SQL Server 2016 – CREATE/ALTER PROCEDURE must be the first statement in a query batch

Everyone knows that SQL Server started supporting CREATE OR ALTER statement with SQL Server 2016 and it can be used with few objects. However, with my new installation of SQL Server 2016, I noticed this error which was not supposed to see with 2016.
What could be the reason? I could not immediately realize the issue and was thinking that what went wrong. It should be working as this is SQL Server 2016.
After few minutes, I realized the issue;
It is all about the version I used. This feature was introduced with SQL Server 2016 – Service Pack I and I had not installed SP1 for my new instance. You may experience the same, hence made this post to note it :).

SQL Server TRY_CONVERT returns an error instead of NULL

Can this be happened? The functions TRY_CONVERT and TRY_CAST have been designed for handling errors and they should return NULL when there is an error.
But, it is not always true.
What is the reason? Although it returns NULL when an error occurred with the conversion, it does not happen if the conversion is not a permittable. The above code tries to convert an integer into an unique identifier that is not permitted, hence it fails with an error.

SQL Server – Merging Partitions – Does it physically move data from a data file to another?

Partitioning is a common practice when maintaining large number of records in tables regardless of the database type: whether the database is an OLTP database or a data warehouse based on dimensional modeling. Partitioning requires a Partition Function that describes boundary values and number of partitions required, and Partition Scheme that assigns File Groups to partitions.
Here are few posts I have made on partitioning;
Once a table is partitioned, based on the workloads, we add new boundary values for introducing new partitions to the table and we remove boundary values for combining partitions (or removing partitions). These two operations are done with two partition related functions: SPLIT and MERGE. Now the question is, when we remove a partition using the MERGE function, if partitions are distributed with different file groups (means different data files), does SQL Server moves data from one file to another? Do we really have to consider it?
Let’s make a database and do a simple test. Let’s create a database called AdventureWorksTest and add multiple file groups and data files.
As you see, we have five file groups now.
Let’s create a Partitioned Table and load it using AdventureWorks2014 Sales.SalesOrderHeader table. Note that, below code creates a Partition Function with 2014, 2015, 2016 and 2017 as boundary values. And it creates the Partition Scheme setting with multiple file groups.
Here is the result of the query.
Here is another query to see how data is distributed with each file group.
If I illustrate the same in different manner;
This is how the data is distributed with my system now. If I try to delete a data file, example, AdventureWorksTest_Data_03 that is belong to FileGroup3, SQL Server will not allow me to do as it holds data.
If I decide to combine 2015 and 2016, in other words, if I want to remove the 4th partition that holds 2016 data, what will happen. Let’s do and see. The below codes merge the partition;
Once delete, let me run the same query to see how data is partitioned now.
As you see, FileGroup3 is no longer used by the table and number of records in the FileGroup2 has been increased. The partition which was 5 is now 4. This means, data that was maintained with FileGroup3 (or AdventureWorksTest_Data_03) has been moved to FileGroup2 (or AdventureWorksTest_Data_02). This is what has happened;
This clearly shows that data is moved when partitions are merged. Okay, do we have to really consider about this? Yes, we have to, when this is a small table, we do not see any performance issue but if the table is large, moving data from one file to another will surely take time, hence, think twice before merging, specifically when they are distributed in multiple files.
** Now, if you need, you can delete the AdventureWorksTest_Data_03 file and SQL Server will allow you to delete as it is empty.

How to refer files in HDInsight – Azure Storage using different ways

If you have started working with Big Data, you surely need to check the Microsoft support on it via Azure platform – HDInsight service. HDInsight allows you to create a Hadoop environment within few minutes and it can be anytime scaled out or in based on your requirements. I have written few posts on this, you can have a look on them using following links;
In order to work with data loaded to HDInsight, or Hadoop, data files have to be refereed using supported syntax. There are multiple ways for referring files in the storage with HDFS. Here are the ways;

Fully qualified path with wasb(s) protocol

This is most accurate and correct way of referring files in the storage. Here is the pattern;
Here is an example using Putty, connecting with HDInsight and reading a file (processed with Hive) exist. My container name is dinesqlashdinsight and storage name is dinesqlasstorage. File path is data/cleanlog/000000_0 (this is a Hive table in fact).

Connecting with the default container

If your files are in the default container, you can skip the container name and storage name as follow;
Note the three slashes. It is required when you do not mentioned the container name.

Connecting using Hadoop/Linux/Unix native ways

Generally, when you work with Hadoop using Linux/Unix, you refer files without the protocol. Azure HDInsight supports the same and we can refer files using that syntax.

Do I need double quotes for my paths?

It is required when you have some odd characters like equal (=) sign with your path. See the example below. I try to read a data file exist in a the cluster and the path has equal signs, hence path is encased with double quotes.

SQL Server – Dropping Primary Key will drop the Clustered Index?

Everyone knows that SQL Server creates a Clustered Index when we add a Primary Key if there is no Clustered Index already exist in the table. It adds a Non-Clustered Key for the Primary Key if we have already added a Clustered Index. However they are two different objects; one is a Constraint and other is an Index. What if I drop one object? Will it drop the other as well?
Let’s make a test and see. If I create a table Customer like below, making Customer Key as the Primary Key;
As you see, it will create both Key and the Index.
Now if I drop either one, it will drop the other one as well. For an example, if I drop the Primary Key, it will drop the Index as well.
If I need to make sure that it does not happen, I can create them separately, first create the Clustered Index on CustomerKey and then make the CustomerKey as the Primary Key. However, that will add another index specifically for the Primary Key.
The reason for above behavior is, Primary Key needs an Index. It is always associated with an index therefor if one is getting dropped, the associate also getting dropped.

Dimension with just one attribute – data warehouse design

Have you ever come across a situation where you have to design a dimension that has no attributes except the one explain the dimension itself? I am sure that you have experienced such dimensions and you might have created a table in relational data warehouse for maintaining it. But is it the best way of managing such dimensions?
Assume that you have a fact table that maintains Issues and it is linked with multiple dimensions such as Date, Warehouse and Customer. And you have identified Issue Number as another dimension as there is a requirement for analyzing claims by issue number. You may design your data warehouse as below.
As you see, there are no other identified attributes for Issue Number dimension. Do we have an alternate way of designing it?
This is where we use a dimension type called Degenerated Dimension. Whenever we see such dimensions and we know for sure that it has only one attribute and there will be always a unique value to a transaction in the fact, it is always better to add it to the fact table itself rather maintaining another table. The dimension that is maintained in the fact is called as Degenerated Dimension.

How to Connect HDInsight using Putty

I have written many articles on HDInsight Service which is an Azure service that allows us to provision Apache Hadoop Cluster and use it for various type of big data related implementations. If you need to see all my articles written on HDInsight, please use the link: http://dinesql.blogspot.com/search?q=hdinsight.
Although I have mainly used Azure portal to demonstrate HDInsight (Ambari dashboard), many use a tool called PuTTY that is a free implementation of SSH and Telnet for Windows and Unix platform. It is a good tool to work with, specifically if you need to connect the cluster remotely and work with it for navigating nodes or doing some data work with some sub projects like Hive against the cluster. Remember, if you have configured your HDInsight cluster as a Linux-based cluster, you need a SSH client to remotely connect with it. Let’s see the way of using it.
First of all you need to download PuTTY for using it as Windows machines do not have it as a built-in tool (Linux and Mac OS computers have a SSH client interface built-in). Here is the link for downloading PuTTY.
There is no installation with it. You can just use the putty.exe. However, you need a host name to connect. If you go to the portal and open your cluster blade, you can get the Host Name for connecting with a SSH Client by clicking Secure Shell (SSH) button.
Note that there are two, one for Windows and another for Linux, Unix and OS X users, make sure that you copy the Windows one.
When you open PuTTY, you should see an input box called Host Name for entering the copied host name. Enter the copied value and make sure that the Connection Type is selected as SSH.
All you have to do now is, click Open. This opens the standard window with standard font and font size. If you need a larger font, you can go to Window – Appearance  and change settings as you want.
When you open this first time, you will get a security warning as below;
This says that we do not have the key related to the host cached locally, hence it is not secure or not sure whether we connect to the right one. Since we know that we connect to the right host, we can simply click Yes and continue.
Once connected, you will be promoted for user id and password. Note that you need to use the one you configured as SSHUser and the Password when the cluster was created. It does not accept the HTTP User.
Enter the SSHUser and the password, you will be connected with the default node.
Everything is ready now. As you see, first part of the last line is the user connected and hn0 is one of the head nodesLast part of it shows the cluster name but we see it partially as it is lengthy. Now I can navigate, make folders, move files or connect with Hive without any issue.