Azure Marketplace – SQL Server 2016 Virtual Machine – Data Tools cannot be installed

As you know, easiest way of getting a SQL Server set up in Azure is, purchasing a Virtual Machine in the Marketplace. There are multiple VMs for SQL Server based on Standard, Enterprise and Developer editions, you can pick the one as per your requirements.
Remember, it does not come with all tools. It has Management Studio installed but it does not have Data Tools installed, hence it needs to be installed.
I had configured a SQL Server 2016 SP1 VM and have been using for sometime. There was no requirement for Data Tools but I experienced an issue when I try to install Data Tools using standard Setup.exe.
I noticed that Setup fails when it tries to download required components from Microsoft site.
……Error 0x80072f08: Failed to send request to URL:……………
I did not immediately identify the issue but with my fellow MVPs’ help, I realized that it can happen with a block related to security.
If you face the same, make sure that no firewall settings block required site and Administrator (or whoever run) has no restrictions. You can make sure that Administrator has no restrictions by checking Server Manager – Local Server – IE enhanced security configuration.

Run Hive Queries using Visual Studio

Once HDInsight cluster is configured, we generally use either the portal dashboard (Powered by Ambari) or a tool like PuTTY for executing queries against data loaded. Although they are not exactly a developer related tools, or in other words, not an IDE, we had to use because we did not have much options. However, now we can use the IDE we have been using for years for connecting with HDInsight and executing various types of queries such as Hive, Pig and USQL. It is Visual Studio.
Let’s see how we can use Visual Studio for accessing HDInsight.

Making Visual Studio read for HDInsight

In order to work with HDInsight using Visual Studio, you need to install few tools on Visual Studio. Here are the supported versions;
  • Visual Studio 2013 Community/Professional/Premium/Ultimate with Update 4
  • Visual Studio 2015 any edition
  • Visual Studio 2017 any edition
You need to make sure that you have installed Azure SDK on your Visual Studio. Click here for downloading the Web Platform Installer and make sure following are installed;
This installs Microsoft Azure Data Lake Tools for Visual Studio as well, make sure it is installed.
Now your Visual Studio is ready for accessing HDInsight.

Connecting with HDInsight

Good thing is, you can connect with your cluster even without creating a project. However, once the SDK is installed, you can see new Templates called Azure Data Lake – HIVE (HDInsight), Pig (HDInsight), Storm (HDInsight) and USQL (ADLA) and HIVE template can be used for creating a project.
Project creates one hql file for you and you can use it from executing your Hive Queries. In addition to that, You can open Server Explorer (View Menu -> Server Explorer), and expand Azure (or connect to your Azure account and then expand) for seeing all components related to Azure.
As you see, you can see all databases, internal and external tables, views and columns. Not only that, by right-clicking the cluster, you can open a windows for writing a query or viewing jobs. Here is the screen when I use the first option that is Write a Hive Query.
Did you notice Intelli-Sense? Yes, it supports with almost all metadata, hence it is really easy to write a query.

Executing Queries

If you need to see records in tables without limiting data with predicates or constructing the query with additional functions, you can simply right-click on the table in Server Explorer and select View top 100 Rows.
If you need to construct a query, then use the above method for opening a window and write the query. There are two ways of executing the code: Batch and Interactive. Batch mode does not give you the result immediately but you will be able to see or download once the job submitted is completed. If you use the Interactive, then it is similar to SSMS result.
If you use the Batch mode, you can see the way job is getting executed. Once the job is completed, you can click on Job Output for seeing or downloading the output.
As you see, there is no graphical interface to see the job execution. Visual Studio will show the job execution using a graphical interface only when the job is executed by Tez Engine. Remember, HDInsight will always use Tez Engine to execute Hive Queries but simpler queries will be executed using Map Reduce Engine.
See this query that has some computation;
Can we create table with this IDE?
Yes, it is possible. You can right-click on the your database in Azure Server Explorer and select Create table menu item.
Let’s talk about more on this with later posts.

How to Connect HDInsight using Putty

I have written many articles on HDInsight Service which is an Azure service that allows us to provision Apache Hadoop Cluster and use it for various type of big data related implementations. If you need to see all my articles written on HDInsight, please use the link:
Although I have mainly used Azure portal to demonstrate HDInsight (Ambari dashboard), many use a tool called PuTTY that is a free implementation of SSH and Telnet for Windows and Unix platform. It is a good tool to work with, specifically if you need to connect the cluster remotely and work with it for navigating nodes or doing some data work with some sub projects like Hive against the cluster. Remember, if you have configured your HDInsight cluster as a Linux-based cluster, you need a SSH client to remotely connect with it. Let’s see the way of using it.
First of all you need to download PuTTY for using it as Windows machines do not have it as a built-in tool (Linux and Mac OS computers have a SSH client interface built-in). Here is the link for downloading PuTTY.
There is no installation with it. You can just use the putty.exe. However, you need a host name to connect. If you go to the portal and open your cluster blade, you can get the Host Name for connecting with a SSH Client by clicking Secure Shell (SSH) button.
Note that there are two, one for Windows and another for Linux, Unix and OS X users, make sure that you copy the Windows one.
When you open PuTTY, you should see an input box called Host Name for entering the copied host name. Enter the copied value and make sure that the Connection Type is selected as SSH.
All you have to do now is, click Open. This opens the standard window with standard font and font size. If you need a larger font, you can go to Window – Appearance  and change settings as you want.
When you open this first time, you will get a security warning as below;
This says that we do not have the key related to the host cached locally, hence it is not secure or not sure whether we connect to the right one. Since we know that we connect to the right host, we can simply click Yes and continue.
Once connected, you will be promoted for user id and password. Note that you need to use the one you configured as SSHUser and the Password when the cluster was created. It does not accept the HTTP User.
Enter the SSHUser and the password, you will be connected with the default node.
Everything is ready now. As you see, first part of the last line is the user connected and hn0 is one of the head nodesLast part of it shows the cluster name but we see it partially as it is lengthy. Now I can navigate, make folders, move files or connect with Hive without any issue.

Azure SQL Data Warehouse – Part I – Creating and Connecting

We have been implementing on-premises data warehouses for years and everyone is well aware on difficulties-faced at the first phase of the project, that is requesting hardware for the project. In most cases, either the project is stopped or postponed by the management or client saying that no-budget. But now we have a solution that addresses the initial issue we have. Not only that, this solution gives you many benefits in terms of performance as well. It is Azure SQL Data Warehouse.

The Azure SQL Data Warehouse is a cloud-based SQL Data Warehouse which can be easily scaled with cloud-based-resources and pay as-it-grows. It is a fully managed service that that covers maintenance, software patches and backups. It can store and process both relational and non-relational data, and many Azure features such as Azure Blob Storage, Azure Data Factory can be used for implementing it.

Let’s talk about some benefits in details;

Scalability and Cost
This is one of the major benefits. You can scale Azure Data Warehouse, increasing resources when required and cost will be calculated based on it. Since there is no upfront cost involved with this, you can simply start with less number of resources and adjust when you want. You will be charged on two resource usage types: Storage and Compute.

The storage configured for Azure SQL Data Warehouse is automatically re-scaled as you add and remove data, hence you pay for what you use.

The value set for Compute is the measurement for performance of execution in Azure SQL Data Warehouse. It is measured in Data Warehouse Units (DWU) and you can start with 100 DWU (USD 2 per hour) to 6000 DWU (USD 123 per hour) as per the requirements. This value can be anytime changed and cost of the usage will be calculated as per the value set.

Read more on DWU at:

Since this can be scaled out as you need, data and queries can be distributed, increasing the performance. This uses Massive Parallel Processing Architecture (MPP), hence workload can be spread across many nodes using complex statistics calculated, increasing the performance significantly.

This uses Columnstore Indexing that is 10 times faster than Rowstore Indexing. Since it supports 5 times more compression than standard Rowstore Indexes reducing IOs, it improves the performance more.

Hybrid Cloud
It is a hybrid cloud environment. It supports seamless integration with on-premises data, cloud data and of course unstructured data. With Polybase, different type of sources can be queried using TSQL as data is stored in local, making sure that everything required is shown in a single database.

Not only that, copying data from other sources can be easily done via Azure Data Factory.

Okay, now let’s see how we can create one and star using it. First of all, you need an Azure Account. Once an account is created, you need to create a SQL Server which is a logical server for holding your data warehouse. If you have already created a SQL Sever, then it can be used, else follow the steps below for creating a logical server.

1. Login to the Azure Portal and open SQL Servers Blade.

2. Set Sever name, Server admin login, Password, Resource Group and Location. Note that you can add database specific logins later.
Azure DW 02

  1. Click on Create button to get it created. If you want to see the server in Azure Dashboard, make sure that Pin to dashboard is checked.

Now the server is ready. Let’s create a data warehouse.

  1. Open the SQL Data Warehouses blade.

  2. Enter the Database Name. With this test, we will be using AdventureWorks (we will be creating different databases later), hence set the name as AdventureWorksDW.

  3. Select the appropriate Resource Group. Make sure that you select the same selected for the server.

  4. Select Sample for Select source.

  5. Select AdventureWorsDW as the sample database.

  6. Select the server created with above steps and enter the admin account and its password.

  7. Select DWU as you need. Let’s initially set this as 200.

    Azure DW 01

  8. Select Pin to dashboard and click Create to get the data warehouse created.

Database is created. Remember, you cannot connect with the server configured until you are permitted to connect from your IP address. Initially you can configure your IP address to the server using the portal. Later, you can connect to the server using Management Studio and configure IP addresses for others using TSQL, either at server level or database level. Here are the steps.

  1. Open the portal and open SQL Servers blade.
  2. Click on the server created (or the one used for adding the data warehouse) to get its properties.
  3. You can add your IP address by clicking Add client IP button and clicking Save button. Or else, you can manually set it using the given grid.
    Azure DW 03
  4. Add it and save.

Now firewall rule is configured for you. Let’s connect and see.

  1. Open Management Studio and connect with the server account as below.
    Azure DW 04
  2. Once connected, open a new Query Window and type below code for seeing firewall rules set.
    Azure DW 05
  3. If you need to add more rules for others, use sp_set_firewall_rule for server level rules and sp_set_database_firewall_rule for adding database level rules.
  4. Query your database and see now. You should be able to query your table without any issue.
    Azure DW 06

With the next post, let’s see how we can use other tools like Visual Studio and Power BI for connecting with the data warehouse created.

Here is the link for Part II –