As shown in Figure 6.21, Azure Batch Explorer provides a nice overview of the status of your Azure Batch account. You can download Azure Batch Explorer from https://azure.github.io/BatchExplorer.
FIGURE 6.21 Azure Batch Explorer
PolyBase
Batching is related to your current context. The examples you have read about up to now have to do with running batch jobs, which are small snippets of code that typically manipulate or move large amounts of data. PolyBase does perform something called batching, but it is different from a batch job. PolyBase is a technology that exists between a file and the ability to run T‐SQL commands on that file. PolyBase is commonly implemented using external tables. Another feature of PolyBase is the batch loading of data, for example, while bulk loading data with the COPY INTO command. Why is this batching? The reason COPY INTO and PolyBase are considered batching models is due to the fact that when you are using this approach, data is not copied from the data source row by row. Instead, PolyBase batch loads the data in bulk, many rows at a time, which significantly reduces load times.
In Exercise 4.11 you created an external table using the following sequential steps:
- CREATE DATABASE
- CREATE EXTERNAL DATA SOURCE
- CREATE EXTERNAL FILE FORMAT
After creating those three references, you used them with the CREATE EXTERNAL TABLE statement, which creates the external table. The following is an example of that statement. You can see it in practice in Figure 4.33.
It is not required, but if you append an AS SELECT statement to the end of the previous SQL statement, then you have created a CETAS statement. You performed the COPY INTO exercise, which moves data from a data source in batches, in Exercise 4.13. You can see here how to use the COPY INTO command to bulk load a CSV file into a table. The data is batched into large datasets and placed into the table in a transmission that takes place very quickly.
In both scenarios, i.e., the external table and COPY INTO, the existence of PolyBase is not intuitive. Unless you knew of its existence, you might think something like changing data in files in an ADLS container using SQL queries natively works. However, it does not. PolyBase performs its magic a few layers of abstraction below the interface, and magic it is—PolyBase is a very powerful tool.