Develop a Batch Processing Solution Using an Azure Data Factory Pipeline – Create and Manage Batch Processing and Pipelines

  1. Log in to the Azure portal at https://portal.azure.com ➢ navigate to the Azure Data Factory workspace you created in Exercise 3.10 ➢ click the Open link in the Open Azure Data Factory Studio tile on the Overview blade ➢ select the Author hub ➢ select the + to the right of the search box ➢ select Pipeline from the pop‐out menu ➢ select the pipeline ➢ expand Batch Service from the Activities pane ➢ and then drag and drop a Custom activity to the editor canvas.
  2. Enter a name for the Custom activity (I used Calculate Frequency Median) ➢ select the Azure Batch tab ➢ select the + New link to the right of the Azure Batch Linked Service drop‐down list box ➢ enter a name (I used BrainjammerAzureBatch) ➢ enable interactive authoring ➢ enter the Azure Batch Access key (located in the Primary Access Key text box on the Keys blade for the Azure Batch account created in Exercise 6.1) ➢ enter the account name (I used brainjammer) ➢ enter the Batch URL (also available on the Keys blade for the Azure Batch account called Account Endpoint) ➢ and then enter the pool name you created in Exercise 6.1 (I used brainwaves).
  3. Select + New from the Storage Linked Service Name ➢ create a linked service to the storage account where you placed the batch code (brainjammer‐batch.exe) in step 3 of Exercise 6.1 ➢ click Test Connection ➢ click Create ➢ select the Settings tab ➢ enter run.bat into the Command text box ➢ select the Azure Storage linked service you just created from the Resource Linked Service drop‐down ➢ click the Browse Storage button next to the Folder Path text box ➢ navigate to the Exercise6.1 directory, which contains the run.bat file you uploaded in Exercise 6.1 ➢ click OK ➢ and then click Save. The configuration should resemble Figure 6.20.

FIGURE 6.20 Azure Batch custom pipeline activity Azure Data Factory

  1. Click the Validate button ➢ click the Debug button, to test the batch job ➢ and then navigate to your ADLS container. New files will be rendered into the path that you provided for outputLocation in Exercise 6.1, appended with the current year, month, day, and hour. Navigate back to the Azure Data Factory workspace ➢ rename the pipeline (I used TransformSessionFrequencyToMedian) ➢ and then click Publish.

You can use a client application, Azure Batch Explorer, to view some usage and performance metrics for Azure Batch.

Ileana Pecos

Learn More

Leave a Reply

Your email address will not be published. Required fields are marked *