Lambda Architecture – Create and Manage Batch Processing and Pipelines

This architecture model was detailed extensively in the sections that covered the design (Chapter 3) and implementation (Chapter 4) of the serving layer. Like the serving layer, the batch layer is a component of the lambda architecture. The final component is the speed layer, which is covered in Chapter 7, “Design and Implement a Data Stream Processing Solution.” The batch layer is responsible for incrementally loading noncontinuous data from your data lake into the serving layer. The data flow takes place along the cold path and, as a result, takes a bit longer to become available to consumers and reporting. A reason for the delay has to do with the large volume of data on which the batch job is likely executing. Another is that before the data is placed onto the serving layer for consumption, it needs to be in an efficient querying format. The complexity, variety, and quantity of the data on which the batch process runs has a great impact on the time required to transform the data to this level of quality. This is a reason why numerous models are based on lambda but do not mirror it exactly.

Refer to the illustration of the lambda architecture in Figure 3.13 to view this batching position and responsibility. You might also be able to transpose the serving layer on top of the Analytical Data Storage component in Figure 6.4. Figure 6.4 mimics many elements of the lambda architecture, but an additional layer provides an opportunity to perform additional transformation before exposing the results of the batch processing to consumers and reporting resources.

Develop Batch Processing Solutions

There are many Azure products that can be used to develop a batch processing solution. As you have read, historically, a batch process was defined as a program hosted on a batch server that was triggered on a regular basis. There is no limitation to what a batch job is allowed to do, but in most cases it would access a data source, retrieve data, transform data, and output the result. Or the batch job might simply update some data on a table and be done with it. The point is that it is currently difficult to describe from an implementation perspective what a batch process is. Does it mean that you must use a product that contains the word batch in it? Consider that the data flow transformation feature can perform actions that are very similar to what has historically been done with what was commonly called a batch job. Recognize this ambiguity and don’t get caught up in semantics; the objective is to manage your data and get it transformed as quickly, securely, and with the highest degree of quality as possible. The following are examples of some batch processing solutions. Some use products that include the word batch, and some do not. As you work through the examples, learn them, and then take what you’ve learned and apply it to the context you are working in. Because you will be exposed to numerous products and features, you can choose the one that works best for you based on your requirements.

Lambda Architecture – Create and Manage Batch Processing and Pipelines

Develop Batch Processing Solutions

Ileana Pecos

Leave a Reply Cancel reply

Develop Batch Processing Solutions

Related Posts

SCHEDULED TRIGGERS – Create and Manage Batch Processing and Pipelines

Exam Essentials – Transform, Manage, and Prepare Data

Usage – Transform, Manage, and Prepare Data

Ileana Pecos

Leave a Reply Cancel reply