Data Modeling and Usage – Transform, Manage, and Prepare Data

Data modeling is focused on uncomplicating data by reorganizing it into a state where business decisions and insights can be gained. Consider, for example, how the brainjammer brain wave data looked in its initial state. At first glance there is little to no value to be gained from data in such format.

Performing some transformation on the data provided some opportunity to gain value. Viewing the data after its initial transformation provides a better understanding of what the data is meant to represent.

Next, you performed some exploratory data analysis on the data to determine any insights or traits that the data produces. This data does provide some insights but is best consumed visually and in relation to data with similar characteristics. In this case comparing the statistical frequency values to other scenarios can induce some conclusions and introduce more questions.

What is next? Data modeling also has linkages into the machine learning context. Remember that one objective of the insights learned from the brainjammer brain wave data is to find a trend or pattern in the data. Then, you can use that pattern to analyze brain wave readings, in real time or near real time, to distinguish what scenario the individual is performing. If you study Figure 5.43, you might be able to make some educated conclusions concerning brain wave reading ranges using the median value. There is significant overlap, which may hinder your ability to precisely predict the scenario. Azure Machine Learning (AML) provides some advanced capabilities for providing more precise results.

Data Modeling with Machine Learning

Machine learning is a very interesting emerging area. You shouldn’t expect many questions concerning AML on the exam, but as a data engineer you should know something about how to use it and what you can expect from it. Complete Exercise 5.15, where you will gain both of those aspects. Before you begin, however, please note that, as of this writing, in order to run an Automated Machine Learning (AutoML) job from Azure Synapse Analytics, your Spark pool must be version 2.4. The pool you created in Exercise 3.4 targeted version 3.1. Therefore, you need to create a second Spark pool that targets version 2.4 to complete this exercise. This requirement might change in the future, but this is the case for now.

Ileana Pecos

Learn More

Leave a Reply

Your email address will not be published. Required fields are marked *