Exam Essentials – Transform, Manage, and Prepare Data

The T in ETL. After data is ingested (extracted), it needs to be molded into a format and structure that can then flow through the remaining steps of your pipeline. This molding and formatting activity is called transformation. It is the most critical part of the ETL process, as it organizes data in many shapes and forms into a standard structure ready for loading and analysis.

Jupyter notebooks. Sharing your source code and business logic can be a challenge when working in small, medium, and large teams. Having a standard format in which your project dependencies are stored makes sharing code easier. Jupyter notebooks can be exported and imported between Azure HDInsight, Azure Databricks, and Azure Synapse Analytics, which enables great flexibility and many opportunities to gain the advantages each product offers.

Encoding data. If you are running your data analytics in English, you likely will not need to worry too much about encoding and decoding the data. If this is not the case, however, then you need to take actions to ensure that data that contains non‐English characters (also called special characters) is handled, selected, and rendered as expected.

Data normalization. Normalizing your data has more than a single meaning and depends on the context. In a RDBMS it has to do with a concept called normal form. Normal form is intended to reduce duplication and improve query performance. Another meaning has to do with the visualization of data points. When data is normalized in this context, all plotted data is converted into values between 0 and 1 so that it all fits nicely in a chart, reducing outliers.

Data modeling with machine learning. Changing your data from a raw form into an uncomplicated structure reads like the definition of transformation. That is true, but in this context it refers to Azure machine learning data models like classification, regression, and time series forecasting.

Ileana Pecos

Learn More

Leave a Reply

Your email address will not be published. Required fields are marked *