The Role of Data Engineering and Analytics in an Organization

1. Introduction

Before learning how to work with tools, it is very important to understand how a business works, how a business uses data, and how it can be useful.

2. The Role of Analytics in an Organization

An organization exists to deliver value. There are 3 key groups who can benefit from this or that business:

The most important group is the customers, as many modern companies operate on the “customer obsession” principle.

In order for the business to grow, it is necessary to create more value for each of the groups. For customers, for example, customer experience. For employees - the level of balance between work and life, salary (work life balance, salary). For owners - income. All of these groups make decision making to successfully grow their businesses and get their jobs done. In order to make decisions, you need data. The data can be raw data or organized raw data. One of the tasks of the data engineer is to provide data to the groups described above for further decision making. Therefore, it is very important to understand how exactly the work done by the date engineer affects what happens to the business.

Video 1: Importance of data to an organization

3. Analytics Objectives

Analytics is the part of the business that uses data to provide information that is used to make decisions to make the business run efficiently. Analytics is needed for:

4. MindMap of data engineering

MindMap is a smart map, a tool for visual display of information.

5. Major roles in the data world

Traditional category:

Data Engineer Category:

Profile category (Data Science, IT):

Category advanced analytics (Forecasting elements):

This blog focuses on the data engineer and traditional category roles.

An engineer does not know everything about everything, he understands the basic principles and sees the ultimate goal, and then he creates using tools and skills. A data engineer who previously has been referred with following roles:

can be divided into 2 main types:

  1. A programmer who became a Data Engineer.
  2. BI / DW / ETL developer who became a Data Engineer.

Let’s take a closer look at the differences. The task of the Data Engineer is to create a platform where data is automatically loaded, where it is transformed into an accessible form for end users (usually business users).

Data sources can be different: relational databases, SFTP, API, log files, sensors. Data types can also be different: structured data in tabular format, semi-structured (JSON, XML) and unstructured (video, audio).

Depending on the business requirements, the Data Engineer needs to create a data pipeline that will automatically pick up data and load it into the data platform (data warehouse or data lake). You need to choose tools for working with data.

Our goal is simple: to help businesses extract valuable information from data. To do this, you need to create an analytical solution where users can independently work with data, test their hypotheses and analyze business problems using the right metrics.

To build such a solution, you need a Data Engineer. In my case, this is not just creating a data stream, transforming and loading data. This is a full-fledged work with business units, understanding their needs and providing them with the tools to solve their problems.

One can use Java / Python programming languages, etc. to create a solution - Data Engineer - #1 ( Technical Data Engineer ), or one can use ready-made solutions that will allow you to create scalable and secure solutions, quickly achieve results - Data Engineer - #2 (let it be Result Oriented Data Engineer ).

Programming is indispensable even for type 2, but you do not need to be a programming guru, it is enough to understand how Python works and use small pieces of code to customize the solution.

6. Architecture of the analytic solution

3 layers of architecture:

Sometimes another layer is used - Processing / Compute Layer, where the data is transformed before loading into the storage.

One more simplified architecture diagram (source: [2]) is as follows:

References:

[1] https://github.com/Data-Learn/data-engineering

[2] https://github.com/Artyom174