Skip main navigation

The Data Architecture

An article about how data can be managed.

In this article, you will learn about the different data architecture options used to manage data in various data science projects. Organizations understand that their ability to generate actionable insights from data is of strategic significance. For them to use data science, and hence become data-driven, they are deploying cloud-based technologies including AI and machine learning models.

The success of these tools, however, will be limited unless there exists an abundance of high-quality, accessible data. In this regard, selecting the right data architecture with the most efficient data management tools is a key characteristic of a data-driven organization. Nevertheless, managing data in an enterprise context is a highly complex endeavor.

As new data technologies are invented, the burden of legacy systems and data silos grows, unless they can be easily integrated. Fragmentation of the data architecture is an exhausting and frustrating task for many businesses, thanks to not just to silos but also to the diversity of on-premise and cloud-based tools available. Along with data quality issues, these combine to deprive organizations’ data platforms, including machine learning and analytics algorithms, of delivering the anticipated business value.

The main top data priorities for businesses over the next couple of years fall into three areas, all supported by wider adoption of cloud platforms:

  • Improving data management
  • Enhancing data analytics
  • Expanding the use of all types of enterprise data (including streaming and unstructured data)

Open standard is a top requirement of future data architecture strategies. If businesses could build a new data architecture, the most critical advantage over the existing architecture would be a greater embrace of open-source standards and open data formats.

Data Architecture options

There are three data architecture options to manage data for data science:

Data warehouse

A data warehouse (DW) is supposed to be the solution that meets the requirements of a system capable of supporting decision-making, receiving data from multiple operational data sources. The data warehouse is also defined as a subject-oriented, integrated, time-variant, and non-volatile collection of data in support of the decision-making process. See below figure:

Enterprise systems, Accounting systems, E-commerce, Sourcing systems, Human capital, Customer experience, Warehouse MGT system, and call center all go in the data warehouse

Data lake

The data lake is a system that consists of multiple technologies that allow querying the data stored in a file or blob objects. The term data lake was coined in 2010. They support a diverse set of analytic functions, ranging from basic SQL querying of data to real-time analytics to machine learning using cases. A data lake enables the analysis of structured & unstructured data at scale. The number of organizations adopting data lake architecture has increased exponentially. Data lake allows running different types of analytics e.g., SQL queries, big data analytics, full-text search, real-time analytics, and machine learning. See the below figure:

a visualisation of the mentioned technologies, floating around in water

Data lakehouse

The data lakehouse architecture combines both features of the data lake and data warehouse. See the below figure:

From the delta lake, to the delta engine, to the different technologies found in the data lake

It is to be noted that all three architectures are still active and being used by organizations today. The usage should be attributable to data nature, volume and intended use.

© Luleå University of Technology
This article is from the free online

Data Science for Climate Change

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now