Skip main navigation

The Current Technology Landscape: Data and AI

In this article, we explore the technology landscape, looking at data infrastructure, data engineering data visualisation and data science.
© Torrens University

It can be a nightmare for organisations and analysts to be across every technology. The technology landscape can however be classified and simplified based on its application across analytics and data science value streams.

A view of the technology landscape was captured by Mark Turck (figure can also be downloaded below).

Matt Turck’s Data and AI Landscape 2019 chart

Data Infrastructure

Data infrastructure includes the technology infrastructure used to store and process data and make it easily accessible for further analysis. Traditionally, companies have used relational database technologies (e.g., Microsoft SQL Server, Oracle Database and IBM DB2) to build data warehouses as ways of storing data in a structured format that is easy to query.

Most of the data were stored on-site on large servers hosted within the organisation. However, such technologies were limited by the amount of data they could store and volume of data that could be processed, queried and transmitted. If there was an increase in data or processing requirements, organisations would have to buy larger expensive servers. Over the past decade, the growth in big data and cloud technologies has changed the way organisations host and process their data.

Big data technology, such as Hadoop and Not only SQL database (NoSQL), allows organisations to store a large amount of data safely on cheap hardware using distributed computing. It also allows users to query large amounts of data quickly. Hortonworks and Cloudera are two major companies working in Big Data technology. MongoDB is an example of NoSQL.

Recent growth in cloud technologies has commoditised the computation, storage and transmission of data. Cloud technologies have enabled organisations to store a large volume of data at low cost securely in data centres distributed globally. These technologies provide flexibility and elasticity. Organisations can increase and decrease data storage space and compute power based on their requirements and no longer have to buy new expensive hardware each time. Organisations also do not have to worry about maintaining the systems, as this is taken care of by the cloud technology vendors. Microsoft Azure, Amazon Web Service (AWS), Google Cloud Platform (GCP) are all major companies in cloud computing.

Setting up an appropriate data infrastructure is the foundation for solving complex problems using data science. Thus, it has become a necessity for organisations and governments alike. For example, the government has focused on setting up data infrastructure for a vital initiative: Square Kilometre Array overview (SKA). SKA is a next-generation radio telescope that will ultimately have a square kilometre of collecting area, making it the most sensitive radio telescope in the world.

Data Engineering

Data engineering can be defined as the use of technology to integrate data from different sources and build a data pipeline to extract information from data. Companies use different systems to run business functions. For example, a standard company would have systems such as Enterprise resource planning (ERP), a Customer relationship management (CRM) tool, campaign tools, web analytics tools and chatbots.

These systems store data in their own databases in different formats. For example, ERP stores data as a relational database. Conversely, a chatbot may store data in a JSON format (What is JSON). To perform an analysis and make sense of such data, a company would need to integrate data from all these systems in the same format on a common platform. Enterprise service bus (ESB) enables data to be integrated from various platforms. Extract, Transform and Load (ETL) tools allow developers to extract data from various sources, transform them based on business rules and load the data on a common platform for analysis. This is done using various ETL tools available in the market. Standard Query Language (SQL) is the foundation of traditional ETL tools and helps setting communication [LH2] with the data in data warehouses.

SQL queries have been used to extract data from the data warehouse, manipulate this data based on the requirement of the analyst and load the data in a format that is easy to interpret. However, SQL queries have limitations; for example, they can only read data from relational databases that store data in a structured format (i.e., rows and columns).

Over the last decade, data has become diverse and voluminous. Big Data, NoSQL and cloud technologies enable organisations to store and process different kinds of semi-structured and unstructured data, such as free text, video and audio. The growth of big data saw the emergence of new techniques, such as MapReduce, Pig, Hive and Apache Spark. MapReduce uses two tasks: a Map task and a Reduce task. The combination of these tasks enables distributed parallel processing of data by sorting a big chunk of data and breaking into smaller tasks. These tasks are processed simultaneously and combined to get the final outcome quickly. Conversely, Apache Spark can be used to process live streaming data, such as machine and security logs, real-time web browsing data and live credit card transactions.

Cloud vendors have also developed data engineering tools to handle a large volume and variety of data. Amazon has EMR, Google has BigQuery and Dataflow, while Azure has DataFactory.

In recent years, there has also been a rise in focus on ETL workflow automation with tools such as Alteryx.

Data Visualisation

As discussed above, data visualisation tools support descriptive and diagnostic analytics. These tools help represent a large amount of data in the form of reports and dashboards containing charts, graphs and tables, as these are easy for business users to interpret. These tools also enable users to dissect pre-processed data across various dimensions using filters.

Microsoft Excel has been widely used by businesses over the past 20 years to develop reports for limited amounts of data. IBM Cognos and SAP Business Objects advanced some earlier developments in this area. In the last decade, there has been an exponential growth in data visualisation tools. Tableau revolutionised the field of data visualisation by enabling modern visualisations of large amounts of data. Qlik and Microsoft Power BI have gained popularity in the last five years. These business intelligence and data visualisation tools are famous not just among analysts but also in the business community because of the ease with which they can be used. They also provide a self-service functionality for analysts and tech savvy business users.

Open source tools, such as HTML/D3, Javascripts and R Shiny, are popular among developers. They provide libraries of visualisations that help develop custom visualisations for specific business requirements.

Data Science

Data science tools support predictive and prescriptive analytics. They enable running data mining and statistical functions on large amounts of data to identify statistical patterns. These tools include:

  • SAS: Developed in the 1970s, SAS is the earliest tool used by organisations and academics;
  • R and Python: these open source tools have become extremely popular over the past decade among the analytics and data science communities. Such open-source tools have a big community of users who contribute to publishing libraries of statistical functions online;
  • Cloud vendors: These vendors provide tools on their platform that are useful in developing and operationalising data science algorithms. For example, AWS has Sagemaker, Google has Datalabs and TensorFlow;
  •; and
  • Knime.
© Torrens University
This article is from the free online

Introduction to Digital Transformation: Understand and Manage Digital Transformation in the Workplace

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now