Learn more about this course.

Data ingestion: Transporting data from source to storage

Learn how to ingest data by transporting it from source to storage.

Through the video, you would have learned about what data ingestion is, the different sources of data, the two approaches to data ingestion, and the three basic actions in data ingestion.

Data ingestion in the CRISP-DM process

As we learned earlier, the first phase of CRISP -DM is to understand data. Before you can analyse and understand data, you need to collect it from data sources that relate to the business problem. The initial phase of data collection is usually called ‘data ingestion’.

Data ingestion is the process of reading and loading data from various underlying data sources into Python. From there, the data can be processed and transformed as per the requirements of the application. Each kind of data source has its own protocol for transferring data and, as a data analyst, you must be aware of these protocols.

Most times, the data is available to us in the following formats:

Text data (CSV, JSON, Excel, etc.)
Web data (HTML, XML)
Databases (SQL and NoSQL Data)
Binary data formats

Note: Download data set containing the data sets as .csv and .txt files that you will require in this unit. Be sure to extract the files and save them individually in the same folder as your Jupyter Notebooks. All the data in the zip file is sourced from this Github link. [1]

Next, let us learn how to ingest (i.e. load and read) text data in Python.

References

mwaskom. seaborn-data [Internet]. GitHub; 2021 Jun 23. Available from: https://github.com/mwaskom/seaborn-data

Want to keep learning?

This content is taken from FutureLearn online course

Introduction to Data Analytics with Python

View Course

See other articles from this course

This article is from the free online

Introduction to Data Analytics with Python

Created by

Join Now

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now

Learn more about this course.

Data ingestion: Transporting data from source to storage

Data ingestion in the CRISP-DM process

References

Want to keep learning?

Introduction to Data Analytics with Python

Share this post

Introduction to Data Analytics with Python

Introduction to Data Analytics with Python

Reach your personal and professional goals

Register to receive updates

Learn more about this course.

Learn more about this course.