Skip main navigation

Hurry, only 2 days left to get one year of Unlimited learning for £249.99 £174.99. New subscribers only. T&Cs apply

Find out more

Importance of Data Quality

Learn more about the importance of data quality.
According to this picture, there are six attributes for data quality: accuracy, relevance, accessibility, completeness, clarity, and timeliness.
© Kilkenny, M.F., and Robinson, K.M., 2018. Data quality: “Garbage in–garbage out”

You might feel dizzy after learning too much. You may focus on the best method that answers your question. However, the most challenging step is not choosing the most appropriate machine learning method.

The foremost step is the tedious process of collecting and cleaning the data. You must obtain the correct data. Then, you have to filter the data with suitable criteria. Only if these steps are done, you can use descriptive or predictive modeling.

The article that we will read today is originally from Kilkenny, M.F., and Robinson, K.M., 2018. Data quality: “Garbage in–garbage out.” The article emphasizes the process of data collection. Even though you use an excellent method, the results will be wrong if the data is wrong. After spending many hours fighting with Rstudio or any programing, it would be dreadful to realize that the data was wrong.

The result can differ if you change the input data. It will be extremely annoying to find that your original answer was wrong because the input data was wrong. You have to re-do the whole thing using the new input data. Of course, that could prove that your “good” answer was in fact, wrong.

Although the data collection and cleaning processes are the most boring part of the data mining process, it is the first thing to make sure for the right data analysis. Yes, it takes a lot of time.

The article “Data quality: Garbage in- garbage out” suggests four criteria of useful data: accuracy, validity, completeness, and availability. The accuracy criterion is the important one. If your data is accurate, then you are at a good starting point. Accuracy includes both wrong and missing data. For example, data saying 5 while it should be 5.5 is wrong. Also, you do not want too much missing data. You should also consider when the data is announced. We often deal with the data that has a timestamp. We want to have the exact time information.

Remind yourself of the colloquial “Garbage in – Garbage out.” The process of collecting the right data is demanding but it should be done correctly. Otherwise, there is no reason to do any analysis.

© Kilkenny, M.F., and Robinson, K.M., 2018. Data quality: “Garbage in–garbage out”
This article is from the free online

Artificial Intelligence and Machine Learning for Business

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now