Automating the analysis of 'big data'
Having ensured our data is suitable, we then consider how best to automate the analysis of this ‘big data’.
Earlier we introduced the ‘3Vs’ and you explored IBM’s ‘4Vs’ and Microsoft’s ‘6Vs’. We use these to try and define ‘big data’.
Why have we used the term ‘big data’ and what does it mean?
There is no formal definition of ‘big data’. However, we can consider ‘big data’ as typically too large to fit into available memory, or taking too long to examine, or never ending.
In most cases, business problems will be on-going and fall into the ‘never ends’ category, but they may also fall into the ‘takes too long to examine’ category if we want updates every hour and analysis takes all day. If we only had a small amount of data, would we need Machine Learning (ML) and Artificial Intelligence (AI) to make sense of it?
Depending on the type of data and its source, we may need different ML and AI techniques to deal with the data:
When we have a large, historical data set to analyse it won’t change, so we can use static analysis techniques
When we have a large volume of data, and continually add to it, we may not want to repeatedly re-analyse the whole data set. In this case, we use dynamic techniques that update with each new datum. This will help us understand ‘concept drift’ such as a general increase in consumption.
When we have fast-moving data streams, and old data is no longer relevant, our analysis must evolve to suit. For this type of analysis, we use evolving data analysis techniques. This allows for new technologies or products to move in and take market share. For example, people preferring goat’s cheese to cow’s cheese.
If in doubt, remember that we are continually adding to our data, we may not have big data now, but we might in the future. As we will see in future short courses, different algorithms may generate slightly different results. We want our analysis to be consistent over time, so we should consider this at the earliest stage.
If you want to analyse sales in your new, small online book store you need to consider that it might grow in future:
If you want to count the number of internet users:
- China = 765 million
- India = 391 million
- United States = 245 million
- Brazil = 126 million
- Japan = 116 million
- Russia = 109 million
(Roser, Ritchie and Ortiz-Ospina 2019)
If you want to examine land use:
If you want to measure traffic use:
(Department for Transport 2017)
(Department for Transport 2017)
The next short course, in the AI Technologies for Business and Management program, will illustrate the different types of machine learning algorithms.
Think of the milk production scenario or another one in a workplace that you are familiar with. Try and come up with examples of:
- Static data
- Dynamic data
- Evolving data
Share your examples in the comments area and comment on those from fellow learners.
Statista (2019) Combined desktop and mobile visits to Amazon.com from May 2019 to October 2019 [online] available from https://www.statista.com/statistics/623566/web-visits-to-amazoncom/ [22 April 2020]
Roser, M., Ritchie, H., and Ortiz-Ospina, E. (2019) ‘Internet’. OurWorldInData.org [online] available from https://ourworldindata.org/internet [22 April 2020]
OurWorldInData.org (2019) Global Land Use for Food Production [online] available from https://ourworldindata.org/uploads/2019/11/Global-land-use-graphic.png [22 April 2020]
Department for Transport (2017) Road Lengths in Britain 2016 [online] available from https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/611185/road-lengths-in-great-britain-2016.pdf [22 April 2020]
© Coventry University. CC BY-NC 4.0