' '

Automating the analysis of 'big data'

Having ensured our data is suitable, we then consider how best to automate the analysis of this ‘big data’.

Earlier we introduced the ‘3Vs’ and you explored IBM’s ‘4Vs’ and Microsoft’s ‘6Vs’. We use these to try and define ‘big data’.

Why have we used the term ‘big data’ and what does it mean?

There is no formal definition of ‘big data’. However, we can consider ‘big data’ as typically too large to fit into available memory, or taking too long to examine, or never ending.

In most cases, business problems will be on-going and fall into the ‘never ends’ category, but they may also fall into the ‘takes too long to examine’ category if we want updates every hour and analysis takes all day. If we only had a small amount of data, would we need Machine Learning (ML) and Artificial Intelligence (AI) to make sense of it?

Depending on the type of data and its source, we may need different ML and AI techniques to deal with the data:

  • When we have a large, historical data set to analyse it won’t change, so we can use static analysis techniques

  • When we have a large volume of data, and continually add to it, we may not want to repeatedly re-analyse the whole data set. In this case, we use dynamic techniques that update with each new datum. This will help us understand ‘concept drift’ such as a general increase in consumption.

  • When we have fast-moving data streams, and old data is no longer relevant, our analysis must evolve to suit. For this type of analysis, we use evolving data analysis techniques. This allows for new technologies or products to move in and take market share. For example, people preferring goat’s cheese to cow’s cheese.

If in doubt, remember that we are continually adding to our data, we may not have big data now, but we might in the future. As we will see in future short courses, different algorithms may generate slightly different results. We want our analysis to be consistent over time, so we should consider this at the earliest stage.

Examples

If you want to analyse sales in your new, small online book store you need to consider that it might grow in future:

This graph illustrates the amount of visits to Amazon.com from February 2018 to July 2019. The graph rises and falls over the years but shows a significant spike in visitors from 2390 to 2975 from October 2018 to December 2018 followed by a sharp drop to 2247 in February 2019, after which the graph starts climbing again. (Statista 2019)

If you want to count the number of internet users:

  • China = 765 million
  • India = 391 million
  • United States = 245 million
  • Brazil = 126 million
  • Japan = 116 million
  • Russia = 109 million

(Roser, Ritchie and Ortiz-Ospina 2019)

If you want to examine land use:

This bar chart shows how land is used globally for food production. There are six bars. From top to bottom they are labeled 'Earth's surface', 'Land surface'. 'Habitable land', 'Agricultural land', 'Global supply' and 'Global protein supply'. The top bar shows that 29% of the Earth's surface is land and 71% is ocean. The other bars show what percentage of this land is used for food production in each of the relevant categories. (OurWorldInData.org 2019)

If you want to measure traffic use:

This graph illustrates the type and length of road type by country in 2016. It shows three circles: one for England, one for Wales and one for Scotland. The different road types are 'Motorway', 'Trunk "A" Road', 'Principal "A" Road', '"B" Road' and '"C" and "U" Road'. These are depicted as coloured wedges and the length is expressed as a percentage along the outer edge of the circle. (Department for Transport 2017)

This bar chart illustrates the road type and length for the regions of England for 2016. From top to bottom the counties are: South West, South East, London, East of England, West Midlands, Yorkshire and The Humber, North West, North East. (Department for Transport 2017)

The next short course, in the AI Technologies for Business and Management program, will illustrate the different types of machine learning algorithms.

Your task

Think of the milk production scenario or another one in a workplace that you are familiar with. Try and come up with examples of:

  • Static data
  • Dynamic data
  • Evolving data

Share your examples in the comments area and comment on those from fellow learners.


References

Statista (2019) Combined desktop and mobile visits to Amazon.com from May 2019 to October 2019 [online] available from https://www.statista.com/statistics/623566/web-visits-to-amazoncom/ [12 December 2019]

Roser, M., Ritchie, H., and Ortiz-Ospina, E. (2019) ‘Internet’. OurWorldInData.org [online] available from https://ourworldindata.org/internet [12 December 2019]

OurWorldInData.org (2019) Global Land Use for Food Production [online] available from https://ourworldindata.org/uploads/2019/11/Global-land-use-graphic.png [12 December 2019]

Department for Transport (2017) Road Lengths in Britain 2016 [online] available from https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/611185/road-lengths-in-great-britain-2016.pdf [12 December 2019]

Share this article:

This article is from the free online course:

Using Artificial Intelligence (AI) Technologies for Business Planning and Decision-making

Coventry University