Skip main navigation

Demonstrating data modelling

Demonstrating data modelling

We’ll take a look at a demonstration of the process now. As with our previous demonstration examples, much of this demonstration will be interactive using Jupyter Notebook. Before we move on to the demonstration part, let us refresh our thoughts regarding data modelling in general.

Data preparation

As we know, data preparation is one of the initial steps of any data analytics process: for example, the CRISP-DM method.

Graphic shows the "Data analytics CRISP process". We see the six phases. "Business Understanding, "Data Understanding". Data Preparation", "Modeling", "Evaluation", and "Deployment". The diagram shows that the sequence is not strict and can move back and forth with arrows moving in both directions between the phases – this represents the cyclic nature of data mining.

During data preparation, you transform clean data, and account for missing data to ensure that the data set is ready to be mined and is in the format required for the subsequent algorithm or data mining procedure.

Data preparation can take an enormous amount of time depending on the volume of the data and the number of data sources involved. This time spent can be between 70–80% of your efforts in the entire data analysis procedure.

As you can see in the links to the sources from where the files provided to you in the zipped folder are extracted, they were in a messy and raw form. As a part of data preparation, these data sets were cleaned, transformed, and organised into a CSV file for you. This allows your data to be brought into context of the business or analysis (here, finding answers to the question we earlier introduced you to). The CSVs that are provided have been cleaned using Excel to normalise the areas into square kilometres, as well as combine data and metadata into a single sheet.

Data modelling

After you, basically, explore various properties of the data to assess and ensure how the data can help to achieve the business goals that are laid as a result of data preparation, you conduct analysis to find the patterns in the data that can lead to answers. In data modelling, you prepare a set of techniques to update and analyse your data. In our case, for the purpose of the data visualisation course, we will look at graph data modelling.

Graph data modelling is a relatively new concept that has emerged in the data analytics and visualisation space. When it comes to drawing relationships and correlations among variables, we lean towards plotting on graphs. The concept by itself is vast and complex. To know more on this, read the article by a data architect in the link below.

Read: Graph Data modeling: Categorical Variables [1]

Demonstration: Data exploration

Now, let us talk business! Download this Jupyter Notebook here and get the taste of real data exploration. Check if you have already downloaded the zipped folder earlier. If so, make sure you extract the individual files and save it under the same folder you store this Notebook.

Follow the instructions given below in the Notebook and render outputs to understand the exploration process based on the scenario we introduced you to.

  1. Load and define related functions for all the datasets.
  2. Plot the number of fires for each month (Date Reported is once a month) and look for patterns using relation plots.
  3. Aggregate the data to give the mean number of fires per year for each state, which will still allow us to look at trends, correlations, and differences between states.
  4. Check what correlations could be drawn between:

    a. population and mean Fires

    b. population density and mean fires.

    c. tree cover area and fire count

    d. forest area and number of fires

    e. date reported and number of fires.

Download: Final-data-exploration.ipynb

References

  1. Allen D. Graph Data Modeling: Categorical Variables [Article]. Medium; 2019 Oct 7. Available from: https://medium.com/neo4j/graph-data-modeling-categorical-variables-dd8a2845d5e0
This article is from the free online

Data Visualisation with Python: Bokeh and Advanced Layouts

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education