New offer! Get 30% off one whole year of Unlimited learning. Subscribe for just £249.99 £174.99. New subscribers only. T&Cs apply

# Demonstrating data modelling

Demonstrating data modelling

We’ll take a look at a demonstration of the process now. As with our previous demonstration examples, much of this demonstration will be interactive using Jupyter Notebook. Before we move on to the demonstration part, let us refresh our thoughts regarding data modelling in general.

## Data preparation

As we know, data preparation is one of the initial steps of any data analytics process: for example, the CRISP-DM method.

During data preparation, you transform clean data, and account for missing data to ensure that the data set is ready to be mined and is in the format required for the subsequent algorithm or data mining procedure.

Data preparation can take an enormous amount of time depending on the volume of the data and the number of data sources involved. This time spent can be between 70–80% of your efforts in the entire data analysis procedure.

As you can see in the links to the sources from where the files provided to you in the zipped folder are extracted, they were in a messy and raw form. As a part of data preparation, these data sets were cleaned, transformed, and organised into a CSV file for you. This allows your data to be brought into context of the business or analysis (here, finding answers to the question we earlier introduced you to). The CSVs that are provided have been cleaned using Excel to normalise the areas into square kilometres, as well as combine data and metadata into a single sheet.

## Data modelling

After you, basically, explore various properties of the data to assess and ensure how the data can help to achieve the business goals that are laid as a result of data preparation, you conduct analysis to find the patterns in the data that can lead to answers. In data modelling, you prepare a set of techniques to update and analyse your data. In our case, for the purpose of the data visualisation course, we will look at graph data modelling.

Graph data modelling is a relatively new concept that has emerged in the data analytics and visualisation space. When it comes to drawing relationships and correlations among variables, we lean towards plotting on graphs. The concept by itself is vast and complex. To know more on this, read the article by a data architect in the link below.

## Demonstration: Data exploration

Now, let us talk business! Download this Jupyter Notebook here and get the taste of real data exploration. Check if you have already downloaded the zipped folder earlier. If so, make sure you extract the individual files and save it under the same folder you store this Notebook.

Follow the instructions given below in the Notebook and render outputs to understand the exploration process based on the scenario we introduced you to.

1. Load and define related functions for all the datasets.
2. Plot the number of fires for each month (Date Reported is once a month) and look for patterns using relation plots.
3. Aggregate the data to give the mean number of fires per year for each state, which will still allow us to look at trends, correlations, and differences between states.
4. Check what correlations could be drawn between:

a. population and mean Fires

b. population density and mean fires.

c. tree cover area and fire count

d. forest area and number of fires

e. date reported and number of fires.

## References

1. Allen D. Graph Data Modeling: Categorical Variables [Article]. Medium; 2019 Oct 7. Available from: https://medium.com/neo4j/graph-data-modeling-categorical-variables-dd8a2845d5e0