New offer! Get 30% off one whole year of Unlimited learning. Subscribe for just £249.99 £174.99. New subscribers only T&Cs apply

# What data is required to run a model?

Getting the data needed to run your model is often the most frustrating part of the whole modelling process
10.8
When it comes to designing your own models, you will likely stumble upon data issues sooner rather than later. In this lecture, we will have a short look at some points you want to consider when handling your model’s data.
29.2
With conceptual models, the mathematics and the interpretation are the main output you’re aiming for. Even if often small numerical examples are used for illustration, they are not that demanding in terms of data management. But with numerical modelling, you’re looking for quantifications. And the solution of your model will consequently provide numbers.
58.2
Lots of numbers.
62.4
And to get those, you will have to put in numbers first. Let’s take our simple Optimus system model as an example. Before getting any model outcomes, you will have to define the different cost parameters, the c var and the c invest, the price for emissions, the t, and emission factors, the em, for each of the technologies k you want to consider, the time structure l of your model and the resulting demand levels, the dem. Once you have all those and your model can finally run, you will get the values for the generation, the q, and the installed capacities, the q max. This may sound rather unproblematic.
112.2
But if you design an energy system model on an hourly basis for investment decisions up to 2050, those numbers add to huge data sets in no time. What data do we typically need to design a running numerical model? As those models are normally representations of markets or systems, we basically need the structural elements describing our system. This can include a given market infrastructure, like production or transport facilities, as well as physical aspects like input-output relations, construction and lifetimes, and cost structures. Similar, we may need definitions of the respective environment we have included in our model, like emission factors or the assimilative capacity of an ecosystem.
167.8
Contrary to those structural data, we have the operational data those systems produce– namely output– and prices. Often, our model will have those elements as endogenous variables and we don’t need them as direct input. But it’s always helpful if we have a real-world observation we can calibrate our model to. Similar, if the model is focused on one single market, the output and prices of another market may be a needed input for our model. For example, coal, oil, and gas prices are typically needed for electricity market modelling. Where can you find the data you need? There are plenty of sources out there statistical offices of the respective countries, international organisations, yearly company reports, and so forth.
226.6
Gathering the data is often a bit frustrating. And sometimes, you simply cannot find the data needed. But don’t give up yet.
239.4
To fill the data gaps of your model, you may be able to extrapolate the data you have or derive meaningful assumptions by using few data points and economic theory. For example, future investment costs can be constructed by assuming cost-reduction levels based on historic cost reductions. Or, you can construct a demand function by a few observations of real-world demand and price pairs.
272
The most important aspect of those derivations and assumptions is to make clear what you do and what your assumptions are. When it comes to your model results, there are two important rules you should take care of before making the effort of writing up your interpretation. Rule number one relates to the fact that we design our own models, and thereby should have a good idea in which direction the results are likely to develop. Thus, if you are really surprised by your findings, chances are high that they are caused by some model error. Maybe there’s a problem with the data upload or the model code and the model you think you’ve coded are not the same.
323.3
So take really care to figure out what drives your results. Those errors can be hidden really well. Rule number two is basically the same story. Even if your results look reasonable, there’s always a chance that there’s a model error in the background. I have already written up whole results sections multiple times, every time with really good-looking results. But there was a data upload error once and a misplaced bracket the second time. Well, that’s the life of a modeller. So, essentially, make sure you know your model is working as you think it should work. That’s why you should always make a small, traceable model example first and then go big later.
379.7
If you’re convinced enough that your results are correct, you’ll likely end up with up to millions of data points. Remember, we want to connect those with our problem. So we need to interpret our results with our problem in mind. To do so, we need number crunching. On the one hand, we need to filter all our results and boil them down to meaningful aggregates, or representative cases, maybe deriving yearly average prices out of hourly prices or aggregate locational information to regional or national totals. On the other hand, we need to present them. Numerical models have the advantage that they are well-suited to produce simple, yet meaningful, graphical representation of complex processes.
436
Remember, a single picture can be worth a thousand data points.

Data plays a crucial role in many models, especially large-scale numerical scenario models.

Getting the data needed to run your model is often the most frustrating part of the whole modelling process, simply for the fact that data is hard to get and seldom in the direct format you need for your model.

Data availability has increased in recent years, with statistical offices, agencies and associations providing diverse databases on multiple energy- and environmental-related topics.

Even if you cannot find the desired data, you may still be able to generate reasonable estimates. Always think about the underlying technical or natural structures that define the numbers you would need. For example, there are often no plant-specific generation costs available, but using fuel prices and plant efficiencies you can generate a close proxy.

## Input data

As a modeller you are more interested in transferring your input data into output, but don’t neglect the interesting information contained in your input data.

• Can you see shifts in the hourly demand of the last years?
• Is there some underlying price trend in your fuel costs?
• Does the spatial distribution of pollution give you an insight on non-observable pollution sources?

Your model will use this data to hopefully generate interesting findings, but sometimes the data itself already provides some findings that may motivate you to add or change elements of your model.

When you finally get your model results there are two things to do:

1. First, you need to make sure the numbers are correct.
2. Second, you need to transfer the numbers into understandable results for your audience.

## Coding issues

Hopefully you will have a working model that produces correct results. However, there is always the risk that you made some small errors while writing your model code or constructing your data upload.

Always remember: your model software will do whatever you told it, not what you think you told it. So double-check that your coding is correct.

Presenting your results can sometimes be a challenge as you literally produce thousands of numbers. Boil it down to a reasonable reader-friendly summary: use descriptive statistics, aggregate results, show examples, limit the number of scenarios and highlight the insights.

Tables and figures are your friends when it comes to presenting your model outcomes, but always remember that it’s what you read out of those numbers not the numbers that make your interpretation.