Skip to 0 minutes and 6 secondsYou designed the reasonable model representation of a real-world problem. You ensured that it is computational feasible and small enough to be solved in a reasonable time. Now all you need is some data, and you can hit the Solve button. And this is where most of the trouble in numerical modeling really starts. Remember what you learned about endogenous and exogenous variables. This is where it becomes tricky in numerical modeling. Let's take our electricity market example again. Whatever you defined to be an endogenous variable works fine as it will be generated by the model. But what you considered as external parameter needs to be given to the model, which means you will have to find data for it.

Skip to 1 minute and 2 secondsSometimes it may not be possible to find the data you would like to have. This does not necessarily mean you can't continue. Remember that you are the one designing the model. So if data is not available, go back to the model design step and think about ways to reformulate your model without this data requirement. Maybe you can make it an endogenous variable of the model, or you can tweak your model by shifting the focus so that you don't need the specific data anymore, or you can construct data. Similar to the computational restrictions, data restrictions have a large feedback effect on model design.

Skip to 1 minute and 49 secondsWhat data do we typically need to design a running numerical model? As those models are normally representations of markets or systems, we basically need the structural elements describing our system. This can include a given market infrastructure, like production or transport facilities, as well as the physical aspects like input/output relations, construction and lifetimes, and cost structures. Similar, we may need definitions of the respective environment we have included in our model, like emission factors or the assimilative capacity of an ecosystem. Contrary to those structural data, we have the operational data those systems produce, namely output and prices. Often our model will have those elements as endogenous variables, and we don't need them as direct input.

Skip to 2 minutes and 45 secondsBut it always helps if we have real-world observations we can calibrate our model to. Similar, if the model is focused on one single market the output and price info of another market may be a needed input for our model. For example, coal, oil, and gas prices are typically needed inputs for electricity market models.

Skip to 3 minutes and 11 secondsGathering the data often is a bit frustrating, and sometimes one cannot find the data needed. So you have to become a bit creative when it comes to filling the data gaps of your model. Let's make a simple and pretty common example. Assume you want to include this simple demand function in your model to capture the consumer side. Now, what you typically can find about markets are historic prices and quantities. This provides you with a single point of your demand function. But normally, you won't find demand function data out there. Now what to do? Well, we can proceed by doing what economists always do. We assume something.

Skip to 3 minutes and 57 secondsFirst, we need the relation between prices and quantity changes, in other words, the demand elasticity. Second, we assume a specific functional form. To keep things simple, linear functions are always a good first choice. And there you go. Starting with just two data inputs, you just constructed a demand function for your model. Similar approaches can be helpful in other settings as well. Finally, where can you find data? There are plenty of sources out there, ranging from statistical offices of the respective countries, international organisations, and yearly company reports. If the World Wide Web fails to deliver what you need, there's one final option left-- the real world, where you can collect data.

# Data input

Getting the data needed to run a numerical model is oftentimes the most frustrating part of the whole modeling process, simply for the fact that data is hard to get and seldom in the direct format you need for your model. To help you reduce the time spent on searching data we provide a Google Form where you can submit your links and share them publicly via Google Docs.

Even if you cannot find the desired data you may still be able to generate reasonable estimates. Always think about the underlying technical or natural structures that define the numbers you would need; i.e. there are often no plant specific generation costs available, but using fuel prices and plant efficiencies you can generate a close proxy. Similar, economic mechanics can help you to construct the data you need, just as we discussed in the demand function example of the tutorial.

After getting the data, another step you may want to consider is a simple descriptive analysis of this data. Can you see shifts in the hourly demand of the last years? Is there some underlying price trend in your fuel costs? Does the spatial distribution of pollution give you an insight on non-observable pollution sources? Your model will use this data to hopefully generate interesting findings, but sometimes the data itself already provides some findings that may motivate you to add or change elements of your model.

© University of Basel