This article explains the history, concept and structure of databases in information and technology.
In order to start discussing computerised information systems, we need to clarify some basics. There is one thing common to all computerised information systems and that is that there is some sort of data stored and processed. Therefore the first step is to talk about data storage (the term databases is often used in the same sense, but in this step we need to make a strict distinction). From there it will only be a quick jump to get to the all-encompassing organisational information systems called Enterprise Resource Planning (ERP) systems.
In terms of information systems in organisations, historically databases (DB) appeared first. DBs store data in relational tables (a set of interlinked tables). They belong to the realm of electronic data processing (EDP) and they are built with a data focus. They were located where the data was originated to make recording the data as easy as possible. As the databases (and the computers in general) were initially quite expensive, even a large organisation was only able to purchase a few. The first ones usually served the manufacturing processes. Additional ones were then purchased for finance, marketing, etc. As there are potentially many places in an organisation where data is produced, there were soon many databases around. The problem that went unrecognised for a while was that the databases were usually introduced with no single conception throughout the organization. Being locally controlled, different structures and technologies were adopted and even the definition of data (metadata) was different. The result of this was that the DBs usually became technically incompatible and redundantly stored large amounts of data. The result of the latter was that the DBs became contextually incompatible, i.e. the same name would appear with different meanings and there were often different terms used to label the same data. For instance, ‘sales volume’ could mean the number of sold cars for manufacturing, the calculated cost for purchasing, and the sales income for the sales department. At the same time, a purchased item could appear as cost in accounting, as expense in finance and raw material in manufacturing. As databases had a limited capacity and the production of data was (and usually still is) notoriously under-estimated, in DBs data were periodically (or occasionally) erased. During the upgrades these problems became worse and worse and most of the organizational database systems were soon falling apart.
The second type of data storages that appeared were called data marts (DM). These are ‘user-side’ data storages and are usually associated with the idea of management information systems (MIS). They were meant to satisfy the data needs of specific analyses and were therefore located at the point at which the data is used. DMs were normally expected to retrieve data from both internal databases as well as external sources. As there can be lots of analyses done in any organization, there were soon numerous data marts also with different technologies and different data formats, and so the situation described in the case of DBs was seen with DMs also. The overall situation actually got even worse as in order to retrieve data from the DBs, at the interface points between DBs and DMs, converters were created that translated the data from one technology to another, one structure to another, and one interpretation to another. The outcome was an enormous mess of connections between the various storages with numerous translators and gradually, even the most ICT-minded people started to lose sight of what was where. Something had to be done, and fast.
Data warehouses (DW) then came to rescue. DWs are central storages of data. Essentially, the idea was to bring everything from all the databases into a single storage. They are meant to be integrative elements of database systems and they should eliminate many disadvantages of DBs and DMs. As a DW is a single unit, it uses a single technology, retrieving the data from various databases and converting them to a single data format. During this process data are also cleaned to avoid redundancy. In order to avoid any misunderstandings about the definition/meaning of data, DWs also store meta-data, i.e. data about data. These tell the user where the data is coming from, how it is calculated, what it means, which measuring units are used and so forth. In turn, all the data marts can either be removed, i.e. using the DW as a data source instead, or they ‘feed’ from the DW. While the DBs are periodically erased, the capacity of a DW is periodically expanded, so that no data needs to be erased. Of course, this also means a fast-growing amount of data, and thus DW stores data in time series to facilitate searching.
In a well organised organisational data system we can expect to have all three forms of data storages. The DBs will remain at the point of the origin of the data while serving as sources for the DW. There should be a single DW for the whole organization which retrieves and reorganizes/cleans the data from the DBs. On the other side, the DW can feed the DMs with up to date data for their analysis requirements. In fact, DMs do not even need to store data (even if they often do). It is sufficient if it contains only queries, i.e. tables that indicate where to retrieve the data from, and they can then retrieve it in real time as needed. This is what we may call a three-tier data storage system. This is the backbone of the organisation-wide information systems, called Enterprise Resource Planning (ERP) systems, which do all the information jobs within a single system. This is why, when I want to tease my friends who are ERP consultants, I simply call ERPs ‘organisation-wide databases’. Of course, this is a little unfair, as apart from having all the data stored in an appropriate format, ERP systems also carry out a huge number of data transactions, i.e. the process your bills, continuously update warehouse details based on purchases and manufacturing, etc.
There is one further thing that ERP systems provide and that is analytical tools. It all starts with the simple ability to group your data according to any aspect (variable, data field), which is called OLAP. OLAP stands for On-Line Analytic Processing. Being online means that the OLAP is refreshed immediately if data are changed, i.e. it works with real-time data. We can have various dimensions within the same OLAP cube, with a maximum of 32. We can also have various cubes at the same time which essentially means unlimited dimensions for analysis. It is easy to think of OLAPs as multi-dimensional Excel tables with graphs/charts. The OLAP brings the idea of MIS to its fulfilment; they (supposedly) produce processed information that managers can make use of. It is useful to think about OLAPs being the fourth layer following the DMs.
© University of Strathclyde