Security incident response processes
The main processes of the SOC will centre on its key responsibility of security incident response.
The processes involved with monitoring and Security Information and Event Management (SIEM) also link into the security incident response processes, and as such are normally considered together. These processes consist of the following stages and are linked as follows:
The security incident response processes are expandable
In the preparation phase we are looking at processes that need to be in place before we can properly deal with a security incident. The first of these is the process of asset discovery. In an ideal world, our information technology service management (ITSM) system (which we will talk about later) has enumerated and recorded all the configuration items in the system and the discovery phase is merely extracting the relevant information from the configuration management database (CMDB).
Often, however, it is useful to undertake our own discovery processes (which may be informed by the CMDB) to ensure that we have the necessary information in a form that is usable by the SOC. It also needs to be kept up to date and it is common to undertake detailed scans of the infrastructure at regular intervals to detect any changes. Also in the preparation phase is the creation of appropriate policies and procedures which will help guide our activities in the later stages.
Detection and evaluation
In the detection and evaluation phase, we’re looking at uncovering potential issues and deciding if they need to be dealt with. The first stage in this is the detection process. Here we’re looking at the detection of indicators of compromise (IOC).
This has to be a largely automated stage with only important IOCs being relayed to the analysts in the SOC, preferably after a summarisation process. When an IOC has been detected, the next stage is to triage the potential security incident. The concept of triage comes from medicine where patients are classified based on the severity of their condition and the likelihood of the treatment being effective.
For our purposes, the triage process consists of three sub-processes – verification, where the analyst confirms that this is a previously unseen incident; initial classification, where the type and severity of the incident is initially determined; and, finally, assignment, where the incident is assigned to a particular analyst or team. This triage stage needs to be undertaken quickly and efficiently with priority given to the high-value assets; however, any successful compromise should be investigated.
In order to properly analyse the incident, it is likely that the analyst will need to undertake further data collection. This data collection might be from the IT monitoring, but could also be from threat intelligence or from other systems such as a door entry log. Once the initial data has been collected, the data analysis can begin.
In the initial stages, this will often involve trying to answer who (was the attacker), what (did they compromise), where (did they compromise), when (did they compromise), and how (did they compromise the system). It may not be possible to answer all of these questions, but attempts should be made to answer as much as possible.
Another way of looking at the processes involved in the detection and evaluation phase is to use an OODA loop. You’ll remember from the previous step that the OODA loop was developed to deal with rapidly changing military situations and to enable military personnel to respond appropriately. This approach facilitates a rapid response to a situation, but care needs to be taken to ensure that the focus is on an appropriate response and not just a fast response.
After the data analysis stage, we can enter the response phase. The first step here is containment, as we need to ensure that any intrusion is not left to spread unchecked through the IT system. The exception to this being if the compromise has been to part of a honeynet. As the honeynet is separate, and set up to test our system, we may wish to allow the compromise to spread to other parts of the honeynet (though obviously, not beyond).
Containment is followed by the recovery of our system. This will commonly involve rebuilding the system and restoring the data from backups. In the case of complex or advanced persistent threat (APT) attacks on mission critical system, it may even be desirable to replace the hardware, though this is normally not necessary.
When thinking about containment and recovery we need to weigh up two conflicting demands – do we recover quickly and potentially destroy evidence, or do we investigate and try to recover evidence at the expense of operational requirements? The choice made will depend on the nature of the service compromised and how likely a compromise is to spread, and it is here that a honeypot/honeynet can be very useful.
Once recovery has been completed, the incident and the response(s) taken need to be fully reported. This will normally involve recording the information in an appropriate knowledgebase (and potentially the CMDB) and informing management of what occurred. Care must be taken that reports are used to help improve performance and not just archived and this is where the analysis of response stage comes in.
Here, the SOC should consider what went wrong and how to prevent it and similar incidents from happening again. It may also be desirable at this stage to evaluate the processes being used and to see how SOC processes could be improved to ensure that future incidents of this type are prevented from even occurring. Some organisations apply a PDCA loop to help with this process.
© Coventry University. CC BY-NC 4.0