You have an incident report. What now? This second article on incidents sets a path to effective incident management without spending too much time on it.
In this article I will do my best to help you with incident management, to set up an incident handling process, as well as, to analyse and implement the knowledge that you get from your incident reporting process. It is a follow-up on my recent article “How to set up incident reporting“. I highly recommend that you read that one before this one – unless, of course, you already have incident reporting in place.
Again we dive into the two standards covering the topic: ISO 27002, ISO 45001. (These standards are good to read for a more in-depth understanding of incident management.)
In the last part of the article, I will give you a step-by-step guide on how to turn thought into action and create an incident management process using Gluu’s business process management platform.
So, an incident report has landed in someone’s inbox. What happens now? This is where incident management comes in.
Incident management starts with the ‘root cause’
First, we need to understand the incident. This starts with finding the ‘root cause’. What happened – and most importantly – why did the incident happen? Armed with this information we can start the work on preventing a new, similar incident from happening.
ISO 45001 has a great description of root cause analysis:
”Root cause analysis refers to the practice of exploring all the possible factors associated with an incident or nonconformity by asking what happened, how it happened and why it happened, to provide the input for what can be done to prevent it from happening again.”
And again: “This analysis can identify multiple contributory failures, including factors related to communication, competence, fatigue, equipment or procedures.”
As you might remember from the incident reporting article we created a Gluu form to collect as much information as possible. If you later realize that you do not have enough information to prevent recurrence, it’s a great time to go back and update the form with your newly realized needs.
Once the true root cause has been found corrective actions must be taken. This must, according to ISO 45001: “.. be appropriate to the effects or potential effects of the incidents or non-conformities encountered.”
How to set it up your incident management process in Gluu
Having an incident handling process is not the goal, but it provides a clear way of working for everyone and is required by the standards. This is the incident reporting process that we have from our last article. It is the first part of our overall incident management flow:
To go from reporting to analysis and management I add two new incident handling processes to this: ‘Analyse HSE incident’ and ‘Analyze IT incident’. The basis, however, is the same. This is what the process looks now:
As both fields of analysis has different roles to complete the task it makes sense to create them as sub-processes to this process; otherwise this process map would simply be to large to understand. Notice that I keep them separate since different roles will work with them and different methods are required for analysis.
Taking corrective action to mitigate the incident
Once the real root cause has been found corrective actions must be taken. This must, according to ISO 450001: “.. be appropriate to the effects or potential effects of the incidents or non-conformities encountered.”
Corrective actions are actions related to preventing that exact adverse event from happening again by removing (or improving) the root cause of the problem. Without doing anything excessively.
In Gluu we can treat it as a separate sub-process for both HSE and IT, as the activities, organisational escalation, changes etc. require different roles and (potentially) work instructions.
The ISO standards jointly suggests:
- Make changes to any relevant management system
- Implementation of new technology or software
- Review of existing procedures / processes / manuals
- Revised training
This action can take many different forms. Some are more effective than others. Let’s look into to guiding principles for implementation of the changes needed to mitigate the risk.
Selecting a strategy for corrective action
The HSE area has the “Hierachy of controls” principle to prioritise the initiatives.
At the heart of incident management lies a goal of preventing recurrence. But you need a balanced approach to the solution. At one end you can remove the hazard it will be very effective as no-one will encounter the situation again. On the other you accept the hazard as it is and protect you collegues from getting in too much in contact with it. The logical conclusion to prevent accidents is then to eliminate all hazards. However, is that always possible? Hardly.
Let’s say you work with off-shore oil drilling: That field of work carries a lot of inherited risks that cannot be eliminated without preventing the actual work to be completed.
From an IT perspective you could stop access to email in order to prevent phishing email issues. This might be very effective, but would make it hard for co-workers to get work done. In reality you may install software to scan all incoming emails and stops as many malicious emails as possible.
Incident management is also about choosing a balanced approach.
Manage risk without closing the business
For both HSE and IT it seems very reasonable that the cure must not be worse than the problem. So how do you find the point where lowering the risk further can’t be justified from a business perspective?
This is covered by a principle called ”ALARP” (As Low As Reasonably Practicable). Although the term originates in safety management, the concept is an essential one for all forms of risk management:
Within IT you have to accept that your colleagues have to use the intranet and that they need to log on the VPN from the outside. And that their password sometimes is the name of their dog 🙂
This shouldn’t prevent them from using the network at all.
A simple mitigation strategy is to set requirements for the length and complexity of their password. And how often they need to change it. This doesn’t require a lot of resources and reduces the risk significantly. So the name of their dog plus some special characters 🙂
Back to the incident handling process in Gluu
So where do these controls fit into our incident handling process? In Gluu we define two sub-processes for each area of work, as the work instructions and roles are different.
The most appropriate person to implement the actual change is the superviser or line manager, as they have the daily working knowledge of the field.
As mentioned in “How to handle incident reporting” it is vital to keep the flow of incident reports steady. The “Implement corrective actions” activity should also include feedback (a nice “Thank you”) to the person that reported it.
Adding the final activity to the Line managers role is easy in Gluu.
I hope you learned something about incident management, reporting and creating of an incident handling process. If you’re serious about this, then I would like to offer you a free and personalized feedback meeting where I can help you further on your way. You can book a meeting directly in my calendar here.