Demystifying the SOC, Part 5: The New SOC Maturity Model based on Outcomes
In our last blog post, we described the legacy SOC maturity model based on speeds and feeds tracking activity volume, mean time to detect (MTTD) and mean time to respond (MTTR). We demonstrated why SOCs that try to improve these metrics are not as effective or efficient as they could be. While these metrics are prevalent in the industry, and legacy maturity models are based on them, they have typically yielded approaches that try to be as complete as possible but are instead too complex and expensive, and lead to staff burnout.
Tracking the Threat Detection, Investigation and Response Lifecycle as a single continuum
Over time, it has become evident that the effectiveness and efficiency of a SOC should not be measured by the volume of activity at each phase in the Threat Detection, Investigation and Response (TDIR) workflow. This will lead to the SOC focusing on the wrong metrics and will promote a “garbage in — garbage out” syndrome at each of the internal phases. If the focus at each phase is quantity, not quality, accuracy problems will just be pushed to subsequent phases.
Instead, we need to look at the full lifecycle of TDIR as a continuous process while focusing on the fundamental mission of the SOC: to bring the organization back to a known good state after being hit by an attacker.
New SOC maturity model based on Outcomes
Rather than rating SOC maturity by functions mastered, SOC Maturity Model 2.0 focuses on outcomes, and on what use cases the SOC is able to deliver consistently and efficiently with minimal manual intervention. This approach unifies the SOC’s TDIR phases into one holistic, integrated workflow that can be performed at scale. The more mature SOCs are capable of consistently delivering outcomes across more sophisticated use cases, whereas less mature SOCs can only tackle simpler and easier use cases. In our next blog installment, we will dive into more details on what a use case entails, and what tools can do in order to help analysts with the delivery of these use cases.
Level 1 SOCs typically have the ability to detect, investigate and respond to a common set of use cases that are asset and device centric such as phishing, malware, and ransomware. These attacks are typically driven by machines exploiting known or zero-day vulnerabilities which, although very dangerous, are finite in nature, and hence easier to address than human and people-centric attacks.
Level 2 SOCs typically have the ability to detect, investigate and respond to a common set of use cases focused on compromised insiders. A compromised insider — as opposed to a malicious insider — is actually a hacker on the Internet who somehow stole a staff member’s (or contractor’s) credentials and is disguising their actions as the employee’s. These exploits are more challenging and complex to address than level 1’s asset-driven attacks because they leverage the identities and access privileges of known and trusted employees. Compared to finite machines, networks, and protocols, the universe of human behavior and creativity is limitless and more dangerous.
Level 3 SOCs typically have the ability to detect, investigate and respond to a common set of use cases involving malicious insiders. A malicious insider, such as an employee or contractor, is significantly more dangerous than a compromised insider. Why? A true insider is more likely to have extensive knowledge of how the organization and its infrastructure function. They often know where the controls, tools and security traps lie, how they’re configured and operate, and how to avoid getting caught. For that reason, the actions of a malicious insider are considerably more difficult to expose and address.
Level 4 SOCs are able to address some very advanced and custom use cases. Often these use cases and scenarios are related to the business, and go way beyond cyber-hygiene use cases. Examples include fraud use cases, or supply chain attacks.
Each level up requires an enhanced level of sophistication and maturity, including more features for SOC tools and more out-of-the-box content for broader scope. Across any level, the goal is to have tools that automate as many TDIR functions as possible, including not just detection, but alert validation and triage, investigation, response, and answer, so as to get infrastructure back to a known good state. The goal is not to have fully automated SOCs — although that would be nice!! — but to use tools that augment and amplify people’s efficiency and free up human time so they can focus on what the tools cannot drive automatically.
Benefits of this outcome-based approach
The advantage of this outcome-based approach is that it focuses on the fundamental mission of a SOC, which is to bring the environment back to a known, good, secure operating state after being hit, or to prevent a security incident from developing into a breach while trying to learn the lessons of the attack and engage in a continuous improvement cycle. This approach provides a more relevant set of goals and progressions for the SOC which can focus resources on the most common threats first, then address use cases of growing sophistication and complexity.
If an organization can automate most of TDIR for levels 1 and 2, it most likely can handle 80 to 95 percent of anticipated threats without much effort and focus manual resources on the more challenging potentially riskier Level 3 and other edge cases. This is not anymore about generating scores of alerts and false positives and closing them as soon as possible. It is about understanding that TDIR is an end-to-end workflow that focuses on outcomes.
Not only does this maturity model make the SOC more effective, it is likely to make the staff feel more effective, with more satisfaction and less burnout. Next blog, we’ll dive into use cases.