10 Reasons SIEM Should Remain Dedicated to Security — Part 3
Data Considerations
Tracking right along our series of blogs. The first installment was the intro to this series, with an analogy on market segmentation for cars — the perfect car for all doesn’t exist. The second blog discussed how different stakeholders, different owners, different delivery processes and different outcomes usually means different tools.
In this blog, we’ll address data considerations. When I was at Gartner, I published several research notes on use cases that shed light on “the use case triangle”. Different insights (and hence different use cases) need different data as described in the following diagram:
Today, let’s deep dive on the data required to solve security, ITOM and APM use cases.
4. Different data sources for different use cases
Different use cases require a diverse range of data sources. For example, ITOM needs data that reflect the state of the IT infrastructure, providing detailed insights into the performance, usage, and availability of assets and resources; APM and observability, on the other hand, require application-centric data that supply intricate details about each application’s performance. In contrast, security use cases require event data sources across the enterprise — both on-prem and in cloud, streaming feeds, and user and application context.
While it might seem convenient to gather as much data as possible, it’s important to avoid irrelevant data sources. The costs associated with collecting, storing, and processing irrelevant data can balloon quickly, often with diminishing returns, especially as data volumes are only growing.
Let’s look at what is the data set required to solve use cases in security, ITOM and APM. It is rare to have the same data set used for all these use cases (in my decades of work in CyberSec, I have never seen it.) Let’s represent this as a Venn diagram as below:
If we wanted to solve both security as well as ITOM and APM use cases using the same tool, the data sources required would look like this:
In fact, the subset of data that is used at the same time for security, ITOM and APM use cases is a very small subset of the overall set, as described in the diagram below:
As illustrated in Figure 4, the utilization of a joint tool for all use cases yields only minimal savings in terms of data collection and storage. This implies that the potential drawbacks and complexities associated with a joint tool far outweigh any marginal benefits.
5. Different logs even for the same data source
Logs used across different use cases can vary significantly, even when they are generated by the same data source.
For example, to think “I need Linux logs to solve for security, ITOM and APM use cases” hides nuances and complexities around what exactly are “Linux logs”. In fact, there are entire families of Linux logs. Some of these families and logs are mainly relevant to ITOM use cases, some align more to APM/observability, while others are more suitable for security use cases. The 24 families are described in the following table:
As previously mentioned, addressing all three use cases requires the union of all relevant logs. Here, again, we observe only marginal storage savings when employing a single, joint solution, and the same Venn diagram can be used to represent potential savings in the Linux log families, as described below (the same Venn diagram could be used for most other relevant data sources too):
This illustration underscores that while there is some overlap, most log data is unique to each use case, limiting the efficiency of a single, unified tool. Therefore, it becomes crucial to recognize and appreciate these distinctions while choosing the appropriate tools for different organizational needs.
That’s enough for today, next time we’ll wrap up our data considerations.