Tuesday, July 19, 2022

Fully Exploiting Data Sources

Very often, we view data sources as somewhat one dimensional, and don't think about how we can really get value from that data source. We're usually working on a case, just that investigation that's in front of us, and we're so "heads down" that we may not consider that what we see as a single data source, or an entry from that data source (artifact, indicator), is really much more useful, more valuable, than how we're used to viewing it.

So, what am I talking about? Let's consider some of the common data sources we access during investigations, and how they're accessed. Consider something that we're looking at during an investigation...say, a data source that we often say (albeit incorrectly) indicates program execution the "AppCompatCache", or "ShimCache". Let's say that we parse the AppCompatCache, and find an entry of interest, a path to a file with a name that is relevant to our investigation. Many of us will look at that entry and just think, "...that program executed at this time...". But would that statement be correct?

As with most things in life, the answer is, "it depends." For example, if you read Caching Out: The Value Of ShimCache to Investigators (Mandiant), it becomes pretty clear that the AppCompatCache is not the same on all versions of Windows. On some, an associated time stamp does indeed indicate that the file was executed, but on others, only that the file existed on the system, and not that it was explicitly executed. The time stamp associated with the entry is not (with the exception of 32-bit Windows XP) the time that the file was executed; rather, it's the last modification time from the $STANDARD_INFORMATION attribute in the MFT record for that file. To understand if that time stamp corresponds to the time that the file was executed, we need to consider artifact constellations, correlating the data point with other data sources to develop the context, to develop a better understanding of the data source (and point), and to validate our findings.

Further, we need to remember that ShimCache entries are written at shutdown; as a result, a file may exist on the system long enough to be included in the ShimCache, but a shutdown or two later, that entry will no longer be available within the data source. This can tell us something about the efforts of the threat actor or malware author (malware authors have been known to embed and launch copies of sdelete.exe), and it also tells us something about the file system at a point in time during the incident.

The point is that the data sources we rely on very often have much more value and context than we realize or acknowledge, and are often much more nuanced that we might imagine. With the ShimCache, for example, an important factor to understand is which version of Windows from which the data was retrieved...because it matters. And that's just the beginning.

I hope this is beginning to shine light on the fact that the data sources we very often rely on are actually multidimensional, have context and nuance, and have a number of attributes. For example, some artifacts (constituents of data sources) do not have an indefinite lifetime on the system, and some artifacts are more easily mutable than others. To that point, Joe Slowik wrote an excellent paper last year on Formulating a Robust Pivoting Methodology. On the top of the third page of that paper, Joe refers to IOCs as "compound objects linking multiple observations and context into a single indicator", and I have to say, that is the best, most succinct description I think I've ever seen. The same can be said for indicators found with the various data sources we access during investigations, so the question is, are we fully exploiting those data sources?

No comments: