I've been putting some serious thought into the topic of a new #DFIR model, and in an effort to extend and expand upon my previous post a bit, I wanted to take the opportunity to document and share some of my latest thoughts.
I've discussed toolmarks and artifact constellations previously in this blog, and how they apply to attribution. In discussing a new #DFIR model, the question that arises is, how do we describe an artifact or toolmark constellation in a structured manner, so that it can be communicated and shared?
Of course, the next step after that, once we have a structured format for describing these constellations, is automating the sharing and "machine ingestion" of these constellation descriptions. But before we get ahead of ourselves, let's discuss a possible structure a bit more.
The New #DFIR Model
First off, to orient ourselves, figure 1 illustrates the proposed "new" #DFIR model from my previous blog post. We still have the collect, parse, and enrich/decorate phases prior to the output and data going to the analyst, but in this case, I've highlighted the "enrich/decorate" phase with a red outline, as that is where the artifact constellations would be identified.
|Fig 1: New DFIR Model|
- UserAssist entry in the NTUSER.DAT indicating Defender Control was launched
- Prefetch file created for Defender Control (file system/MFT; not for Windows server systems)
- Registry values added/modified in the Software hive
- "Microsoft-Windows-Windows Defender%4Operational.evtx" event records generated
|Fig 2: WinDefend Exclusions|
At this point, a couple of thoughts or ideas jump out at me. First, the individual artifacts within the constellation can be listed in a fashion similar to what's seen in Yara rules, with similar "strings" based upon the source. Remember, by the time we're to the "enrich/decorate" phase, we've already normalized the disparate data sources into a common structure, perhaps something similar to the five-field TLN format used in (my) timelines. The time field of the structure would allow us to identify artifacts within a specified temporal proximity, and each description field would need to be treated or handled (that is, itself parsed) differently based upon the source field. The source field from the normalized structure could be used in a similar manner as the various 'string' identifiers in Yara (i.e., 'ascii', 'nocase', 'wide', etc.) in that they would identify the specific means by which the description field should be addressed.
Some elements of the artifact constellation may not be required, and this could easily be addressed through something similar to Yara 'conditions', in that the various artifacts could be grouped with parens, as well as 'and' and 'or', identifying those artifacts that may not be required for the constellation to be effective, although not complete. From the above examples, the Registry values being modified would be "required", as without them, Windows Defender would not be disabled. However, a Prefetch file would not be "required", particularly when the platform being analyzed is a Windows server. This could be addressed through the "condition" statement used in Yara rules, and a desirable side effect of having a "scoring value" would be that an identified constellation would then have something akin to a "confidence rating", similar to what is seen on sites such as VirusTotal (i.e., "this sample was identified as malicious by 32/69 AV engines"). For example, from the above bulleted artifacts that make up the illustrated constellation, the following values might be applied:
- Required - +1
- Not required - +1, if present
- +1 for each of the values, depending upon the value data
- +1 for each event record
A notional example constellation description based on something similar to Yara might then look something like the following:
$str1 = UserAssist entry for Defender Control
$str2 = Prefetch file for Defender Control
$str3 = Windows Defender DisableAntiSpyware value = 1
$str4 = Windows Defender event ID 5010 generated
$str5 = Windows Defender DisableRealtimeMonitoring value = 1
$str6 = Windows Defender event ID 5001 generated
$str1 or $str2 and ($str3 and $str4 and $str5 and $str6);
Again, temporal proximity/dispersion would need to be addressed (most likely within the scanning engine itself), either with an automatic 'value' set, or by providing a user-defined value within the rule metadata. Additionally, the order of the individual artifacts would be important, as well. You wouldn't want to run this rule and in the output find that $str1 was found 8 days after the conditions for $str3 and $str5 being met. Given that the five-field TLN format includes a time stamp as its first field, it would be pretty trivial to compute a temporal "Hamming distance", of sorts, a well as ensure proper sequencing of the artifacts or toolmarks themselves. That is to say that $str1 should appear prior to $str3, rather than after it, but not so far so as to be unreasonable and create a false positive detection.
Finally, similar to Yara rules, the rule name would be identified in the output, along with a "confidence rating" of 6/6 for a Windows 10 system (assuming all artifacts in the cluster were available), or 5/6 for Windows Server 2019.
Something else that we need to account for when addressing artifact constellations is counter-forensics, even that which is unintentional, such as the passage of time. Specifically, how do we deal with identifying artifact constellations when artifacts have been removed, such as application prefetching being disabled on Windows 10 (which itself may be part of a different artifact constellation), or files being deleted, or something like CCleaner being run?
Maybe a better question is, do we even need to address this circumstance? After all, the intention here is not to address every possible eventuality or possible circumstance, and we can create artifact constellations for various Windows functionality being disabled (or enabled).