Wednesday, May 08, 2019

Deep Knowledge, and the Pursuit Thereof

When IR was largely DF-related work, relatively few in the industry held deep knowledge of artifacts.  Over the years, IR has moved from "image all the things" to "find and image the impacted systems" to "let's deploy an enterprise agent or sensor and collect data from all the things".  As the need for enterprise-wide response became more evident, we developed concepts such as "triage", or collecting specific, targeted data from systems to make a decision as to whether they were "in scope" or not.  We adapted concepts such as "sniper forensics", seeking out that targeted data, from disk forensics to the enterprise.  As we've moved to this enterprise-scale response, including deploying sensors, agents, and automated means of data collection and parsing, we need to ensure that we continue to progress beyond where we were before, in that practitioners can be even further removed from developing a deep knowledge of the data.  This isn't to say that this is the case for all analysts; being absolute would be both incorrect and pointless. 

As tools and frameworks have been specifically designed for addressing the enterprise issue, deep knowledge of systems and artifacts can potentially remain with a few, rather than opening the door and extending that knowledge for the many. As practitioners, we have to be wary of tools and frameworks blinding us to the deep knowledge of the nature and context of the data itself.

Many, many moons ago, back when I was QSA-certified and conducting PCI examinations (ssshhh...don't tell anyone...), our team was using a commercial forensic suite to perform searches across acquired images for credit card numbers (CCNs).  Our assumption was that the commercial product performed as advertised, in that it found "valid" CCNs, per the definition of the PCI Council (which, at the time, was Visa).  We had three checks at the time...BIN, length, and Luhn...and if the CCN that was found passed all three, it was passed along to the appropriate card brand for verification.  At one point, we had a case for which we knew CCNs from two specific brands had been used, but running the commercial product produced no results for those brands.  Our initial query with respect to what the product considered a "valid" CCN resulted in a link to a wiki page on credit card numbers, but did nothing to explain why we weren't received the expected result.  Finally, a deeper investigation, which included further questions and no small amount of testing, revealed that the product at the time did not consider some valid CCNs to be "valid".  Rather than waiting for the core, underlying code to be updated, we opted to go with 7 distinct regexs; while this slowed the search process down, it did give us the needed capability.

My point is that all of this came from a few analysts who were close to the data, and had 'deep knowledge' of the artifacts.  Or, at least deep enough to know where they needed to go deeper than what the tool was presenting.  At the time, this was not something that was plastered all over the Internet; no, it was these few analysts who were looking at the issue, and subsequently, wondering what else had been missed.

When I first released RegRipper over a decade ago, my intention was for it to be a community-based tool.  There was one thing that I was absolutely sure of...I would never see everything there was to see, even in the Registry, nor would I know everything there was to know.  As such, I wanted to provide a means by which analysts could either write their own plugins (some did, starting with copy-paste...) and share them with the community, or reach out and share data so that a plugin could be written or updated.  Over the years, more than a few have done so, but for the most part, those who use the tool do so by downloading and running it.

In 2013, Corey Harrell released auto_rip, a tool that brought a modicum of automation to RegRipper. In releasing it, Corey stepped on to the path of sharing his thought process when it came to analysis; in auto_rip, Corey shared how he structures the collected data for analysis, moving RegRipper from a point-and-fire tool to one used to take a targeted approach to data parsing and presentation. Much more recently, Silv3rHorn released autoripy, in part because auto_rip hadn't been updated in some time.

Nuix has a free extension for their Workstation product for automating the use of RegRipper (and one for Yara, as well), and automatically incorporating the results directly into your Nuix 'case'.  This extension automates almost all of what an analyst would need to do to run RegRipper; it automatically locates the hives (independent of the version of Windows), and runs plugins based on the profiles for each hive.  But remember, I said, "almost".  The analyst has to download RegRipper themselves, as well as ensure that the profiles for each hive are updated, based on the currently available plugins.  This is easy enough to do, as the command line tool for RegRipper (i.e., 'rip') includes a switch for automating this process.  But this isn't something that the extension does; to make the best use of the extension, the analyst needs to do just a little bit more work. 

The question then becomes, are you doing it?  Or are you downloading RegRipper and running it via the extension, with no modifications?  Are you pulling everything you need for your case from the Registry hives, or are you relying on the tool to do it for you?

While I have a great appreciation and fondness for automation, and respect for the effort that goes into creating automation, my concern with regard to tools such as RegRipper, log2timeline, plaso, KAPE, etc., is that rather than pulling back the "veil of mystery" and making the data more accessible (and therefore, more within their realm of knowledge) to the analyst, and thereby increasing deep knowledge in a much wider range of analysts, the result is the opposite. Instead, are we allowing automation, particularly at the enterprise level, to add additional layers of abstraction between the data and the analyst?

Don't get me wrong...I'm not bashing tools such as plaso, KAPE, or any other tools like them.  Not at all.  If it makes someone's job easier and more efficient, that's awesome.  It doesn't matter if it's a full-blown compiled application or a batch file...if it works, so be it.  All of these things are wonderful.  But as practitioners, we have to be careful about how we view and use the tools. 

A side effect (or ancillary effect, depending upon how you look at it) of this is that the community has weaponized terms like 'expert' and 'authority'.  These terms are used to set their designees apart, unreachable and untouchable.  "They're the expert, so all of the functionality I would need must be included in that tool that they released for free, and it's not something I need to concern myself with."

Circling back to RegRipper, I don't know everything there is to know about the Windows Registry, and I certainly don't have any insight into your analysis goals, nor the data you're currently examining.  If you've updated RegRipper with the latest set of plugins, there's no guarantee that there are plugins that will extract and parse the data pertinent to your case.  There may be a plugin that parses data from an older version of that application the user launched, but it hasn't been updated in 8 years.  Or maybe the user used an application for which no plugin exists.

Tool and framework development is great; what better to make your work go quicker, more efficient, and less error prone than automation?  And I don't expect that all of a sudden, everyone will know everything; again, that just doesn't make sense.  However, as practitioners, we shouldn't rely on these tools and frameworks to automagically provide all of the data needed for our case, parsed and displayed for our analysis.  Instead, we need to be vigilant, and ensure that we're looking at such things with a critical eye.

My hope is that more folks in the DFIR industry will use these tools and frameworks as a means to develop deeper knowledge of the data and artifacts, rather than an excuse to not do so.

2 comments:

@_N4rr34n6_ said...

It's a good article, Harlan. And you're not wrong. Congratulations.
I also believe that 'automatic' tools can be used to acquire that deep knowledge about an artifact. Today we are lucky enough to be able to simulate various scenarios about a type of artifact, process that artifact with some tools and know how the information you get from it behaves, and if you get all the information.
That is to say, on the one hand we are lucky that, thanks to those tools, the information is more accessible and on the other hand we are lucky that, thanks to that same tool (if the analyst shows interest), he can acquire a deeper knowledge about the artefacts and their behaviour in different scenarios.
In terms of efficiency, tools tend to be faster and faster, but speed does not mean efficiency.

H. Carvey said...

> ...believe that 'automatic' tools can be used to acquire that deep knowledge about an artifact.

I agree, to an extent...it depends a great deal on the individual analyst, and the culture in which they're operating.