Thursday, August 21, 2014

What does that "look like"?

We've heard this question a lot, haven't we?

I attended a conference about 2 1/2 years ago, and the agenda for that conference had about half a dozen or more presentations that contained "APT" in their title.  I attended several of them, and I have to say...I walked out of some of them.  However, hearing comments from other attendees, many folks felt exactly the same way; not only were they under-whelmed, but I heard several attendees express their disappointment with respect to the content of these presentations.  During one presentation, the speaker stated that the bad guys, "...move laterally."  One of the attendees asked, "what does that look like on systems?", and the speaker's response was to repeat his previous statement.  It was immediately clear that he had no idea...but then, neither did anyone else.

Corey has asked this question in his blog, and he's also done some great work demonstrating what various activities "look like" on systems, such as when systems are exploited using a particular vulnerability.

What I'm referring to in this post isn't (well, mostly...) something like "...look at this Registry key."  Rather, it's about clusters or groups of artifacts that are indicative of an action or event that occurred on a system.

As we share and use artifacts, we can take a step back and look at where those artifacts exist on systems.  Then we begin to see that, depending upon the types of cases we're working, artifacts are clustered in relatively few data sources.  What this means is that on drives that are 500GB, 1TB, or larger, I'm really only interested in a few MB of actual data.  This means that during incident response activities, I can focus my attention on those data sources, and more quickly triage systems.  Rather than backing up a van full of hard drives and imaging 300 or more systems, I can quickly narrow down my approach to the few systems that truly need to be acquired and analyzed.

This also means a significant speed-up for digital analysis, as well.  I don't maintain tables of how long it takes to acquire different hard drives, but not long ago, I had one hard drive that took 9 hrs to acquire, and another that was 250GB that took 5 hrs to acquire.  Knowing the data sources that would provide the biggest bang for the buck, I could have retrieved those after I connected the hard drive to the write-blocker, but before I acquired the hard drive image.

As we share and use artifacts, we begin to see things again and again.  This is a very good thing, because it shows us that the artifacts are reliable.

Like others, including Corey, I don't see sharing artifacts on any sort of scale.  Yes, there are sites such as, but they don't appear to be heavily trafficked or used.  Also, I generally don't find the types of artifacts I'm looking for at those sites.  I've been achieving reliability, on my own, for various artifacts by using them across cases.  For example, I found one particular artifact that nailed down a particular variant of a lateral movement technique; once I completed my analysis of that system, I went back and searched the entire timeline I'd developed for that system, and found that that artifact was unique to the event I was interested in.  I've since been able to use that artifact to quickly search successive timelines, significantly speeding up my analysis process.  Not finding that artifact is equally important, as well, because (a) it tells me that I need to look for something else, and (b) in searching for it, I can show a client that I've done an extremely thorough job of analysis, and done it much quicker.

A great reason for sharing artifacts is the oxidation of those artifacts.  Okay, so what does this mean?

The type of artifacts I'm referring to are, for the most part, not single artifacts.  Rather, when I talk about what something "looks like" on a system, I'm generally referring to a number of artifacts (Windows Event Log records, Registry keys/values, etc.) clustered "near" each other in a timeline.  How "near" they are...ranging from within the same second to perhaps a couple of seconds apart...can vary.  So, let's say that you're doing some testing and replicating a particular activity, and you immediately "freeze" your test system and find six artifacts that, when clustered near each other, are indicative of the action you took.

How many times as incident responders and digital forensic analysts do we get access to a system immediately after the initial intrusion occurred?  What's more likely to happen is that the initial response occurs hours, days, or even weeks after in the initial incident, and is the result of an alert or a victim notification.  Given the passage of time, these artifact clusters tend to oxidize due to the passage of time, as the system continues to run, and to be used.  For example, if a system becomes infected by a browser drive-by and IE is used in the infrastructure, some artifacts of the drive-by may be oxidized simply due to the normal IE cache maintenance mechanism.  Logs with fixed file sizes roll over, the operating system may delete files based on some sort of timing mechanism, etc.  All of this assumes, of course, that someone hasn't done something to purposely remove those artifacts.

Someone may share a cluster of six artifacts that indicate a particular event.  As others incorporate those artifacts into their analysis, the reliability of that cluster grows.  Then someone analyzes a system on which two of those six artifacts have oxidized, and shares their findings, we can see how reliable the remaining four artifacts can be.

I specifically chose to use the term "oxidize", because the term "expire" or "expired" seem to imply that the lifetime of the artifact had passed.  Sometimes, specific artifacts may not be part of a cluster, due to specific actions taken by the intruder.  For example, we've all seen files be deleted, time stomped, and other actions taken that force artifacts to be removed, rather than allowing them to timeout.

At the moment, there are many questions with respect to sharing artifacts; one is the format.  What format is most useful to get examiners to incorporate those artifacts into their analysis?  Because, after all...what's the point of sharing these artifacts if others aren't going to incorporate them into their analysis processes?

During July, 2013, I posted a number of articles to this blog that I referred to as "HowTos"...they were narratives that described what to look for on systems, given various analysis goals.  Unfortunately, the response (in general) to those articles was the Internet equivalent of "cool story, bro."

A great indicator that I've used comes from pg 553 of The Art of Memory Forensics.  The indicator I'm referring to is pretty easy to pick out on that page...I've highlighted it in green in my book.  The authors shared it with us, and I found it valuable enough to write a RegRipper plugin so that I can incorporate that information directly into my own timelines for analysis...and yes, I have found this artifact extremely accurate and reliable, and as such, very valuable.  Having incorporated this indicator into my work, I began to see other artifacts clustered "around" the indicator in my timelines.  I also found that the indicator maintained it's reliability when some of those other artifacts were oxidized due to the passage of time; in one instance, the *.pf file was deleted due to how Windows XP manages the contents of that folder.

Final Thoughts
A good deal of the indicators that I'm referring to can be abstracted to more general cases.  What I mean is, an artifact cluster that is indicative of targeted threat (or "APT") can be abstracted and used to determine if a system was infected with commodity malware.  Other artifact clusters can similarly be extrapolated to more general's really more about how reliable the artifacts in the cluster are, than anything else.