Saturday, February 28, 2009

TimeLine Analysis, pt III

This whole "timeline analysis" thing is getting a little bit of play, it seems. Yeah, I know, I've been on my soapbox about it here and here, and even Rob Lee got into the mix on the SANS Forensic blog, as well. This is a good thing...timeline analysis, when it comes to digital forensics, can be extremely important. Timelines can be about who did what, when, or they can be about what happened when, on what system (or from which system did it originate?). Either way, timelines can do a great deal to answer questions.

From what I've seen, and done, most analysts seem to be taking something of a manual approach to timeline generation; finding "important" events and adding them manually to a spreadsheet. This is fine, but there are drawbacks. First, it's not entirely scalable. Second, as you start adding sources beyond file system data, you start adding complexity and time to the equation. In commercial consulting, time is money. For LE work, time is backlog. There's got to be a better way...seriously.

As a recap, some of where this originates is with Brian Carrier's TSK tools. Brian has some info on timelines here, in which he discusses using the "fls" tool to create a body file, which can then be parsed by mactime or ex-tip. This data can also be graphically displayed via Zeitline (note: Zeitline hasn't been updated since June, 2006). The TSK tools are fantastic for what they do, but maybe what needs to be done is to take the output of fls to the next level.

Now, something that folks (Mike, with ex-tip, and Rob, via SIFT) have done is to include Registry hive files in the timeline analysis, following the same sort of body file format as is used by fls...after all, Registry key LastWrite times are analogous to file last written/modified times. However, there are some potential shortcomings with this approach, the most notable of which is that you'll get a LOT of data that you're simply not interested in if you're getting all keys and their LastWrite times from a hive file; many of the keys within the Registry that may be modified during the course of normal operations may not be of interest to the analyst. Also, simply displaying a Registry key's LastWrite time can have little to no context regarding what actually happened; this is especially true with MRU lists. This is pretty easy to overcome, though, by adding the ability to write timeline data to RegRipper.

Okay, but what about the other sources mentioned? What about Event Logs? Event Log records may be important, but they generally don't fit the model used for the body file. Evt2Xls has been updated (after this tool was copied to the WFA 2/e master DVD and sent to the publisher) to write out the information that is necessary for timeline analysis. Other tools can also be included through the use of import filters, which is the direction Mike Cloppert went with ex-tip. However, as we start adding sources (log files, EVT files, Registry hives, network captures, etc.) we need to add additional information to our "events" so that we can differentiate items such as sources, hosts, users, etc.

As I see it, there are essentially 5 different fields that define a timeline event:

Time - MS systems use 64-bit FILETIME objects in many cases; however, for the purposes of normalization, 32-bit Unix epoch times will work just fine

Source - fixed-length field for the source of the data (i.e., file system, Registry, EVT/EVTX file, AV or application log file, etc.) and may require a key or legend. For graphical representation, each source can be associated with a color.

Host - The host system, defined by IP or MAC address, NetBIOS or DNS name, etc. (may also require a key or legend)

User - User, defined by user name, SID, email address, IM screenname, etc. (may also require a key or legend)

Description - The description of what happened; this is where context comes in...

Now, for the purposes of data reduction, we can also define a sixth field, called "Type". There are essentially two types of events; point or span. Point events have a single time associated with them, span events (i.e., AV scans) have a start and end time associated with them. As there are only two, this can be a binary value (a 1 or a 0). However, maybe this is getting a bit ahead of myself; what I was thinking is that I've had a number of examinations where many files in the system32 directory have had their last access times modified to within a 3 min range, and reviewing the AV application logs, an AV scan had been run at that time.

Now, one thing about the five fields is that they won't all be filed in by the available data, all the time. For example, when parsing Event Logs, there may be a user identifier (SID) in the data, but there may not be host or system information available. Also, the source field will most likely always need to be filled in by either the analyst or the filter. This isn't really a problem, because when it comes to an actual timeline, all you really need is the time (or start and end times) and a description of the event, which can include things such as host and system fields. But one thing to remember is what this is all really about is data reduction and representation; having fields to parse on can let you narrow down activity. For example, if you suspect that a particular user was involved in an incident, you can parse your data based on your user...either by username, SID, or as user's of Analyst's Notebook may be familiar with, email address.

One of the main issues with this is that analysts need to be aware of what they can do...don't discount doing something just because there doesn't seem to be an easily available means to do it right now. Timeline analysis is an extremely valuable tool to use, and the first step is to recognize that fact.

Resources
Geoff Black's CEIC 2007 presentation on timeline analysis
TimeMap Timeline Software (LexisNexis)
Lance's Windows Event Log post

Addendum: After posting, I finished updating evt2xls.pl, adding the capability to print to .csv as well as to a timeline (.tln) format. The .tln format looks like this:

1123619815|EVT|PETER||crypt32/2;EVENTLOG_INFORMATION_TYPE;http://www.download.windowsupdate.com/msdownload/update/v3/static/trustedr/en/authrootstl.cab

1123619816|EVT|PETER|S-1-5-21-839522115-1801674531-2147200963-1003|MsiInstaller/11707;EVENTLOG_INFORMATION_TYPE;Product: Windows Genuine Advantage v1.3.0254.0 -- Installation completed successfully. (NULL) (NULL) (NULL)

1123619888|EVT|PETER|S-1-5-18|Userenv/1517;EVENTLOG_WARNING_TYPE;PETER\Harlan

1125353382|EVT|PETER||VMTools/105;EVENTLOG_INFORMATION_TYPE;

What we see here is 4 entries from a parsed Application Event Log file. There are essentially 5 fields, all pipe ("|") separated. The format looks like this:

Time|Source|Host|User|Description

Again, the Time value is normalized to 32-bit Unix epoch time, and is the Time Generated field from the event record (there is also a Time Written field). What this does is allow an analyst to specify a time window, and then search a file (or several files) for all events that fall within that window; times and dates, as we see then on a live system (ie, "02/12/2009 06:57 PM") or in log files can be easily translated to 32-bit Unix epoch format, and at that point, searching for a specific time or within a specific time window is a simple matter of greater than or less than.

Also, you'll notice that in the 4 events listed above, only two have User fields populated, and both are SIDs. This is one way of identifying users, and the SID can be "translated" by using RegRipper to parse the Software hive (specifically, the ProfileList key) from that system.

In the case of the Event Logs, the Description field is made of the following:

Event Source/Event ID; Event Type; Event Strings

This way, everything is easily parsed for analysis. The size of the fields can be reduced by not translating the event type field to a string identifier...this would make comparisons "easier" programmatically, but as it is now, visually it's a bit easier for an analyst to understand.

As this process/framework is being developed there was be trade-offs along the way...

7 comments:

Anonymous said...

I've been giving this a lot of thought, and I almost wonder if the best approach here is a common timeline database schema rather than a timeline file format. The reasons I feel this way is the properties we find in such a recording tool, which we're already talking about:
- sparse data set
- highly categorical data
- enumerated data types that may need to be extended
- referential nature of some fields

The unfortunate tradeoff with a schema rather than a file format is portability, but I think as we mature timeline tools we may have no choice.

-Michael Cloppert

H. Carvey said...

Michael,

Ultimately, I do think that a schema is going to be the way to go, as this will be the only way to really manage the data as it comes in. Of course, this will require import filters much as you've described in ex-tip already, but that's to be expected. Having just file system and Registry data from one or two systems can make this sort of thing pretty cumbersome to deal with, particularly when you simply want to see all of the events within a specific time window.

As mentioned before, there will also need to be pivot tables or legends of some kind, such that systems are translated from IP address, MAC address, etc., to a common format, say...NetBIOS or system name.

Agreed, portability may be an issue, but tools such as ZeroWine may provide a means of addressing the issue, by providing a modicum of portability.

Anonymous said...

Hello Halan

is there a location where we can download latest evt2xls.pl ?
it does not seem available in http://sourceforge.net/projects/windowsir/

thanks a lot for all your posts and tools

-JulienT

H. Carvey said...

Julien,

I haven't been posting the tools, essentially for this reason...over the past couple of months, I've received emails from people who've asked me for copies of tools, due either to the fact that they needed them RIGHT NOW, or because the tools had been updated and they wanted the latest copies before they were released. In some of the instances where I've provided the tool, I don't get so much as a "thank you" or even acknowledgement that the tool was even received. When I am able to follow up, I will usually ask that if the user was happy with the tool and results, that they say so publicly, as an endorsement (in part) for my upcoming book. This rarely happens, as well, even after the user saying that they would do so.

The most notable exceptions are Lance Mueller's mention of evt2xls, and Ovie's reference in the most recent Cyberspeak podcast to a tool I provided to him.

I know, I probably sound like a complete jerk, asking for something like this...sorry.

Brett Shavers said...

Goodness...a "thank you" is the least someone can do when given help or tools at no cost. Who can't afford to say "Thanks"?

H. Carvey said...

You'd be surprised, dude..."I don't have time, I'm working on a big case", etc, etc.

Anonymous said...

ok, i understand your position. no problem with this.
Sadly, that's also what i observed in many forums and mailing-list. no thanks, no constructive feedback.

All in all, thanks for answer :)

JulienT