Wednesday, March 17, 2010

Timeline Creation and Analysis

I haven't really talked about timelines in a while, in part because I've been creating and analyzing them as part of every engagement I've worked. I do this because in most...well, in all cases...the analysis I need to do involves something that happened a certain time. Sometimes it's a matter of determining what the event or events were, other times it's a matter of determining when the event(s) happened. The fact is that the analysis involves something happened at some point in time...and that's the perfect time to create a timeline.

With respect to creating timelines, I'm not the only using timelines. Chris posted recently on using the TSK tool fls to create a bodyfile from a live system, and Rob posted on creating timelines from Windows Volume Shadow Copies. Using Volume Shadow Copies to create timelines is a great way to get a view into the state of the system at some point in the past...something that can be extremely valuable in an investigation.

These are great places to start, but consider all that you could do if you took advantage of other data on the system. In order to get a more granular view into what happened when on a system, timelines should need to incorporate other data sources. Incorporating Event Log (.evt, .evtx) records may show you who was logged on to the system, and how (i.e., locally, via RDP, etc.). Now, auditing isn't always enabled, or enabled enough to provide indications of what you're looking for, but many times, there's some information there that may be helpful.

Including user web browser activity into a timeline has been extremely useful in tracking down things like browser drive-bys, etc. For example, by including web browsing activity, you may see the site that the user visited just prior to a DLL being created on the system and a BHO being added to the Registry. Also, don't forget to check the user's Bookmarks or Favorites...there at timestamps in those files, as well.

When I was working at IBM and conducting data breach investigations, many times we'd see SQL Injection being used in some manner. Parsing all of the web server logs for the necessary data required an iterative approach (i.e., search for SQL injection, collect IP addresses, re-run searches for the IP addresses, etc.), but adding those log entries to the timeline can provide a great deal of context to youranalysis. Say, for example, that the MS SQL Server database is on the same system as the IIS web server...any commands run via SQLi would leave artifacts on that system, just as creating/modifying files would. If the database is on another system entirely, and you're using the five field TLN format, you can easily correlate data from both systems in the same timeline (taking clock skew into account, of course). This works equally well for exams involving ASP or PHP web shells, as you can see where the command was sent (in the web server logs), as well as the artifacts (within the file system, other artifacts), all right there together.

Consider all of the other sources of data on a system, such as other application (i.e., AV, etc.) logs. And don't get me started on the Registry...well, okay, there you go. There're also Task Scheduler Logs, Prefetch files, as well as metadata from other files (PDF, Office documents, etc.) that can be added to a timeline as necessary. Depending on the system, and what you're looking for, there can be quite a lot of data.

But what does this work get me? Well, a couple of things, actually. For one, there's context. Say you start with the file system metadata in your timeline, and you kind of have a date that you're interested in, when you think the incident may have happened. So, you add the contents of the Event Logs, and you see that the user "Kanye" logged in...event ID 528, type 10. Hey, wait a sec...since when does "Kanye" log in via RDP? Why would he, if he's in the office, sitting at this desk? So then we add the user Registry hive information, and we see "cmd/1" in the RunMRU key (the most recent entry) and shortly thereafter we notice that "Kany3" logged in via RDP. We can get the user information from the SAM hive, as well as any additional information from the newly-created user profile. So as we add data, we begin to also add context with respect to activity we're seeing on the system.

We can also use the timeline to provide an increasing or higher level of overall confidence in the data itself. Let's say that we start with the file system metadata...well, we know that this may not be entirely accurate, as file system MAC times can be easily manipulated. These times, as presented by most tools, are usually derived from the $STANDARD_INFORMATION attribute within the MFT. However, what if I add the creation date of the file from the $FILE_NAME attribute, or simply compare that value to the creation date from the $STANDARD_INFORMATION attribute? Okay, maybe now I've raised my relative confidence level with respect to the data. So now, I add other sources of data, so rather than just seeing a file creation or modification event, I now see other activity (within close temporal proximity) that leads to that event?

Let's say that I start off with a timeline based just on file system metadata (Windows XP), and I see a file creation event for a Prefetch file. The Prefetch file is for an application accessed through the Windows shell, so I would want to perhaps see if the Event Log contained any login information so I could determine which user was logged in, as well as when they'd logged in; however, I find out that auditing of login events is not enabled. Okay, I check the ProfileList key in the Software hive against the user profile directories, and I find out that all users who've logged into the system are local users...so I can go into the SAM hive and get things like Last Login dates. I then parse the UserAssist key for each user, and I find that just prior to the Prefetch file I'm interested in being created, the user "Kanye" navigated through the Start button to launch that application. Now, the file system time may be easily changed, but I now have less mutable data (i.e., a timestamp embedded in a binary Registry value) that corroborates the file system time, which increases my relative level of confidence with respect to the data.

Now, jump ahead a couple of days in time...other things had gone on on the system prior to acquisition, and this time, I'm interested in the creation AND modification times of this Prefetch file. It's been a couple of days, and what I find at this point is that the UserAssist information tells me that the application referred to by the Prefetch file has actually been run several times, between the creation and modification date; now, my UserAssist information corresponds to modification time of the file. So, now I add metadata from the Prefetch file, and I have data that supports the modification time (the last time the application was run, the timestamp for which is embedded in the Prefetch file, would correspond to when the Prefetch file was last modified), as well as the number of times the user launched the application.

Now, if the application of interest is something like MSWord, I might also be interested in things such as any documents recently accessed, particularly via common dialogs. The point is that most analysts understand that file system metadata may be easily modified, or perhaps misinterpreted; by adding additional information to the timeline, I not only add context, but by adding data sources that are less likely to be modified (timestamps embedded in files as metadata, Registry key LastWrite times, etc.), I can raise my relative level of confidence in the data itself.

One final point...incident responders increasingly face larger and larger data sets, requiring some sort of triage to identify, reduce, or simply prioritize the scope of an engagement. As such, having access to extra eyes and hands...quickly...can be extremely valuable. So consider this...which is faster? Imaging 50 systems and sitting down and going through them, or collecting specific data (file system metadata and selected files) and providing to someone else to analyze, while on-site response activities continue? The off-site analyst gets the data, processes it and beings analysis, narrowing the scope...now we're down from 50 systems, to 10...and most importantly, we're already starting to get answers.

Let's say that I have a system with a 120GB system partition, of which 50 GB is used. Which is faster to collect...the overall image, or file system metadata? Which is smaller? Which can be more easily provided to someone off-site? Let's say that the file created when collecting file system metadata is 11MB. Okay...add Registry data, Event Logs, and maybe some specific files, and I'm up to...what...13MB. This is even smaller if I zip it...let's say 9MB. So now, while the next 12oGB system is being acquired, I'm providing the data to an off-site analyst, and she's able to follow a process for creating a timeline, and begin analyzing the data. Uploading a 9MB file is much faster than shipping a 120GB image via FedEx.

As a responder, I've had customers in California. Call me after happy hour on the East Coast, and the first flight out will be sometime in the next 12 hrs. It's usually 4 1/2 hrs to the San Jose area, but 6 hrs to LA or Seattle, WA. Then depending on where the customer is actually located, it may be another 2 hrs for me to get to the gate, get a rental car and arrive on-site. However, if there are trained first responders on staff, I can begin analyzing data (and requesting additional data) within, say, 2 hours of the initial call.

So another way cool thing is that this can also be used in data breach cases. How's that? Well, if you're shipping compressed file system metadata to someone (and you've encrypted it), you're not shipping file contents...so you're not exposing sensitive data. Providing the necessary information may not answer the question, but it can definitely narrow down the answer and help to identify and reduce the overall scope of an incident.

6 comments:

Chris said...

The thing I like about timelines is that they can be added to other system timelines to get a more comprehensive view of activity. Taking 10 systems and looking at 10 separate timelines may not provide all the context but merging them together and looking at them simultaneously, you might be able to see how one system was used to compromise another. This can be useful in profiling the attack and the methodologies that were used.

Keydet89 said...

Chris,

Very true...this is where the TLN format fields come into play...

Randall said...

Harlan,

I want to second your suggestion about digging for corroborative data. I'm working a case right now where the question was raised of whether porn was viewed by a particular user. An initial screen of the computer showed no bad images either allocated or unalocated. Deleted UserAssist keys, link files, and other evidence pointed strongly to privacy-cleaning software being used. However, analyzing the user's registry under HKCU\Software\Microsoft\Windows\CurrentVersion\Policies\Comdlg32 showed multiple .mov, .mwv, .jpg and .zip files with porn names accessed using a memory card, USB drive and Apple Nano. I also found evidence of this activity in the Pagefile.sys. BUSTED!
Attempts to erase/hide user activity by savvy users will likely continue. How successful they will be will depend on what software they use and how knowledgeable they are. Fortunately the software and the users will always have limitations and what they miss will be what we can use. But this means we'll have to dig deeper through many different sources of data just as you demonstrated in your timeline analysis.
Thank you for sharing your line of thinkig by the way. That is a good guide for how a lot of investigations will need to be conducted.

Keydet89 said...

Randall,

Thanks for the comment.

Depending on the OS, you may have access to so much more. With XP, there are Restore Points, and with Vista and above, VSCs.

I'm sure folks are going to use wiping tools, but there's always unallocated space within the hive file itself, too.

That is a good guide for how a lot of investigations will need to be conducted.

We can only hope.

I recently completed an exam during which I found that web server logs were very pertinent to the exam. A timeline based on file system metadata alone gave me file paths, as well as when the files were created, modified and accessed. Adding Event Logs gave me a clue as to *how* the initial compromise occurred, and adding the web logs let me know what led to the creation of certain files, as well as who created them.

cpldbc said...

Using timelines has become much more powerful now that I have incorporated other data (restore points, event logs, setupapi, etc). I recently had to revisit an older file coming due in court. Two years ago, I had manually parsed out all the relevant times to show when/why the computer was brought down and back up.

This time round, my script-created timeline had all the same data and presented it easily when I wanted to show it to counsel.

Keydet89 said...

cpl,

Good to hear that you're finding timeline creation to be a powerful analysis tool.