Tuesday, February 02, 2010

More Thoughts on Timeline Analysis

I had a conversation with Cory recently, and during the conversation, he mentioned that if I was going to present at a conference and talk about timeline analysis, I should present something novel. I struggled with that one...I don't see a lot of folks talking about using timeline analysis, and that may have to do with the fact that constructing and analyzing a timeline is a very manual process at this point, and that's a likely too high an obstacle for many folks, even with the tools I've provided, or using other tools, such as log2timeline.

Something Cory mentioned really caught my attention, as well. He suggested that various data sources might provide the analyst with a relative level of confidence as to the data itself, and what's being shown. For example, when parsing the MFT (via analyzeMFT or Mark Menz's MFTRipper), the analyst might have more confidence in the temporal values from the $FILE_NAME attribute than from the $STANDARD_INFORMATION attribute, as tools that modify file MAC times modify the temporal values in the latter attribute. See Episode 84 from the CommandLine KungFu blog for a good example that illustrates what I'm talking about...

This is an interesting concept, and something that I really wanted to noodle over and expand. One of the reasons I look to the Registry for so much valuable data is...well...because it's there, but also because I have yet to find a public API that allows you to arbitrarily alter Registry key LastWrite times. Sure, if you want to change a LastWrite time, simply add and delete a value from a key...but I have yet to find an API that will allow me to backdate a LastWrite time on a live system. But LastWrite times aren't the full story...there are a number of keys whose value data contains timestamps.

Particularly for Windows systems, there are a number of sources of timestamped data that can be added to a timeline...metadata from shortcut files, Prefetch files, documents, etc. There are also Event Log records, and entries from other logs (mrt.log, AV logs, etc.).

So, while individual sources of timeline data may provide the analyst with varying levels of relative confidence as to the veracity and validity of the data, populating a timeline with multiple sources of data can serve to raise the analyst's level of relative confidence.

Let's look at some examples of how this sort of thinking can be applied. I did PCI breach investigations for several years, and one of the things I saw pretty quickly was that locating "valid" credit card numbers within an image gave a lot of false positives, even with three different checks (i.e., overall length, BIN, and Luhn check). However, as we added additional checks for track data, our confidence that we had found a valid credit card number increased. Richard talks about something similar in his Attribution post...by using 20 characteristics, your relative confidence of accurate attribution is increased over using, say, 5 characteristics. Another example is malware detection...running 3 AV scanners provides an analyst with a higher level of relative confidence than running just one, just as following a comprehensive process that includes other checks and tools provides an even higher level of relative confidence.

Another aspect of timeline analysis that isn't readily apparent is that as we add more sources, we also add context to the data. For example, we have a Prefetch file from an XP or Vista system, so we have the metadata from that Prefetch file. If we add the file system metadata, we have when the file was first created on the system, and the last modification time of the file should be very similar to the timestamp we extract from the Prefetch file metadata. We may also have other artifacts from the file system metadata, such as other files created or modified as a result of the application itself being run. Now, Prefetch files and file system metadata apply to the system, but not to the specific user...so we may get a great deal of context if we find that a user launched the application, as well as when they took this action. We may also get additional context from an Event Log record that shows, perhaps a login with event ID 528, type 10, indicating a login via RDP. But wait, we know that the user to which the user account applies was in the office that day...

See how using multiple data sources builds our "story" and adds context to our data? Further, the more data we have that shows the same or similar artifacts, the greater relative confidence we have in the data itself. This is, of course, in addition to the relative level of confidence that we have in the various individual sources. I'm not a mathy guy, so I'm not really sure how to represent this in a way that's not purely arbitrary, but to me, this is really a compelling reason for creating timelines for analysis.

What say you?

5 comments:

Anonymous said...

Hi Harlan,

I responded on Forensic Focus, but thought your readers may like to see as well;

A common intelligence practice is to rate your source and data according to an A1-F6 scale. I've applied this quite a few times on digital analysis work.

On the wikipedia page for Intelligence Collection Management (http://en.wikipedia.org/wiki/Intelligence_collection_management)

scroll down to;
Ratings by the Collection Department

(and don't rate everything as 'F6'!)

Software such as i2 Analyst Notebooks allows you to build this into the charting side of things, and shows the connecting links as dotted, dashed, straight, bold, etc, based on the rating given.

I am putting together a paper which goes through how timeline analysis works for intelligence practitioners, and how it can be applied in digital analysis, which includes information rating.

regards,
D

Randall Karstetter said...

Harlan,

Great thoughts as always. Something else to think about. You made the comment "but I have yet to find an API that will allow me to backdate a LastWrite time on a live system." You don't have to find an API. Change the motherboard BIOS date and time, disconnect the ethernet cable so no AV logs or outside server times will be seen, open your Regedit and edit a registry key. Shut down the system, restore the BIOS time, reboot the system and use a Hex editor to fix the event logs and any other logs. Go look at your LastWriteTime for your registry key. I've been experimenting with the effects of changing the BIOS time and how that can/can't be revealed. If someone is real determined and knows what they're doing, they can make it very difficult for a forensic analyst to uncover their actions. That's another reason why your recommendation for gathering as much information from different sources is so important. I.e. a knowledgable suspect might be able to delete nine of their tracks but the tenth one they missed may trip them up. If the CFE isn't looking for all corroborating evidence, they might miss the telltale trace and get fooled.

So IMHO, your suggestion of gathering as much independent data as possible to increase one's confidence in the data is not only good practice, it may be crucial to uncovering the truth.

I'd like to relate a real case I had that was my most challenging timeline analysis. A woman is mad at her boyfriend and wants to get even. She goes to Hawaii on vacation for a week. Comes home and "discovers" her house was broken into. Nothing is taken but she discovers her computer was turned on while she was gone and pictures of her and her boyfriend were burned onto a CD and then deleted off the computer. She says that the only person who would want to break into her house and delete these pictures is her boyfriend. He's arrested. Police examine her computer, I examine her computer. Event logs, everything make it look like computer was turned on in the middle of her vacation, a CD was burned and pictures deleted. She had dial-up and computer didn't connect to the internet on that day. Computer was only on for a few minutes. I thought there was a possibility that just before she went to the airport, she could have turned her computer off, changed the BIOS to five days ahead, turned it on, burned a CD, deleted files, turned it off and changed the BIOS back. She knew his work schedule and could time the comptuer activity to correspond to when he got off work and would be driving home. Then when she got home, she could have "discovered" the computer activity while she was gone. BTW, she was once a computer programmer and had some training in computer security. Neither the police nor I could prove, one way or another, if she had done this. He was convicted of breaking-and-entering. To this day I think she set him up, but I couldn't prove it with the computer evidence. That's one of the reasons why I have been studying BIOS time changes and why anyone doing timeline analysis needs to consider that as well.

H. Carvey said...

Change the motherboard BIOS date...

To my point, that wouldn't be a live system. What I was getting at is that there doesn't seem to be a timestomp for Registry key LastWrite times.

On the wikipedia page for Intelligence Collection Management...

Very interesting stuff, thanks! I don't see this being added to your usual SOP for analysis, but I do tend to think that this sort of thing should be added at some level of training for forensic analysts, as well as responders (responders would have an impact on initial collection).

If the goal of the forensic analysis is to determine something...and very often it is...or to go beyond that to achieve prosecution, then doing something like timeline analysis in order to increase the relative confidence and context of the data and the events would seem to me to be extremely important, and offer significant advantages.

fifth.sentinel said...

The more I look at windows systems for either IR (e.g. malware indicators) or internal investigations, the I am amazed at just how much information is stored either on purpose or as a side effect of normal operations. If you look at the registry there is a wealth of information, and timeline base analysis can be the key item to break open an analysis.

But what if you could then somehow do a timeline analysis that includes the live HIVES plus incorporates how they have changed by mapping in the System Restore snapshots. This obviously signficantly improves the correlation of events (and as a side effect may lead to abilty to filter "normal HIVE changes"). We can then add in other data like filesystem, MFS, Windows .dat files, other browser database...
While correlation improves with the more we add, we start to suffer from:
a) information overload
b) trying to map together all the timeline indicators from each source without missing something.

What I have had rumbling in the back corner of my head for a while is how we can take all this timeline data, and use dynamic visualizations (i.e. click to zoom on timeperiod, or source for more detail) as a tool to quickly focus efforts for detailed analysis of areas of interest.

Standard line/column timeline visualizations would not allow for the human ability to see patterns to come to for, and so my current thinking is some sort of adaptation of things like:-
Circos
and Axiis looks interesting as shown here:-
Historic Browser Stats

This has gotten long so I will stop here and leave it for a future blog entry once the rumblings has stopped and the lightbulb finally goes off.

@fifth_sentinel

H. Carvey said...

@fifth_sentinel

Interesting comment, and well something that definitely gets the creative juices flowing.

But what if you could then somehow do a timeline analysis that includes the live HIVES plus incorporates how they have changed by mapping in the System Restore snapshots.

Brendan/moyix has done a great deal of work associated with extracting Registry information from a memory dump, so this may help with what you're referring to.

However, adding information from the XP System Restore Points is pretty easy with things like RegRipper and ripXP. This is definitely worth adding to a timeline...thanks for providing your comments.