Friday, February 05, 2010

Is anyone doing timeline analysis??

Apparently, someone is...the Illustrious Don Weber, the inspiration behind the ITB, to be specific. In a recent SecRipcord blog post, he talks about finding the details of a Hydraq infection via timeline creation and analysis.

In his post, Don also illustrates some information from a malicious service that includes a ServiceDllUnloadOnStop value. I hadn't seen this value before, and it appears that Geoff Chappell has a very detailed explanation of that value, as well as some others that are also part of the service keys. This can add a good deal of context to the information, particularly since this isn't often seen in legitimate Windows services. Sometimes searching or sorting by service Registry key LastWrite times isn't all that fruitful, as many seem to be updated when the system boots. So add something else to your "what's unusual or suspicious" checks for services...lack of descriptions, apparently random names, and some of these values.

Don then goes on to talk about what an APT-style manual compromise "looks like" via timeline analysis. Don includes the contents of a Task Scheduler log file in his timeline, and also shows what would appear to be a remote intruder interacting with the system via a remote shell...running native tools allows the intruder to conduct recon without installing additional tools. After all, if the system already has everything you need on it...nbtstat, net commands, etc., why deposit other tools on the system that will essentially provide the same information?

What Don's posts illustrate are great examples of the immense value of timeline analysis, and how it can be used to provide a greater level of confidence in, as well as context to, your data.

Addendum: I had a conversation via IM with Chris yesterday...with over 2 feet of snow, what else am I going to do, right? We were exchanging ideas about timeline analysis and how things could be represented graphically for analysis purposes, particularly give the nature of the data (LOTS of it) and the nature of malware and intrusions (LFO). I think we came up with some pretty good ideas, and there's still some thinking and looking around to do...the problem is that neither one of us is a graphics programmer, so there's going to be a good deal of trial and error using currently available tools. We'll just have to see how that goes.

I think that the major obstacle to moving forward is going to be a lack of a standard. While I applaud the work that Don's done and admire his initiative and sense of innovation, looking at his posts, it's clear that he's decided to sort of take things in his own direction. Don't get me wrong...there's nothing wrong with that at all. Where it does come into play is that if there's a particular next step tool that relies on a particular format or structure for data, then it's going to be difficult to transition other 'branches' to that tool.

Log2timeline is another, more comprehensive framework for developing timelines, and great piece of work from Kristinn. It's very automated, and uses some of the code in my tools, and provides other output formats in addition to TLN.

So, overall, I'm seeing that there's quite a bit of interest in helping responders, analysts, and examiners move beyond the manual spreadsheet approach to timeline analysis, but perhaps its time to come together and find some common ground that we can all stand on.

13 comments:

Don C. Weber said...

Hmm, I thought by going with the TLN format I was standardizing output formats within syscombotln. As log2timeline, I started syscombotln before I knew about it. As log2timeline is being actively developed there is probably more value in helping him port to Windows than having separate efforts. Syscombotln is fairly crude after all. But for me it has proven to be easy to use, quick, and effective. After a few uses of log2timeline it should have the same effect. I'll have to test.

As to visualization, I still like Highlighter to help identify areas of interest and narrow down times of interest. Large files and large search lists are still a challenge, but if handled properly even these issues can be overcome. I really like being able to "remove" lines that contain specific content. However, a nice GUI tool that would "fold" or "contract" lines would be nice as well.

So, if TLN is not the standard, what needs to be the standard? XML, cvs, xls? Who creates and maintains this standard? Wouldn't just picking a well documented and understood output format be easier for developers to use for integration in their input and output? I understand that having multiple outputs is nice, but I moved all of the outputs involved with syscombotln to a standard TLN format (and cleaned up areas where I initially didn't follow the format) so that people using or moving to other tools, such as log2timeline, could easily import the results.

Go forth and do good things,
Don C. Weber

H. Carvey said...

Don,

I can see that you're using the 5 field TLN format, but as I mentioned, you've also "made it your own" and extended it somewhat. What I mean by this is that instead of using "EVT" as your source, you've decided to add additional information to this field, with "SysEvent.Evt – EVT". In another part of your timeline, you've opted to go with "Registry Hive: system" instead of just "REG".

I see a great deal of value in using TLN as the standard for a format. Kristinn has included this option, which I think is good. Depending on how people plan to use the output, other options may be of value, as well.

As to visualization, I see value in Highlighter as a tool, just not for this. When doing timeline analysis, I'm not looking for the longest or shortest lines, or clusters for long or short lines...I'm looking for content.

I'm still working on ideas with respect to timeline visualization...

Don C. Weber said...

Ah, I understand now. Yes, I thought the "Source" field needed a little more clarification so that the entry could be followed up on if necessary. This is actually important when you consider that some of the content from the original source might need to be stripped during parsing because of control or double-byte characters. However, if you can point me to the TLN source format specification I could move the more specific source information into the data section and maintain the Source field more consistently. Would this be your recommendation? Personally I believe that there are going to be so many different sources it will be difficult to maintain a specific list. I would think it easier to expect tools to be sufficiently descriptive about the source to benefit analysts who need to track down the specific information in the original source.

As to Highlighter, perhaps you should take another look at the functionality provided. Although the line length feature is one means of anomaly detection I believe you are correct when it comes to timeline analysis. However, the actual highlighting feature coupled with the "next" and "previous" hot keys allows the analyst to quickly hop through specific search items such as "nbtstat" or other system artifacts to identify areas/times of interest. Additionally, since timelines can get very large very fast the fact that it does not load the whole file means that it is a little easier on system memory. I have used it successfully on 500MB timeline files (yes, text files) to quickly pin-point target areas and export highlighted lines of interest for more detailed analysis and reporting. I am hoping to get the developer to write an article for Into The Boxes which might clarify some of this functionality. We'll see how that works out.

Thank you for the clarification,
Don

H. Carvey said...

...some of the content from the original source might need to be stripped...

I'm not sure I follow. Whether you're using analyzeMFT to parse the $FILE_NAME timestamps from the MFT, or using FTK Imager to export a file listing, it's all file system metadata. Same with EVT data...the data comes from the Event Logs, so I'm not entirely sure why you need to specify with file the entry came from.

...if you can point me to the TLN source format specification...

It's in the blog, my friend. You seem to have found it and followed it already...I'd recommend referring back to it again if you'd like to review it.

Anonymous said...

Interesting stuff Harlan, I started looking at MS Project for visualization of timelines. Very flexible on input formats and fields. Not everyones cup-of-tea, but quick and dirty! Rgds, James

Kristinn said...

The lack of a really good output method that can be imported easily into multiple tools or is defined as a timeline standard is exactly the reason why I chose to provide several different output mechanism in my tool. I agree, standards are good and we should try to pick one and stick with it, but which one... and again like Don said, who defines and maintains it?

I still think that providing multiple outputs can be of value, even if we can agree upon a default standard, since there will always be new tools that can be used to analyse timelines that do not support our chosen standard.
And I also believe that log2timeline should be ported so that it can be used with Windows, I don't think it would require much work, most of it should work out of the box. The reason why it isn't supported as of now is the fact that I develop the tool on Mac OS X and I use Linux as my analysis and testing station, so I would need to set up a Windows box with Perl to properly code and test it (something I do not have time to do rigt now). So any help with Windows porting would be appreciated. Another thing which is missing is proper GUI, and since I'm not a GUI developer any help in that department would also go a long way.

But besides that I do believe moving to a database format could potentially make things easier, for instance to a SQLite database (one proposed schema provided with log2timeline). That would make any manipulation easier, such as: reading multiple records, hiding events that are not of interest, indexing, sorting, flagging, highlighting, etc. The only problem here is that would require creating a tool that could actually read that particular database and work with the data... something that I plan to create one day.

H. Carvey said...

Kristinn,

Thanks for the comment.

Do you see any issues with the 5-field TLN output as a standard? Very little work seemed to have been done in that area for a while, which is why I came up with what I saw as a minimum data set to describe an event.

As to defining it, we can decide to do so. RE: maintaining it...well, I came up with it; how difficult can it be to maintain it?

I think that the key would be to get it accepted and used.

RE: GUI...with respect to developing the timeline, I'm considering using something similar to RegRipper, providing a base GUI with plugins to parse various files and formats.

Kristinn said...

Personally I don't see anything wrong with the TLN as an adapted standard. Yet for some file formats you do have additional information, such as multiple timestamps per file/artifact, e.g. atime/ctime/... or created time, modified, etc. Question is whether or not this should be a special field, that is a type field or something like that. Otherwise this information has to be included in the description field, which can be quite long. So I personally think it would be a better idea to make a list of valid types and have a special TYPE field within it.

And then the question of the source field, should that be a standardised field with fixed entries (REG,EVT,...) or just a flexible field which is left to the developer to fill in (EventLog instead of EVT for instance).

I do feel we need to have at least one ASCII standard and then to create a database schema as well, so that other options can be included, such as to group events, highlight, hide, etc.

So for maintance of this standard, as long as we all agree upon the fields, I do not see that much problem of maintaining it... it's your standard after all... you could publish on your site a simple text document or something like that which describes the fields and their values, or just keep referring to a blog post that describes them (just like your latest post)

H. Carvey said...

Kristinn,

Thanks for the comment.

...such as multiple timestamps per file/artifact, e.g. atime/ctime/...

Understood. In the case of file system metadata, here's an example I pulled from a timeline generated from an example image:

FILE REG-OIPK81M2WC8 - MA.E C:/WINDOWS/bootstat.dat

In this case, the source is "FILE", the host is "REG-OIPK81M2WC8", there is no user (ie, "-") and the event description addresses the multiple timestamps.

Something like this could also be applied to Windows shortcut/LNK file data.

What would the TYPE field include?

...should that be a standardised field with fixed entries...

I can see fixed fields...EVT/EVTX, REG, LNK, etc. Firewall logs - FWLog.

Part of the reason for this is two-fold. I currently do a lot of work with mini-timelines, using only EVT data. Sometimes, that may be all I have (at least, at the moment), and other times, I create these mini-timelines to get a frame of reference. As such, if I were to put all data into a database, I'd want to have a standard field to search/sort on. Also, some of what Chris and I talked about regarding visualization had to do with overlays, and being able to create an EVT file overlay would be very beneficial.

Also, having everyone understand and work from a standard definition means that we all have a common understanding. If the field says "EVT", I already know that the data comes from Event Logs with the ".evt" extension, which tells me that the data came from a Windows 2000, XP or 2003 system, providing me with a great deal of context. Allowing deviations from this can alter the context...for example, what if someone decides to use "SysEvent"...does that come from Vista or XP?

I do feel we need to have at least one ASCII standard and then to create a database schema as well, so that other options can be included, such as to group events, highlight, hide, etc.

I agree with the ASCII standard, but can't that be mapped directly to a database?

IMHO, highlight and hide would necessarily be functions of a viewing tool, not of the format. Telling a tool like Highlighter do hide all Prefetch file metadata entries doesn't remove that data from the timeline, it simply hides/masks it in the viewer.

Kristinn said...

Harlan,

FILE REG-OIPK81M2WC8 - MA.E C:/WINDOWS/bootstat.dat

In this case, the source is "FILE", the host is "REG-OIPK81M2WC8", there is no user (ie, "-") and the event description addresses the multiple timestamps.


Well I see some issues with this, that is the fact that it can be confusing to include it in the description field, since it will not be part of the standard. This means that one developer chooses to use MA.E while another mtime or crtime and yet another use Modified (or mod).... you get the picture. And another thing, you might want to filter based on a type, just to see a MOD action or something like that. And this increases the length of the description field, which would otherwise have potentially been more simple to read...

I think that adding this could be a good thing, but again... this is your standard and your call...

What would the TYPE field include?

Something fixed and short, for instance

MOD (modified)
CRE (created)
DEL (deleted)
...

just to construct a small list of allowed entries here, that are predefined and easy to create filters from.

I can see fixed fields...EVT/EVTX, REG, LNK, etc. Firewall logs - FWLog.

OK, I agree with you here... now there is just a need to actually construct a list of accepted fields here and stick to them. So the standard has to include the allowed variables, and then the list can be extended as the need arises (e.g new formats or artifacts are parsed).

H. Carvey said...

...that is the fact that it can be confusing to include it in the description field, since it will not be part of the standard.

I'm not sure I follow...if it's in the Description field, why wouldn't it be part of the standard?

Also, this is similar to the output produced by Rob Lee's mactime. I've found it to be extremely concise and descriptive, particularly since I'm looking at the events based on the when they occurred, rather than what occurred.

I'm not sure that I see the need for the TYPE field as defined, as that can be covered in the Description field. From file system metadata, one event with a time that includes "MAC." tells me everything in one entry, rather than three.

Kristinn said...

Harlan,

what I mean ... perhaps not clear enough.. is that mactime does include a type field (called activity type) that is defined as a MACB field. Something that TLN does not do. That is in the mactime standard there is a field clearly denoting the type of action, which is strictly set to MACB. This is fine for filesystem timestamps, but might not be sufficient when extending the timeline to include artifacts found inside the operating system.

Currently I include this information in the description field in log2timeline, and include it as well in the mactime output using the appropriate MACB fields in the activity field.

The TLN format as described contains one timestamp per record (unlike mactime bodyfile for instance). This means that if you would include filesystem timestamps you would need to create sometimes multiple records per file, each one with a different description field (basically including MACB or the activity type field inside the description field). Not necessarily a bad thing (mactime output format uses only one timestamp per entry as well, just the body file that includes all four).

Creating such a field would still mean that if I include timestamps from the filesystem I would have to have three or four entries per file (or find some method of reducing it, like MACB, perhaps use different letters?). So one method would be to have the type field as a short text description or something like in the mactime format, letters each one representing which types are used.

I'm not sure I follow...if it's in the Description field, why wouldn't it be part of the standard?

What I mean "not in the standard" is the fact that since it is not defined as a separate field it is left to the developer to include it in the "leftovers" or in the description field. The description field is flexible, it just contains all the extracted information to describe the event itself, so that one developer could just skip this information altogether, or say M.CB or M.CE or just modified, etc.... defining a type field would standardize the type instead of allowing it to be included wherever the developer wants it to be and in whatever format (or not at all).

Another thing with a type field, using MACB is very good when describing filesystem timestamps, but what about a user printing a document (information extracted from a Word document)? The only method to display it would then be to say that the particular time is MACB or all of the timestamps, is that really descriptive enough? or would it be better to say a type of PRINT or something like that? That is to include a type that is more descriptive than MACB... and then of course we have the fact that MACB means different things on different filesystem, requiring more knowledge of the examiner, he/she has to know the true meaning behind a M record on that particular filesystem... would DEL be more appropriate? or MOD to indicate that this is in fact a modification to the file itself, not the metadata information associated to it? Just some thoughts, you might not agree with me, but I wanted to include this in the discussion...(but you would obviously include in the description field a text that would clearly indicate that the user printed a document, which might be descriptive enough, but just to give an example)

And again, if we are adding a type called PRINT, what about all other possibilites, perhaps this is not a good idea as this field could potentially include so many different actions that it would render it self to be useless... I'm not sure, just throwing this up in the air...

Kristinn said...

And finally, would there be a value to add a field that includes a file's MD5 sum or any other hash? It would slow the tool down considerably if it had to calculate a md5 sum for each file, but it could be very useful to include it nonetheless (or at least to provide the option). Again the mactime bodyfile includes a field called MD5, although not used by the mactime output format...