Monday, February 08, 2010

Timeline Analysis...do we need a standard?

Perhaps more appropriately, does anyone want a standard, specifically when it comes to the output format?

Almost a year ago, I came up with a 5-field output format for timeline data. I was looking for something to 'define' events, given the number of data sources on a system. I also needed to include the possibility of using data sources from other systems, outside of the system being examined, such as firewalls, IDS, routers, etc.

Events within a timeline can be concisely described using the following five fields:

Time - A timeline is based on times, so I put this field first, as the timeline is sorted on this field. Now, Windows systems have a number of time formats...32-bit Unix t_time format, 64-bit FILETIME objects, and the 128-bit SYSTEMTIME format. The FILETIME object has granularity to 100 nanoseconds, and the SYSTEMTIME structure has granularity to the millisecond...but is either really necessary? I opted to settle on the Unix t_time format, as the other times could be easily reduced to that format, without loosing significant granularity.

Source - This is the source from which the timeline data originates. For example, using TSK's fls.exe allows the analyst to compile file system metadata. If the analyst parses the MFT using MFTRipper or analyzeMFT, she still has file system metadata. The source remains the same, even though the method of obtaining the data may vary...and as such, should be documented in the analyst's case notes.

Sources can include Event Logs (EVT or EVTX), the Registry (REG), etc. I had thought about restricting this to 8-12 characters...again, the source of the data is independent of the extraction method.

Host - This is the host or name of the system from which the data originated. I included this field, as I considered being able to compile a single timeline using data from multiple systems, and even including network devices, such as firewalls, IDS, etc. This can be extremely helpful in pulling together a timeline for something like SQL injection, including logs from the web server, artifacts from the database server, and data from other systems that had been connected to.

Now, when including other systems, differences in clocks (offsets, skew, etc.) need to be taken into account and dealt with prior to entering the data into the timeline; again, this should be thoroughly documented in the analyst's case notes.

Host ID information can come in a variety of forms...MAC address, IP address, system/NETBios name, DNS name, etc. In a timeline, it's possible to create a legend with a key value or identifier, and have the timeline generation tools automatically translate all of the various identifiers to the key value.

This field can be set to a suitable length (25 characters?) to contain the longest identifier.

User - This is the user associated with the event. In many cases, this may be empty; for example, consider file system or Prefetch file metadata - neither is associated with a specific user. However, for Registry data extracted from the NTUSER.DAT or USRCLASS.DAT hives, the analyst will need to ensure that the user is identified, whereas this field is auto-populated by my tools that parse the Event Logs (.evt files).

Much like the Host field, users can be identified in a variety of means...SID, username, domain\username, email address, chat ID, etc. This field can also have a legend, allowing the analyst to convert all of the various values to a single key identifier.

Usually, a SID will be the longest method of referring to a user, and as such would likely be the maximum length for this field.

Description - This is something of a free-form, variable length field, including enough information to concisely describe the event in question. For Event Log records, I tend to include the event source and ID (so that it can be easily researched on EventID.net) , as well as the event message strings.

Now, for expansion, there may need to be a number of additional, perhaps optional fields. One is a means for grouping individual events into a super-event or a duration event, such as in the case of a search or AV scan. How this identifier is structured still needs to be defined; it can consist of an identifier in an additional column, or it may consist of some other structure.

Another possible optional field can be a notes field of some kind. For example, event records from EVT or EVTX files can be confusing; adding additional information from EventID.net or other credible sources may add context to the timeline, particularly if multiple examiners will be reviewing the final timeline data.

This format allows for flexibility in storing and processing timeline data. For example, I currently use flat ASCII text files for my timelines, as do others. Don has mentioned using Highlighter as a means for analyzing an ASCII text timeline. However, this does not obviate using a database rather than flat text files; in fact, as the amount of data grows and as visualization methods are developed for timelines, using a database may become the standard for storing timeline data.

It is my hope that keeping the structure of the timeline data simple and well-defined will assist in expanding the use of timeline creation and analysis. The structure defined in this post is independent of the raw data itself, as well as the means by which the data is extracted. Further, structure is independent of the storage means, be it a flat ASCII text file, a spreadsheet or a database. I hope that those of us performing timeline analysis can settle/agree upon a common structure for the data; from there, we can move on to visualization methods.

What are your thoughts?

14 comments:

Don C. Weber said...

There seem to be two ways to do this officially. One is via IEEE and requires a membership which appears to be ~$200US. (This is better than the ~$3000US I thought it was going to be. Unless I am missing something.) The IEEE Standards Initiative, which is how standards are proposed and reviewed, can be found hear.

The other option is the Open Source Initiative which has a standards development section here. I did not see any dues requirements.

There may be others but I stopped at two.

Not sure as to the benefit of one over the other. I imagine that an IEEE Standard will be harder to modify once implemented but the Open Source Standard would be less likely to be implemented in commercial software.

As to a need, well, there is obviously a need. The real question is whether anybody would really care enough to implement the standard since the majority of the commercial tools that provide data analysis capabilities have their own directions. I would imagine that to get something wide reaching we would need to have at least one if not several of these companies on-board and actively participating in development as well as implementation. That, in my opinion, would require that the standard provides them some type of cost benefit as I don't see them doing it out of the kindness of their hearts. Non-commercial tool development I think would utilize the standard initially as it would help developers by already providing an initial framework to begin development. However, once projects start taking off and these developers see other needs or requirements I see them deviating from the standard rather than trying to help it grow and advance.

So, I guess the real question to you is: is it worth your time and effort to champion an effort such as a Timeline Analysis Format Standard? Only you can answer that question. I say if you get at least four people talking to you about it and saying they will devote some time then it might be worth it. I would also say, as I mentioned before, you will need an organization to help champion it as well. A commercial data analysis software company would be nice. Endorsements from some of the security organizations would help as well. Would the standard benefit the community enough to justify the extra effort?

Go forth and do good things,
Don C. Weber

H. Carvey said...

I honestly don't see where any of this is required. Seriously, it's all really a matter of adoption.

For example, what directs the commercial tools? Given my experience with two of them, it seems that the big push comes from customers; if enough customers, and big enough customers, want and push for something, the commercial tools will follow suit.

But is that really necessary? You've indicated in your blog that there are EnScripts that you use to export data; I've written ProDiscover ProScripts. So providing a scripting capability provides the means; it's up to the users to develop the specific capabilities and functionality.

I think that all it really needs is agreement between some key players to get some momentum.

Don C. Weber said...

Well, by using IEEE or Open Source it means that the standard is public and not just maintained across several blog posts. Also, it means that the "key players" are known officially rather than across several blog post and agreements in comments, emails, or the halls of conferences.

I understand what you mean by scripting, but eventually the momentum of timeline analysis will peak the interest of the commercial tool developers. Then, lacking a well defined, maintained, and public standard, they will implement their own (although the scripting feature should still be in place and usable). This means divergence which I believe is what you are attempting to avoid by broaching this subject.

Don C. Weber

H. Carvey said...

Keeping the discussion at this point in blogs and on lists allows for the kinks to be worked out. You and Kristinn have commented to the blog posts; right now, we're trying to reach consensus. Why would anyone want to spend the effort and resources to submit a standard that no one agreed to?

As to key players...there're RFCs that include key players, definitions that are thought of as standards, etc.

We need to figure out if we can reach agreement first.

I don't see commercial tools picking this up for a while. Also, when/if they do, I would hope that there are enough examples of standards divergence that have led to serious issues that this would be considered and taken into account.

But again, for right now, none of this matters until the "key players" figure out if we can come up with a common structure or not.

Kristinn said...

I have to agree with Harlan here, I don't think this standard needs to be adopted by IEEE, at least not at this point in time. I think that would really be an overkill.

I think Harlan is right that at least for this moment it would be sufficient for some of the key players to settle with a standard and then move on from there. If the people that are currently developing tools for timeline analysis settle on a standard and continue to develop tools using it I think that would push others to adapt it as well.

And in the future, if we see that this is getting out of hands, then perhaps it would be wise to reconsider and go over the procedure to get the standard official. This is a process that can take quite some time and effort, so it is not something to do without a clear need to actually go forward with it.

For now, I think we can settle on a single standard, whether that by a temporary one or a permanent. The standard has to be flexible enough so that it can be changed if in the future we see a need to do so (something that can be quite difficult to do if it is an IEEE standard).

And speaking of flexible... is there a need to maintain a version number of the standard within it? That is to have a version field denoting the TLN versioning number or is that something that is obvious (5 fields is version 0.1 6 fields is version 0.2 or something like that).

H. Carvey said...

Krisinn,

Maybe a good idea would be to figure out what the fields should be first. I think having some sort of versioning in place is a good idea, but IMHO, it would be better to figure the format out, identifying required and optional fields first, and get as far along to the first version as we can before submitting something. That way, we avoid a lot of changes and versions in a short period of time (no pun intended).

Kristinn said...

it would be better to figure the format out, identifying required and optional fields first, and get as far along to the first version as we can before submitting something. That way, we avoid a lot of changes and versions in a short period of time (no pun intended).

Couldn't agree more...

Don C. Weber said...

Okay, I forgot about RFCs.

I think that the development of this needs to be maintained as a separate page on one of our sites instead of across blog posts. People will need a single point of reference rather than trying to guess the latest. I offer a page on my site but I am flexible. I say we take this to email and invite/include any of the "key players" you guys propose. I will start this later today if one of you doesn't beat me to the punch.

Don C. Weber

H. Carvey said...

Here's what I suggest...send me an email (keydet89 at yahoo dot com) from the email address that you'd like to use for this, and include suggestions for who you think should be included...please contact them first and see if they even want to be involved.

I will compile the information and put together a text file that outlines what we have so far, and send it around.

Kristinn said...

ok, this sounds like a good idea ;) this is better than to toss it around in comments ;)

Geoff said...

Gentlemen, good luck in your endeavor. Getting Big Player #1 to fall in line with the scripts they build internally will be difficult, but it's a worthwhile pursuit. Hopefully the exposure will educate the community that there's more to look at than SIA timestamps.

H. Carvey said...

...Getting Big Player #1 to fall in line...

Not sure who you're referring to, but I doubt that any commercial vendor is going to "fall in line", particularly with something as little used at timeline analysis.

Hopefully the exposure will educate the community that there's more to look at than SIA timestamps.

I hope to be able to get that concept out there, too...particularly that there are timestamps less mutable and perhaps more reliable than SIA timestamps.

In addition, I'd like to get the idea that the use of multiple data sources can increase confidence beyond what can be provided by individual sources (thanks to Cory for pointing the way to that one...)

Anonymous said...

Hey Harlan and everyone else, this sounds like a great idea. From the LE perspective timelines are often a really important aspect of evidence presentation. I would disagree with "something as little used as timeline analysis" the reason why it is little used is because there are not good tools to do it. (if you know of any _good_ tools please let me know)

However this is a chicken and egg problem. Do you have a standard first and then develop the tools or develop the tool first and create a demand for the standard. Having being invovled in developing ISO standards I cannot think of a better way of killing of a standard effort than getting involved with a standards body. Think about tcpdump, in that case a great tool was created and the format is now the defacto standard for packet captures. The same has happened with the EWF format potential religious argument starting however like it or not the EWF format is the defacto standard because the tools support it. Andy Rosen has developed his sparse file format (which I think is a great idea), and the electronic evidence bag also has potential but until they are supported by the tools no one will use them.
So if you want this to work you need to come up with a tool. That you can plug a bunch of data into and generate lots of pretty pictures your average jury can understand.
hmm this sounds like a great product maybe I should start coding......

H. Carvey said...

Anonymous,

...there are not good tools to do it.

I posted a set of tools and a document to the Win4n6 group, and there's log2timeline, as well.

So if you want this to work you need to come up with a tool.

I've already been using the stuff I wrote, quite effectively. While it's still a pretty manual process at the moment, the benefit is well worth the cost.