Tuesday, January 01, 2013


I've recently been working on a script to parse the NTFS $UsnJrnl:$J file, also known as the USN Change Journal.  Rather than blogging about the technical aspects of what this file is, or why a forensic analyst would want to parse it, I thought that this would be a great opportunity to instead talk about programming and parsing binary structures.

There are several things I like about being able to program, as an aspect of my DFIR work:
- It very often allows me to achieve something that I cannot achieve through the use of commercially available tools.  Sometimes it allows me to "get there" faster, other times, it's the only way to "get there".
- I tend to break my work down into distinct, compartmentalized tasks, which lends itself well to programming (and vice versa).
- It gives me a challenge.  I can focus my effort and concentration on solving a problem, one that I will likely see again and will already have an automated solution for solving when I see it.
- It allows me to see the data in its raw form, not filtered through an application written by a developer.  This allows me to see data within the various structures (based on structure definitions from MS and others), and possibly find new ways to use that data.

One of the benefits of programming is that I have all of this code available, not just as complete applications but also stuff I've written to help me perform analysis.  Stuff like translating time values (FILETIME objects, DOSDate time stamps, etc.), as well as a printData() function that takes binary data of an arbitrary length and translates it into a hex editor-style view, which makes it easy to print out sections of data and work with them directly.  Being able to reuse this code (even if "code reuse" is simply a matter of copy-paste) means that I can achieve a pretty extensive depth of analysis in fairly short order, reducing the time it takes for me to collect, parse, and analyze data at a more comprehensive level than before.  If I'm parsing some data, and use the printData() function to display the binary data in hex at the console, I may very well recognize a 64-bit time stamp at a regular offset, and then be able to add that to my parsing routine.  That's kind of how I went about writing the shellbags.pl plugin for RegRipper.

I've also recently been looking at IE index.dat files in a hex editor, and writing my own parser based on the MSIE Cache File Format put together by Joachim Metz.  So far, my initial parser works very well against the index.dat file in the TIF folder, as well as the one associated with the cookies.  But what's really fascinating about this is what I'm seeing...each record has two FILETIME objects and up to three DOSDate (aka, FATTime) time stamps, in addition to other metadata.  For any given entry, all of these fields may not be populated, but the fact is that I can view them...and verify them with a hex editor, if necessary.

As a side note regarding that code, I've found it very useful so far.  I can run the code at the command line, and pipe the output through one or more "find" commands in order to locate or view specific entries.  For example, the following command line gets the "Location : " fields for me, and then looks for specific entries; in this case, "apple":

C:\tools>parseie.pl index.dat | find "Location :" | find "apple" /i

Using the above command line, I'm able to narrow down the access to specific things, such as purchase of items via the Apple Store, etc.

I've also been working on a $UsnJrnl (actually, the $UsnJrnl:$J ADS file) parser, which itself has been fascinating.  This work was partially based on something I've felt that I've needed to do for a while now, and talking to Corey Harrell about some of his recent findings has renewed my interest in this effort, particularly as it applies to malware detection.

Understanding binary structures can be very helpful.  For example, consider the target.lnk file illustrated in this write-up of the Gauss malware.   If you parse the information manually, using the MS specification...which should not be hard because there are only 0xC3 bytes visible...you'll see that the FILETIME time stamps for the target file are nonsense (Cheeky4n6Monkey got that, as well).  As you parse the shell item ID list, based on the MS specification, you'll see that the first item is a System folder that points to "My Computer", and the second item is a Device entry whose GUID is "{21ec2020-3aea-1069-a2dd-08002b30309d}".  When I looked this GUID up online, I found some interesting references to protecting or locking folders, such as this one at LIUtilities, and this one at GovernmentSecurity.org.  I found this list of shell folder IDs, which might also be useful.

The final shell item, located at offset 0x84, is type 0x06, which isn't something that I've seen before.  But there's nothing in the write-up that explains in detail how this LNK file might be used by the malware for persistence or propagation, so this was just an interesting exercise for me, as well as for Cheeky4n6Monkey, who also worked on parsing the target.lnk file manually.  So, why even bother?  Well, like I said, it's extremely beneficial to understand the format of various binary structures, but there's another reason.  Have you read these posts over on the CyanLab blog? No?  You should.  I've seen shortcut/LNK files with no LinkInfo block, only the shell item ID list, that point to devices; as such, being able to parse and understand these...or even just recognize them...can be very beneficial if you're at all interested in determining USB storage devices that had been connected to a system.  So far, most of these devices that I have seen have been digital cameras and smart phone handsets.

Okay, right about now, you're probably thinking, "so what?"  Who cares, right?  Well, this should be a very interesting, if not outright important issue for DFIR analysts...many of whom want to see everything when it comes to analysis.  So the question then becomes...are you seeing everything?  When you run your tool of choice, is it getting everything?

Folks like Chris Pogue talk a lot about analysis techniques like "sniper forensics", which is an extremely valuable means for performing data collection and analysis.  However, let's take another look at the above question, from the perspective of sniper forensics...do you have the data you need?  If you don't know what's there, how do you know?

If you don't know that Windows shortcut files include a shell item ID list, and what that data means, then how can you evaluate the use of a tool that parses LNK files?  I'm using shell item ID lists as an example, simply because they're so very pervasive on Windows 7 systems...they're in shortcut files, Jump Lists, Registry value data.  They're in a LOT of Registry value data.  But the concept applies to other aspects of analysis, such as browser analysis.  When you're performing browser analysis in order to determine user activity, are you just checking the history and cookies, or are you including Registry settings ("TypedURLs" key values for IE 5-9, and "TypedURLsTimes" key values on Windows 8), bookmarks, and session restore files?  When performing USB device analysis on Windows systems, are you looking for all devices, or are you using checklists that only cover thumb drives and external hard drives?

I know that my previous paragraph covers a couple of different levels of granularity, but the point remains the same...are you getting everything that you need or want to perform your analysis?  Does the tool you're using get all system and/or user activity, or does it get some of it?

Can we ever know it all?
One of the aspects of the DFIR community is that, for the most part, most of us seem to work in isolation.  We work our cases and exams, and don't really bother too much with asking someone else, someone we know and trust, "hey, did I look at everything I could have here?" or "did I look at everything I needed to in order to address my analysis goals in a comprehensive manner?"  For a variety of reasons, we don't tend to seek out peer review, even after cases are over and done.

But you know something...we can't know it all.  No one of us is as smart or experienced as several or all of us working together.  This can be close collaboration, face-to-face, or online collaboration through blogs, or sites such as the ForensicsWiki, which makes a great repository, if it's used.

Finally, a word about choices in programming languages to use.  Some folks have a preference.  I've been using Perl for a long time, since 1999.  I learned BASIC in the '80s, as well as some Pascal, and then in the mid-'90s, I picked up some Java as part of my graduate studies.  I know some folks prefer Python, and that's fine.  Some folks within the community would like to believe that there are sharp divides between these two camps, that some who use one language detest the other, as well as those who use it.  Nothing could be further from the truth.  In fact, I would suggest that this attempt to create drama where there is none is simply a means of masking the fact that some analysts and examiners simply don't understand the technical aspects of the work that's actually being done.

Forensics from the Sausage Factory - USN Change Journal
Security BrainDump - Post regarding the USN Change Journal
OpenFoundry - Free tools; page includes link to a Python script for parsing the $UsnJrnl:$J file

1 comment:

dfirfpi said...

I share everything you wrote. I personally try to write always a "couple" of code lines when I face a new artifact: imho it is the only way I can feel comfortable in the understanding of them. Besides, it's also a way to test the tools that I use during the analysis.