Wednesday, July 30, 2014

Book Review: "The Art of Memory Forensics"

I recently received a copy of The Art of Memory Forensics (thanks, Jamie!!), with a request that I write a review of the book.  Being a somewhat outspoken proponent of constructive and thoughtful feedback within the DFIR community, I agreed.

This is the seminal resource/tome on memory analysis, brought to you by THE top minds in the field.  The book covers Windows, Linux, and Mac memory analysis, and as such must be part of every DFIR analyst's reading and reference list.  The book is 858 pages (not including the ToC, Introduction, and index), and is quite literally packed with valuable information.

Some context is necessary...I'm writing this review as someone who has used Volatility for some time, albeit not to it's fullest possible extent.  I'm more of an incident responder, and not so much a malware reverse engineer; I tend to work with some really good malware RE folks and usually go to them for the deeper stuff.  I've converted hibernation files and found some pretty interesting artifacts within the resulting raw memory (my case notes are rife with some of these artifacts), and I've reached to Jamie Levy on several occasions for support.  In addition, I recently completed the five-day Volatility training course.

Also, I spend most of my time working on Windows systems; as such, I cannot offer a great deal of value, nor insight, when it comes to reviewing the information that this book contains on Linux and Mac memory. However, I have worked with some of the folks who provided material for these sections, and I've seen them present at the Open Memory Forensics Workshop (OMFW), and to say that these folks are competent is a gross understatement.

That being said, this book is the most comprehensive reference that covers the topic of memory analysis, from start to finish, available.  The authors begin the book by providing a detailed description of system architecture, as it pertains to memory, discussing address translation and paging (among other topics) before progressing into data structures.  This ground-up approach provides the foundational knowledge that's really required for a complete understanding of memory analysis.  The book then proceeds with a complete walk-through of the Volatility Framework itself, covering topics such as plugins, basic and advanced usage, etc.  There is even a chapter that covers just memory acquisition, addressing tools, tool usage, and hive extraction (using the TSK tools) to assist in profile identification.  All of this information is covered prior to addressing actual memory analysis, so that by the time a reader gets to chapter 5, they should have some understanding of memory structure and how to acquire memory.

Something pointed out in chapter 4 (Memory Acquisition) is worth repeating...that memory acquisition via software is a "topic of heated debate".  While the authors do provide a comprehensive list of software tools that can be used to acquire memory, they also state that the list is not to be viewed as an evaluation, nor should the reader consider the fact that a tool is on the list as an endorsement of that tool.  As such, YMMV based on personal experience...

Throughout the book, the authors bring their incredible wealth of experience to bear in this book, as well.  After all, who better to write a book such as this than the folks who developed the Volatility Framework as a means to meet their own needs in memory analysis, while working on what are arguably the most technologically complex cases seen.  The section on Windows memory forensics covers 14 chapters, and interspersed throughout those chapters are examples of how memory analysis can be used to assist in a wide range of analysis.  Each section starts with an "objectives" section that outlines what the reader can expect to understand once they've completed the section, and many sections provide IRL (or near-IRL) examples of how to use Volatility to support the analysis in question.  As such, the authors are not just providing a "...use this plugin...", as much as they're also providing examples of what the output of the plugin means, and how it pertains to the investigation or analysis in question.

At this point, I've had my copy of the book for a few days, and I've had a ruler and highlighter on hand since I first cracked the spine.  The formatting of the book is such that I've already started adding my own notes to the margins, based on my own exams.  I've found it valuable to go back to case notes and write notes in the margins of the book, adding context from my own exams to what the author's have provided.  This simply increases the value to the book as a reference resource.  In addition, the book is rife with caveats, concerns, and tidbits...such as the section on Timestomping Registry Keys, and what intruders have done that modify the LastWrite time of the Policy\Secrets key in the Security hive.  There's even an entire section on timelining!

If you have an interest in memory analysis, this is THE MUST-HAVE resource!  To say that if you or anyone on your team is analyzing Windows systems and doesn't have this book on your shelf is wrong, is wholly incorrect.  Do NOT keep this book on a shelf...keep it on your desk, and open!  Within the first two weeks of this book arriving into your hands, it should have a well-worn spine, and dirty finger prints and stains on the pages!  If you have a team of analysts, purchase multiple copies and engage the analysts in discussions.  If one of your analysts receives a laptop system for analysis and the report does not include information regarding the analysis of the hibernation file, I would recommend asking them why - they may have a perfectly legitimate reason for not analyzing this file, but if you had read even just a few chapters of this book, you'd understand why memory analysis is too important to ignore.

Thursday, July 24, 2014

File system ops, testing phase 2

As I mentioned in my previous post on this topic, there were two other tests that I wanted to conduct with respect to file system operations and the effects an analyst might expect to observe within the MFT, and the USN change journal.  My thoughts were that if an intruder were accessing a system via RDP, they might not do the drag-and-drop method to move files, or if they were accessing the system via a RAT and they only had command line access, they might use native, command line tools to conduct file operations.

Testing Protocol
All of the same conditions exist from the previous tests, in fact, I didn't even boot the VM between tests.  What I wanted to do this time is look at what effects one could expect to see for copy and move operations conducted via the command line, rather than via the shell.  I wanted to run these tests, as they would better represent the file system operations that may occur during a malware infection.

For this set of tests, I logged into the VM, opened a command prompt and typed the following commands:

C:\>copy c:\tools\eula_30.txt c:\temp\eula_31.txt
C:\>move c:\tools\procmon.exe c:\temp\procmon.exe

Test 1 - copy operation, via the command line

Original file record:

44657      FILE Seq: 3    Links: 1   
[FILE],[BASE RECORD]
.\tools\eula_30.txt
    M: Fri Nov  8 15:17:17 2013 Z
    A: Fri Jul 28 14:32:44 2006 Z
    C: Thu Jul 17 20:38:52 2014 Z
    B: Fri Jul 28 14:32:44 2006 Z
  FN: eula_30.txt  Parent Ref: 44361/32
  Namespace: 3
    M: Fri Nov  8 15:17:17 2013 Z
    A: Fri Jul 28 14:32:44 2006 Z
    C: Fri Nov  8 15:17:17 2013 Z
    B: Fri Jul 28 14:32:44 2006 Z

Resulting file record:

23643      FILE Seq: 6    Links: 1   
[FILE],[BASE RECORD]
.\temp\eula_31.txt
    M: Fri Nov  8 15:17:17 2013 Z
    A: Thu Jul 24 14:57:41 2014 Z
    C: Thu Jul 24 14:57:41 2014 Z
    B: Thu Jul 24 14:57:41 2014 Z
  FN: eula_31.txt  Parent Ref: 44311/7
  Namespace: 3
    M: Thu Jul 24 14:57:41 2014 Z
    A: Thu Jul 24 14:57:41 2014 Z
    C: Thu Jul 24 14:57:41 2014 Z
    B: Thu Jul 24 14:57:41 2014 Z

From the USN change journal (as with the previous test, these entries are not in order):

eula_31.txt: Named_Data_Extend,Data_Extend,Data_Overwrite,Stream_Change  FileRef: 23643/6  ParentRef: 44311/7
eula_31.txt: Data_Extend,Data_Overwrite  FileRef: 23643/6  ParentRef: 44311/7
eula_31.txt: Close,File_Create  FileRef: 23643/6  ParentRef: 44311/7
eula_31.txt: Data_Extend,Data_Overwrite,Stream_Change  FileRef: 23643/6  ParentRef: 44311/7
eula_31.txt: Data_Extend  FileRef: 23643/6  ParentRef: 44311/7
eula_31.txt: Named_Data_Extend,Data_Extend,Data_Overwrite,Named_Data_Overwrite,Close,Stream_Change  FileRef: 23643/6  ParentRef: 44311/7
eula_31.txt: Named_Data_Extend,Data_Extend,Data_Overwrite,Named_Data_Overwrite,Stream_Change  FileRef: 23643/6  ParentRef: 44311/7
eula_31.txt: File_Create  FileRef: 23643/6  ParentRef: 44311/7

Results:
The results of the file copy operation, with respect to the MFT record (i.e., attribute time stamps, parent ref number, etc.) are identical to what we saw when the test was performed via the shell.  The most notable exception is the absence of references to consent.exe being launched in the USN change journal data.

Test 2 - move operation, via the command line

File record following previous test:

22977      FILE Seq: 12   Links: 1   
[FILE],[BASE RECORD]
.\tools\procmon.exe
    M: Thu Jul 17 20:40:35 2014 Z
    A: Thu Jul 17 20:40:35 2014 Z
    C: Thu Jul 17 20:40:35 2014 Z
    B: Fri May 31 20:54:54 2013 Z
  FN: procmon.exe  Parent Ref: 44361/32
  Namespace: 3
    M: Thu Jul 17 20:40:35 2014 Z
    A: Thu Jul 17 20:40:35 2014 Z
    C: Thu Jul 17 20:40:35 2014 Z
    B: Fri May 31 20:54:54 2013 Z

File record following move operation:

22977      FILE Seq: 12   Links: 1   
[FILE],[BASE RECORD]
.\temp\procmon.exe
    M: Thu Jul 17 20:40:35 2014 Z
    A: Thu Jul 17 20:40:35 2014 Z
    C: Thu Jul 24 14:57:55 2014 Z
    B: Fri May 31 20:54:54 2013 Z
  FN: procmon.exe  Parent Ref: 44311/7
  Namespace: 3
    M: Thu Jul 17 20:40:35 2014 Z
    A: Thu Jul 17 20:40:35 2014 Z
    C: Thu Jul 17 20:40:35 2014 Z
    B: Fri May 31 20:54:54 2013 Z

From the USN change journal:

procmon.exe: Rename_New_Name,Close  FileRef: 22977/12  ParentRef: 44311/7
procmon.exe: Rename_New_Name  FileRef: 22977/12  ParentRef: 44311/7
procmon.exe: Rename_Old_Name  FileRef: 22977/12  ParentRef: 44361/32

Results
The results of this test were similar to the results observed in the previous test, with the exception that consent.exe was not run. The only change to the record was the modification of the parent ref number, which was reflected in the MFT entry change (C) time stamp in the $STANDARD_INFORMATION attribute being updated.

Take Aways
A couple of interesting "take aways" from this testing...

1.  When a file is copied or moved via the shell, we can expect to see consent.exe run, and on workstation systems (Win7, Win8.1) an application prefetch/*.pf file created.  This artifact on Win8.1 will be very beneficial, as the structure of *.pf files on that platform allows for up to 8 launch times to be recorded, adding much more granularity to our timelines.

2.  If an intruder accesses a system using compromised credentials, such as via RDP, there can be a great deal of activity 'recorded' in various locations within the system (i.e., Registry, Windows Event Log, etc.).  However, it an intruder is accessing the system via a RAT, there will be an apparent dearth of artifacts on the system, unless the analyst knows where to look.  This, of course, is in lieu of any additional instrumentation used to monitor the endpoints.

3.  For those who perform dynamic analysis of malware and exploit kits for the purposes of developing threat intel, adding this sort of thing to your analysis would very likely assist in developing a much more detailed picture of what's happening on the host, even weeks or months after the fact.

Final Note
I know that this testing is pretty rudimentary, and that much of the results have been documented already (via MS Knowledge Base articles, the SANS 2012 DFIR poster, etc.), but I wanted to take the testing a step further by looking at other artifacts in the individual MFT records, as well as the USN change journal.  In a lot of ways, the results of these tests serve a IoCs that can be used to help analysts add additional context to their timelines, and ultimately to their analysis.

Tuesday, July 22, 2014

File system ops, effects on MFT records

I recently conducted some testing of different actions on a Windows 7 system, with the specific purpose of identifying artifacts within the file system (in this case, the MFT and the USN change journal), particularly within individual records.  I wanted to take a look at the effects of different actions to see what they "look like" within the individual records, as well as within the USN change journal, in hopes that things would pop out that could be used during forensic exams.  Once I completed my testing, I decided to share what I'd done and what I'd found, in hopes that others might find it useful.

Testing Platform: 32-bit Windows 7 Ultimate VM running in Virtual Box.

Tools: My own custom stuff.  I updated the MFT parser included with WFA 4/e, and used usnj.pl to parse the USN Change Journal, and parse.pl to translate the output of the change journal parser into a timeline.  This page at MS identifies that USN record v2 structure, and the reason codes, used by usnj.pl.

Methodology:  I started by writing down and outlining all of the tests that I wanted to perform.  I had a total of 5 tests that I wanted to run in order to see what the effects of each individual action was on the MFT, and individual records within the MFT.  I picked 5 different files within the VM to use in each test, respectively.  Once that was done, I added the VM to FTK Imager as an evidence item and extracted the MFT; this was my "before" sample.  Then, I launched the VM, performed all of the tests, logged out and shut down the VM, and extracted the MFT (my "after" sample) and the USN change journal.

All testing occurred on 17 July 2014.  In all of the tests, I've changed the font color for items of interest to red.

Test 1 - Renaming a file
This was a simple test, but something I hadn't specifically looked at before.  All I did with this one was open a command prompt, change to the directory in question, and issued the command, "ren eula.txt eula30.txt".

Here's the record details from before the test was run:

44657      FILE Seq: 3    Links: 1   
[FILE],[BASE RECORD]
.\tools\Eula.txt
    M: Fri Nov  8 15:17:17 2013 Z
    A: Fri Jul 28 14:32:44 2006 Z
    C: Fri Nov  8 15:17:17 2013 Z
    B: Fri Jul 28 14:32:44 2006 Z
  FN: Eula.txt  Parent Ref: 44361/32
  Namespace: 3
    M: Fri Nov  8 15:17:17 2013 Z
    A: Fri Nov  8 15:17:17 2013 Z
    C: Fri Nov  8 15:17:17 2013 Z
    B: Fri Nov  8 15:17:17 2013 Z

...and here are the record details after the test:

44657      FILE Seq: 3    Links: 1   
[FILE],[BASE RECORD]
.\tools\eula_30.txt
    M: Fri Nov  8 15:17:17 2013 Z
    A: Fri Jul 28 14:32:44 2006 Z
    C: Thu Jul 17 20:38:52 2014 Z
    B: Fri Jul 28 14:32:44 2006 Z
  FN: eula_30.txt  Parent Ref: 44361/32
  Namespace: 3
    M: Fri Nov  8 15:17:17 2013 Z
    A: Fri Jul 28 14:32:44 2006 Z
    C: Fri Nov  8 15:17:17 2013 Z
    B: Fri Jul 28 14:32:44 2006 Z

Again, this was an atomic action; that is to say, all I did with respect to this file was run the ren command.  I honestly have no idea why the last accessed (A) and creation (B) dates from the $STANDARD_INFORMATION attribute would be copied into the corresponding time stamps of the $FILE_NAME attribute for a rename operation.  However, notice that very little else about the record changed; the record number (from the DWORD at offset 0x2C within the record header), the sequence number, and the parent file reference number remained the same, which is to be expected.

Here are the changes recorded in the USN change journal:

eula_30.txt: Rename_New_Name  FileRef: 44657/3  ParentRef: 44361/32
eula_30.txt: Rename_New_Name,Close  FileRef: 44657/3  ParentRef: 44361/32
Eula.txt: Rename_Old_Name  FileRef: 44657/3  ParentRef: 44361/32

Now, these changes are not in the specific order in which they occurred...they're listed in a timeline, so they occurred within the same second.  But it is interesting that there is rename_old_name and rename_new_name identifiers for the actions that took place.  Perhaps because a good deal of the analysis work that I do comes from corporate environments, I've been seeing a lot of Windows 7 systems with VSCs disabled in the Registry; as such, I haven't had access to an older version of the MFT via a VSC in order to compare record contents, on a per-record basis.  By incorporating the USN change journal into my analysis, I can get some additional context with respect to what I'm seeing.

The use of the USN change journal can also be useful in identifying activity that occurs during a malware infection.  For example, in some cases, malware may create a downloader, use that to download another bit of malware, and then delete the original downloader.  The USN change journal can help you identify that activity, even if the MFT record for the original downloader has been reused and overwritten.

Test 2 - Adding an ADS to a file
For this test, I added an ADS to a file by typing echo "This is an ADS" > procmon.chm:ads.txt at the command prompt.  Now, this file is the ProcMon help file that is included when you download the ProcMon archive from SysInternals, and as such, it already had a Zone.Identifier ADS associated with the file.

The "before" record:

44401      FILE Seq: 11   Links: 1   
[FILE],[BASE RECORD]
.\tools\procmon.chm
    M: Fri Nov  8 15:17:17 2013 Z
    A: Mon Nov 28 16:46:42 2011 Z
    C: Fri Nov  8 15:17:17 2013 Z
    B: Mon Nov 28 16:46:42 2011 Z
  FN: procmon.chm  Parent Ref: 44361/32
  Namespace: 3
    M: Fri Nov  8 15:17:16 2013 Z
    A: Fri Nov  8 15:17:16 2013 Z
    C: Fri Nov  8 15:17:16 2013 Z
    B: Fri Nov  8 15:17:16 2013 Z
**ADS: Zone.Identifier

...and the "after" record:

44401      FILE Seq: 11   Links: 1   
[FILE],[BASE RECORD]
.\tools\procmon.chm
    M: Thu Jul 17 20:39:22 2014 Z
    A: Mon Nov 28 16:46:42 2011 Z
    C: Thu Jul 17 20:39:22 2014 Z
    B: Mon Nov 28 16:46:42 2011 Z
  FN: procmon.chm  Parent Ref: 44361/32
  Namespace: 3
    M: Fri Nov  8 15:17:16 2013 Z
    A: Fri Nov  8 15:17:16 2013 Z
    C: Fri Nov  8 15:17:16 2013 Z
    B: Fri Nov  8 15:17:16 2013 Z
**ADS: ads.txt
**ADS: Zone.Identifier

In this case, you'll notice that only the M (modified) and C (MFT entry change) times in the $STANDARD_INFORMATION attribute have changed.  I would expect that the C (entry changed) time stamp would change, as the addition of an ADS constitutes a change to the MFT record itself, but the M (last modified) time stamp changed, also.

From the USN change journal:

procmon.chm: Stream_Change  FileRef: 44401/11  ParentRef: 44361/32
procmon.chm: Named_Data_Extend,Close,Stream_Change  FileRef: 44401/11  ParentRef: 44361/32
procmon.chm: Named_Data_Extend,Stream_Change  FileRef: 44401/11  ParentRef: 44361/32

So now, if an ADS is suspected, a good place to look for indications of when the ADS was added to a file (or folder) would be to parse the USN change journal and look for stream_change entries.  This can be valuable during an examination because an ADS does not have any unique time stamps associated with it within the MFT record.  An ADS is a $DATA attribute within the MFT record, and as such, does not have a unique $STANDARD_INFORMATION or $FILE_NAME attribute associated with it.

Test 3 - File system tunneling
In this test, I created a batch file named "tunnel.bat" in the C:\Tools folder, with the following contents:

del procmon.exe
echo "This is a test file" > procmon.exe

For this test, I ran the batch file, which deletes procmon.exe and then creates a new file named procmon.exe in the same folder, in relatively short order.  In fact, for file system tunneling to take effect, the entire process has to happen within 15 seconds (by default; the time can be changed, or file system tunneling itself disabled, via the Registry).  As we'll see, the entire process took place within a second.

The original MFT record appears as follows:

44631      FILE Seq: 4    Links: 1   
[FILE],[BASE RECORD]
.\tools\Procmon.exe
    M: Fri Nov  8 15:17:17 2013 Z
    A: Fri May 31 20:54:54 2013 Z
    C: Fri Nov  8 15:17:17 2013 Z
    B: Fri May 31 20:54:54 2013 Z
  FN: Procmon.exe  Parent Ref: 44361/32
  Namespace: 3
    M: Fri Nov  8 15:17:17 2013 Z
    A: Fri Nov  8 15:17:17 2013 Z
    C: Fri Nov  8 15:17:17 2013 Z
    B: Fri Nov  8 15:17:17 2013 Z
**ADS: Zone.Identifier

After the test was run, the MFT record appeared as follows:

44631      FILE Seq: 5    Links: 1   
[FILE],[DELETED],[BASE RECORD]
    M: Fri Nov  8 15:17:17 2013 Z
    A: Fri May 31 20:54:54 2013 Z
    C: Fri Nov  8 15:17:17 2013 Z
    B: Fri May 31 20:54:54 2013 Z
  FN: Procmon.exe  Parent Ref: 44361/32
  Namespace: 3
    M: Fri Nov  8 15:17:17 2013 Z
    A: Fri Nov  8 15:17:17 2013 Z
    C: Fri Nov  8 15:17:17 2013 Z
    B: Fri Nov  8 15:17:17 2013 Z
**ADS: Zone.Identifier

Here's the new file record for the file:

22977      FILE Seq: 12   Links: 1   
[FILE],[BASE RECORD]
.\tools\procmon.exe
    M: Thu Jul 17 20:40:35 2014 Z
    A: Thu Jul 17 20:40:35 2014 Z
    C: Thu Jul 17 20:40:35 2014 Z
    B: Fri May 31 20:54:54 2013 Z
  FN: procmon.exe  Parent Ref: 44361/32
  Namespace: 3
    M: Thu Jul 17 20:40:35 2014 Z
    A: Thu Jul 17 20:40:35 2014 Z
    C: Thu Jul 17 20:40:35 2014 Z
    B: Fri May 31 20:54:54 2013 Z
[RESIDENT]

Notice that the only difference between the two 44631 records is the sequence number, and that the original file record is now marked "DELETED".  What this illustrates is that the MFT record itself is NOT reused during file system tunneling on NTFS, and that a new record is created during the operation.  This was something I'd wondered about for some time, and now I can see the effect of file system tunneling.

We can see in this case that the MAC times for the new file are all for the date of the testing, and that the B (creation) date is from the original file record.  Also, notice the $FILE_NAME attribute time stamps of the new file...very interesting.

Also, because the file went from being a PE file to a string, the resulting file is now resident; I didn't include the hex dump of the file contents, extracted from the MFT record.

This blog post (from 2005) explains why tunneling exists at all.

From the USN change journal:

procmon.exe: Data_Extend,Close,File_Create  FileRef: 22977/12  ParentRef: 44361/32
procmon.exe: Data_Extend,File_Create  FileRef: 22977/12  ParentRef: 44361/32
procmon.exe: File_Create  FileRef: 22977/12  ParentRef: 44361/32
Procmon.exe: File_Delete,Close  FileRef: 44631/4  ParentRef: 44361/32

When I first read about file system tunneling, I was curious as to whether the original MFT record for the deleted file was simply reused, and this test clearly illustrates that is not the case.

Additional Resources:
Here's a jIIr post from Corey Harrell in which he discusses the use of the USN change journal and file system tunneling
- Eric Huber's blog post on file system tunneling
- Blazer Catzen discussed some file system tunneling testing he'd done on David Cowen's Forensic Lunch podcast, and posted the presentation he'd put together on the subject.

Test 4 - Copy a file to another location in the same volume
In this test, I copied C:\Windows\Logs\IE9_NR_setup.log to C:\Users\IE9_NR_setup.log, using drag-n-drop via the Windows Explorer shell.

From "before" MFT:

96296      FILE Seq: 3    Links: 2   
[FILE],[BASE RECORD]
.\Windows\Logs\IE9_NR_Setup.log
    M: Fri Nov  8 13:26:02 2013 Z
    A: Fri Nov  8 13:26:02 2013 Z
    C: Fri Nov  8 13:26:02 2013 Z
    B: Fri Nov  8 13:26:02 2013 Z
  FN: IE9_NR~1.LOG  Parent Ref: 1966/1
  Namespace: 2
    M: Fri Nov  8 13:26:02 2013 Z
    A: Fri Nov  8 13:26:02 2013 Z
    C: Fri Nov  8 13:26:02 2013 Z
    B: Fri Nov  8 13:26:02 2013 Z
  FN: IE9_NR_Setup.log  Parent Ref: 1966/1
  Namespace: 1
    M: Fri Nov  8 13:26:02 2013 Z
    A: Fri Nov  8 13:26:02 2013 Z
    C: Fri Nov  8 13:26:02 2013 Z
    B: Fri Nov  8 13:26:02 2013 Z

From the "after" MFT, the original file:

96296      FILE Seq: 3    Links: 2   
[FILE],[BASE RECORD]
.\Windows\Logs\IE9_NR_Setup.log
    M: Fri Nov  8 13:26:02 2013 Z
    A: Fri Nov  8 13:26:02 2013 Z
    C: Fri Nov  8 13:26:02 2013 Z
    B: Fri Nov  8 13:26:02 2013 Z
  FN: IE9_NR~1.LOG  Parent Ref: 1966/1
  Namespace: 2
    M: Fri Nov  8 13:26:02 2013 Z
    A: Fri Nov  8 13:26:02 2013 Z
    C: Fri Nov  8 13:26:02 2013 Z
    B: Fri Nov  8 13:26:02 2013 Z
  FN: IE9_NR_Setup.log  Parent Ref: 1966/1
  Namespace: 1
    M: Fri Nov  8 13:26:02 2013 Z
    A: Fri Nov  8 13:26:02 2013 Z
    C: Fri Nov  8 13:26:02 2013 Z
    B: Fri Nov  8 13:26:02 2013 Z

...and the resulting file:

22987      FILE Seq: 12   Links: 2   
[FILE],[BASE RECORD]
.\Users\IE9_NR_Setup.log
    M: Fri Nov  8 13:26:02 2013 Z
    A: Thu Jul 17 20:41:39 2014 Z
    C: Thu Jul 17 20:41:39 2014 Z
    B: Thu Jul 17 20:41:39 2014 Z
  FN: IE9_NR~1.LOG  Parent Ref: 486/1
  Namespace: 2
    M: Thu Jul 17 20:41:39 2014 Z
    A: Thu Jul 17 20:41:39 2014 Z
    C: Thu Jul 17 20:41:39 2014 Z
    B: Thu Jul 17 20:41:39 2014 Z
  FN: IE9_NR_Setup.log  Parent Ref: 486/1
  Namespace: 1
    M: Thu Jul 17 20:41:39 2014 Z
    A: Thu Jul 17 20:41:39 2014 Z
    C: Thu Jul 17 20:41:39 2014 Z
    B: Thu Jul 17 20:41:39 2014 Z

Now, one question you might have is that if I dragged-and-dropped the file, shouldn't the record show indications of the file having been accessed?  Well, we have to remember that as of Vista, the NtfsDisableLastAccessUpdate value is enabled by default, meaning that "normal" user actions won't cause the

From the USN change journal:

IE9_NR_Setup.log: Data_Extend,Data_Overwrite,File_Create  FileRef: 22987/12  ParentRef: 486/1
IE9_NR_Setup.log: File_Create  FileRef: 22987/12  ParentRef: 486/1
IE9_NR_Setup.log: Data_Extend,File_Create  FileRef: 22987/12  ParentRef: 486/1
CONSENT.EXE-531BD9EA.pf: Data_Extend,Data_Truncation,Close  FileRef: 1582/11  ParentRef: 59062/1
CONSENT.EXE-531BD9EA.pf: Data_Truncation  FileRef: 1582/11  ParentRef: 59062/1
CONSENT.EXE-531BD9EA.pf: Data_Extend,Data_Truncation  FileRef: 1582/11  ParentRef: 59062/1
IE9_NR_Setup.log: Data_Extend,Data_Overwrite,Close,File_Create  FileRef: 22987/12  ParentRef: 486/1

From the USN change journal, we see a reference to consent.exe being run; this is the dialog that pops up when you drag-and-drop a file between folders, asking if you want to copy or move the file, or cancel the operation.

Test 5 - Move a file to another location in the same volume
Moved C:\Windows\Logs\IE10_NR_setup.log to C:\Temp\IE10_NR_setup.log (drag-n-drop, via the Windows Explorer shell)

The "before" record:

16420      FILE Seq: 15   Links: 2   
[FILE],[BASE RECORD]
.\Windows\Logs\IE10_NR_Setup.log
    M: Fri Nov  8 14:24:59 2013 Z
    A: Fri Nov  8 14:24:59 2013 Z
    C: Fri Nov  8 14:24:59 2013 Z
    B: Fri Nov  8 14:24:59 2013 Z
  FN: IE10_N~1.LOG  Parent Ref: 1966/1
  Namespace: 2
    M: Fri Nov  8 14:24:59 2013 Z
    A: Fri Nov  8 14:24:59 2013 Z
    C: Fri Nov  8 14:24:59 2013 Z
    B: Fri Nov  8 14:24:59 2013 Z
  FN: IE10_NR_Setup.log  Parent Ref: 1966/1
  Namespace: 1
    M: Fri Nov  8 14:24:59 2013 Z
    A: Fri Nov  8 14:24:59 2013 Z
    C: Fri Nov  8 14:24:59 2013 Z
    B: Fri Nov  8 14:24:59 2013 Z

...and the "after" record:

16420      FILE Seq: 15   Links: 2   
[FILE],[BASE RECORD]
.\temp\IE10_NR_Setup.log
    M: Fri Nov  8 14:24:59 2013 Z
    A: Fri Nov  8 14:24:59 2013 Z
    C: Thu Jul 17 20:41:58 2014 Z
    B: Fri Nov  8 14:24:59 2013 Z
  FN: IE10_N~1.LOG  Parent Ref: 44311/7
  Namespace: 2
    M: Fri Nov  8 14:24:59 2013 Z
    A: Fri Nov  8 14:24:59 2013 Z
    C: Fri Nov  8 14:24:59 2013 Z
    B: Fri Nov  8 14:24:59 2013 Z
  FN: IE10_NR_Setup.log  Parent Ref: 44311/7
  Namespace: 1
    M: Fri Nov  8 14:24:59 2013 Z
    A: Fri Nov  8 14:24:59 2013 Z
    C: Fri Nov  8 14:24:59 2013 Z
    B: Fri Nov  8 14:24:59 2013 Z

Okay, the file was moved (copy + delete operations), but we might expect to see some changes in the time stamps...shouldn't we?  Well, in this case, we cannot tell if the $FILE_NAME attribute time stamps had been changed, because for this file, all of the time stamps, in all of the available attributes, were the same.  We do, however, see that the C (entry modified) time in the $STANDARD_INFORMATION attribute changed (as expected) and that the parent file reference number changed.

From the USN change journal:

IE10_NR_Setup.log: Security_Change  FileRef: 16420/15  ParentRef: 44311/7
IE10_NR_Setup.log: Rename_New_Name,Close  FileRef: 16420/15  ParentRef: 44311/7
IE10_NR_Setup.log: Rename_New_Name  FileRef: 16420/15  ParentRef: 44311/7
IE10_NR_Setup.log: Security_Change,Close  FileRef: 16420/15  ParentRef: 44311/7
IE10_NR_Setup.log: Rename_Old_Name  FileRef: 16420/15  ParentRef: 1966/1
CONSENT.EXE-531BD9EA.pf: Data_Extend,Data_Truncation,Close  FileRef: 1582/11  ParentRef: 59062/1
CONSENT.EXE-531BD9EA.pf: Data_Truncation  FileRef: 1582/11  ParentRef: 59062/1
CONSENT.EXE-531BD9EA.pf: Data_Extend,Data_Truncation  FileRef: 1582/11  ParentRef: 59062/1

Again, we see a reference to consent.exe having been launched.  I'm not entirely sure why the "Security_change" reason code in the USN change journal was generated for a move operation.

Both tests 4 and 5 validate what's described in MS KB article 299648, keeping in mind that the article only discusses time stamps from the $STANDARD_INFORMATION attribute.

Summary
Again, I ran these tests as a means for determining what different file operations look like in the MFT and USN change journal, and what the effects are on individual records.  This information can be helpful in a variety of investigation types, such as malware detection, and finding indications of historical activity and data (i.e., files that are no longer on the system).

Future Efforts
For the future, I'll need to look at copy and move file operations performed at the command line, using the copy and move commands, respectively.

Thursday, July 10, 2014

Random Stuff

Host-Based Digital Analysis
There are a lot of folks with different skill sets and specialties involved in targeted threat analysis and threat intel collection and dissemination.  There are a lot of researchers with specific skill sets in network traffic analysis, malware reverse engineering, etc.

One of the benefits I find in host-based analysis is that the disk is one of the least volatile of the data sources.  Ever been asked to answer the "what data left our organization" definitively?  Most often, the answer to that question is, if you didn't conduct full packet capture when the data was leaving, at the time that it was leaving, then you really have no way of knowing definitively.  Information in memory persists longer than what's on the wire, but if you're not there to collect memory within a reasonable time frame, you're likely going to miss the artifacts you're interested in, just the same.  While the contents on disk won't tell you definitively what left that system, artifacts on disk persist far longer than those available via other sources.

With malware RE or dynamic analysis, you're getting a very limited view of what could have happened on the infected host, rather than looking at what did happen.  A malware RE analyst with only a sample to to work with will be able to tell you what that sample was capable of, but won't be able to tell you what actually happened on the infected host.  They can tell you that the malware included the capability to perform screen captures and keystroke logging, but they can't tell you if either or both of those capabilities were actually used.

One of the aspects of targeted threat incidents is the longevity of these groups.  During one investigation I worked on a number of years ago, our team found that the original compromise had occurred via a phishing email opened by three specific employees, 21 months prior to our being called for assistance.  More recently, I've found evidence of the creation of and access to web shells going back a year prior to the activity that caught our attention in the first place.  Many of those who respond to these types of incidents will tell you that it is not at all unusual to find that the intruders had compromised the infrastructure several months (sometimes even a year or more) before the activity that got someone's attention (C2 comms, etc.) was generated, and it's often host-based analysis that will demonstrate that.

Also, what happens when these groups no longer use malware?  If malware isn't being used, then what will be monitored on the network and looked for in logs?  That's when host-based analysis becomes really important.  While quite a few analysts know how to use application prefetch/*.pf files in their analysis, what happens when the intruder accesses a server?  There is a great deal of information available within a system image that can provide insight into what the intruder was doing, what they were interested in, etc., if you know where to go to get it.  For example, I've seen intruders use Windows Explorer to access FTP sites, and the only place that artifacts of this activity appear are in the user shellbags.

Used appropriately, host-based analysis can assist in scoping an incident, as well as be extremely valuable for collecting detailed information about an intruder's activities, even going back several months.

Now, some folks think that host-based analysis takes far too long to get answers and is not suitable for use in high-tempo environments.  When used appropriately, this aspect of analysis can provide some extremely valuable insights. Like the other aspects of analysis (memory, network), host-based analysis can provide findings unique to that aspect that are not available via the others.  Full disk acquisition is not always required; nor is completely indexing the image, or running keyword searches across the entire image.  When done correctly, answers to critical questions can be retrieved from limited data sources, allowing the response team to take appropriate action based on those findings.

RTFM
I recently received a copy of RTFM that I'd purchased, and I have to say, I really like the layout of this book.  It is definitely a "field manual", something that can be taken on-site and used to look up common command line options for widely-used tools (particularly when there is no, or limited, external access), and something that an analyst can write their own notes and reminders in.  For example, the book includes some common WMIC and PowerShell commands to use to quickly collect information from a compromised system.  In a lot of ways, it reads like one of the O'Reilly Publishing "...in a Nutshell" books...just the raw facts, assuming a certain level of competency in the reader, and no fluff.

As anyone who has read my books knows, I have a number checklists that I use (included in the book materials), and it occurred to me that they'd make a great field manual when pulled together in a similar format.  For example, I have a cheatsheet that I use for timeline creation...rather than printing it out over and over, I could put something like this into a field manual that I could then reference when I need to without having to have an Internet connection, or look up on my system.

I think that having a field manual that includes commonly used command line options is a great idea.  Also, sometimes it's hard to remember all of the different artifacts that can fall into different categories, such as 'program execution', or things to look for if you're interested in determining lateral movement within an infrastructure.  Many times, it's hard for me to remember the different artifacts on different versions of Windows that fall into these categories, and having a field manual would be very useful.  There are a number of useful tidbits in my blog that I cannot access if I don't have Internet access, and I can't remember everything (which is kind of why I write stuff into my blog).  Having a reference guide would be extremely beneficial, I think...and I already have a couple of great sources for this sort of information - my case notes, my blog, etc.

Actually, I think that a lot of us have a whole bunch of little tidbits that we don't write down and share, and don't occur to us during the heat of the moment (because analysis can often be a very high-energy event...), but would be extremely valuable if they were shared somehow.

I'm not one of those people with an eidetic memory who can remember file and Registry paths, particularly on different versions of Windows, unless I'm using them on a regular basis.  The same is true for things like tools available for different artifacts...different tools provide different information, and are useful in different circumstances.

Telling A Story
Chris Pogue recently published a post on the blog of his new employer, Nuix, describing how an investigator needs to be a story teller.  Chris points out some very important points that many of us who work in this field likely see over and over, in particular the three points he lists right at about the middle of the post.  Chris's article is worth a read.  And congratulations to Chris on his new opportunity...I'm sure he'll do great.

Something to keep in mind, as well, is that when developing our story, when translating what you've done...log analysis, pcap capture and analysis, host-based analysis...into something that the C-suite executives can digest, we must all be very mindful to do so based on the facts that we've observed.  That is to say, we must be sure to not fill in gaps in the story with assumption, or embellishment.  As "experts" (in the client's eyes), we were asked to provide answers...so when telling our story, expecting the client to just "get it", or giving them a reference to go look up or research really isn't telling our story.  It's being lazy.  Our job is to take a myriad of highly technical facts and findings, and weaving them into a story that allows the C-suite executives to make critical business decisions, in a timely manner.  That means we need to be correct, accurate, and timely.  To paraphrase what I learned in the military, many times a good answer now is better than the best answer delivered too late.  We need to keep in mind that while we're looking at logs, network traffic, the output of Volatility plugins, or parsing host-based data, there's a C-suite executive who has to report to a compliance or regulatory board, to whom bits and bytes, flags and Registry values mean absolutely nothing.

All of this also means that we need to be open to exposure and criticism.  What Chris says in his article is admittedly easier said than done.  How often do we get feedback from clients?

Mentoring
So how do we get better at telling our story, particularly when each response engagement is as different from the others we've done as snowflakes? This leads us right into a thread over on Twitter where mentoring was part of the topic.  Our community is in dire need of mentoring.  Mentoring is a great way to go about improving what we do, because many times we're so busy and engaged in response and analysis that we don't have the time to step back and see the forest for the trees, as it were.  Sometimes it takes an outside influence to get us to see the need to change, or to show us a better way.  However, I do not get the impression that many of the folks in our community are open to mentoring, and that impression has very little to do with distance.

First, mentoring should/needs to be an active, give-and-take relationship, and my experience in the community (as an analyst, writer, presenter, etc.) at large has been that there is a great deal of passivity.  We rarely see thoughtful reviews of things such as books, presentations, and conferences in this community.  People don't want to share their thoughts, nor have their name associated with such things, and as such, we're missing a great opportunity for overall improvement and advancement in this industry, without this give-and-take.

Second, mentoring opens the mentee to exposing what they're currently doing.  Very few in this community appear to want or seek out that kind of exposure, even if it's limited to just the mentor.  Years ago, I was part of a team and our manager instructed everyone to upload the reports that they'd sent to clients to a file share.  After several months, I accessed the file share to upload my most recent report, and found that the folder for that quarter was empty, even though I knew that other analysts had been working really hard and billing a great deal of hours.  Our manager conducted an audit and found that only a very few of us were following his instructions.  While there was never any explanation that I was aware of for other analysts not uploading their reports, my thought remains that they did not want to expose what they were doing.  As Chris mentioned in his article, he's been tasked with reviewing reports provided to clients by other firms.  When we were on the IBM ISS ERS team together, I can remember him reviewing two such reports.  I've been similarly tasked during my time in this field, and I've seen a wide range of what's been sent to clients.  I've taken those experiences and tried to incorporate them into how I write my reports; I covered a great deal of this in chapter 9 of Windows Forensic Analysis 4/e.

Monday, June 30, 2014

RegRipper

Just a reminder to everyone out there that the OFFICIAL download link for the most current version of RegRipper is available from the link found here, or here (i.e., at the [RegRipper download]" link).

Some folks have reached to me recently and said, "I have the most recent download...", and that's apparently not been the case.  I left the Google Code page for RegRipper populated in part because there is some information that I put in the Wiki pages that I still want to be able to access.

Just a note...if you think that the download link is broken, be sure to check to see if your infrastructure allows access to Dropbox.

If you want to see what's going to be new with RegRipper, be sure to vote for the presentation at OSDFCon.

Thursday, May 22, 2014

Book Writing: To Self-Publish, or Not

The CEIC Conference is going on as I write this, and Suzanne Widup's author panel went on yesterday.  I'm not at the conference, so like many others, I live vicariously through what gets Tweeted about the conference, as well as about specific portions of the conference, such as the panel.

I saw a question posted to Twitter, in which the tweeter asked, "for the panel, why not self-publish like RTFM?"

My initial thought was, you need to consider the members of the panel and the books they've written or co-authored; those titles really don't lend themselves too well to a format similar to RTFM, which, in some cases, is described as a collection of notes and tips bound into a book.  For example, I don't think I could see Hacking Exposed Computer Forensics in a format similar to RTFM.  As such, the question is essentially an apples-to-oranges comparison.  While self-publishing definitely has it's place, but may not always the best option for the material, nor for the author.  But that doesn't mean that there aren't publishing possibilities out there that would be very well suited to a format similar to RTFM.

I've addressed this topic before, but it is a good question, and certainly bears addressing.  Essentially, the choice of whether to go with a publisher or to self-publish comes down to how much time and effort you do want to invest in getting the book published...at least, that's my perspective.  Other perspectives might be about what the author gets out of it, or how much someone has to pay for the book.  Writing a book is tough enough as it is...having someone there to do some of those things that need to be done (i.e., formatting, illustrations, copy editing, printing, etc.) in order for the book to be available to others means that the author can focus time and effort on writing the book, and not have to stop and figure out how get something done, or find a resource.

A friend of mine told me that her husband publishes CDs for his band on his own, rather than going through a production firm.  That means that he does everything himself; in some cases this makes perfect sense.  In others, such as writing a DFIR book, maybe not so much.  Most of us don't like to write as it is; if you self-publish, would you have someone review your materials for grammar, consistency, and technical accuracy?  If so, would it be someone you pay for that service?  Where's that money going to come from?  How will you handle illustrations or figures?

As such, consider this... if self-publishing were the sole route available, we'd likely have far fewer books available in the DFIR field.  Or, maybe another way to put it is that if self-publishing really were that easy, we'd have more books.  In the years that I've been involved in writing books, I've seen a fairly good number of folks start down the road and not make it very far, for a variety of reasons.  In some cases, it's due to the realization that there's much more to writing a book than simply having an idea.  When the publisher comes back and gives you a bunch of forms to fill out, and requests a market analysis and a detailed outline, with a swag on a word count, the reality of the situation becomes readily apparent.  I've seen people stop there.  I've also seen one instance where the author got past the point of signing a contract, and the publisher came back later and modified the contract, almost doubling the word count for the final manuscript, but made no other changes to contract, including the delivery date. The author simply walked away.

I've read a number of the reviews for RTFM, and to be honest, the book sounds like a fantastic idea; it was apparently originally intended to be an accumulation of someone's notes to be passed on to their team.  In the right hands, something like that can be extremely useful, and I can relate; when I was in grad school in '95-ish, I taught myself Java programming and relied heavily on O'Reilly's "...in a Nutshell" books for tips and guidance.  I found it very useful, because I wasn't looking for the basics of programming, and the basics of Java programming to be explained to me...I just wanted the bare bones stuff, with no fluff.  Material that might be better suited to an RTFM-like format might be something like what's found here.

Self-publishing simply isn't for everyone...the audience for a book like this is pretty limited.  I can see books like this for using other tools, but I think that one of the strengths of RTFM is that there's the base assumption that anyone purchasing the book is familiar with both operating at the command line, and with Linux.  While there are certain segments of the DFIR community that would strongly suggest that that's exactly how it should be, the fact is that this is far from reality.

Resources
Self-publishing a book: 25 things you need to know - I strongly suggest that you read them all
Lulu.com - self-publishing company
How to self-publish - a guide, with pictures

Saturday, May 17, 2014

Artifacts

I received a request right before WFA 4/e hit the streets...after the writing and editing was complete and while the printed book was being shipped...to "talk about anti-forensics".  Unfortunately, at this point, I still haven't heard any more than just that, but I've had more than a couple of instances where knowledge of artifacts and Windows structures has allowed me to gather valuable data for analysis, even when the bad guy took steps, however unknowingly, to remove other artifacts.  I say "unknowingly" because sometimes the steps taken may not specifically be intended to be "anti-forensic" in nature, but may still have that effect.

Something that I've found over the years is that even when steps are taken to remove indications of activity, there may still be artifacts available that can prove valuable to an analyst.  While the analyst may not be able to answer THE question that they have, there may be data available that will still provide insight into the case and allow other questions to be answered.  For example, if the intruder accessed a system via RDP and removed or obscured some valuable data source (i.e., cleared the Windows Event Log, etc.) and the question you have is, "where did they access this system from?", you might not be able to answer that question.  However, using other data, you would be able to show when they were active on the system, what they were doing at the time, and even demonstrate access to other systems.

To quote Blade: "When you understand the nature of a thing, you know what it's capable of."  I know, I know...but I really wanted to work that quote into this post.  ;-)  What I'll do now is take a look at some of the things I and others have seen, and provide some thoughts as to other data sources that would be of value.

FTP Via Windows Explorer
I've seen the native ftp.exe client used on systems in a variety of cases, and not only to exfiltrate data.  Back when I was doing PCI forensic analysis, we saw a good deal of SQL injection activity, some of which would use echo to create an FTP script on a system, and then launch that script using the -s switch with ftp.exe.  The use of ftp.exe to infiltrate or exfil data can leave artifacts in the Registry, and for workstation systems, there will be an application prefetch file created.  On XP systems, the last accessed time on the file will be updated, and there will very likely be a value created in the user's MUICache key for ftp.exe.

However, you can use Windows Explorer to connect to an FTP server.  My publisher used to have me do this in order to transfer chapters, and I've seen this used a number of times on various cases.  The interesting thing about this is that while it involves interaction via the GUI shell, it leaves far fewer artifacts than using the command line utility.  In fact, having looked at several cases where this technique is used, the only place that I've found artifacts of this activity is in the user's shellbag artifacts.  I've discussed these artifacts before, so I won't go into a great deal of detail here.  Suffice to say, shellbags can be a great resource, demonstrating access to network resources such as shares (even C$ shares), MTP devices (digital cameras and smartphones), FTP sites, etc., providing artifacts of activity that you might not find anywhere else on the system.

Clearing Windows Event Logs
Ever access a system and find out that the records in the Security and System Event Logs only go back a day or two?  One of the things I talk about in my books and presentations is that while it's easy to assume that the default configuration of the Event Logs caused them to roll over, it's also pretty trivial to check and see if there was some other reason for this, such as a user clearing the Event Logs.  If this happens, you'll likely see a record in the Security Event Log indicating that this happened, so look for the appropriate event ID (517 or 1102).  I've seen intruders do this, and I've seen admins who are responding to and troubleshooting an "incident" do this, as well.  Many times when the Event Log is cleared, you'll see a user accessing the Event Viewer (usually visible via the UserAssist data) just prior to that time.

When the Event Log is cleared, that doesn't mean that all data goes away.  You can try to recover Windows Event Log records using Willi Ballenthin's EVTXtract, or depending on what you're trying to illustrate, you can look to other data.  For example, I've had instances when the Windows Event Logs have been cleared, but I've been able to demonstrate a user's windows of activity over time using other sources of data, such as the Registry, VSCs, etc.

The Power of Mini-, Micro-, and Nano-Timelines
Daniel Garcia recently added a review of WFA 4/e to the Amazon page for the book (thanks again, Daniel, for taking the time do to that, I greatly appreciate it); in that review he mentioned mini-timelines.  Interestingly enough, I use this technique all the time.  Many times, I'll grab some information and start putting together a timeline from a small subset of data sources, in order to get an idea of what's going on, and then once I have that info, I'll kick off a heftier process and let that run while I'm analyzing what I have.  Or, as is often the case, the results of analyzing the mini-timeline will provide me with the direction for my next steps.  This allows me to see things that I might have missed had I included voluminous amounts of file system metadata, Windows Event Logs, etc., and goes back to the technique of using overlays that I mentioned over two years ago.  This technique has provided useful in a number of cases.  For example, if someone is in a data center acquiring data, I can send them a batch script (similar to auto_rip) that runs various tools (RegRipper, etc.) and have them ship me the output of the tools. This allows me to start analysis while the bulk of the data is in transit, and when it shows up, I'm ready to start my focused analysis.  Or, they can acquire the data and once it's been verified, send me subsets of the data (Registry hive files, Windows Event Logs, etc.) in a secure archive, allowing me to begin analysis on a few KB of data while the full archive (several hundred GB of data) is enroute.

Not long ago, I collected the NTUSER.DAT, USRCLASS.DAT and index.dat files from three user profiles within an image.  These profiles were thought to be active during the time of the incident, so I parsed the Registry hives with RegRipper, and the index.dat files with a custom tool, and created a micro-timeline that showed me not just times of activity, but patterns of activity that I would have missed had I included all of the data (file system, WEVTX, Registry hives, other user profiles, etc.) available within the image.  The results of this analysis allowed me to then focus my analysis on the more inclusive timeline and develop a much clearer picture of the activity that was the focus of my interest.

Browser Analysis
When we hear 'browser analysis', most of us think about data sources such as index.dat files or SQLite databases, and tools like IEF.  But there are other potentially valuable data sources available to us, such as cookie files, bookmarks/favorites, and session recovery files.

If the user is using IE and you're interested in their activity during a specific point in time, you may have options available to you to get the information you're looking for.  For example, the TypedURLs key (and TypedURLsTime key, if they're using Windows 8) may prove fruitful, particularly when used in conjunction with VSCs.  If IE crashed (for whatever reason) while the user was browsing the web, you'll have the Travelog files available, and these can provide much more insight into what the user was doing than an index.dat record would.

The IE session restore files are structured storage/OLE format, and Yogesh has an EnScript available for parsing them.  I've used strings to get the data that I want, and MiTeC's Structured Storage Viewer to view the contents of individual streams within the file.  Python has a good module for parsing OLE files (I really haven't found anything that works as well in Perl, and have written some of my own stuff), and it shouldn't take too much effort to put a parser together for these files.  What's really fascinating about these files is that within a timeline, you may see where the user launched IE (UserAssist data), accessed a particular site (TypedURLs key, index.dat data), but at that point, you really can't tell too much about what they did, or what sort of interaction they had with the page, or pages, that they visited.  If you're lucky and there's a session saved in a Travelog file, then you can see what they were doing at the time of the crash.  I've seen commands sent to database servers via default stored procedures.  So, these files can be a rich source of data.

For other browsers, here's information on session restore functionality:
Chrome User Data Directory  (here's a tip for restoring the last session from the command line)
Firefox - Mozilla Session Restore

Summary
My point in all this is that while in most cases we really want to see all of the data, there are times when we either don't need everything, or as is often the case, everything simply isn't available and we have to make the best use of what we have.  For example, if I simply want to see when a user was active on a system, over time, I wouldn't need everything from the system, and I wouldn't need everything from the user profile.  All I'd need to get started are the two Registry hives, browser history files, and maybe the Jump Lists.  The total size of this data is much less than the full image, and it's even smaller if I can get someone on site to run the tools and just send me the data.