Monday, August 20, 2012

SetRegTime

I like good stuff, interesting stuff.  I particularly like stuff that gets me to thinking, and gets me to thinking specifically about validating my analysis process.

I ran across SetRegTime today, from Joakim Schicht.  Basically, Joakim started from the perspective that, from his view and his reading on the Internet, modification of Registry key LastWrite times to arbitrary values was "not possible".  So, he set out to turn this line of thinking around, achieved it, and released a proof-of-concept tool to demonstrate the capability.

In my courses, I have specifically stated that I was not aware of any open, public APIs (such as the GetFileTime()/SetFileTime() functions) that allow for arbitrary modification of Registry key LastWrite times.  Now, thanks to Joakim, we all are. 

I greatly applaud and appreciate Joakim's efforts in producing and releasing SetRegTime, as it:

1.  Identifies the public API and increased the possibility (albeit not the likelihood) of this occurring.
2.  Illustrates the need for an overall analysis process.
3.  Illustrates the need for a greater understanding of the Registry as an investigative resource.
4.  Illustrates more than ever the need for timeline analysis.

So, the big question for most analysts will likely be...okay, so what does this do to my examinations?  I'm sure that the thought will be that it throws an additional level of uncertainty into the exams, but I would suggest that if you have an analysis process, then this won't be the case at all.  With an analysis process, you will likely find indications of this sort of activity occurring, particularly if you are using timeline analysis. 

In addition, when performing malware analysis, you would want to look for the use of the APIs that Joakim mentions (i.e., NtCreateKey, NtOpenKey, NtSetInformationKey, NtFlushKey), as well as the use of the Windows internal names for the Registry.  Behavior analysis of the malware will likely illustrate this activity, as well.

So, if you're "poking around" in the Registry and find something interesting, and rely on that one artifact or finding as the foundation for your case, you're likely going to be building a house of cards.  However, if you have an overall analysis process that incorporates multiple data sources and multiple artifacts to support your conclusions, then you're likely going to pick up on the use of this sort of software, and be able to address it accordingly.

Wednesday, August 15, 2012

ShellBag Analysis

What are "shellbags"?
To get an understanding of what "shellbags" are, I'd suggest that you start by reading Chad Tilbury's excellent SANS Forensic blog post on the topic.  I'm not going to try to steal Chad's thunder...he does a great job of explaining what these artifacts are, so there's really no sense in rehashing everything.

Discussion of this topic goes back well before Chad's post, with this DFRWS 2009 paper.  Before that, John McCash talked about ShellBag Registry Forensics on the SANS Forensics blog.  Even Microsoft mentions the keys in question in KB 813711

Without going into a lot of detail, a user's shell window preferences are maintained in the Registry, and the hive and keys being used to record these preferences will depend upon the version of the Windows operating system.  Microsoft wants the user to have a great experience while using the operating system and applications, right?  If a user opens up a window on the Desktop and repositions and resizes that window, how annoying would it be to shut the system down, and have to come back the next day and have to do it all over again?  Because this information is recorded in the Registry, it is available to analysts who can parse and interpret the data. As such, "ShellBags" is sort of a colloquial term used to refer to a specific area of Registry analysis.

Tools such as Registry Decoder, TZWorks sbag, and RegRipper are capable of decoding and presenting the information available in the ShellBags.

How can ShellBags help an investigation?
I think that one of the biggest issues with ShellBags analysis is that, much like other lines of analysis that involve the Windows Registry, they're poorly understood, and as such, underutilized.  Artifacts like the ShellBags can be very beneficial to an examiner, depending upon the type of examination they're conducting.  Much like the analysis of other Windows artifacts, ShellBags can demonstrate a user's access to resources, often well after that resource is no longer available.  ShellBag analysis can demonstrate access to folders, files, external storage devices, and network resources.  Under the appropriate conditions, the user's access to these resources will be recorded and persist well after the accessed resource has been deleted, or is no longer accessible via the system.

If an organization has an acceptable use policy, ShellBags data may demonstrate violations of that policy, by illustrating access to file paths with questionable names, such as what may be available via a thumb drive or DVD.  Or, it may be a violation of acceptable use policies to access another employee's computer without their consent, such as:

Desktop\My Network Places\user-PC\\\user-PC\Users

...or to access other systems, such as:

Desktop\My Network Places\192.168.23.6\\\192.168.23.6\c$

Further, because of how .zip files are handled by default on Windows systems, ShellBag analysis can illustrate that a user not only had a zipped archive on their system, but that they opened it and viewed subfolders within the archive.

This is what it looks like when I accessed Zip file subfolders on my system:

Desktop\Users\AppData\Local\Temp\RR.zip\DVD\RegRipper\DVD

Access to devices will also be recorded in these Registry keys, including access to specific resources on those devices.

For example, from the ShellBags data available on my own system I was able to see where I'd accessed an Android system:


Desktop\My Computer\F:\Android\data


...as well as a digital camera...

Desktop\My Computer\Canon EOS DIGITAL REBEL XTi\CF\DCIM\334CANON

...and an iPod.

Desktop\My Computer\Harlan s iPod\Internal Storage\DCIM

Another aspect of ShellBags analysis that can be valuable to an examination is by the analyst developing an understanding the actual data structures, referred to as "shell item ID lists", used within the ShellBag.  It turns out that these data structures are not only used in other values within the Registry, but they're also used in other artifacts, such as Windows shortcut/LNK files, as well as within Jump List files.  Understanding and being able to recognize and parse these structures lets an analyst get the most out of the available data.

Locating Possible Data Exfil/Infil Paths via ShellBags
As information regarding access to removable storage devices and network resources can be recorded in the ShellBags, this data may be used to demonstrate infiltration/infection or data exfiltration paths.

For example, one means of getting data off of a system is via FTP.  Many Windows users aren't aware that Windows has a command line FTP client, although some are; in my experience, it's more often intruders who are conversant in the use of the command line client.  One way that analysts look for the use of the FTP client (i.e., ftp.exe) is via Prefetch files, as well as via the MUICache Registry key.

However, another way to access FTP on a Windows system is via the Windows Explorer shell itself.  I've worked with a couple of organizations that used FTP for large file transfers and had us use this process rather than use the command line client.   A couple of sites provide simple instructions regarding how to use FTP via Windows Explorer:

MS FTP FAQ
HostGator: Using FTP via Windows Explorer

Here's what a URI entry looks like when parsed (obfuscated, of course):

Desktop\Explorer\ftp://www.site.com

One of the data types recorded within the ShellBags keys is a "URI", and this data structure includes an embedded time stamp, as well as the protocol (ie, ftp, http, etc.) used in the communications.  The embedded time stamp appears (via timeline analysis) to correlate with when the attempt was made to connect to the FTP site.  If the connection is successful, you will likely find a corresponding entry for the site in the NTUSER.DAT hive, in the path:

HKCU/Software/Microsoft/FTP/Accounts/www.site.com

Much like access via the keyboard, remote access to the system that provides shell-based control, such as via RDP, will often facilitate the use of other graphical tools, including the use of the Windows Explorer shell for off-system communications.  ShellBag analysis may lead to some very interesting findings, not only with respect to what a user may have done, but also other resources an intruder may have accessed.

Summary
Like other Windows artifacts, ShellBags persist well after the accessed resources are no longer available.  Knowing how to parse and interpret this Registry data

Parsing ShellBags data can provide you with indications of access to external resources, potentially providing indications of one avenue of off-system communications.  If the concern is data infiltration ("how did that get here?"), you may find indications of access to an external resource, followed by indications of access to Zip file subfolders.  ShellBags can be used to demonstrate access to resources where no other indications are available (because they weren't recorded some where else, or because they were intentionally deleted), or they can be used in conjunction with other resources to build a stronger case.  Incorporation of ShellBag data into timelines for analysis can also provide some very valuable insight that might not otherwise be available to the analyst.

Resources
ForensicsWiki Shell Item page

Saturday, August 11, 2012

RegRipper Updates

In an effort to consolidate some of the information regarding RegRipper in one consistent location, I started the RegRipper Google Code site.  My hope is that this will provide a much more stable means for folks to find information regarding RegRipper.

Now, my setting up this site does not take anything at all away from the RegRipper Plugin Google Code site that Brett Shavers set up...in fact, these two sites will support each other.  The RegRipper plugin site will continue to be the location to get the plugin distributions, as this is something that folks are likely to do much more often, once they have the RegRipper suite of tools.  However, the point is to have just these two sites so that there is no confusion as to where you can go to get the latest and greatest information about RegRipper, or distribution of RegRipper plugins.

The RegRipper site will be one consolidated location where you can go to get information regarding RegRipper.  For example, I've started adding pages to the wiki, such as this one describing the plugin architecture.  The benefit of this is that the pages are right there...you don't have to search the blog, or via Google, to find what you're looking for.  Also, this allows the information to be updated, but you won't have to search through my blog or some other resource for the information...it will be in one location.

Speaking of plugins, Hal Pomeranz sent me two plugins recently...I took a look at them and forwarded them on to Brett, and you should be seeing them shortly on the plugin site.  Hats off to Hal for recognizing that something he saw could be put into a plugin, and for putting forth the time and effort to write the plugins, and share them with the rest of the community.

Also, I've uploaded two plugins to the RegRipper site.  One is appcompatcache.pl, which basically was developed from the information available in this Mandiant blog post.  Thanks to the great work, and the code that they provided, I was able to put together a RegRipper plugin to parse this same information.

Note: If you're going to use this plugin, and not use the EXE versions of the RegRipper tools (i.e., you're using the Perl scripts), be sure to update your copy of the Parse::Win32Registry module.  The easiest way to do this, if you're using ActiveState Perl, is to use PPM:

C:\perl>ppm update parse-win32registry

The other plugin I uploaded is shellbags.pl, which is in an archive with a readme file...puh-lease read that file if you have any questions at all about the plugin.  This plugin parses the Shell\BagMRU shell item information from the USRCLASS.DAT hives from Vista, Win2008R2, and Win7 systems.  I had only a limited amount of testing data available (thank you to all of you who sent me samples) when developing this plugin, and most of it was from Win7 systems.  I happened to find a couple of hives from Vista systems and one from a Win2008R2 system, so testing was extremely limited.  As such, it might have some hiccups on various data types, particularly ones I have not yet seen in testing.  If you run into any issues, contact me at the email address listed in the header of the plugin.

The output from this plugin is similar to what you see when you use TZWorks' sbag prototype utility.  One notable exception is that when printing the Registry key LastWrite times in the far left-hand column (I call it "MRU time"), I apply that time only to the first value identified in the MRUListEx value immediately beneath that key.  Many thanks to Dave and Jonathan for providing their excellent tool.

In developing the shellbags.pl plugin, I had access to a great deal of useful information to get me started.  I've begun linking to most of that information on the ShellBags page in the RegRipper wiki, but I do want to thank Willi, Andrew, Joachim, and Kevin for their assistance in this endeavor.  Without the foundation that they'd provided, developing this plugin would have been much more difficult.  However, the effort put into developing the shellbags.pl plugin doesn't end here...the data structures used in the values beneath these keys are also used in other place throughout Windows systems, and not just in the Registry.  However, this does mean that I'll be able to finally finish the comdlg32.pl plugin updates.

In developing both of these plugins, I found some fascinating information.  After all, I was working at the binary level, dumping bits of the data to a hex editor-style format, and looking for patterns.  For example, when looking at some of the output of the shellbags.pl plugin during development stages, I found that the data within URI types contains an embedded time stamp...including this in a timeline, along side information from the NTUSER.DAT hive from the same user profile proved to be extremely illuminating.

Okay, so why would you care about viewing/analyzing this "shell bags" information?  As mentioned here (albeit four years ago...), this information can point you toward a user's access to external storage devices, including external drives, shares, and network resources.  I've seen that it can also include devices such as iPods and digital cameras.  Fifth Sentinel talks about these artifacts here, and Alissa Torres has talked a great deal about these artifacts at SANS events.  If your case involves determining a user's access to resources, this information can be extremely valuable, not only in and of itself, but also when included in or acting as a pivot or reference point during timeline analysis.  Combine this information with other data, including documents/files that the user accessed, removable devices connected to the system, Jump Lists, etc., and you can build a pretty interesting picture of user activity.  Another thing I really like about this artifact, as well as what's provided via the appcompatcache.pl plugin is that the data persists well after the original resource is no longer available, and some of the data structures include embedded time stamps.

If you have any questions about the two plugins I uploaded, or about RegRipper in general, please feel free to contact me (keydet89 at yahoo dot com).

Tuesday, July 31, 2012

Links and Updates

Blogosphere
Cory Harrell has another valuable post up, this one walking through an actual root cause analysis.  I'm not going to steal Corey's thunder on this...the post or the underlying motive...but suffice to say that performing a root cause analysis is critical, and it's something that can be done in an efficient manner.  There's more to come on this topic, but this sort of analysis needs to be done, it can be done effectively and efficiently, and if you don't do it, it will end up being much more expensive for you in the long run.

Jimmy Weg started blogging a bit ago, and in very short order has already posted several very valuable articles.  Jimmy's posts so far a very tutorial in nature, and provide a great deal of information regarding the analysis of Volume Shadow Copies.  He has a number of very informative posts available, to include a very recent post on using X-Ways to cull evidence from VSCs.

Mari has started blogging, and her inaugural post sets the bar pretty high.  I mentioned in this blog post that blogging is a great way to get started with respect to sharing DFIR information, and that even an initial blog post can lead to further research.  Mari's post included enough information to begin writing another parser for the Bing search bar artifacts, if you're looking to write one that presents the data in a manner that's usable in your reporting format.

David Nides recently posted to his blog regarding what he discussed in his SANS Forensic Summit presentation.  David's post focuses exclusively on log2timeline as the sole means for creating timelines, as well as some of what he sees as the short-comings with respect to analyzing the output of Kristinn's framework.  However, that's not what struck me about David's post...rather, what caught my attention was statements such as, for the average DFIR professional who is not familiar with CLI.

Now, don't think that I'm on the side of the fence that feels that every DFIR "professional" must be well-versed in CLI tools.  Not at all.  I don't even think that it should a requirement that DFIR folks be able to program.  However, I do see something...off...about a statement that includes the word "professional" along with "not familiar with CLI".

I've worked with several analysts throughout my time in the infosec field, and I've taught (and teach) a number of courses.  I have worked with analysts who have used *only* Linux, and done so at the command line...and I have engaged with paid professionals tasked with analysis work who are only able to use one commercial analysis framework.  So, while I am concerned by repeated statements in a post that seem to say, "...this doesn't work, because Homey don't play that...", I am also familiar with the reality of it.

Speaking of the SANS Forensic Summit, the Volatility blog has a new post up that is something of a different perspective on the event.  Sometimes it can be refreshing to get away from the distractions of the cartoons, and it's always a good idea to get different perspectives on events. 

Tools
The folks over at TZWorks have put together a number of new tools.  Their Jump List parser works for both the *.automaticDestinations-ms Jump Lists and the *.customDestinations-ms files, as well.  There's a Prefetch file parser, USB storage parser, and a number of other very useful utilities freely available.  The utilities are very useful, and are available for a variety of platforms (Win32- and 64-bit, Linux, Mac OS X).

If you're not familiar with the TZWorks.net site, take a look and bookmark it.  If you're downloading the tools for use, be sure to read the license agreement.  Remember, if you're reporting on your analysis properly, you're identifying the tools (and the versions) that you used, and relying on these tools for commercial work may come back and bite you.

Andrew posted to the ForensicsArtifacts.com site recently regarding MS Office Trust Records, which appear to be generated when a user trusts content via MS Office 2010.  Andrew, co-creator of Registry Decoder, pointed out that Mark Woan's RegExtract parses this information, and shortly after reading his post, I wrote a RegRipper plugin to extract the information, and then created another version of that plugin to extract the data in TLN format.  This information is very valuable, as it is an indicator of explicit user activity...when opening a document from an untrusted source, the user must click the "Enable Editing" button that appears in the application in order to proceed with editing it.  Clearly, this requires some additional testing to determine actions that cause this artifact to be populated, etc., but for now, it clearly demonstrates user access to resources (i.e., network drives, external drives, files, etc.).  In the limited testing that I've done so far, the time stamp associated with the data appears to be when the document was created on the system, not when the user clicked the "Enable Editing" button.  What I've done is downloaded a document (MS Word .docx) to my desktop via Chrome, recorded the date and time of the download, and then opened the file.  When the "Enable Editing" button is present in the warning ribbon at the top of the document, I will wait up to an hour (sometimes more) to click the button and record the time I did so.  Once I do, I generally close the document.  I then reboot the system and use FTK Imager to get a copy of the NTUSER.DAT hive, and run the plugin.  In every case so far, what I've seen is that the time stamp associated with the values in the key correlate to the creation time of the file, further evidenced by running "dir /tc".

Monday, July 30, 2012

Adding Value to Timelines

Timeline analysis is valuable to an analyst, in that a timeline of system events provides context, situational awareness, and an increased relative confidence in the data with which the analyst is engaged.

We can increase the value of a timeline by adding events to that timeline, but adding events for it's own sake isn't what we're particularly interested in.  Timeline analysis is a form of data reduction, and adding events to our timeline, for it's own sake, is moving away from that premise.  What we want to do is add events of value, and we can do that in a couple of ways.

Categorizing Events
Individual events within a timeline, in and of themselves, can have little meaning, particular if we're unfamiliar with those specific events.  We try to minimize the amount of information that's in an event, in order to get as many events as we can on our screen and within our field of vision, in order to get some context or situational awareness around that particular event.  As we see events over and over again, we develop something of an "expert" or "experience" recognition system in our minds...we recognize that some events, or groups of events, are most often associated with various system or user activities.  For example, we begin to recognize, through repetition and research, that one event (or a series of events) indicates that a USB device was connected to a system, or a program was installed, or that a user accessed a file with a particular program.  In our minds, we begin to group these events into categories.

Consider this...given the myriad of events listed in the Windows Event Log, particularly on Windows 7 and 2008 R2 systems, having the ability to map events to categories, based on event source and ID pairs, can be extremely valuable to an analyst.  An analyst can do the research regarding an event once, and then add the event source/ID pair, along with an event category to the event mapping file, along with a credible reference.  From that point on, the event mapping file gets used over and over again, automatically mapping event source/ID pairs to the category that the analyst identified.  If there's any question about the meaning or context of a particular event, the reference is right there and available in the event mapping file.

As an example of this event mapping, we may find through analysis and research that the event source WPD-ClassInstaller with the ID 24576 within the System Event Log refers to a successful driver installation, and as such, we might give this event a category ID of "[Driver Install]" for easy reference.  We might also then know to look for events with source UserPnp and IDs 20001 and 20003 in order to identify the USB device that was installed.  This event mapping also allows us to identify specific events of interest, events that we may want to focus on in our exams.

We can then codify this "expert system" (perhaps a better term is an "experience system") by adding category IDs to events.  One benefit of this quicker recognition; we're no longer relying on memory, but instead adding our experience to our timeline analysis process, thereby adding value to the end result.

Note: In the above paragraph, I am not referring to adding category information to an event after the timeline has been generated.  Instead, I am suggesting that category IDs be added to events, so that they "live" with the event.

Another benefit is that by sharing this "experience system" with others, we reduce their initial cost of entry into analyzing timelines, and increase the ability of the group to recognize patterns in the data. By adding the ability to recognize patterns to the group as a whole, we then provide a greater capability for processing the overall data.

Now, some events may fit into several categories at once.  For example, the simple fact that we have an *.automaticDestinations-ms Jump List on a Windows 7 or 2008 R2 system indicates an event category of "Program Execution"; after all, the file would not exist unless an application had been executed.  Depending upon which application was launched, other event categories may also apply to the various entries found within the DestList stream of the Jump List file.  For MS Word, the various entries refer to files that had been accessed; as such, each entry might fall within a "File Access" event category.  As Jump Lists are specific to a user, events extracted from the DestList stream or from the corresponding LNK streams within the Jump List file may also fall within a more general "User Activity" event category.

Incorporating Metadata
One of the things missing from the traditional approach to creating timelines is the incorporation of file metadata into the timeline itself.

Let's say that we run the TSK took fls.exe against an image in order to get the file system metadata for files in a particular volume.  Now we have what amounts to the time stamps from the $STANDARD_INFORMATION attribute (yes, we're assuming NTFS) within the MFT.  This is clearly useful, but depending upon our goals, we can potentially make this even more useful by accessing each of the files themselves and (a) determining what metadata may be available, and (b) providing the results of filtering that metadata.

Here's an example...let's say that you're analyzing a system thought to have been infected with malware of some kind, and you've already run an AV scan or two and not found anything conclusive.  What are some of the things that you could look for beyond simply running an AV scan (or two)?  If there are multiple items that you'd look for, what's the likelihood that you'll remember all of those items, for every single case?  How long does it take you to walk through your checklist by hand, assuming you have one?  Let's take just one potential step in that checklist...say, scanning user's temporary directories.  You open the image in FTK Imager, navigate in the tree view to the appropriate directory, and you see that the user has a lot of files in their temp directory, all with .tmp extensions.  So you start accessing each file via the FTK Imager hex view and you see that some of these files appear to be executable files.  Ah, interesting.  Wouldn't it be nice to have that information in your timeline, to have something that says, "hey, this file with the .tmp extension is really an executable file!"

Let's say you pick a couple of those files at random, export them, and after analysis using some of your favorite tools, determine that some of them are packed or obfuscated in some way.  Wouldn't it be really cool to have this kind of information in your timeline in some way, particularly within the context that you're using for your analysis? 

For an example of why examining the user's temporary folder might be important, take a look at Corey Harrell's latest Malware Root Cause Analysis blog post.

Benefits
Some benefits of adding these two practices to our timeline creation and analysis process is that we automate the collection and presentation of low-hanging fruit, increasing the efficiency at which we do so, and reduce the potential for mistakes (forgetting things, following the wrong path to various resources, etc.).  As such, root cause analysis becomes something that we no longer have to forego because "it takes too long".  We can achieve that "bare metal analysis".

Summary
When creating timelines, we want to look at adding value, not volume (particularly not for volume's sake).  Yes, there is something to be said regarding the value of seeing as much activity as possible that is related to an event, particularly when external sources of information regarding certain aspects of an event may fall short in their descriptions and technical details.  Having all of the possible information may allow you to find a unique artifact that will allow you to better monitor for future activity, to find indications of the incident across your enterprise, or to increase the value of the intelligence you share with other organizations.


Tuesday, July 17, 2012

Thoughts on RegRipper Support

One of the things I've considered recently is taking a more active role in supporting RegRipper, particularly when it comes to plugins.

When I first released RegRipper in 2009 or so, my hope was that it would be completely supported by community.  For a while there, we saw some very interesting things being done, such as RegRipper being added as a viewer to EnCase.  Over the past year or so, largely thanks to the wonderful and greatly appreciated support of folks like Brett Shavers and Corey Harrell, RegRipper has sort of taken off.

From the beginning, I think that the message about RegRipper has been largely garbled, confused, or simply misunderstood.  As such, I'd like to change that.

When someone has wanted a plugin created or modified, I've only ever asked for two things...a concise description of what you're looking for, and a sample hive.  Now, over the past 3 or so years, I've received requests for plugins, accompanied by either a refusal to provide a sample hive, or the sample hive simply being absent.  For those of you who have provided a sample hive, you know that I have treated that information in the strictest confidence and wiped the hive or hives after usage.  In addition, rather then being subjected to a barrage of emails to get more information about what you are looking for, those of you who have provided sample hives have also received the requested plugin in very short order, often as quickly as within the hour.

One of the things I've tried to do is be responsive to the community regarding needs.  For example, I provided a means for listing available plugins as part of the CLI component of RegRipper (i.e., rip.pl/.exe).  As this is CLI, some folks wanted a GUI method for doing the same thing, so I wrote the Plugin Browser.  Even so, to this day, I get questions about the available plugins; I was recently asked if two plugins were available, one that was originally written almost 3 years ago, and one what I'd written two months ago.  

I'm not trying to call anyone out, but what I would like to know is, what is a better means for getting information out there and in the hands of those folks using RegRipper?

Recently, some confusion in the RegRipper message became very apparent to me, when information that another Perl script that I had released was a RegRipper plugin was shared across the community.  It turned out that, in fact, that script had nothing whatsoever to do with either RegRipper or the Registry.

Speaking of plugins, there are a number of folks who've taken it upon themselves to write RegRipper plugins of their own, and share them with the public, and for that, I salute you.  Would it be useful to have a testing and review mechanism, or at least identify the state (testing, dev, beta, final) of plugins?

Finally, I've written a good number of plugins myself that I haven't yet provided to the general public.  I have provided many of those plugins to a couple of folks within the community who I know would (and have) use them, and provide feedback.  In some cases, I haven't released the plugins because of the amount of confusion there seems to be with regards to what a plugin is and how it's used by RegRipper; i.e., as it's currently written, you can't just drop a plugin in the RegRipper plugins directory and have it run by RegRipper (or via rip.pl/.exe).  Some effort is required on the part of the analyst to include plugins in a profile in order to have it run by RegRipper.

As such, I've considered becoming more active in getting the message about RegRipper out to the DFIR community as a whole, and I'd like to know, from the folks who use RegRipper, how would we/I do a better job with RegRipper, as well as in supporting it?

Publishing DFIR Materials, pt II

After posting on this topic previously and getting several comments, along with comments via other venues, I wanted to follow up with some further thoughts on opportunities for publishing DFIR materials.

When it comes to publishing DFIR materials, there are a number of ways to publish or provide information to the community, from tweets and commenting on blog posts, all the way up to writing books.  As it turns out, this may be a good way to define the spectrum for publishing DFIR materials...starting with blog posts, and progressing through a number of media formats to book publishing.

Based on some previous thought and comments, I wanted to share some of the different publishing mechanisms within that spectrum, as well as what might be pros and cons of each of them.

Blogging
Blogging is a great way to make DFIR information (both technical and non-technical) available to the community.  It can be quick, with some bloggers posting within minutes of an event or of finding information.

One of the best examples of DFIR blogging that I've seen, in which the content is consistently excellent, is Corey Harrell's Journey Into IR blog.  Corey's posts are consistently well-written, insightful, and chock full of great technical information.  Corey has taken the opportunity a number of times post not only his research set-up, but also provide a comprehensive write-up regarding the tools and techniques he used, as well as his findings.

Pros
Blogging is a great way to get information out on a particular topic, particularly if your goal is to get the initial information out to show the results of a tool or initial research.  This is a great way to see if anyone else is interested in what you're working on, either investigating or developing, and to see if there's interest in taking the research further.

This is also a great way to quickly put out information regarding findings, particularly if it requires more than 140 characters, but isn't too voluminous or extensive.

Cons
There can be a number of 'cons' associated with this publishing mechanism...one of which is the simple fact that it can be very difficult to keep up with all of the possible blogs that are out there.  Also, with blogs, it can be difficult to see the progression of information as (or 'if') it is updated.

Call me a spelling n@zi if you like, but another issue that I see with blogs is that pretty much anyone can create one, and if the author has little interest in such things as grammar and spelling, it can be difficult to read, if not find, such blogs.  If you're searching for something via Google as part of your research, and someone posted a great blog post but opted to not spell certain terms properly, or opted to not use the generally accepted terminology (i.e., Registry key vs. value), you might have some difficulty finding that post.

If you're interested in purely technical DFIR information, blogs may or may not be the best resource.  Some authors do not feel the need to research their information or provide things such as references, and some may not provide solely technical information via their blog, using it also as a personal diary or political platform.  There's nothing wrong with this, mind you...it's just that it may be difficult to find something if it's mixed in with a lot of other stuff.

Some blogs and blog posts provide nothing more than a list of links, with no insight or commentary from the author.  While this method of blogging can provide the information to a wider audience than would normally view the original blog post, it really doesn't do much to further the community as a whole.  If someone posts about finding and using a tool, and feel as if they want to post a blog of their own, why not provide some insight into how you found the tool to be useful, or not, if that's the case?  What if it's not a tool, but information...wouldn't it be useful to others within the community if you

Wiki
I like wikis, as they can provide a valuable means for maintaining updated, accurate information, particularly on very technical subjects.  Most of the formats I've seen include the ability to add references and links to supporting information, which add credibility to the information being provided.  Blogs provide this as well, but a wiki allows you to edit the information, providing the latest and most up-to-date information in one location.

Pros
Wikis can be extremely beneficial resources, in that they can provide a single, updated repository of information on DFIR topics.

Perhaps the best use of a wiki is as an internal resource, one in which members of your team are the only ones who can access it and update it.

Cons
One of the primary cons I would associate with the use of Wikis is that a lot of folks don't seem to use them. One of the wikis I frequent is the ForensicsWiki; while I find this to be a valuable resource and have even posted information there myself, my experience within the public lists and forums is that most folks don't seem to consider going to sites such as this, or using them as a resource.  I know that schools and publishers, including my own, frown upon the use of wikis as references, but if the information is accurate (which you've determined through research and testing), what's really the issue?

PDFs
After I got involved in writing books, I started to see the value of providing up-to-date documents on specific analysis topics.  Rather than writing a book, take a single analysis technique (say, file extension analysis), or a series of steps to perform a specific type of analysis (i.e., determining CD burning by a user, etc.), write it up into a 6-10 page PDF document and release it.

To see an example of this publishing mechanism, go to my Google Code book repository, download the "RR.zip" archive, and look in the DVD\stuff subdirectory.  You'll find a number of PDF documents that I'd written up and provided as "bonus" material with the book.  Since releasing this information, I haven't heard from anyone how useful it is, or if it's completely worthless.  ;-)

Another excellent example of this sort of publishing is a newsletter such as the Into The Boxes e-zine.  It's unfortunate that the community support wasn't there to keep Don's efforts going.  Another excellent example of using this mechanism to publish DFIR information is the DFIR poster that the SANS folks made available recently.

Pros
This mechanism can be extremely valuable to analysts in a number of areas.  While I was on the IBM team, we wanted to have a way to provide analysts with information that they could download and take with them when they were headed out on a response engagement.  This was "just-in-time" familiarization and/or training that could get an analyst up to speed on a particular topic quickly, and could also be used as an on-site reference.  Our thinking was that if we had someone who had to go on-site in order to acquire a MacBook, or a boot-from-SAN device, or try to conduct a live acquisition of a system that has only USB 1.0 connections, we could provide extremely useful reference information so that the analyst could act with confidence, which is paramount when you're in front of a customer.  Many times, while we had other analysts who were just a phone call away, we would find ourselves either in a data center with no cell phone signal, or standing directly in front of a customer.

I talked to several LE analysts about this type of JIT training, and received some enthusiastic responses at the time.  Having 6-10 page PDFs that can be printed out and included in a binder, with updated PDFs replacing older information, was seen as very valuable.  I know that some folks have also expressed a desire to have something easily searchable.

Cons
This publishing mechanism depends on the expertise of the individual author, and their willingness to not only provide the information, but keep it up to date.  If this is something that someone just decides to do, then you have similar issues as with blogging...spelling, grammar, completeness and accuracy of information.  One way around this is to have a group available, either through volunteers or a publisher, that provides for reviews of submitted material, checking for clarity, consistency, and accuracy.

IOCs/Plugins
IOCs, or "indicators of compromise", should be included as a publishing mechanism, as it is intended for sharing information and/or intelligence, albeit following a specific structure or specification.  Perhaps the most notable effort along these lines is OpenIOC.org, which uses a schema developed by the folks at Mandiant.  The OpenIOC framework is intended to provide an extendable, customizable structure for sharing sophisticated indicators and intelligence in order to promote advanced threat detection.

I would also include plugins in this category, particularly (although not specifically) those associated with RegRipper.  I know that other tools have taken up an approach similar to RegRipper's plugins, and this is a good place to include them, even if they don't follow as structured a format as IOCs.

Pros
IOCs and plugins can be a great publishing mechanism, providing for the retention of corporate knowledge, as well as being a force multiplier.  Let's say someone finds something after 8 or 12 hrs of analysis, something that they hadn't seen before...then they write up an IOC or plugin, and share it with their 10 other team members.  With a few minutes of time, they've just saved their team at least 10 x 12 hrs, or 120 hrs of work, where each team member (assuming equal skill level across the team) would have had to spend 12 hrs of their own time to find that same indicator.  Now, each team member has 100% of the knowledge and capability for locating the indicator, while having to spend 0 time in attaining that knowledge.

IOCs and plugins put tools and capabilities in the hands of the analysts who need them, and using the appropriate mechanism for querying for the indicators provides for those indicators to be searched for every time, in an automated manner.

Cons
One 'con' I have seen so far with respect to IOCs is that there is either a limitation within the schema, or a self-imposed (by the IOC author) limitation of some kind.  What I mean by this is, I've seen several malware-specific IOCs released online recently, and in some cases, there is no persistence mechanism listed within the IOC.  I contacted the author, and was told that while the particular malware sample used the ubiquitous Run key within the Windows Registry for persistence, the value name used could be defined by the malware author and in essence, completely random.  As such, the author found no easy means for codifying this information via the schema, and felt that the best thing to do was to simply leave it out.  To me, this seems like a self-imposed blind spot and a gap in potential intelligence.  I'm not familiar enough with the OpenIOC schema to know if it provides a means for identifying this sort of information or not, but I do think that by not providing something, that there is a significant blind spot.

Another 'con' associated with IOCs that I have heard others mention, particularly at the recent SANS Forensic Summit, is that no one is going to give away their "secret sauce", thereby giving up their competitive advantage.  I would go so far as to say that this applies to other publishing mechanisms, as well, and is not a 'con' that is specific solely to IOCs.  Like most, I am fully aware that while some sites (i.e., blogs, etc.) may provide DFIR information, not all of it is necessarily cutting edge.  In fact, there are a number of sites where, when DFIR information does appear, it is understood to be 6 months or more old, for that particular provider, even though others may not have seen it before.

As with IOCs, RegRipper plugins can be difficult for folks to write, or write correctly, on their own.  This can be particularly true if the potential author is new to either programming or to the response and analysis techniques that generally go hand-in-hand with, or precede, the ability to write IOCs and plugins.

Short Form
I recently had a discussion with a member of the Syngress publishing staff regarding a "new" publishing format that the publishing company is pursuing; specifically, rather than having authors write an entire book, have them instead write a "module", which is not so much a part or portion of a book, but more of a standalone publishing mechanism.  The idea with this "short form" of publishing is that DFIR information will be available to the community quicker, as the short form is easier for the author to write, and for the publisher to review and publish.

A very good example of short form publishing is the IACIS Quick Reference from Lock and Code, which is an excellent reference, and available in both a free and a for-fee form.

In a lot of ways, this is very similar to the PDF publishing mechanism I mentioned earlier, albeit this mechanism can reach up to over 100 pages; while it's longer than a PDF, it is still shorter than a complete book.

Pros
Benefits of this publishing mechanism are that the information is more complete than a blog post or PDF, is reviewed by someone for technical accuracy, as well as spelling and grammar, is formatted, and is available quicker than a full book.

Another benefit of this mechanism is that folks can pick the modules that they're interested in, rather than purchasing a full book of 8 or more chapters, when they're only interested in about half of the content.  Hopefully, this will also mean that folks who are interested in several modules and want a hard-copy version of the material can choose the modules that they want and have them printed to a soft-bound edition.

Cons
Even the short form publishing mechanism can take time to make it "to market" and be available to the community.  For example, in my experience, it can take quite a while for someone to write something that is 100 pages long, even if they are experienced writers.  Let's say that the author is focused, has some good guidance and motivation, and gets something through the authoring, review, and revision process in 90 days. How long will it take to then have that information available to the public?  At this point, everything is dependent upon the publisher's schedule...who is available to review the module and get it into a printer-ready format?  What about contracts with printers?  Will an electronic version of the module be ready sooner than the hard-copy version?

Books
Books are great resources for DFIR information, whether the author is going through a publishing company, or following a self-publishing route.

Pros
One of the biggest 'pros' of publishing books containing DFIR material is that a publishing company has a structure already set up for publishing books, which includes having the book technically reviewed by someone known within the field, as well as reviewed for consistency, grammar, spelling, etc., prior to being sent to the printer.

Writing a book can be an arduous undertaking, and keeping track of everything that goes into it...paragraph formats, side bars, code listings, figures, etc...can be a daunting task.  Working with a publisher means that you have a signed contract and schedule to meet, which can act as the "hot poker motivation" that is often needed to get an author to sit down and start writing.  As chapters are written, they're sent off to someone to perform a technical review, which can be very beneficial because the author may loose sight of the forest for the trees, and having someone who's not so much "in the weeds" review the material is a great way to keep you on track.  Finally, having someone review the finished product for grammar and spelling, and catching all of those little places where you put in the wrong word or left one out can be very helpful.  Overall, this structure adds credibility to the finished product.

Cons
Publishing a book can take some time.  My first book literally took me a year to write, and from there 3 - 3 1/2 months to go from an MS Word manuscript to the PDF proofs to a published book available for purchase.  Due to the amount of time and effort it takes, some authors who start down the road of writing a book and even get to the point of having a signed contract never even get to the point where they have a published book.  As I've progressed along in writing books, I've been able to reduce the total amount of time between the signed contract and the publication date, but the fact is that it can still take a year or more.

Another aspect of the book form is that different publishers may support different ebook formats.  When my first book was published with Syngress, there was a PDF version of the book available, and for a while after the soft-bound book was available, those purchasing the book via Syngress would also receive a PDF copy of the book, as well.  However, shortly thereafter, Syngress was purchased by Elsevier, a publishing company that does not support the PDF ebook format.

One of the benefits for some folks, believe it or not, of working with a publisher is that they have a schedule and the 'hot poker motivation' to get the work done.  As such, one of the detriments of self-publishing is that without the necessary internal stimulus to keep the author to a schedule, the finished product may never materialize.

Overall Pros
Publishing DFIR information can potentially make us all stronger examiners.  No single analyst knows everything that there is to know about DFIR analysis, but working together and sharing information and intel, we can all be much stronger analysts.  It's the sharing of not only information and intelligence, but digesting that information, providing insights on it, and engaging in discussions of it that makes us all stronger analysts.

Some information changes quickly, while other information remains pretty consistent over a considerable length of time.  Choosing the appropriate publishing mechanism can make the appropriate information available in a timely manner; for example, a blog post can raise awareness of an issue or indicator, which can lead to more research and the creation of a tool, an IOC, or a plugin.

Overall Cons
All publishing mechanisms rely on the interest and desire of the author(s) to provide information, to research it, and to keep it up to date.  Sometimes due to work, life, or simply lack of interest, information isn't kept up to date.  However, the 'pro' that can come out of this, the 'silver lining' if you will, is that perhaps all that is needed is to provide that initial information on a topic, and someone else may pick it up.

Another significant 'con' associated with publishing DFIR material to the community in general is a lack of support and feedback from the community.  Well, to be honest, this is a 'con' only if it's something that you're looking for; I happen to be of a mind that no one knows everything, and no one of us is as smart as all of us. As such, I honestly believe that the way to improve overall is to provide insightful commentary and feedback to someone who has provided something to the community, be it a tool or utility, or something published using any of the above means.  If someone provides DFIR information, I try to take the time to let them, or the community as a whole, know what was useful about it, from my perspective.  Feedback from the community is what leads to improvement in and expansion of the material itself.  Not everyone has the same perspective on the cases that they work, nor of the information that they look at on any particular case.  You may have two examiners with the same system, but as they're from different organizations, the goals of their exam, as well as their individual perspectives, will be different.

Goals
Finally, something needs to be said about the goals of publishing DFIR information; often, this is highly personal, in that an author's goals or reasons for publishing may be something that they do not want to discuss...and there's nothing wrong with that.

Usually, if someone is publishing DFIR information, it's because they wanted, first and foremost, to make the information available and contribute something to the community at large.  However, there can be other goals to publishing that motivate someone and direct them toward a particular publishing mechanism.  For example, writing blog posts that are of a consistently high quality (with respect to both the information and the presentation) will lead to that author becoming recognized as an expert in their field.  One follow-on to this that is often mentioned is that by being recognized as something of an expert, that author will be consulted for advice, or contracted with to perform work...I personally haven't encountered this, per se, but it is something that is mentioned by others.

Another goal for publishing information and choosing the appropriate mechanism is that the author(s) may want to be compensated for their time and work, and who can really blame them?  I mean, really...is it such a bad thing to, after sacrificing evenings and weekends to produce something that others find of value, want to take your wife to dinner?  How about to raise money to contribute to a charity?  Or to pay for the tools that you purchased in order to conduct your research, or tools you'll need to further that research?

Friday, June 29, 2012

SANS DFIR Summit Follow-up

First off, I want to thank Rob Lee for asking me to provide a keynote presentation to the 2012 SANS Forensic Summit (presentation slides are available here).  It was truly an honor, and once again, I was blessed to be in the presence of so many great speakers, and some of the brightest minds in the community.  Also, I have to give a heartfelt thanks to the wonderful SANS staff who made the entire conference possible.  Without your work and dedication, the summit wouldn't be the incredible resource that it is.

I attended a number of presentations while at the summit and I thought I'd share my thoughts and views about each of them, as well as the summit as a whole.  Hopefully, others will do the same.

Det. Cindy Murphy's keynote was well thought-out and very well received.  Cindy is a well-known figure within the DFIR community, and her presentation really addressed a lot of the aspects of sharing within the community that many of us have been talking about for sometime.  One of the strengths of the community that Cindy mentioned was that we all have different perspectives, and we can use that fact to build up the community as a whole.  However, one of the weaknesses of the community is that we don't share those perspectives.

At the same time, I also think that Chris Pogue hit the nail on the head when he made his comment about sharing IOCs...no individual or organization is going to share their 'secret sauce'.  Within the community, there are a number of businesses, and the nature of a business is to make money.  If you're giving away your competitive advantage, you're not making money.  This is just one of the obstacles to sharing within the DFIR community, and hopefully by engaging more, we can discuss ways to overcome some of these obstacles.

Alissa Torres had an excellent point regarding not staying "in your lane" with respect to what you do.  I completely agree with her sentiment, and anyone who attended her presentation could clearly see how learning pen testing techniques from her co-workers has benefited her.


Nick Harbour of CrowdStrike gave an interesting talk on anti-forensics.  It was interesting to see some of the techniques that could be used discussed, and I spent most of the presentation thinking to myself, "how would we detect that?"  For instance, one of the techniques Nick mentioned for communicating off of a system was to launch IE as a COM object, and send data out in that manner.  This is nothing new...I remember the Setiri presentation at BH 2002 discussing a similar approach.  But the fact is, it can still work to hide activity from a particular area of analysis. 

I enjoyed seeing Chris Pogue back in action again with his Sniper Forensics 3: The Hunt presentation.  You can find previous iterations of Chris's presentations here and here.

Elizabeth Schweinsberg took an interesting approach in her presentation on Registry analysis - she crawled an AV website to collect data on reported Registry modifications made by various malware, and presented that data as a means for targeting your response and investigations.  This was very interesting, and discussing it with her afterward, I think it would be a great idea to do that with other AV vendors, as well.  I agree that this is not the idea data set...after all, AV companies receive samples of malware out of context and over the years some of the Registry artifacts associated with malware have been self-inflicted (that is, a result of how the AV analyst launches the malware).  But, it's the best data we have available.  One of the interesting statistics that Elizabeth came up with was the continued wide-spread use of the Run key as a persistence mechanism.  Even after returning from the conference, I still see malware that uses this key for persistence.

In her presentation, Elizabeth also provided something of a showdown between GRR, RegRipper, and Registry Decoder, using various criteria.  In some ways, I think it was good to show the differences in the tools, but in others, I didn't follow the reasoning for holding up tools against a criteria for which they were neither designed nor written.  After all, to say that tool X wasn't scalable, when it wasn't necessarily written to be scalable, isn't necessarily giving the tool a fair representation.  RegRipper was designed from the beginning to provide the ability for the community to write plugins, and one of the criteria that it was measured against was the ability to extract data from the RunOnce key.  Elizabeth was correct in that the plugin did not exist when she was testing the tool, but is that really a "con" (as opposed to a "pro") statement about the tool?  Also, I found myself thinking on my flight home that if Elizabeth had contacted me during her testing, I could have provided that plugin or anything else she needed.  After I got home, I reached to her and found out that she had, in fact, written her own module but not included that in the presentation.  I look forward to future conferences where both Elizabeth and other members of the Google team will be presenting.

The last presentation of the conference was from Carbon Black's CEO, Mike Viscuso.   In his presentation, Mike demonstrated the value of Cb without ever describing Cb in detail, and for the nature of the conference, I think it was very important that he not cross that line into a vendor presentation.  Instead, Mike clearly illustrated the need for the concept behind Cb, which was to redefine the data set that we, as incident responders, want access to when responding.  As a result, there were several excellent questions that came up, mostly from folks who had (understandably) never heard of nor looked at Cb.  The first comment during the Q&A session included asking if, by identifying a new 'data set' that DFIR folks would like to have available would put our current IR folks out of work, and to be honest, nothing could be further from the truth.  Cb, and tools like it, are changing the face of incident response, but not in a way that puts current IR staff out of work...rather, it requires a change in the business model that is currently in use.  The emergency model of IR is not sustainable; this is true for the consulting company that provides the service, as well as their customers.  Moving to a proactive, "security camera" approach does not remove the need for highly skilled responders, it simply changes the business model to one that is more advantageous, more sustainable, and much easier to manage from a budget perspective than the current model.  And this is applies to both sides of the equation, both the consultants and their customers.

A final thought about the presentations, and the summit in general...this is a great opportunity for folks who attend the conferences to really network and engage, particularly with the authors and presenters.  The SANS Summit is a small conference (when compared to others) and provides a fantastic opportunity, not just for networking in general, but also for attendees to engage in a direct and meaningful manner with the speakers.  It doesn't matter whether you've got questions or just really liked what you heard, you can walk up to the presenter and say something.  After all, if you're 20 feet from the presenter, why send an email or Tweet saying that you liked the presentation...why not just walk up to the presenter, introduce yourself, and tell them directly?  The size of the summit really facilitates that kind of close, direct interaction.

Thursday, June 28, 2012

Publishing DFIR Materials


At the recent SANS DFIR Summit, Corey Harrell, Christopher Witter and I had a chance to chat with someone from Syngress Publishing, who proposed a new business model for DFIR materials to us, and I wanted to get a feel for how others felt about it.

Right now, for a new author, it can take a long time to get material out into a book format.  My first book took about a year to write, and then 3 1/2 months to get through printing.  Some authors don't get beyond the initial couple of chapters before walking away from the project.  Writing a book can be a daunting, and often overwhelming project, and even if it is finished, it can take a year or more before any of the information appears in the public.

The new model takes a different approach.  Instead of full books, authors will write "modules", 30 - 120 page packages of what might be part of a book, but stand alone in and of themselves.  If you've see WFAT 3/e, you'll see that there are several of chapters in the book that could be provided in this manner, perhaps with some additional work.  These modules would be provided much quicker, going through the same review process but being shorter, would be available in a much quicker time frame.  Initially, they would be available in electronic format, (hopefully) at a reduced price.  This way, if you were waiting for WFAT 3/e to come out because you were interested in chapters 3 and 5, you wouldn't have to wait a full year or more for the materials.  Instead, you would have access to them in a much quicker time frame, and then as other modules came available, you would be able to combine the modules into print material.

This model reduces the time in which material is available, reduces the cost-of-entry for the material, and takes a great deal of burden off of the author, as well.  Rather than being engaged in a project that is a year long, the author might be engaged for only 2 months at a time.  Technical reviews would be much quicker, as would the overall final review before going to "printing".  This model also allows for updates...if you purchase a module, there will be an update model available for you to get the latest and greatest version of the module.

From a topics perspective, look at it this way...take one chapter from WFAT 3/e, perhaps expand it a bit with some applicable screen captures or other applicable material, and consider that a module.

Given all of this, I wanted to get some feel from the community at large as to (a) how you feel about this approach, (b) what topics you might like to see covered, and (c) who might be interested in providing this material.  Feel free to comment here, or email me at keydet89 at yahoo dot com.

Saturday, June 23, 2012

When was a file accessed?

One of the aspects of Windows analysis that I discuss in the courses we're offering is that the version of Windows you're analyzing is significant.  For example, as of Windows Vista, updating of file system last accessed times, as a result of normal user behavior, is disabled by default.  However, even though we can't look to file accesses times as an indication of when a user accessed the files, there are a number of artifacts on Windows systems, in particular Windows 7, which will tell us not only that a user accessed a file (based on the context of those artifacts), but also when.  As such, we  can add category IDs or tags (i.e., "[File Access]") to those events (something that I've discussed previously) in order to make them much easier to identify in timelines, as well as in other reporting formats.

I'll take a moment a discuss a few of the artifact sources we can use on Windows 7 systems that provide indications of file access...

LNK Files
One of the ways that LNK files are created on a system is that a user will double-click a file which is located somewhere on that system, on removable media, or even on a network share.  When this happens, a shortcut file that points to the target will be created in the user's Recent folder.  The operating system will select the appropriate application (based on the extension of the target file) with which to open the file.

As such, under "normal" circumstances, the creation date of the LNK file would correspond to when the target file was first accessed, and the last modification date of the LNK file would correspond to when the target file was most recently accessed. [Ref: Harry Parsonage's excellent "The Meaning of LIFE" white paper.]

Jump Lists
On Windows 7 systems, we now have new Task Bar artifacts called Jump Lists available for analysis.  The AutomaticDestinations Jump Lists are produced by activities very similar to those associated with LNK files, with the added advantage that the Jump Lists are associated with an application (based on the AppID), as well as with a user.

Let's say that the user accesses a Word .docx file by double-clicking it.  When this happens, an LNK file is created, and a Jump List associated with the version of MS Word installed on the system is created, if it doesn't already exist.  These Jump Lists are based on the MS Compound Document format, and an entry that contains an LNK stream is created within the Jump List file, and a structure is added to the DestList stream within the Jump List.  When the file is accessed and the DestList stream structure is added, the time of the activity is included within that structure.  This time can be used to illustrate the most recent time the user accessed that file.

As the LNK streams that point to the target file are not files themselves, they do not have MACB file system times specifically associated with each of them.  They do contain the MA.B times of the target file, embedded within the stream, as they follow the binary format specification described by MS. 

MRU Lists
There are a number of Registry keys (specifically within the user's NTUSER.DAT hive file) that maintain references to files that the user has accessed.  Some, such as the RecentDocs key, maintain simply names of files, while others, such as the Paint subkey beneath the user's Applets key (see the RegRipper applets.pl plugin), provide the full path to the file.  Many of these keys also contain Most Recently Used entries, indicating that the key's LastWrite time may reflect when the appropriately listed file was most recently accessed. 

Document Metadata
There are a number of file formats that allow for metadata to be stored within the file itself.  MS Office has long been known for providing a good deal of (potentially embarrassing) metadata.  While more recent formats of MS Office documents don't contain as much metadata as previous versions, we may still be able to use this information to provide indications of file access.

VSCs
Let's not forget that previous versions of each of the artifacts we've discussed so far may be located within available Volume Shadow Copies; as such, we may want to take a targeted (perhaps even laser-focused) approach to parsing previous versions of each of these artifacts for comparative, historical data.

Summary
As you can see, even though the updating of last access times for files is disabled by default on Windows systems as of Vista, this doesn't mean that we can't determine when a user accessed particular files.

Wednesday, June 20, 2012

Training, and Learning

I finished up leading a Timeline Analysis Course on Tuesday afternoon, ending two days of some pretty intensive training.  One of the things I find when I'm putting presentations or courses together, and then actually giving the presentation, is that I very often end up learning a good deal along the way, and this time around was no different.  As has happened in the past, what I learn leads me to revisit and possibly even modify tools or analysis techniques, and again, this time was no different.

One of my biggest takeaways from the training is that I need to reconsider how at least some of the available time stamped data is presented in a timeline.  One of those items is how Prefetch file metadata is represented and displayed; I've since updated the parsing tool to address this particular item.  Another, and the one I'm going to discuss in this blog post, is how Windows shortcut (LNK file) information might be displayed in a timeline, or more specifically, what information about LNK files might possibly need to be presented in a timeline.

Category IDs
So, as a bit of background, I've been thinking quite a lot lately about how to better take advantage of timeline data.  As I was putting the timeline course together, it occurred to me that I was going to be spending a good deal of time describing to attendees how to create a timeline, and even walking them through this process with demonstrations and hands-on exercises, but spending very little time discussing how to actually analyze the timeline data.  The simplest answer is, it depends.  It depends on your exam goals, why you're performing the exam, and why you created a timeline in the first place.  It occurred to me that I was making an assumption that most analysts would have a good solid justification for creating a timeline as part of their analysis process.  If that were the case every time, I wouldn't have folks signing up for a timeline analysis course, would I?  I'm not saying that analysts don't have a justification for creating a timeline, but sometimes that justification be, "...that's what we always do...", or "...that's what I did last time."

Timeline analysis is something of a data reduction technique...we go from a 500GB hard drive or image, to somewhere around a GB or so of data, and we then arrange it based on a time value in the hopes of obtaining some context and increasing our relative confidence in the data that we're looking at; that's the goal, anyway.  But by grabbing just the data directly associated with a time value, we end up performing a great deal of data reduction.  Even so, we still need a means for directing our analysis, or getting the cream to rise to the top of the container.

Something I'd discussed in a previous blog post was the concept of categories for events.  Rob Lee has done some considerable work in this area already, providing a color-coded Excel macro that implements the category ID scheme he's identified via resources such as the SANS DFIR poster.  Regardless of the method used to identify event types or categories, the idea is to develop some method to assist the examiner in her analysis of the timeline.  After all, if you have something of an idea of what you're looking for, then finding it might be a bit easier if you classify various events by type or category, and then have some means to identify the events accordingly (via color, a tag or identifier, etc.).

Shortcut/LNK Files
Speaking of categories, perhaps one of the most difficult artifacts to classify into a single category is Windows shortcut/LNK files.  Without getting into a long discussion about this, let's take a look at an example of what's available in an LNK file found in a user's Recent folder:

atime                         Tue May 15 21:11:59 2012                         
basepath                    C:\Users\                                        
birth_obj_id_node       08:00:27:dd:64:d1                                
birth_obj_id_seq         9270                                             
birth_obj_id_time        Tue May 15 21:09:27 2012                         
birth_vol_id                 2C645C57...13C2834AAD2                 
commonpathsuffix       john\Downloads\Autoruns.zip                      
ctime                           Tue May 15 21:11:59 2012                         
filesize                        535772                                           
machineID                  john-pc                                          
mtime                         Tue May 15 21:11:59 2012                         
netname                     \\JOHN-PC\Users                                  
new_obj_id_node        08:00:27:dd:64:d1                                
new_obj_id_seq          9270                                             
new_obj_id_time        Tue May 15 21:09:27 2012                         
new_vol_id                 2C645C57...13C2834AAD2                 
relativepath                ..\..\..\..\..\Downloads\Autoruns.zip            
vol_sn                        F405-DAC1                                        
vol_type                     Fixed Disk          

As you can see, we have a number of data elements available to us once we've decoded the binary contents of the LNK file, any of which (or any combination of which) may be relevant or significant to our analysis.  For example, as the LNK file was found in the user's Recent folder, we can assume that the existence of the file indicates some form of user activity; that is, the user must have done something, must have performed a specific action (such as double-clicking the file) that caused that LNK file to be created.

Next, we have the path to where the target file was located within the file system, as well as the MA.B times of the target file at the time that the shortcut was created. That might be significant to your analysis, as it demonstrates both knowledge of and access to a file, and will persist even after the target file is no longer available.

Had the LNK file been located on the user's Desktop and pointed to an EXE target file, this might illustrate specific actions taken by the user, such as installing an application.  This might also indicate program execution, rather than file access.

If the target file in the shortcut is a document or image, this might also illustrate program execution.  Launching the shortcut would cause the Windows system to reach into the Registry in order to determine which application with which files with the target file's extension are associated.  For example, let's say a shortcut "points to" a .avi video file.  On some systems, launching a shortcut that points to a video file might cause Windows Media Player to be launched automatically; on other systems, it might be another application all together.  Either way, the existence of an LNK file might also illustrate program execution or application launch.

Finally, we see that the last item visible in our example is called "vol_type", which refers to the type of volume where the target file was at the time of the activity.  In this case, the C:\ volume is a "Fixed Disk"; if it wasn't, would that be significant?  For example, if the volume type were "removable media" or a "network share", would that be significant to your exam?  In some cases, it could very well be, and we might look to that information for indications of access to or use of removable storage devices, or of network shares.

Perhaps the idea here isn't to classify LNK files into a single category, but instead filter the various data items found within LNK files based on a set of rules, and produce timeline events based on the output of each of the rules, where appropriate.  This might mean that for a single LNK file, we might end up with multiple events in our timeline.

Resources
ForensicsWiki LNK page

Thursday, June 14, 2012

Timeline Analysis, and Program Execution

I mentioned previously that I've been preparing for an upcoming Timeline Analysis course offered through my employer.  As part of that preparation, I've been using the tools to walk through the course materials, and in particular one of the hands-on exercises that we will be doing in the course.

One of the things I'd mentioned in my previous post is that Rob Lee has done a great deal of work for SANS, particularly in providing an Excel macro to add color-coding of different events to log2timeline output files.  I've had a number of conversations and exchanges with Corey Harrell and others (but mostly Corey) regarding event categorization, and the value of adding these categories to a timeline in order to facilitate analysis.  This can be particularly useful when working with Windows Event Log data, as there a good number of events recorded by default, and all of that information can be confusing if you don't have a quick visual reference. 

As I was running through the exercises, I noticed something very interesting in the timeline with respect to the use of the Autoruns tool from SysInternals; specifically, that there were a good number of artifacts associated with both the download and use of the tool.  I wanted to extract just those artifacts directly associated with Autoruns from the timeline events file, in order to demonstrate how a timeline can illustrate indications of program execution.  To do so, I ran the following command:

type events.txt | find "autoruns" /i > autoruns_events.txt

...and then to get my timeline...

parse -f autoruns_events.txt > autoruns_tln.txt

...and got the following:

Tue May 29 12:56:02 2012 Z
  FILE                       - ..C. [195166] C:/Windows/Prefetch/AUTORUNS.EXE-1CF578DD.pf
  FILE                       - ..C. [44056] C:/Windows/Prefetch/AUTORUNSC.EXE-C5802224.pf

Tue May 15 21:14:55 2012 Z
  REG      johns-pc         john - M... HKCU/Software/Sysinternals/AutoRuns
  REG      johns-pc         john - [Program Execution] Software\SysInternals\AutoRuns (EulaAccepted)

Tue May 15 21:14:07 2012 Z
  FILE                       - MA.B [195166] C:/Windows/Prefetch/AUTORUNS.EXE-1CF578DD.pf

Tue May 15 21:13:57 2012 Z
  PREF     johns-PC          - [Program Execution] AUTORUNS.EXE-1CF578DD.pf last run (1)
  REG      johns-pc         john - [Program Execution] UserAssist - C:\tools\autoruns.exe (1)

Tue May 15 21:13:53 2012 Z
  FILE                       - M.C. [640632] C:/tools/autoruns.exe
  FILE                       - M.C. [26] C:/tools/autoruns.exe:Zone.Identifier
  REG      johns-pc     - M... [Program Execution] AppCompatCache - C:\tools\autoruns.exe

Tue May 15 21:13:42 2012 Z
  FILE                       - MAC. [877] C:/Users/john/AppData/Roaming/Microsoft/Windows/Recent/Autoruns.lnk
  JumpList johns-pc         john - C:\Users\john\Downloads\Autoruns.zip

Tue May 15 21:13:32 2012 Z
  FILE                       - MA.B [44056] C:/Windows/Prefetch/AUTORUNSC.EXE-C5802224.pf

Tue May 15 21:13:28 2012 Z
  PREF     johns-PC          - [Program Execution] AUTORUNSC.EXE-C5802224.pf last run (1)
  REG      johns-pc         john - [Program Execution] UserAssist - C:\tools\autorunsc.exe (1)

Tue May 15 21:13:23 2012 Z
  FILE                       - M.C. [49648] C:/tools/autoruns.chm
  FILE                       - M.C. [26] C:/tools/autoruns.chm:Zone.Identifier
  FILE                       - M.C. [559736] C:/tools/autorunsc.exe
  FILE                       - M.C. [26] C:/tools/autorunsc.exe:Zone.Identifier
  REG      johns-pc     - M... [Program Execution] AppCompatCache - C:\tools\autorunsc.exe

Tue May 15 21:12:10 2012 Z
  FILE                       - ...B [877] C:/Users/john/AppData/Roaming/Microsoft/Windows/Recent/Autoruns.lnk
  FILE                       - ..C. [535772] C:/Users/john/Downloads/Autoruns.zip
  FILE                       - ..C. [26] C:/Users/john/Downloads/Autoruns.zip:Zone.Identifier

Tue May 15 21:11:59 2012 Z
  FILE                       - MA.B [535772] C:/Users/john/Downloads/Autoruns.zip
  FILE                       - MA.B [26] C:/Users/john/Downloads/Autoruns.zip:Zone.Identifier

Wed May  9 15:08:16 2012 Z
  FILE                       - .A.B [640632] C:/tools/autoruns.exe
  FILE                       - .A.B [26] C:/tools/autoruns.exe:Zone.Identifier
  FILE                       - .A.B [559736] C:/tools/autorunsc.exe
  FILE                       - .A.B [26] C:/tools/autorunsc.exe:Zone.Identifier

Sat Nov  5 17:52:32 2011 Z
  FILE                       - .A.B [49648] C:/tools/autoruns.chm
  FILE                       - .A.B [26] C:/tools/autoruns.chm:Zone.Identifier

What I find most interesting about this timeline excerpt is that it illustrates a good deal of interaction with respect to the download and launch of the tool within it's eco-system, clearly demonstrating Locard's Exchange Principle.  Now, there are also a number of things that you don't see...for example, this timeline is comprised solely of those lines that included the word "autoruns" (irrespective of case) somewhere in the line; as such, we won't see things such as the query to the "Image File Execution Options" key, to determine if there's been a debugger assigned to the tool, nor do you see ancillary events or those that might be encoded.  However, what we do see will clearly allow us to "zoom in" on a specific time window within the overall timeline, and see what other events may be listed there.

The timeline is clearly very illustrative.  We can see the download of the tool (in this case, via Chrome to a Windows 7 platform), and the assignment of the ":Zone.Identifier" ADSs, something that with XP SP2 was done only via IE and Outlook.  Beyond the file system metadata, we start to see even more context, simply by adding additional data sources such as the Registry AppCompatCache value data, UserAssist value data, information derived from the SysInternals key in the user's Registry hive, Jump Lists, etc.  In this case, the Jump List info in the timeline was extracted from the DestList stream found in the Jump List for the Windows Explorer shell, as zipped archives will often be treated as if they were folders.

Another valuable aspect of this sort of timeline data is that it is very useful in the face of the use of counter-forensics techniques, even those that may be unintentional (i.e., performed by an administrator, not to hide data, but to "clean up" the system).  Let's say that this tool had been run, and then deleted; remove all of the "FILE" entries that point to C:/tools from the above timeline, and what do you have left?  You have those artifacts that persist beyond the deletion of files and programs, and provide clear indicators that the tools had been used.  We can apply this same sort of analysis to other situations where tools had been run (programs executed) on a system, and then some steps taken to obviate or hide the data.

M... [Program Execution] AppCompatCache - C:\tools\autorunsc.exe

The "M..." refers to the fact that, as pointed out by Mandiant, when the tool is run, the file modification time for the tool is recorded in the data structure within the AppCompatCache value.  The "[Program Execution]" category identifier, in this case, indicates that the CSRSS flag was set (you'll need to read Mandiant's white paper).  The existence of the application prefetch file for the tool, as well as the UserAssist entry, help illustrate that the program had been executed.

One of the unique things about the SysInternals tools is that after they were taken over by Microsoft, they began to have EULA acceptance dialogs added to them.  Now, there is a command line switch that you can use to run the CLI versions of the tools and accept the EULA, but the tools will create their own subkey beneath the SysInternals key in the Software hive, and set the "EulaAccepted" value.  Even if the tool is renamed, these same artifacts will be left on a system.

File system metadata was extracted from the acquired image using TSK fls.exe.  As such, we know that the MACB times are from the $STANDARD_INFORMATION attribute within the MFT, which are highly mutable; that is to say, easily modified to arbitrary values.  We can see from the timeline that Autoruns.zip was downloaded on 15 May, and according to the SysInternals web site, an updated version of the tool was posted on 14 May.  The files were extracted from the zipped archive, carrying with them some of their original file times, which is why we see ".A.B" times prior to the date that the archive was downloaded.  Had the file times been modified to arbitrary values (i.e., "stomped"), rather than the files being deleted, we would still see the other artifacts listed in the timeline, in that order.  In essence, we'd have a "signature" for program execution.

Other sources of data that would not appear in a timeline can include, for example, the user's MUICache key.  This key simply holds a list of values, and in a number of exams, I've found references to malware that was run on the system, even after the actual files had been removed.  Also, if the AutoRuns files had been deleted, I could parse the AutoRuns.lnk Windows shortcut file to get the path to, as well as the MA.B times for, the target file.  In order to illustrate that, what follows is the raw output of an LNK file/stream parser:

atime                         Tue May 15 21:11:59 2012                         
basepath                    C:\Users\                                        
birth_obj_id_node       08:00:27:dd:64:d1                                
birth_obj_id_seq         9270                                             
birth_obj_id_time        Tue May 15 21:09:27 2012                         
birth_vol_id                 2C645C57D81C5047B7DDE13C2834AAD2                 
commonpathsuffix       john\Downloads\Autoruns.zip                      
ctime                           Tue May 15 21:11:59 2012                         
filesize                        535772                                           
machineID                  john-pc                                          
mtime                         Tue May 15 21:11:59 2012                         
netname                     \\JOHN-PC\Users                                  
new_obj_id_node        08:00:27:dd:64:d1                                
new_obj_id_seq          9270                                             
new_obj_id_time        Tue May 15 21:09:27 2012                         
new_vol_id                 2C645C57D81C5047B7DDE13C2834AAD2                 
relativepath                ..\..\..\..\..\Downloads\Autoruns.zip            
vol_sn                        F405-DAC1                                        
vol_type                     Fixed Disk                            

The "mtime","atime", and "ctime" values correspond to the MA.B times, respectively, of the target file, which in this case is the Autoruns.zip archive.  As such, I could either go back and add the LNK info to my timeline, or automatically have that information added during the initial process of collecting data for the timeline.  In this case, what I would expect to see would be MA.B times from both the file system and the LNK file metadata at exactly the same time.  Remember, the absence of an artifact where we expect to find one is itself an artifact, and as such, if the Autoruns.zip file system metadata was not available, that would tell me something and perhaps take my analysis in another direction.

[Note: I know you're looking at the above output and thinking, "wow, that looks like a MAC address in the output!"  You're right, it is.  In this case, looking up the OUI leads us to Cadmus Systems, and yes, the system was from a VM running in VirtualBox.  Also, there's a good deal of additional information available in the LNK file metadata, to include the fact that the target file was on a fixed disk, as opposed to a removable or network drive.]

The Value of Multiple Data Sources
Regarding the value of data from multiple sources (even additional locations within the same source, in a comment to his post regarding a RegRipper plugin that he'd written, Jason Hale points out, quite correctly:

I didn't think there was a whole lot of value in the information from the TypedURLsTime key itself (other than knowing that computer activity was occurring at that time) without correlating it with the values in TypedURLs.

Jason actually wrote more than one plugin to extract the TypedURLsTime value data (this key is specific to Windows 8 systems). I've looked at the plugin that outputs in TLN format, for inclusion in a timeline...I use a different source identifier in version I wrote (I use "REG", for consistency...Jason uses "NTUSER.DAT").  However, we both reached point B, albeit via different routes.  This will definitely be something I'll be including in my Windows 8 exams.

Key Concepts
1. Employing multiple data sources to develop a timeline of system activity provides context, as well as increases our relative confidence in the data itself.
2. Employing multiple data sources can demonstrate program execution.
3. Employing multiple data sources can illustrate and overcome the use of counter-forensics activities, however unintentional those activities may be.