The Windows Incident Response Blog is dedicated to the myriad information surrounding and inherent to the topics of IR and digital analysis of Windows systems. This blog provides information in support of my books; "Windows Forensic Analysis" (1st thru 4th editions), "Windows Registry Forensics", as well as the book I co-authored with Cory Altheide, "Digital Forensics with Open Source Tools".
Wednesday, May 29, 2013
Good Reading, Tools
Reading
Cylance Blog - Uncommon Event Log Analysis - some great stuff here showing what can be found with respect to indirect or "consequential" artifacts, particularly within the Windows Event Logs on Vista systems and above. The author does a pretty good job of pointing out how some useful information can be found in some pretty unusual places within Windows systems. I'd be interested to see where things fall out when a timeline is assembled, as that's how I most often locate indirect artifacts.
Cylance Blog - Uncommon Handle Analysis - another blog post by Gary Colomb, this one involving the analysis of handles in memory. I liked the approach taken, wherein Gary explains the why, and provides a tool for the how. A number of years ago, I had written a Perl script that would parse the output of the MS SysInternals tool handle.exe (ran it as handle -a) and sort the handles found based on least frequency of occurrence, in order to do something similar to what's described in the post.
Security BrainDump - Bugbear found some interesting ZeroAccess artifacts; many of the artifacts are similar to what is seen in other variants of ZA, as well as in other malware families (i.e., file system tunneling), but in this case, the click fraud appeared in the systemprofile folder...that's very interesting.
SpiderLabs Anterior - The White X - this was an interesting and insightful read, in that it fits right along with Chris Pogue's Sniper Forensics presentations, particularly when he talks about 'expert eyes'. One thing Chris is absolutely correct about is that we, as a community, need to continue to shift our focus away from tools and more toward methodologies and processes. Corey Harrell has said the same thing, and I really believe this to be true. While others have suggested that the tools help to make non-experts useful, I would suggest that the usefulness of these "non-experts" is extremely limited. I'm not suggesting that one has to be an expert in mechanical engineering and combustion engine design in order to drive a car...rather, I'm simply saying that we have to have an understanding of the underlying data structures and what the tools are doing when we run those tools. We need to instead focus on the analysis process.
Java Web Vulnerability Mitigation on Windows - Great blog post that is very timely, and includes information that can be used in conjunction with RegRipper to in order to determine initial infection vector (IIV) during analysis.
ForkSec Blog - "new" blog I saw referenced on Twitter one morning, and I started my reading with the post regarding the review of the viaExtract demo. I don't do any mobile forensics at the moment, but I did enjoy reading the post, as well as seeing the reference to Santoku Linux.
Tools
win-sshfs - ssh(sftp) file system for Windows - I haven't tried this one but it does look interesting.
4Discovery recently announced that they'd released a number of tools to assist in forensic analysis. I downloaded and ran two of the tools...LinkParser and shellbagger. I ran LinkParser against a legit LNK file that I'd pulled from a system that contained only a header and a shell item ID list (it had no LinkInfo block), and LinkParser didn't display anything. I also ran LinkParser against a couple of LNK files that I have been using to test my own tools, and it did not seem to parse the shell item ID lists. I then ran shellbagger against some test data I've been working with, and found that, similar to other popular tools, it missed some shell items completely. I did notice that when the tool found a GUID that it didn't know, it said so...but it didn't display the GUID in the GUI so that the analyst could look it up. I haven't yet had a chance to run some of the other tools, and there are reportedly more coming out in the future, so keep an eye on the web site.
ShadowKit - I saw via Chad Tilbury on G+ recently that ShadowKit v1.6 is available. Here's another blog post that talks about how to use ShadowKit; the process for setting up your image to be accessed is identical to the process I laid out in WFAT 3/e...so, I guess I'm having a little difficulty seeing the advantages of this tool over native tools such as vssadmin + mklink, beyond the fact that it provides a GUI.
Autopsy - Now has a graphical timeline feature; right now, this feature only appears to include the file system metadata, but this approach certainly has potential. Based on my experience with timeline analysis, I do not see the immediate value in this approach to bringing graphical features to the front end of timeline analysis. There are other tools that utilize a similar approach, and as with those, I don't see the immediate value, as most often I'm not looking for where or when the greatest number of events occur, but I'm usually instead looking for the needle in stack of needles. However, I do see the potential for the use of this technique in timeline analysis. Specifically, adding Registry, Windows Event Log, and other events will only increase the amount of data, but one means for addressing this would be to include alerts in the timeline data, and then show all events as one color, and alerts as another. Alerts could be based on either direct or indirect/consequential artifacts, and can be extremely valuable in a number of types of cases, directing the analyst's attention to critical areas for analysis.
NTFS TriForce - David Cowen has released the public beta of his NTFS TriForce tool. I didn't see David's presentation on this tool, but I did get to listen to the recording of the DFIROnline presentation - the individual artifacts that David describes are very useful, but real value is obtained when they're all combined.
Auto-rip - Corey has unleashed auto-rip; Corey's done a great job of automating data collection and initial analysis, with the key to this automation being that Corey knows and understands EXACTLY what he's doing and why when he launches auto-rip. This is really the key to automating any DFIR task..while some will say that "it goes without saying", too often there is a lack of understanding with respect to the underlying data structures and their context when automated tools are run.
WebLogParser - Eric Zimmerman has released a log parser with geolocation, DNS lookups, and more.
Tuesday, May 21, 2013
Plugin: SAMParse
I thought I'd take a moment to discuss the samparse.pl plugin. This plugin parses the SAM hive file for information regarding user accounts local to the system itself, as well as their group membership, both of which can be very valuable and provide a good amount of insight for the analyst, depending upon the case. The information retrieved by this plugin should be correlated against the output of the profilelist.pl plugin, as well as the user profiles found within the file system.
One of the initial sources for parsing the binary data maintained within the SAM hive is the Offline Windows Password and Registry Editor. There is also a good deal of useful information in this AccessData PDF document.
An interesting piece of information displayed by this plugin, if available, is the user password hint. This capability was part of the plugin starting on 20 Oct 2009 (the capability was included in XP), and discussed by SpiderLabs almost 3 years later. This may provide useful information for an analyst...I have actually seen what turned out to be the user's password here!
Perhaps one of the most confusing bits of information in the output of the samparse.pl plugin is the "Password not required" entry. This is based on a check of a flag value, and means just that...that a password is not required. It does NOT mean that the account does not have a password...it simply means that one is not required. As such, you may find that the account does, indeed, have a password. I've seen posts to various forums and lists that either ask about this setting, or simply state that the output of RegRipper is incorrect. I am always glad to entertain and consider issues where the interpretation of a Registry value or data flag setting is incorrect, particularly if it is supported with solid data.
If you're analyzing a Vista or Windows 7 system and run across something suspicious regarding the local user accounts, remember that you will have a copy of the SAM hive in the Windows\system32\config\RegBack folder that you can incorporate into your analysis, and that you may also have older SAM hives in available VSCs.
Finally, there's a version of this plugin that provides timeline (TLN) output for various bits of time stamped date, to include account creation date, the password reset date, the last password failure date, and the last login. Incorporating this into your timeline, along with the historical information available in other Registry resources (such as those mentioned in the above paragraph), can provide considerable insight into user activity on the system.
Resources
MS KB305144 -
Scripting Guy blog, 7/7/2006
One of the initial sources for parsing the binary data maintained within the SAM hive is the Offline Windows Password and Registry Editor. There is also a good deal of useful information in this AccessData PDF document.
An interesting piece of information displayed by this plugin, if available, is the user password hint. This capability was part of the plugin starting on 20 Oct 2009 (the capability was included in XP), and discussed by SpiderLabs almost 3 years later. This may provide useful information for an analyst...I have actually seen what turned out to be the user's password here!
Perhaps one of the most confusing bits of information in the output of the samparse.pl plugin is the "Password not required" entry. This is based on a check of a flag value, and means just that...that a password is not required. It does NOT mean that the account does not have a password...it simply means that one is not required. As such, you may find that the account does, indeed, have a password. I've seen posts to various forums and lists that either ask about this setting, or simply state that the output of RegRipper is incorrect. I am always glad to entertain and consider issues where the interpretation of a Registry value or data flag setting is incorrect, particularly if it is supported with solid data.
If you're analyzing a Vista or Windows 7 system and run across something suspicious regarding the local user accounts, remember that you will have a copy of the SAM hive in the Windows\system32\config\RegBack folder that you can incorporate into your analysis, and that you may also have older SAM hives in available VSCs.
Finally, there's a version of this plugin that provides timeline (TLN) output for various bits of time stamped date, to include account creation date, the password reset date, the last password failure date, and the last login. Incorporating this into your timeline, along with the historical information available in other Registry resources (such as those mentioned in the above paragraph), can provide considerable insight into user activity on the system.
Resources
MS KB305144 -
Scripting Guy blog, 7/7/2006
Thursday, May 16, 2013
The Tool Validation "Myth-odology"
I posted recently about understanding data structures, and I wanted to continue with that thought process and line of reasoning into the area of the current state of tool validation.
What we have seen in the community for some time is that a new tool is announced or mentioned, and members of the community begin clamoring for their copy of that tool. Many times, one of the first questions is, "where can I download a copy of the tool?" The reasons most give for wanting to download a copy of the tool is so that they can "test" it, or use it to validate the output of other tools. To that, I would pose this question - if you do not understand what the tool is doing, what it is designed to do, and you do not understand the underlying data structures being parsed, how can you then effectively test the tool, or use that tool to validate other tools?
As such, the current state of tool validation, for the most part, isn't so much a methodology as it is a myth-odology. Obviously, this isn't associated with testing and validation processes such as those used by NIST and other organizations, and applies more to individual analysts.
There are tools out there right now that are being recommended as being THE tool for parsing a particular artifact or set of artifacts. The tools are, in fact, very good at what they do, but the fact is that some of them do not parse all of the data structures available within the set of artifacts, nor do they identify the fact that they're missing these structures in their output. I'm aware of analysts who, in some cases, have stated that the fact that the tool doesn't parse and display specific artifacts isn't an issue for them, because the tool showed them what they were looking for. I think what's happening is that someone will run a tool against a data set, see a lot of data in the output, and deem it "good". They may then run another tool against the same data set, see different output, and deem one of the tools "not good" or at the very least, "questionable". What I don't think is happening is that analysts are testing the tools against the data structures themselves, viewing the data itself as a 'blob' and relying on the tools to provide that layer of abstraction I mentioned in my previous post.
Consider the parsing of shell items, and shell item ID lists. These artifacts abound on Windows systems, more so as the versions of Windows increase. One place that they've existed for some time is in the Windows shortcuts (aka, LNK files). Some of the tools that we've used for years parse both the headers and LinkInfo blocks of these files, but it's only been in the past 12 - 18 months or so that tools have parsed the shell item ID lists. Why is this important? These blog posts do a great job of explaining why...give them a read. Another reason is that over the past year or so, I've run across several LNK files that consisted solely of the header and the shell item ID list...there was no LinkInfo block to parse. As such, some of the tools that were available at the time would simply return blank output.
There is also the issue of understanding how a tool performs it's function. Let's take a look at the XP Event Log example again. Tools that use the MS API for parsing these files are likely going to return the "corrupted file" message that we're all used to seeing, but tools that parse the files on a binary level, going record-by-record, will likely work just fine.
Another myth or misconception that is seen too often is that the quality of the tool is determined by how much space the output consumes. This simply is not the case. Again, consider the shell item ID lists in LNK files. Some of the structures that make up these lists contain time stamps, and a number of tools display the time stamps. What do these time stamps mean? How are they generated/produced? Perhaps equally important is the question, what format are the time stamps saved in? As it turns out, the time stamps are DOSDate format, consuming 32-bits and having a 2 second granularity. On NTFS systems, a folder entry (that leads to the target file) that appears in the shell item ID list will have a 64-bit FILETIME time stamp converted to a 32-bit DOSDate time stamp, with a corresponding loss in granularity. As such, it's important to not only understand the data structure and its various elements, but also the context of those structure elements. As such, if one tool lists all of the elements of the component data structures, and another does not, is the second tool any less valid or correct?
Returning to the subject of data structures, does this mean that every analyst must know and understand the details for every available data structure on, say, a Windows system? No, not at all...that's simply not realistic. The answer, IMHO, is that analysts need to engage. If you're unclear about something, ask. If you need a reference, ask someone. There are some great structure references posted on the ForensicsWiki, including those posted by Joachim Metz, but I think that far too few analysts use that site as a resource. By sharing what we know, and coupling that with what we need to know, we can approach a better method for validating the tools and methodologies that we use.
What we have seen in the community for some time is that a new tool is announced or mentioned, and members of the community begin clamoring for their copy of that tool. Many times, one of the first questions is, "where can I download a copy of the tool?" The reasons most give for wanting to download a copy of the tool is so that they can "test" it, or use it to validate the output of other tools. To that, I would pose this question - if you do not understand what the tool is doing, what it is designed to do, and you do not understand the underlying data structures being parsed, how can you then effectively test the tool, or use that tool to validate other tools?
As such, the current state of tool validation, for the most part, isn't so much a methodology as it is a myth-odology. Obviously, this isn't associated with testing and validation processes such as those used by NIST and other organizations, and applies more to individual analysts.
There are tools out there right now that are being recommended as being THE tool for parsing a particular artifact or set of artifacts. The tools are, in fact, very good at what they do, but the fact is that some of them do not parse all of the data structures available within the set of artifacts, nor do they identify the fact that they're missing these structures in their output. I'm aware of analysts who, in some cases, have stated that the fact that the tool doesn't parse and display specific artifacts isn't an issue for them, because the tool showed them what they were looking for. I think what's happening is that someone will run a tool against a data set, see a lot of data in the output, and deem it "good". They may then run another tool against the same data set, see different output, and deem one of the tools "not good" or at the very least, "questionable". What I don't think is happening is that analysts are testing the tools against the data structures themselves, viewing the data itself as a 'blob' and relying on the tools to provide that layer of abstraction I mentioned in my previous post.
Consider the parsing of shell items, and shell item ID lists. These artifacts abound on Windows systems, more so as the versions of Windows increase. One place that they've existed for some time is in the Windows shortcuts (aka, LNK files). Some of the tools that we've used for years parse both the headers and LinkInfo blocks of these files, but it's only been in the past 12 - 18 months or so that tools have parsed the shell item ID lists. Why is this important? These blog posts do a great job of explaining why...give them a read. Another reason is that over the past year or so, I've run across several LNK files that consisted solely of the header and the shell item ID list...there was no LinkInfo block to parse. As such, some of the tools that were available at the time would simply return blank output.
There is also the issue of understanding how a tool performs it's function. Let's take a look at the XP Event Log example again. Tools that use the MS API for parsing these files are likely going to return the "corrupted file" message that we're all used to seeing, but tools that parse the files on a binary level, going record-by-record, will likely work just fine.
Another myth or misconception that is seen too often is that the quality of the tool is determined by how much space the output consumes. This simply is not the case. Again, consider the shell item ID lists in LNK files. Some of the structures that make up these lists contain time stamps, and a number of tools display the time stamps. What do these time stamps mean? How are they generated/produced? Perhaps equally important is the question, what format are the time stamps saved in? As it turns out, the time stamps are DOSDate format, consuming 32-bits and having a 2 second granularity. On NTFS systems, a folder entry (that leads to the target file) that appears in the shell item ID list will have a 64-bit FILETIME time stamp converted to a 32-bit DOSDate time stamp, with a corresponding loss in granularity. As such, it's important to not only understand the data structure and its various elements, but also the context of those structure elements. As such, if one tool lists all of the elements of the component data structures, and another does not, is the second tool any less valid or correct?
Returning to the subject of data structures, does this mean that every analyst must know and understand the details for every available data structure on, say, a Windows system? No, not at all...that's simply not realistic. The answer, IMHO, is that analysts need to engage. If you're unclear about something, ask. If you need a reference, ask someone. There are some great structure references posted on the ForensicsWiki, including those posted by Joachim Metz, but I think that far too few analysts use that site as a resource. By sharing what we know, and coupling that with what we need to know, we can approach a better method for validating the tools and methodologies that we use.
Monday, May 13, 2013
Understanding Data Structures
Sometimes at conferences or during a presentation, I'll provide a list of tools for parsing a specific artifact (i.e., MFT, Prefetch files, etc.), and I'll mention a tool or script that I wrote that presents specific data in a particular format. Invariably when this happens, someone asks for a copy of the tool/script. Many times, these scripts may not be meant for public consumption, and are only intended to illustrate what data is available within a particular structure. As such, I'll ask why, with all of the other available tools, someone would want a copy of yet another tool, and the response is most often, "...to validate the output of the other tools." So, I'm left wondering...if you don't understand the data structure that is being accessed or parsed, how is having another tool to parse it beneficial?
Tools provide a layer of abstraction over the data, and as such, while they allow us access to information within these data structures (or files) in a much more timely manner than if we were to attempt to do so manually, they also tend to separate us from the data...if we allow this to happen. For many of the more popular data structures or sources available, there are likely multiple tools that can be used to display information from those sources. But the questions then become, (a) do you understand the data source(s) being parsed, and (b) do you know what the tool is doing to parse those data structures? Is the tool using an MS API to parse the data, or is it doing so on a binary level?
A great example of this is what many of us will remember seeing when we have extracted Windows XP Event Logs from an image and attempted to open them in the Event Viewer on our analysis system. In some cases, we'd see a message that told us that the Event Log was corrupted. However, it was very often the case that the file wasn't actually corrupted, but instead that our analysis system did not have the appropriate message DLLs installed for some of the records. Microsoft does, however, provide very clear and detailed definitions of the Event Log structures, and as such, tools that do not use the Windows API to parse the Event Log files can be used to much greater effect, to include parsing individual records from unallocated space. This could not be done without an understanding of the data structures.
Not long ago, Francesco contacted me about the format of automaticDestinations Jump List files, because he'd run a text search across an image and found a hit "in" one of these files, but parsing the file with multiple tools gave no indication of the search hit. It turned out that understanding the format of MS compound file binary files provides us with a clear indication of how to map unallocated 'sectors' within the Jump List file itself, and determine why he'd seen a search hit 'in' the file, but that hit wasn't part of the output of the commonly-used tools for parsing these files.
Another great example of this came my attention this morning via the SQLite: Hidden Data in Plain Sight blog post from the Linuxsleuthing blog. This blog post further illustrates my point; however, in this case, it's not simply a matter of displaying information that is there but not displayed by the available tools. Rather, it is also a matter of correlating the various information that is available in a manner that is meaningful and valuable to the analyst.
The Linuxsleuthing blog post also asks the question, how do we overcome the shortcomings of the common SQLite Database analysis techniques? That's an important question to ask, but it should also be expanded to just about any analysis technique available, and not isolated simply to SQLite databases. What we need to consider and ask ourselves is, how do we overcome the shortcomings of common analysis techniques?
Tools most often provide a layer of abstraction over available data (structures, files, etc.), allowing for a modicum of automation and allowing the work to be done in a much more timely manner than using a hex editor. However, much more is available to us than simply parsing raw data structures and providing some of the information to the analyst. Tools can parse data based on artifact categories, as well as generate alerts for the analyst, based on known-bad or known-suspicious entries or conditions. Tools can also be used to correlate data from multiple sources, but to really understand the nature and context of that data, the analyst needs to have an understanding of the underlying data structures themselves.
Addendum
This concept becomes crystallized when looking at any shell item data structures on Windows systems. Shell items are not documented by MS, and yet are more and more prevalent on Windows systems as the versions progress. An analyst who correctly understands these data structures and sees them as more than just "a bunch of hex" will reap the valuable rewards they hold.
Shell items and shell item ID lists are found in the Registry (shellbags, itempos* values, ComDlg32 subkey values on Vista+, etc.), as well as within Windows shortcut artifacts (LNK files, Win7 and 8 Jump Lists, Photos artifacts on Windows 8, etc.). Depending upon the type of shell item, they may contain time stamps in DOSDate format (usually found in file and folder entries), or they may contain time stamps in FILETIME format (found in some variable type entries). Again, tools provide a layer of abstraction over the data itself, and as such, the analyst needs to understand the nature of the time stamp, as well as what that time stamp represents. Not all time stamps are created equal...for example, DOSDate time stamps within the shell items are created by converting the file system metadata time stamps from the file or folder that is being referred to, reducing the granularity from 100 nanoseconds to 2 seconds (i.e., the seconds value is multiplied times 2).
Resources
Windows Shellbag Forensics - Note: the first colorized hex dump includes a reported invalid SHITEM_FILEENTRY, in green; it's not actually invalid, it's just a different type of shell item.
Tools provide a layer of abstraction over the data, and as such, while they allow us access to information within these data structures (or files) in a much more timely manner than if we were to attempt to do so manually, they also tend to separate us from the data...if we allow this to happen. For many of the more popular data structures or sources available, there are likely multiple tools that can be used to display information from those sources. But the questions then become, (a) do you understand the data source(s) being parsed, and (b) do you know what the tool is doing to parse those data structures? Is the tool using an MS API to parse the data, or is it doing so on a binary level?
A great example of this is what many of us will remember seeing when we have extracted Windows XP Event Logs from an image and attempted to open them in the Event Viewer on our analysis system. In some cases, we'd see a message that told us that the Event Log was corrupted. However, it was very often the case that the file wasn't actually corrupted, but instead that our analysis system did not have the appropriate message DLLs installed for some of the records. Microsoft does, however, provide very clear and detailed definitions of the Event Log structures, and as such, tools that do not use the Windows API to parse the Event Log files can be used to much greater effect, to include parsing individual records from unallocated space. This could not be done without an understanding of the data structures.
Not long ago, Francesco contacted me about the format of automaticDestinations Jump List files, because he'd run a text search across an image and found a hit "in" one of these files, but parsing the file with multiple tools gave no indication of the search hit. It turned out that understanding the format of MS compound file binary files provides us with a clear indication of how to map unallocated 'sectors' within the Jump List file itself, and determine why he'd seen a search hit 'in' the file, but that hit wasn't part of the output of the commonly-used tools for parsing these files.
Another great example of this came my attention this morning via the SQLite: Hidden Data in Plain Sight blog post from the Linuxsleuthing blog. This blog post further illustrates my point; however, in this case, it's not simply a matter of displaying information that is there but not displayed by the available tools. Rather, it is also a matter of correlating the various information that is available in a manner that is meaningful and valuable to the analyst.
The Linuxsleuthing blog post also asks the question, how do we overcome the shortcomings of the common SQLite Database analysis techniques? That's an important question to ask, but it should also be expanded to just about any analysis technique available, and not isolated simply to SQLite databases. What we need to consider and ask ourselves is, how do we overcome the shortcomings of common analysis techniques?
Tools most often provide a layer of abstraction over available data (structures, files, etc.), allowing for a modicum of automation and allowing the work to be done in a much more timely manner than using a hex editor. However, much more is available to us than simply parsing raw data structures and providing some of the information to the analyst. Tools can parse data based on artifact categories, as well as generate alerts for the analyst, based on known-bad or known-suspicious entries or conditions. Tools can also be used to correlate data from multiple sources, but to really understand the nature and context of that data, the analyst needs to have an understanding of the underlying data structures themselves.
Addendum
This concept becomes crystallized when looking at any shell item data structures on Windows systems. Shell items are not documented by MS, and yet are more and more prevalent on Windows systems as the versions progress. An analyst who correctly understands these data structures and sees them as more than just "a bunch of hex" will reap the valuable rewards they hold.
Shell items and shell item ID lists are found in the Registry (shellbags, itempos* values, ComDlg32 subkey values on Vista+, etc.), as well as within Windows shortcut artifacts (LNK files, Win7 and 8 Jump Lists, Photos artifacts on Windows 8, etc.). Depending upon the type of shell item, they may contain time stamps in DOSDate format (usually found in file and folder entries), or they may contain time stamps in FILETIME format (found in some variable type entries). Again, tools provide a layer of abstraction over the data itself, and as such, the analyst needs to understand the nature of the time stamp, as well as what that time stamp represents. Not all time stamps are created equal...for example, DOSDate time stamps within the shell items are created by converting the file system metadata time stamps from the file or folder that is being referred to, reducing the granularity from 100 nanoseconds to 2 seconds (i.e., the seconds value is multiplied times 2).
Resources
Windows Shellbag Forensics - Note: the first colorized hex dump includes a reported invalid SHITEM_FILEENTRY, in green; it's not actually invalid, it's just a different type of shell item.
Subscribe to:
Posts (Atom)