Thursday, May 16, 2013

The Tool Validation "Myth-odology"

I posted recently about understanding data structures, and I wanted to continue with that thought process and line of reasoning into the area of the current state of tool validation.

What we have seen in the community for some time is that a new tool is announced or mentioned, and members of the community begin clamoring for their copy of that tool. Many times, one of the first questions is, "where can I download a copy of the tool?"  The reasons most give for wanting to download a copy of the tool is so that they can "test" it, or use it to validate the output of other tools.  To that, I would pose this question - if you do not understand what the tool is doing, what it is designed to do, and you do not understand the underlying data structures being parsed, how can you then effectively test the tool, or use that tool to validate other tools?

As such, the current state of tool validation, for the most part, isn't so much a methodology as it is a myth-odology.  Obviously, this isn't associated with testing and validation processes such as those used by NIST and other organizations, and applies more to individual analysts.

There are tools out there right now that are being recommended as being THE tool for parsing a particular artifact or set of artifacts.  The tools are, in fact, very good at what they do, but the fact is that some of them do not parse all of the data structures available within the set of artifacts, nor do they identify the fact that they're missing these structures in their output.  I'm aware of analysts who, in some cases, have stated that the fact that the tool doesn't parse and display specific artifacts isn't an issue for them, because the tool showed them what they were looking for.  I think what's happening is that someone will run a tool against a data set, see a lot of data in the output, and deem it "good".  They may then run another tool against the same data set, see different output, and deem one of the tools "not good" or at the very least, "questionable".  What I don't think is happening is that analysts are testing the tools against the data structures themselves, viewing the data itself as a 'blob' and relying on the tools to provide that layer of abstraction I mentioned in my previous post.

Consider the parsing of shell items, and shell item ID lists.  These artifacts abound on Windows systems, more so as the versions of Windows increase.  One place that they've existed for some time is in the Windows shortcuts (aka, LNK files).  Some of the tools that we've used for years parse both the headers and LinkInfo blocks of these files, but it's only been in the past 12 - 18 months or so that tools have parsed the shell item ID lists.  Why is this important?  These blog posts do a great job of explaining why...give them a read.  Another reason is that over the past year or so, I've run across several LNK files that consisted solely of the header and the shell item ID list...there was no LinkInfo block to parse.  As such, some of the tools that were available at the time would simply return blank output.

There is also the issue of understanding how a tool performs it's function.  Let's take a look at the XP Event Log example again.  Tools that use the MS API for parsing these files are likely going to return the "corrupted file" message that we're all used to seeing, but tools that parse the files on a binary level, going record-by-record, will likely work just fine.  

Another myth or misconception that is seen too often is that the quality of the tool is determined by how much space the output consumes. This simply is not the case.  Again, consider the shell item ID lists in LNK files.  Some of the structures that make up these lists contain time stamps, and a number of tools display the time stamps.  What do these time stamps mean?  How are they generated/produced?  Perhaps equally important is the question, what format are the time stamps saved in?  As it turns out, the time stamps are DOSDate format, consuming 32-bits and having a 2 second granularity.  On NTFS systems,  a folder entry (that leads to the target file) that appears in the shell item ID list will have a 64-bit FILETIME time stamp converted to a 32-bit DOSDate time stamp, with a corresponding loss in granularity.  As such, it's important to not only understand the data structure and its various elements, but also the context of those structure elements.  As such, if one tool lists all of the elements of the component data structures, and another does not, is the second tool any less valid or correct?

Returning to the subject of data structures, does this mean that every analyst must know and understand the details for every available data structure on, say, a Windows system?  No, not at all...that's simply not realistic.  The answer, IMHO, is that analysts need to engage.  If you're unclear about something, ask.  If you need a reference, ask someone.  There are some great structure references posted on the ForensicsWiki, including those posted by Joachim Metz, but I think that far too few analysts use that site as a resource.  By sharing what we know, and coupling that with what we need to know, we can approach a better method for validating the tools and methodologies that we use.

Monday, May 13, 2013

Understanding Data Structures

Sometimes at conferences or during a presentation, I'll provide a list of tools for parsing a specific artifact (i.e., MFT, Prefetch files, etc.), and I'll mention a tool or script that I wrote that presents specific data in a particular format.  Invariably when this happens, someone asks for a copy of the tool/script.  Many times, these scripts may not be meant for public consumption, and are only intended to illustrate what data is available within a particular structure.  As such, I'll ask why, with all of the other available tools, someone would want a copy of yet another tool, and the response is most often, "...to validate the output of the other tools."  So, I'm left wondering...if you don't understand the data structure that is being accessed or parsed, how is having another tool to parse it beneficial?

Tools provide a layer of abstraction over the data, and as such, while they allow us access to information within these data structures (or files) in a much more timely manner than if we were to attempt to do so manually, they also tend to separate us from the data...if we allow this to happen.  For many of the more popular data structures or sources available, there are likely multiple tools that can be used to display information from those sources.  But the questions then become, (a) do you understand the data source(s) being parsed, and (b) do you know what the tool is doing to parse those data structures?  Is the tool using an MS API to parse the data, or is it doing so on a binary level? 

A great example of this is what many of us will remember seeing when we have extracted Windows XP Event Logs from an image and attempted to open them in the Event Viewer on our analysis system.  In some cases, we'd see a message that told us that the Event Log was corrupted.  However, it was very often the case that the file wasn't actually corrupted, but instead that our analysis system did not have the appropriate message DLLs installed for some of the records.  Microsoft does, however, provide very clear and detailed definitions of the Event Log structures, and as such, tools that do not use the Windows API to parse the Event Log files can be used to much greater effect, to include parsing individual records from unallocated space.  This could not be done without an understanding of the data structures.

Not long ago, Francesco contacted me about the format of  automaticDestinations Jump List files, because he'd run a text search across an image and found a hit "in" one of these files, but parsing the file with multiple tools gave no indication of the search hit.  It turned out that understanding the format of MS compound file binary files provides us with a clear indication of how to map unallocated 'sectors' within the Jump List file itself, and determine why he'd seen a search hit 'in' the file, but that hit wasn't part of the output of the commonly-used tools for parsing these files.

Another great example of this came my attention this morning via the SQLite: Hidden Data in Plain Sight blog post from the Linuxsleuthing blog.  This blog post further illustrates my point; however, in this case, it's not simply a matter of displaying information that is there but not displayed by the available tools.  Rather, it is also a matter of correlating the various information that is available in a manner that is meaningful and valuable to the analyst.

The Linuxsleuthing blog post also asks the question, how do we overcome the shortcomings of the common SQLite Database analysis techniques?  That's an important question to ask, but it should also be expanded to just about any analysis technique available, and not isolated simply to SQLite databases.  What we need to consider and ask ourselves is, how do we overcome the shortcomings of common analysis techniques?

Tools most often provide a layer of abstraction over available data (structures, files, etc.), allowing for a modicum of automation and allowing the work to be done in a much more timely manner than using a hex editor.  However, much more is available to us than simply parsing raw data structures and providing some of the information to the analyst.  Tools can parse data based on artifact categories, as well as generate alerts for the analyst, based on known-bad or known-suspicious entries or conditions.  Tools can also be used to correlate data from multiple sources, but to really understand the nature and context of that data, the analyst needs to have an understanding of the underlying data structures themselves.

Addendum
This concept becomes crystallized when looking at any shell item data structures on Windows systems.  Shell items are not documented by MS, and yet are more and more prevalent on Windows systems as the versions progress.  An analyst who correctly understands these data structures and sees them as more than just "a bunch of hex" will reap the valuable rewards they hold.

Shell items and shell item ID lists are found in the Registry (shellbags, itempos* values, ComDlg32 subkey values on Vista+, etc.), as well as within Windows shortcut artifacts (LNK files, Win7 and 8 Jump Lists, Photos artifacts on Windows 8, etc.).  Depending upon the type of shell item, they may contain time stamps in DOSDate format (usually found in file and folder entries), or they may contain time stamps in FILETIME format (found in some variable type entries).  Again, tools provide a layer of abstraction over the data itself, and as such, the analyst needs to understand the nature of the time stamp, as well as what that time stamp represents.  Not all time stamps are created equal...for example, DOSDate time stamps within the shell items are created by converting the file system metadata time stamps from the file or folder that is being referred to, reducing the granularity from 100 nanoseconds to 2 seconds (i.e., the seconds value is multiplied times 2).

Resources
Windows Shellbag Forensics - Note: the first hex dump includes a reported invalid SHITEM_FILEENTRY; it's not actually invalid, it's just a different type of shell item.

Monday, April 29, 2013

There Are Four Lights: Incident Response

When I first thought of what became the Forensic Scanner (free version available here), my goal was to provide a solution for getting analysts to the point of analyzing images acquired from systems sooner; that is, to optimize an analyst's time when it comes to dead-box analysis.  Taking a page from Deming's book, my approach was to take a look at what could be optimized, and I figured that getting analysts to the point of actually doing analysis faster, by automating those tasks that we tend to do over and over again would be a great way to speed things up a bit.

The Forensic Scanner was designed to be used by mounting an acquired image on your analysis system as an accessible volume.  You can mount acquired images using FTK Imager, ImDisk, ProDiscover, or even converting the image to a VHD using vhdtool.

One of the things that's come up since I started talking about the Forensic Scanner is the question of whether this tool can be used in the triage of live systems.  Now, the Scanner was not designed for this purpose, particularly because some of the Perl modules used do not work against the Registry on a live system - a different API is required.  However, as it turns out, with the right tools, you can, in fact, use the Forensic Scanner to triage remote live systems.  For example, if you have F-Response, you can use the Forensic Scanner to retrieve information from remote live systems. I've also heard from one person recently that they were able to use the Forensic Scanner via EnCase PDE.  I don't have any specifics about how they did this, and I am unable to test this myself.

If you don't have access to either of these tools, but still want to use the Forensic Scanner in an infrastructure, take a look at Andrew Hay's post regarding the NBDServer application.  His methodology is a bit involved, but from the perspective of trying to perform remote incident response on a shoe-string budget, the only "costs" involved are two systems (or a VM or two...) and a bit of a learning curve.

RegRipper Updates

I've made some updates to RegRipper that I wanted to let everyone know about, in case you want to take advantage of them.

Version 2.8 is a minor update, and includes an additional function/subroutine that is available to the plugins: alertMsg().  In short, the tools (RegRipper, rip) provide the functionality, which is then used by the plugins themselves.  The updates to the tools simply provide the functionality; several of the plugins have been updated to make use of that functionality.  If you'd like to use this functionality, then you want to download the files rrv2.8.zip and plugins20130429.zip.

How is this alertMsg() function useful?  Well, consider Corey's recent post regarding the soft_run.pl and user_run.pl plugins; in the post, he illustrates several values of interest, that point (in his case) to malware.

As such, I added two checks to both of the plugins; one checks for "Temp" or "temp" in the path found in the value data (this would catch "Local Settings\Temp", "Temporary Internet Files", and "Templates"), and the other checks to see if the path in the value data ends in ".com" or ".bat". 

Other updated plugins include (but are not limited to):
  • appinitdlls - generate an alert if the value is not blank
  • appcompatcache - generate an alert for any path that includes "[Tt]emp"
  • attachmgr - generates alerts per Corey's blog post (ref: KB883260)
  • imagefile - generate an alert if a Debugger value is found
  • user_run, soft_run - alert on paths that contain "[Tt]emp"
  • winlogon, winlogon_u - added several alerts
What this means is that for RegRipper (the GUI), any alerts generated by the plugins will be added to the end of the report file, and for rip.exe, any alerts will appear last in the output.  The purpose of this is to allow analysts to focus on what might be most important to them.  Many of the plugins are meant to provide information for the analyst to review and use, and therefore will not generate alerts (nor need to).  However, other plugins (such as those described in this post) include specific items that can be checked, and if found, an alert can be generated.  This does not mean that the output of the plugin will not be generated; in fact, quite the opposite occurs.  The alerts are generated in addition to the output of the plugins.

So a big question is going to be, where do the alerts come from? The answer is pretty simple...they come from stuff I, and others (specifically, Corey Harrell), have seen.  For example, one of the checks that occurs in the soft_run.pl and user_run.pl plugins is that every value data (i.e., path) is checked to see if it contains "[Tt]emp"; an alert will be generated if it contains "Templates", "Local Settings\Temp", or "Temporary Internet Files", for example.  This is important because (a) I've seen applications set to run from those paths, and (b) you generally don't want that sort of thing to happen, particularly from "Temporary Internet Files".

Now, there are two things to keep in mind...the first is that not all plugins will necessarily generate alerts.  Some plugins, such as networklist.pl, do not necessarily provide information that should be alerted on.  The output of this plugin is mostly for informational purposes, and you should check it if you're looking for something specific.  Other plugins do provide information that can be alerted on; for example, in winlogon.pl, one alert will be generated if the TaskMan value is found, and another will be generated if the Userinit value is found to have more than simply what is expected.  Someplace that this might be useful...look for alerts from the winlogon.pl plugin, which would detect Ramnit.

The second is that of the plugins that do generate alerts, they will only generate those alerts that are included in (i.e., coded into) the plugins.  You can see what is generating an alert by locating any instance of ::alertMsg() in the plugin, and taking a look at the code around it.  If a plugin isn't alerting on something that you want, it may be because that alert hasn't shared that with someone...so just send me an email and I'll see what I can do (note: I may need sample data in order to test it).

Several of the plugins that were updated to include this ::alertMsg() functionality have also been converted to TLN output so that the alerts can be included in a timeline.  My hope is that this will bring a considerable modicum of intelligence to timeline analysis, by including things that would be of interest directly in the timeline.  In many cases, the location of the alert in the timeline may be imprecise...the time stamp value is based on the LastWrite time of the key; however, my hope is that seeing an event source of "ALERT" in the timeline (which you search on using Notepad++, etc.) will raise awareness of areas that should be checked by bringing them to the attention of the analyst.

A caveat...if you're using ripXP (is anyone using that??), then you want to use the plugins in the archive for 18 April 2013.  I'll leave that archive up and available, but I will not be updating ripXP with the alertMsg() functionality, so you'll need to use the appropriate plugins.  This is easy to do, simply create a separate folder for ripXP.

Thursday, April 18, 2013

RegRipper Plugin Updates

The RegRipper plugin archive has been updated.

The archive contains a text file that lists the updates, which are also listed here.  The wiki also contains a Plugins page, where descriptions of plugins (what they check, what they're intended for, how to use the data, etc.) will be maintained.

Monday, April 15, 2013

Plugin: Winlogon

The Winlogon plugin is a pretty comprehensive plugin, in that since the RegRipper consolidation release, several plugins have been retired and their functionality incorporated into this one plugin.

The Winlogon plugin is a valuable resource when it comes to determining autostart information for the system.  For example, the UserInit and Shell values point to the shell that is launched when a user logs in.  From here:

The Winlogon key controls actions that occur when you log on to a computer running Windows 7. Most of these actions are under the control of the operating system, but you can also add custom actions here. The “HKLM\Software\Microsoft\Windows NT\CurrentVersion\Winlogon\Userinit” and “HKLM\Software\Microsoft\Windows NT\CurrentVersion\Winlogon\Shell” subkeys can automatically launch programs. 

 MS KB 555648 addresses an issue where either the Shell or Userinit values have been modified.

The Winlogon plugin extracts values and data from beneath the HKLM\Software\Microsoft\Windows NT\CurrentVersion\Winlogon key, as well as it's accompanying Wow6432Node cousin on 64-bit Windows systems, and it also collects information from several subkeys, as well.

This Microsoft page provides additional information about some of the values that appear beneath this key.  Another way that a value beneath this key can be used to subvert the system is to add the TaskMan value, and point to malicious software.

Notify
This subkey maintains a running list of functionality made available to Windows systems via notification packages.  In short, a "package" (DLL) can receive notifications from Windows when certain events occur.  When these events occur, Windows will look for the package and launch the handler for that specific event.  For example, you can have specific functions run automatically when a user logs on, locks the console, when a smartcard is plugged into the system, etc.

As with other functionality on Windows system, this also provides a great mechanism for malware (see this Cutwail example) persistence.

Special Accounts
One of the subkeys that can exist beneath the Winlogon key is the "SpecialAccounts\UserList" subkey.  The values beneath this key, and each value's accompanying data, determines whether or not the specific account appears on the Welcome screen.  Very often, this information is used to for legitimate purposes, so that the screen isn't cluttered with accounts that are not used for logging into the system at the console.  However, this functionality can be, and has been, used for malicious purposes.  I've seen this in the wild, most often when an intruder has accessed an infrastructure via RDP, and creates accounts on systems that they can use to log in; hiding the user account from the Welcome screen prevents legitimate users from seeing anything suspicious when the system is rebooted.  In one instance, I saw this being used, but the "SpecialAccounts" key had been misspelled, so the functionality was not enabled.

The Winlogon plugin encapsulates data from several plugins, which led me to retire those other plugins.  For example, I added the checks from the taskman.pl, notify.pl, and specaccts.pl plugins to the winlogon.pl plugin, and retired those other plugins.  All of this will appear in the history file associated with the next roll-out of the plugin archive.  The output of the winlogon.pl plugin also includes analysis notes, so that the analyst has information right there in the report with respect to what they should look for, and what might be suspicious.

Resources
Winlogon\Nofity entries
MS KB 102972: Explains many of the Winlogon values

Thursday, April 11, 2013

Plugin: specaccts.pl

As is the case with many of the RegRipper plugins, the specaccts.pl plugin initially came about because of something I read about, and after running it, it actually found what I was looking for in the wild.

Beneath the Winlogon key (specifically, HKLM\Software\Microsoft\Windows\CurrentVersion\Winlogon), there may be a subkey path of "SpecialAccounts\UserList".  The values listed beneath the UserList key would be user account names, and if the data associated with a value is "0", then that account will not appear on the Welcome screen (any value greater than 0 allows the account to appear on the Welcome logon screen).

I've seen this used twice in the wild...once, it worked, and the second time, the bad guy had misspelled "SpecialAccounts", and as such, the functionality that they were trying to achieve wasn't realized.  Sometimes, a little attention to detail can go a long way.

There is malware that uses these Registry keys to keep new user accounts hidden from view on a live system, such as TrojanSpy:Win32/Ursnif , Trojan:Win32/Starter, and EyeStye.  As such, this plugin can provide indicators of a malware infection, an intrusion, or of malicious user intent on the system.  However, keep in mind, that this functionality can also be used for legitimate purposes, such as hiding an Administrator or HelpDesk account from view on the Welcome screen.

As of this writing, Corey Harrell and I are finishing updates to a number of plugins, and looking at merging plugins where appropriate.  As the information that we're looking for with the specaccts.pl plugin is beneath the Winlogon key, I've rolled the functionality into the winlogon.pl plugin, and retired the specaccts.pl plugin.

So, the functionality isn't going away...rather, it's going to be incorporated into an existing plugin.

Monday, April 08, 2013

Plugin: *_tln

If you've downloaded the new RegRipper plugins archive, you may have noticed several plugins whose names end in "_tln.pl".  These plugins specifically output their collected information in the five-field timeline (TLN) events file format that I use for creating timelines.

Many folks using the RegRipper tools may not be aware that you can use rip.exe to determine information about the plugins that you have currently have available.  For example, the following command will allow you to see all the plugins that you have, listed in a tabular format:

rip -l

This next command will allow you to see all of the plugins you have, listed in CSV format:

rip -l -c

This command will let you see all of the plugins that end in "*_tln", in CSV format:

rip -l -c | find "_tln"

Now that we have a list of the plugins that provide TLN output, we can easily include the output of the plugin in our timeline events file by using the following command:

rip -r path -p plugin  -u user -s server >> events.txt

An example of how this can be useful is in adding the UserAssist data for a specific user to the timeline events file...you can do that using the following command:

rip -r path -p userassist_tln -u user -s server >> events.txt

Very easy, very straightforward, and the use of these plugins can provide us with a good deal of granularity in our timeline.

Something that's very important to understand about the TLN plugins is that, in most cases, they will not display the same information as their accompanying plugin without "_tln" in the name.  In many cases, the information maintained in the keys and values extracted via the plugins is stored in a "most recently used" or "MRU" format, and as such, the LastWrite time of the key is associated with the most recent entry.  An example of this is the shellbags_tln.pl plugin...running this one side-by-side with the shellbags.pl plugin won't provide you with the same information, nor the same number of lines in the output.  However, this is by design...shellbag data is one of those "MRU" sources within the Registry.  One exception to this is the output of the userassist_tln.pl plugin; the time stamp data extracted by this plugin is stored in the binary content of the value data.

Typing the command to list the *_tln plugins will illustrate that most of the plugins appear to be oriented toward the NTUSER.DAT and Software hives.  The shellbags_tln.pl plugin was written to run against the USRCLASS.DAT hive, and lists its output based on the key LastWrite time or "MRU Time"; it does not list information in TLN format based on the created, last accessed or last modified times extracted from the shell items.  The samparse_tln.pl plugin will list information in TLN format based on various time stamps associated with each user account.  Also, with this plugin, you don't need to add the "-u" switch, as the user information is embedded within the hive file itself.