Friday, June 29, 2012

SANS DFIR Summit Follow-up

First off, I want to thank Rob Lee for asking me to provide a keynote presentation to the 2012 SANS Forensic Summit (presentation slides are available here).  It was truly an honor, and once again, I was blessed to be in the presence of so many great speakers, and some of the brightest minds in the community.  Also, I have to give a heartfelt thanks to the wonderful SANS staff who made the entire conference possible.  Without your work and dedication, the summit wouldn't be the incredible resource that it is.

I attended a number of presentations while at the summit and I thought I'd share my thoughts and views about each of them, as well as the summit as a whole.  Hopefully, others will do the same.

Det. Cindy Murphy's keynote was well thought-out and very well received.  Cindy is a well-known figure within the DFIR community, and her presentation really addressed a lot of the aspects of sharing within the community that many of us have been talking about for sometime.  One of the strengths of the community that Cindy mentioned was that we all have different perspectives, and we can use that fact to build up the community as a whole.  However, one of the weaknesses of the community is that we don't share those perspectives.

At the same time, I also think that Chris Pogue hit the nail on the head when he made his comment about sharing individual or organization is going to share their 'secret sauce'.  Within the community, there are a number of businesses, and the nature of a business is to make money.  If you're giving away your competitive advantage, you're not making money.  This is just one of the obstacles to sharing within the DFIR community, and hopefully by engaging more, we can discuss ways to overcome some of these obstacles.

Alissa Torres had an excellent point regarding not staying "in your lane" with respect to what you do.  I completely agree with her sentiment, and anyone who attended her presentation could clearly see how learning pen testing techniques from her co-workers has benefited her.

Nick Harbour of CrowdStrike gave an interesting talk on anti-forensics.  It was interesting to see some of the techniques that could be used discussed, and I spent most of the presentation thinking to myself, "how would we detect that?"  For instance, one of the techniques Nick mentioned for communicating off of a system was to launch IE as a COM object, and send data out in that manner.  This is nothing new...I remember the Setiri presentation at BH 2002 discussing a similar approach.  But the fact is, it can still work to hide activity from a particular area of analysis. 

I enjoyed seeing Chris Pogue back in action again with his Sniper Forensics 3: The Hunt presentation.  You can find previous iterations of Chris's presentations here and here.

Elizabeth Schweinsberg took an interesting approach in her presentation on Registry analysis - she crawled an AV website to collect data on reported Registry modifications made by various malware, and presented that data as a means for targeting your response and investigations.  This was very interesting, and discussing it with her afterward, I think it would be a great idea to do that with other AV vendors, as well.  I agree that this is not the idea data set...after all, AV companies receive samples of malware out of context and over the years some of the Registry artifacts associated with malware have been self-inflicted (that is, a result of how the AV analyst launches the malware).  But, it's the best data we have available.  One of the interesting statistics that Elizabeth came up with was the continued wide-spread use of the Run key as a persistence mechanism.  Even after returning from the conference, I still see malware that uses this key for persistence.

In her presentation, Elizabeth also provided something of a showdown between GRR, RegRipper, and Registry Decoder, using various criteria.  In some ways, I think it was good to show the differences in the tools, but in others, I didn't follow the reasoning for holding up tools against a criteria for which they were neither designed nor written.  After all, to say that tool X wasn't scalable, when it wasn't necessarily written to be scalable, isn't necessarily giving the tool a fair representation.  RegRipper was designed from the beginning to provide the ability for the community to write plugins, and one of the criteria that it was measured against was the ability to extract data from the RunOnce key.  Elizabeth was correct in that the plugin did not exist when she was testing the tool, but is that really a "con" (as opposed to a "pro") statement about the tool?  Also, I found myself thinking on my flight home that if Elizabeth had contacted me during her testing, I could have provided that plugin or anything else she needed.  After I got home, I reached to her and found out that she had, in fact, written her own module but not included that in the presentation.  I look forward to future conferences where both Elizabeth and other members of the Google team will be presenting.

The last presentation of the conference was from Carbon Black's CEO, Mike Viscuso.   In his presentation, Mike demonstrated the value of Cb without ever describing Cb in detail, and for the nature of the conference, I think it was very important that he not cross that line into a vendor presentation.  Instead, Mike clearly illustrated the need for the concept behind Cb, which was to redefine the data set that we, as incident responders, want access to when responding.  As a result, there were several excellent questions that came up, mostly from folks who had (understandably) never heard of nor looked at Cb.  The first comment during the Q&A session included asking if, by identifying a new 'data set' that DFIR folks would like to have available would put our current IR folks out of work, and to be honest, nothing could be further from the truth.  Cb, and tools like it, are changing the face of incident response, but not in a way that puts current IR staff out of work...rather, it requires a change in the business model that is currently in use.  The emergency model of IR is not sustainable; this is true for the consulting company that provides the service, as well as their customers.  Moving to a proactive, "security camera" approach does not remove the need for highly skilled responders, it simply changes the business model to one that is more advantageous, more sustainable, and much easier to manage from a budget perspective than the current model.  And this is applies to both sides of the equation, both the consultants and their customers.

A final thought about the presentations, and the summit in general...this is a great opportunity for folks who attend the conferences to really network and engage, particularly with the authors and presenters.  The SANS Summit is a small conference (when compared to others) and provides a fantastic opportunity, not just for networking in general, but also for attendees to engage in a direct and meaningful manner with the speakers.  It doesn't matter whether you've got questions or just really liked what you heard, you can walk up to the presenter and say something.  After all, if you're 20 feet from the presenter, why send an email or Tweet saying that you liked the presentation...why not just walk up to the presenter, introduce yourself, and tell them directly?  The size of the summit really facilitates that kind of close, direct interaction.

Thursday, June 28, 2012

Publishing DFIR Materials

At the recent SANS DFIR Summit, Corey Harrell, Christopher Witter and I had a chance to chat with someone from Syngress Publishing, who proposed a new business model for DFIR materials to us, and I wanted to get a feel for how others felt about it.

Right now, for a new author, it can take a long time to get material out into a book format.  My first book took about a year to write, and then 3 1/2 months to get through printing.  Some authors don't get beyond the initial couple of chapters before walking away from the project.  Writing a book can be a daunting, and often overwhelming project, and even if it is finished, it can take a year or more before any of the information appears in the public.

The new model takes a different approach.  Instead of full books, authors will write "modules", 30 - 120 page packages of what might be part of a book, but stand alone in and of themselves.  If you've see WFAT 3/e, you'll see that there are several of chapters in the book that could be provided in this manner, perhaps with some additional work.  These modules would be provided much quicker, going through the same review process but being shorter, would be available in a much quicker time frame.  Initially, they would be available in electronic format, (hopefully) at a reduced price.  This way, if you were waiting for WFAT 3/e to come out because you were interested in chapters 3 and 5, you wouldn't have to wait a full year or more for the materials.  Instead, you would have access to them in a much quicker time frame, and then as other modules came available, you would be able to combine the modules into print material.

This model reduces the time in which material is available, reduces the cost-of-entry for the material, and takes a great deal of burden off of the author, as well.  Rather than being engaged in a project that is a year long, the author might be engaged for only 2 months at a time.  Technical reviews would be much quicker, as would the overall final review before going to "printing".  This model also allows for updates...if you purchase a module, there will be an update model available for you to get the latest and greatest version of the module.

From a topics perspective, look at it this way...take one chapter from WFAT 3/e, perhaps expand it a bit with some applicable screen captures or other applicable material, and consider that a module.

Given all of this, I wanted to get some feel from the community at large as to (a) how you feel about this approach, (b) what topics you might like to see covered, and (c) who might be interested in providing this material.  Feel free to comment here, or email me at keydet89 at yahoo dot com.

Saturday, June 23, 2012

When was a file accessed?

One of the aspects of Windows analysis that I discuss in the courses we're offering is that the version of Windows you're analyzing is significant.  For example, as of Windows Vista, updating of file system last accessed times, as a result of normal user behavior, is disabled by default.  However, even though we can't look to file accesses times as an indication of when a user accessed the files, there are a number of artifacts on Windows systems, in particular Windows 7, which will tell us not only that a user accessed a file (based on the context of those artifacts), but also when.  As such, we  can add category IDs or tags (i.e., "[File Access]") to those events (something that I've discussed previously) in order to make them much easier to identify in timelines, as well as in other reporting formats.

I'll take a moment a discuss a few of the artifact sources we can use on Windows 7 systems that provide indications of file access...

LNK Files
One of the ways that LNK files are created on a system is that a user will double-click a file which is located somewhere on that system, on removable media, or even on a network share.  When this happens, a shortcut file that points to the target will be created in the user's Recent folder.  The operating system will select the appropriate application (based on the extension of the target file) with which to open the file.

As such, under "normal" circumstances, the creation date of the LNK file would correspond to when the target file was first accessed, and the last modification date of the LNK file would correspond to when the target file was most recently accessed. [Ref: Harry Parsonage's excellent "The Meaning of LIFE" white paper.]

Jump Lists
On Windows 7 systems, we now have new Task Bar artifacts called Jump Lists available for analysis.  The AutomaticDestinations Jump Lists are produced by activities very similar to those associated with LNK files, with the added advantage that the Jump Lists are associated with an application (based on the AppID), as well as with a user.

Let's say that the user accesses a Word .docx file by double-clicking it.  When this happens, an LNK file is created, and a Jump List associated with the version of MS Word installed on the system is created, if it doesn't already exist.  These Jump Lists are based on the MS Compound Document format, and an entry that contains an LNK stream is created within the Jump List file, and a structure is added to the DestList stream within the Jump List.  When the file is accessed and the DestList stream structure is added, the time of the activity is included within that structure.  This time can be used to illustrate the most recent time the user accessed that file.

As the LNK streams that point to the target file are not files themselves, they do not have MACB file system times specifically associated with each of them.  They do contain the MA.B times of the target file, embedded within the stream, as they follow the binary format specification described by MS. 

MRU Lists
There are a number of Registry keys (specifically within the user's NTUSER.DAT hive file) that maintain references to files that the user has accessed.  Some, such as the RecentDocs key, maintain simply names of files, while others, such as the Paint subkey beneath the user's Applets key (see the RegRipper plugin), provide the full path to the file.  Many of these keys also contain Most Recently Used entries, indicating that the key's LastWrite time may reflect when the appropriately listed file was most recently accessed. 

Document Metadata
There are a number of file formats that allow for metadata to be stored within the file itself.  MS Office has long been known for providing a good deal of (potentially embarrassing) metadata.  While more recent formats of MS Office documents don't contain as much metadata as previous versions, we may still be able to use this information to provide indications of file access.

Let's not forget that previous versions of each of the artifacts we've discussed so far may be located within available Volume Shadow Copies; as such, we may want to take a targeted (perhaps even laser-focused) approach to parsing previous versions of each of these artifacts for comparative, historical data.

As you can see, even though the updating of last access times for files is disabled by default on Windows systems as of Vista, this doesn't mean that we can't determine when a user accessed particular files.

Wednesday, June 20, 2012

Training, and Learning

I finished up leading a Timeline Analysis Course on Tuesday afternoon, ending two days of some pretty intensive training.  One of the things I find when I'm putting presentations or courses together, and then actually giving the presentation, is that I very often end up learning a good deal along the way, and this time around was no different.  As has happened in the past, what I learn leads me to revisit and possibly even modify tools or analysis techniques, and again, this time was no different.

One of my biggest takeaways from the training is that I need to reconsider how at least some of the available time stamped data is presented in a timeline.  One of those items is how Prefetch file metadata is represented and displayed; I've since updated the parsing tool to address this particular item.  Another, and the one I'm going to discuss in this blog post, is how Windows shortcut (LNK file) information might be displayed in a timeline, or more specifically, what information about LNK files might possibly need to be presented in a timeline.

Category IDs
So, as a bit of background, I've been thinking quite a lot lately about how to better take advantage of timeline data.  As I was putting the timeline course together, it occurred to me that I was going to be spending a good deal of time describing to attendees how to create a timeline, and even walking them through this process with demonstrations and hands-on exercises, but spending very little time discussing how to actually analyze the timeline data.  The simplest answer is, it depends.  It depends on your exam goals, why you're performing the exam, and why you created a timeline in the first place.  It occurred to me that I was making an assumption that most analysts would have a good solid justification for creating a timeline as part of their analysis process.  If that were the case every time, I wouldn't have folks signing up for a timeline analysis course, would I?  I'm not saying that analysts don't have a justification for creating a timeline, but sometimes that justification be, "...that's what we always do...", or "...that's what I did last time."

Timeline analysis is something of a data reduction technique...we go from a 500GB hard drive or image, to somewhere around a GB or so of data, and we then arrange it based on a time value in the hopes of obtaining some context and increasing our relative confidence in the data that we're looking at; that's the goal, anyway.  But by grabbing just the data directly associated with a time value, we end up performing a great deal of data reduction.  Even so, we still need a means for directing our analysis, or getting the cream to rise to the top of the container.

Something I'd discussed in a previous blog post was the concept of categories for events.  Rob Lee has done some considerable work in this area already, providing a color-coded Excel macro that implements the category ID scheme he's identified via resources such as the SANS DFIR poster.  Regardless of the method used to identify event types or categories, the idea is to develop some method to assist the examiner in her analysis of the timeline.  After all, if you have something of an idea of what you're looking for, then finding it might be a bit easier if you classify various events by type or category, and then have some means to identify the events accordingly (via color, a tag or identifier, etc.).

Shortcut/LNK Files
Speaking of categories, perhaps one of the most difficult artifacts to classify into a single category is Windows shortcut/LNK files.  Without getting into a long discussion about this, let's take a look at an example of what's available in an LNK file found in a user's Recent folder:

atime                         Tue May 15 21:11:59 2012                         
basepath                    C:\Users\                                        
birth_obj_id_node       08:00:27:dd:64:d1                                
birth_obj_id_seq         9270                                             
birth_obj_id_time        Tue May 15 21:09:27 2012                         
birth_vol_id                 2C645C57...13C2834AAD2                 
commonpathsuffix       john\Downloads\                      
ctime                           Tue May 15 21:11:59 2012                         
filesize                        535772                                           
machineID                  john-pc                                          
mtime                         Tue May 15 21:11:59 2012                         
netname                     \\JOHN-PC\Users                                  
new_obj_id_node        08:00:27:dd:64:d1                                
new_obj_id_seq          9270                                             
new_obj_id_time        Tue May 15 21:09:27 2012                         
new_vol_id                 2C645C57...13C2834AAD2                 
relativepath                ..\..\..\..\..\Downloads\            
vol_sn                        F405-DAC1                                        
vol_type                     Fixed Disk          

As you can see, we have a number of data elements available to us once we've decoded the binary contents of the LNK file, any of which (or any combination of which) may be relevant or significant to our analysis.  For example, as the LNK file was found in the user's Recent folder, we can assume that the existence of the file indicates some form of user activity; that is, the user must have done something, must have performed a specific action (such as double-clicking the file) that caused that LNK file to be created.

Next, we have the path to where the target file was located within the file system, as well as the MA.B times of the target file at the time that the shortcut was created. That might be significant to your analysis, as it demonstrates both knowledge of and access to a file, and will persist even after the target file is no longer available.

Had the LNK file been located on the user's Desktop and pointed to an EXE target file, this might illustrate specific actions taken by the user, such as installing an application.  This might also indicate program execution, rather than file access.

If the target file in the shortcut is a document or image, this might also illustrate program execution.  Launching the shortcut would cause the Windows system to reach into the Registry in order to determine which application with which files with the target file's extension are associated.  For example, let's say a shortcut "points to" a .avi video file.  On some systems, launching a shortcut that points to a video file might cause Windows Media Player to be launched automatically; on other systems, it might be another application all together.  Either way, the existence of an LNK file might also illustrate program execution or application launch.

Finally, we see that the last item visible in our example is called "vol_type", which refers to the type of volume where the target file was at the time of the activity.  In this case, the C:\ volume is a "Fixed Disk"; if it wasn't, would that be significant?  For example, if the volume type were "removable media" or a "network share", would that be significant to your exam?  In some cases, it could very well be, and we might look to that information for indications of access to or use of removable storage devices, or of network shares.

Perhaps the idea here isn't to classify LNK files into a single category, but instead filter the various data items found within LNK files based on a set of rules, and produce timeline events based on the output of each of the rules, where appropriate.  This might mean that for a single LNK file, we might end up with multiple events in our timeline.

ForensicsWiki LNK page

Thursday, June 14, 2012

Timeline Analysis, and Program Execution

I mentioned previously that I've been preparing for an upcoming Timeline Analysis course offered through my employer.  As part of that preparation, I've been using the tools to walk through the course materials, and in particular one of the hands-on exercises that we will be doing in the course.

One of the things I'd mentioned in my previous post is that Rob Lee has done a great deal of work for SANS, particularly in providing an Excel macro to add color-coding of different events to log2timeline output files.  I've had a number of conversations and exchanges with Corey Harrell and others (but mostly Corey) regarding event categorization, and the value of adding these categories to a timeline in order to facilitate analysis.  This can be particularly useful when working with Windows Event Log data, as there a good number of events recorded by default, and all of that information can be confusing if you don't have a quick visual reference. 

As I was running through the exercises, I noticed something very interesting in the timeline with respect to the use of the Autoruns tool from SysInternals; specifically, that there were a good number of artifacts associated with both the download and use of the tool.  I wanted to extract just those artifacts directly associated with Autoruns from the timeline events file, in order to demonstrate how a timeline can illustrate indications of program execution.  To do so, I ran the following command:

type events.txt | find "autoruns" /i > autoruns_events.txt

...and then to get my timeline...

parse -f autoruns_events.txt > autoruns_tln.txt

...and got the following:

Tue May 29 12:56:02 2012 Z
  FILE                       - ..C. [195166] C:/Windows/Prefetch/
  FILE                       - ..C. [44056] C:/Windows/Prefetch/

Tue May 15 21:14:55 2012 Z
  REG      johns-pc         john - M... HKCU/Software/Sysinternals/AutoRuns
  REG      johns-pc         john - [Program Execution] Software\SysInternals\AutoRuns (EulaAccepted)

Tue May 15 21:14:07 2012 Z
  FILE                       - MA.B [195166] C:/Windows/Prefetch/

Tue May 15 21:13:57 2012 Z
  PREF     johns-PC          - [Program Execution] last run (1)
  REG      johns-pc         john - [Program Execution] UserAssist - C:\tools\autoruns.exe (1)

Tue May 15 21:13:53 2012 Z
  FILE                       - M.C. [640632] C:/tools/autoruns.exe
  FILE                       - M.C. [26] C:/tools/autoruns.exe:Zone.Identifier
  REG      johns-pc     - M... [Program Execution] AppCompatCache - C:\tools\autoruns.exe

Tue May 15 21:13:42 2012 Z
  FILE                       - MAC. [877] C:/Users/john/AppData/Roaming/Microsoft/Windows/Recent/Autoruns.lnk
  JumpList johns-pc         john - C:\Users\john\Downloads\

Tue May 15 21:13:32 2012 Z
  FILE                       - MA.B [44056] C:/Windows/Prefetch/

Tue May 15 21:13:28 2012 Z
  PREF     johns-PC          - [Program Execution] last run (1)
  REG      johns-pc         john - [Program Execution] UserAssist - C:\tools\autorunsc.exe (1)

Tue May 15 21:13:23 2012 Z
  FILE                       - M.C. [49648] C:/tools/autoruns.chm
  FILE                       - M.C. [26] C:/tools/autoruns.chm:Zone.Identifier
  FILE                       - M.C. [559736] C:/tools/autorunsc.exe
  FILE                       - M.C. [26] C:/tools/autorunsc.exe:Zone.Identifier
  REG      johns-pc     - M... [Program Execution] AppCompatCache - C:\tools\autorunsc.exe

Tue May 15 21:12:10 2012 Z
  FILE                       - ...B [877] C:/Users/john/AppData/Roaming/Microsoft/Windows/Recent/Autoruns.lnk
  FILE                       - ..C. [535772] C:/Users/john/Downloads/
  FILE                       - ..C. [26] C:/Users/john/Downloads/

Tue May 15 21:11:59 2012 Z
  FILE                       - MA.B [535772] C:/Users/john/Downloads/
  FILE                       - MA.B [26] C:/Users/john/Downloads/

Wed May  9 15:08:16 2012 Z
  FILE                       - .A.B [640632] C:/tools/autoruns.exe
  FILE                       - .A.B [26] C:/tools/autoruns.exe:Zone.Identifier
  FILE                       - .A.B [559736] C:/tools/autorunsc.exe
  FILE                       - .A.B [26] C:/tools/autorunsc.exe:Zone.Identifier

Sat Nov  5 17:52:32 2011 Z
  FILE                       - .A.B [49648] C:/tools/autoruns.chm
  FILE                       - .A.B [26] C:/tools/autoruns.chm:Zone.Identifier

What I find most interesting about this timeline excerpt is that it illustrates a good deal of interaction with respect to the download and launch of the tool within it's eco-system, clearly demonstrating Locard's Exchange Principle.  Now, there are also a number of things that you don't see...for example, this timeline is comprised solely of those lines that included the word "autoruns" (irrespective of case) somewhere in the line; as such, we won't see things such as the query to the "Image File Execution Options" key, to determine if there's been a debugger assigned to the tool, nor do you see ancillary events or those that might be encoded.  However, what we do see will clearly allow us to "zoom in" on a specific time window within the overall timeline, and see what other events may be listed there.

The timeline is clearly very illustrative.  We can see the download of the tool (in this case, via Chrome to a Windows 7 platform), and the assignment of the ":Zone.Identifier" ADSs, something that with XP SP2 was done only via IE and Outlook.  Beyond the file system metadata, we start to see even more context, simply by adding additional data sources such as the Registry AppCompatCache value data, UserAssist value data, information derived from the SysInternals key in the user's Registry hive, Jump Lists, etc.  In this case, the Jump List info in the timeline was extracted from the DestList stream found in the Jump List for the Windows Explorer shell, as zipped archives will often be treated as if they were folders.

Another valuable aspect of this sort of timeline data is that it is very useful in the face of the use of counter-forensics techniques, even those that may be unintentional (i.e., performed by an administrator, not to hide data, but to "clean up" the system).  Let's say that this tool had been run, and then deleted; remove all of the "FILE" entries that point to C:/tools from the above timeline, and what do you have left?  You have those artifacts that persist beyond the deletion of files and programs, and provide clear indicators that the tools had been used.  We can apply this same sort of analysis to other situations where tools had been run (programs executed) on a system, and then some steps taken to obviate or hide the data.

M... [Program Execution] AppCompatCache - C:\tools\autorunsc.exe

The "M..." refers to the fact that, as pointed out by Mandiant, when the tool is run, the file modification time for the tool is recorded in the data structure within the AppCompatCache value.  The "[Program Execution]" category identifier, in this case, indicates that the CSRSS flag was set (you'll need to read Mandiant's white paper).  The existence of the application prefetch file for the tool, as well as the UserAssist entry, help illustrate that the program had been executed.

One of the unique things about the SysInternals tools is that after they were taken over by Microsoft, they began to have EULA acceptance dialogs added to them.  Now, there is a command line switch that you can use to run the CLI versions of the tools and accept the EULA, but the tools will create their own subkey beneath the SysInternals key in the Software hive, and set the "EulaAccepted" value.  Even if the tool is renamed, these same artifacts will be left on a system.

File system metadata was extracted from the acquired image using TSK fls.exe.  As such, we know that the MACB times are from the $STANDARD_INFORMATION attribute within the MFT, which are highly mutable; that is to say, easily modified to arbitrary values.  We can see from the timeline that was downloaded on 15 May, and according to the SysInternals web site, an updated version of the tool was posted on 14 May.  The files were extracted from the zipped archive, carrying with them some of their original file times, which is why we see ".A.B" times prior to the date that the archive was downloaded.  Had the file times been modified to arbitrary values (i.e., "stomped"), rather than the files being deleted, we would still see the other artifacts listed in the timeline, in that order.  In essence, we'd have a "signature" for program execution.

Other sources of data that would not appear in a timeline can include, for example, the user's MUICache key.  This key simply holds a list of values, and in a number of exams, I've found references to malware that was run on the system, even after the actual files had been removed.  Also, if the AutoRuns files had been deleted, I could parse the AutoRuns.lnk Windows shortcut file to get the path to, as well as the MA.B times for, the target file.  In order to illustrate that, what follows is the raw output of an LNK file/stream parser:

atime                         Tue May 15 21:11:59 2012                         
basepath                    C:\Users\                                        
birth_obj_id_node       08:00:27:dd:64:d1                                
birth_obj_id_seq         9270                                             
birth_obj_id_time        Tue May 15 21:09:27 2012                         
birth_vol_id                 2C645C57D81C5047B7DDE13C2834AAD2                 
commonpathsuffix       john\Downloads\                      
ctime                           Tue May 15 21:11:59 2012                         
filesize                        535772                                           
machineID                  john-pc                                          
mtime                         Tue May 15 21:11:59 2012                         
netname                     \\JOHN-PC\Users                                  
new_obj_id_node        08:00:27:dd:64:d1                                
new_obj_id_seq          9270                                             
new_obj_id_time        Tue May 15 21:09:27 2012                         
new_vol_id                 2C645C57D81C5047B7DDE13C2834AAD2                 
relativepath                ..\..\..\..\..\Downloads\            
vol_sn                        F405-DAC1                                        
vol_type                     Fixed Disk                            

The "mtime","atime", and "ctime" values correspond to the MA.B times, respectively, of the target file, which in this case is the archive.  As such, I could either go back and add the LNK info to my timeline, or automatically have that information added during the initial process of collecting data for the timeline.  In this case, what I would expect to see would be MA.B times from both the file system and the LNK file metadata at exactly the same time.  Remember, the absence of an artifact where we expect to find one is itself an artifact, and as such, if the file system metadata was not available, that would tell me something and perhaps take my analysis in another direction.

[Note: I know you're looking at the above output and thinking, "wow, that looks like a MAC address in the output!"  You're right, it is.  In this case, looking up the OUI leads us to Cadmus Systems, and yes, the system was from a VM running in VirtualBox.  Also, there's a good deal of additional information available in the LNK file metadata, to include the fact that the target file was on a fixed disk, as opposed to a removable or network drive.]

The Value of Multiple Data Sources
Regarding the value of data from multiple sources (even additional locations within the same source, in a comment to his post regarding a RegRipper plugin that he'd written, Jason Hale points out, quite correctly:

I didn't think there was a whole lot of value in the information from the TypedURLsTime key itself (other than knowing that computer activity was occurring at that time) without correlating it with the values in TypedURLs.

Jason actually wrote more than one plugin to extract the TypedURLsTime value data (this key is specific to Windows 8 systems). I've looked at the plugin that outputs in TLN format, for inclusion in a timeline...I use a different source identifier in version I wrote (I use "REG", for consistency...Jason uses "NTUSER.DAT").  However, we both reached point B, albeit via different routes.  This will definitely be something I'll be including in my Windows 8 exams.

Key Concepts
1. Employing multiple data sources to develop a timeline of system activity provides context, as well as increases our relative confidence in the data itself.
2. Employing multiple data sources can demonstrate program execution.
3. Employing multiple data sources can illustrate and overcome the use of counter-forensics activities, however unintentional those activities may be.

Monday, June 11, 2012

Timeline Analysis

As I've been preparing for our upcoming timeline analysis course, I've been putting some work into updating some of the tools that I use for creating timelines, which are also provided to attendees for their use, along with the other course materials.  Some of the updates I've been doing are intended to bring a new level of capabilities to the analyst, and really illustrate the power that timelines bring to an analyst.

One of the things Rob Lee has talked about in his timeline analysis courses is the idea of "pivot points", events within a timeline that would likely serve as anchor points for our analysis, or for what you're interested in determining.  I've had some conversations with folks at work and some extensive email exchanges with Corey Harrell lately, both of which have involved determining what some of these pivot or anchor points might be.  One place from which I've obtained pivot points is the initial triage phone call with the customer; maybe the customer had logged into a system and found that, at some point, WinRAR had been installed.  Or maybe they saw a pop-up from the AV application.  Or maybe banking or credit card fraud was reported by the bank to have occurred on a certain date, so you know that access to a system had to have occurred prior to that date.  The point is that we usually have some piece of information that leads us to the decision to create a timeline; after all, we wouldn't simply create a timeline "...because that's what we've always done."  I say, nay, nay...we're not likely to index an entire image unless we're planning to perform a keyword search, and we're equally unlikely to create a timeline unless we have a good reason for doing so.

My point is this...when we sit down to analyze an acquired image, how many times have we opened the image in our commercial analysis framework and just started poking around aimlessly?  The answer is probably...far too often.  What if I had a timeframe ("prior to April 5th"), or better yet, a specific event ("WinRAR was installed") that led me to creating a timeline?   I would then have a specific point within the timeline that I could go to in order to begin my analysis.

A Brief Word About Goals...
During the recent Intro to Windows Forensic Analysis course, one of the attendees asked me, "How do you determine the goals of your analysis if your customer doesn't even know them?"  Well, the fact of the matter is that they will know their goals, even if they don't know an analyst, it's your job to work with them and draw that out.  Start with something simple, such as why the customer called you in the first place, or what led them to identify a system that they want you to acquire and analyze; now you just need to determine, analyze for what?  Sometimes, this can involve something (AV alert, pop-up, etc.) having occurred on a specific date, and that's a'll at least know that you're looking for an event that occurred on a certain date, and that would be your initial anchor point.

And Now, Back To Timelines...
When I'm building at timeline, I like to use LogParser to extract data from the Windows Event Log (*.etvx) files.  I then use a Perl script, (provided as a standalone Windows executable), to transition the resulting .csv format logs into TLN format for inclusion into the timeline.  One of the tool updates I've completed recently is to add an event ID mapping capability to as it parses through the output of LogParser, for each event, it checks a lookup table, based on event source and event ID pairs, for an identifier of what type or category the event is, and adds an identifier or tag to the event description.  For example, there are event records that tell you when a program has been installed, removed, or launched.  There are event records that tell you when the system has been connected to a wireless access point.  And there are a LOT of event records that indicate login attempts, either locally or remotely.  As I was working my way through an analysis, I thought that it might be useful to be able to quickly see at a glance what I was looking for...login events, program execution, etc.

Note: While I refer to the tools as Perl scripts, I also provide course attendees with copies of the scripts compiled into Windows executables via Perl2Exe.

The event ID mapping file is a flat text file, and in my modifications to the script, I included the ability to add comments to the file.  As part of my research, I've included links to Microsoft resources (as comments) that identify what certain events mean; so, it's not me saying that a particular event means that a program was installed, it's Microsoft stating that that's what the event identifies, and I provide a link a vendor resource that analysts can use to validate that information.  This provides a great facility for an analyst to not only easily research the event, but also add their own event identifiers.  I've also taken this a step further by adding similar identifiers to the TLN output of other tools, including the RegRipper plugins and other data parsers that are provided along with the course materials.

Another potential means for identifying pivot or anchor points for your analysis is to add an additional layer of filtering to the tools.  For example, I wrote a RegRipper plugin that replicates Mandiant's Python script, and we know from the published research that the entries identify programs that had been executed on the system.  Now, what if we were to not only tag these entries in our timeline as identifying program execution, but also scan each entry and identify those in particular that were run from a directory that includes the word "temp" in the path (such as "Local Settings\Temp" or "Temporary Internet Files")?  With all of the available data in a timeline, adding tags to identify pivot or anchor points in this way would likely be extremely useful.

This isn't anything new, and I'm not the only one to look at things this way.  At the SANS360 event last year, Rob Lee spent his 6 minutes talking about an Excel macro he had created to color code events in a similar manner, so that they could be easily identified in the output of log2timeline.  Rob's also created a poster of these event categories.

Some analysts have asked me about the timeline analysis course that we're offering, and why I don't use other, perhaps more popular tools when I perform my analysis.  I'm not against the use of other tools; in fact, if you have the time and interest, I strongly encourage you to use multiple tools to look at data.  Creating my own tools serves two purposes; it forces me to better understand the actual data so that I can better understand it and see how it can be useful, and it allows me a finer, more granular level of access to the data.  Sometimes, I don't want a full fact, I may not even want a timeline created from just one source.  Other times, I may not have a complete image to work with; rather, I will have a selected set of files from a system.  I conduct various levels of analysis using a selected set of files in cases where it takes far too long to obtain and ship a full image file, when working with compromised systems that may contain sensitive data or illicit images, or when working to assist another analyst, to name but a few instances.  Or, based on the questions that the customer has, I may want a timeline created solely from a subset of one data source, such as a timeline of remote logins and from where they originated.  If that were the case, I might use a command line such as the following:

Logparser -i:evt -o:csv "Select RecordNumber,TO_UTCTIME(TimeGenerated),EventID,SourceName,ComputerName,SID,Strings from Log.evtx" | find "TerminalServices-LocalSessionManager/XX" > remote_logins.csv

Now that I have just the remote login messages, I can (a) parse these into a timeline, and (b) quickly write up a Perl script that will run through the information in the .csv file and provide me with a list of unique IP addresses.

Again, I'm not averse to using other tools, and definitely would not advocate against the use of other tools.  This is simply how I prefer to go about creating timelines, and I think that it serves as an excellent foundation from which to teach timeline creation and analysis.

Wednesday, June 06, 2012

Training, and Host-Based Analysis

I posted recently on the host-based analysis topic, and just yesterday, I finished up teaching the first rendition of the ASI Intro to Windows Forensic Analysis course, which focuses on host-based analysis.  I'm really looking forward to teaching the Timeline Analysis course in a week and a half.  As with analysis engagements, I tend to take things that I learned in a recent engagement and apply them to future work;  the same is true with the training I provide.  I've been spending a considerable amount of time over the past 12 or so hours going back over the course I just taught, and looking at ways to improve not only the next iteration of the course, but also to improve the next course that I teach.

Both of the courses are two days long, and I try to have a good deal of hands-on work and exercises.  As such, folks planning to attend the course should have some experience in performing forensic analysis, and also be comfortable (not expert, just comfortable) working at the command line.  I found that a great way to illustrate the value of certain artifacts is to provide tools and exercises that illustrate the value of a particular artifact.  As such, I provide a number of my own tools, which I've updated in functionality.  I also provide various sample files so that folks attending the course can practice using the tools.

As an example, I provide the tools discussed here, including pref.exe.  I also provide sample Prefetch files extracted from an XP system, as well as others extracted from a Vista system.  In the Intro course, we walk through each artifact, and provide several means for extracting the data of value from each.  In some cases, I only talk about various tools and provide screen shots, as due to the license agreements, I can't distribute copies of the tools themselves.

Both courses start off with the core concepts; the why of what we're doing, before we step off into the how.  In the Intro course, I spend a good deal of time talking about using multiple data sources to illustrate that something occurred.  In our first exercise scenario, we look at how to determine the active users on a system, and discuss the various sources we can look to for data; while some of these data sources may appear to be redundant, we want to look to them in order to validate our other data, as well as provide a means of analysis in the face of counter- or anti-forensics activities, no matter how unintentional those activities may be.  The reason for this is two-fold...first, some data sources are more easily mutable than others, and may be changed either over the course of time while the system is active, or changed intentionally.  I've had exams where one of the initial steps taken by administrators, prior to contacting our IR team, included removing user accounts.  As such, another reason for making use of redundant data sources is to address those times when the one data source we usually rely on isn't available, or isn't something we necessarily trust.

Another area we look at in the Intro course is indications of program execution.  We look to a number of different locations within a Windows system for indications of programs having been executed (either simply executed on the system, or those that can be associated with a user), and as such, we use a number of RegRipper plugins that you won't find anywhere else, and are only provided in conjunction with the courses. There's one to parse the AppCompatCache value data (as well as run checks for any program paths with 'temp' in the path), another to display the information in TLN format for inclusion in a timeline, as well as others to query, parse and display other Registry data that is relevant to indications of program execution, and other categories of activity.

We also discuss Volume Shadow Copies, as well as means of accessing them when analyzing an acquired image.  In the course, we focus on the VHD method for accessing VSCs, but we also discuss other means, as well.  I stress throughout the course that the purpose of accessing various artifacts in this manner is to let the analyst see what's available, so that they're better able to select the appropriate tool for the job.  For example, if you're accessing VSCs a great deal, perhaps it would be valuable to use techniques such as those that Corey's blogged about, or use something like ShadowKit, or perhaps take advantage of the VSC functionality included in TechPathway's ProDiscover.

One of the things I really like about these training courses is engaging with others and discussing how to go about performing host-based analysis based on identified goals.

Friday, June 01, 2012

Host-Based Analysis

I came across an article on malware 'licensing' the other day that caught my attention.  When I read this, I thought to myself, "wow, what is this going to do to dynamic malware analysis?"  Seriously...what are some of the common approaches to malware analysis?  Get a sample, and either send it off to the AV vendor you're paying, upload it to VirusTotal, or, if you have the in-house capability, perform some form of dynamic analysis (i.e., actually run it to see what it does).  No, I'm not saying that this all that's done...there are some extremely knowledgeable and capable reverse engineers out there who use their RE-fu.  But the fact is that a lot of folks who "do" malware analysis don't exactly break out the debugger and set break points as their first order of business...although some do. 

What I am saying is that, in my IR experience, a common approach to "malware analysis" is (a) get a copy and give it to the AV vendor, (b) deploy the .dat, and (c) wipe and reinstall the infected box(es).  Or, someone will load a sample up into a VM and run it with some monitoring tools to see what it does.

Overall, it seems to me that this malware 'licensing' really drives the need for host-based analysis, particularly if you want to collect intel from which to build your defense, or to just address an issue.  One benefit of host-based analysis is that you can determine artifacts of an infection or compromise that you might not see in AV vendor write-ups, specifically when the question isn't "what could the malware do", but instead, "what did the malware do?"

So why isn't this something that we normally do? takes too long?  Well, with training and education, we can achieve that targeted, "sniper" approach that Chris talks about, allowing us to perform more efficient, timely, and comprehensive analysis, if we understand the nature of what we're looking for (goals), as well as the systems we're analyzing.

Shameless plug: It's exactly this knowledge and some of the analysis techniques that we'll be discussing in our training.

As a bit of a tangent, here are some links to some excellent blog posts that may be of assistance to analysts, in helping them to get from "I've heard of that" to "I've done that"...

SecurityBananas - From Hibernation file to Malware Analysis with Volatility
Willi Benethin's DFIROnline video, showing how to take advantage of INDX records
Girl, Unallocated - Report writing cheat sheet
Cheeky4n6Monkey - looking for a better way to manage massive amounts of image EXIF data?  This would also work with documents, as well.

A final note on host-based of the things I try very hard to avoid when performing malware detection is to start looking for malware based on file names alone.  We are all aware that files can be named anything on Windows systems; remember the ntshrui.dll issue?   Many times, if I go searching within an image for a file name, it's due to an entry in the Run key, or more recently, something interesting in the AppCompatCache data (yes, Virginia, I do use a RegRipper plugin for that, as well as many others).  Corey went back and took a look at some older exams, and found some very interesting information; I've included it in every exam that I do in which I'm even remotely interested in determine  indications of program execution.