Windows Incident Response

Monday, January 28, 2013

Are You Being Served, pt II

This article isn't going to be directed toward digital analysts; rather, it will be directed more to folks who hire or contract with analysts or firms, and are the recipients (or customers) of the technical work performed by those digital forensics analysts. My goal here is to simply express some thoughts on how customers might go about determining if the results of the work that they contracted for are meeting their needs.

Previously in this blog, I asked the question, Are you being served? If you've asked yourself that question, you may be wondering...how would I know? Selecting a DFIR analyst (either an individual or a firm) is really no different for evaluating and hiring any other provider of services, such as a plumber or auto mechanic. The difference is that plumbers and mechanics fix something for you, and you can evaluate their services based on if the problem is fixed, and for how long. For customers of digital analysis services, determining if you're getting what you paid for is a bit more difficult.

In exploring the subject of finding a digital forensics expert, I ran across this article at the Law.com web site. The article contains a number of aspects of the overall digital analysis services that lawyers should consider when looking for a digital forensics expert. For example, the article suggests that when asked to identify methods of data exfiltration, analysts should include USB devices. This is good to know, but more importantly, does the analyst identify all such devices, or only the thumb drives? How do you know? Does the analyst make an attempt to determine the use of counter-forensics techniques, where a user might delete certain artifacts in an attempt to hide the fact that they connected a specific device to the system? What details can the analyst provide with respect to the device being connected to the system, and how a user may have interacted with that device? Regardless of the data exfiltration method used (USB device, web mail, BlueTooth, etc.), how does the analyst address data movement, in particular?

Beyond those items addressed in the article, some other things to consider include (but are not limited to):

Does the analyst explore historical data, such as Volume Shadow Copies (VSCs), when and where it is appropriate to do so? If not, why? If the methodology used by the analyst fails to find any VSCs, what does the analyst state as the reason for this finding?

What about other artifacts? When the analyst provides a finding, do they have additional artifacts to support their findings, or are their findings based on that one artifact? If artifacts (such as Prefetch files) are not examined or missing, what reason does the analyst provide?

If you're interested in the existence of malware on a system, what does the analyst do to address this issue? Do they run AV against the mounted image? What else do they do? If malware is found, do they determine the initial infection vector? Do they determine if the malware ever actually executed?

When you look at the report provided, does the information in it answer your questions and address your concerns, or are there gaps? Does the analyst connect the dots in the report, or do they skip over many of the dots, and fill in the gaps using speculation?

One question that you might consider asking is, what tools does the analyst use, but I would suggest that it's more important to know how the tools are used. For example, having access to one of the commercial analysis suites can be a good thing, particularly if the analyst states that they will use it on your case to perform a keyword search. But does it make sense to do so? Did they work with you to develop a list of keywords to use in the search? I've heard of examinations that were delayed for some time while the data was being preprocessed and indexed in preparation for a keyword search, yet none of the analysts could state why the keyword search was necessary or of value to the case itself.

There is often much more to digital analysis than simply finding one or two artifacts in order to "solve the case". Systems today are sufficiently complex that multiple artifacts are needed to identify the context of a single artifact, such as a tool not finding VSCs within an image of a Windows 7 system. Digital analysis is very often used as the basis for making critical business decisions or addressing legal questions, so the question remains...are you being served? Are you getting the data that you need, in a timely manner, and in a manner that you can understand and use?

Resources
Law.com - How to Find a Digital Forensics Expert

Interested in Windows DFIR training? Windows Forensic Analysis, 11-12 Mar; Timeline Analysis, 9-10 Apr. Pricing and Calendar. Send email here to register.

Why "BinMode"?

You may be wondering why I've started posting articles to my blog with titles that start with "BinMode" and "There Are Four Lights".

The "BinMode" posts are dedicated to deeply technical posts; the name comes from the fact that sometimes I'll write a Perl script that requires me to open a file using binmode(), so that I can parse the file on a binary level. These are generally posts that go beyond the tools, which tend to provide a layer of abstraction between the data and analyst. I feel that it's important for analysts to understand what data is available to them, so that they can make better decisions as to which tool to use to extract and process that data.

An example of this is the recent work I've done parsing the Java deployment cache index (*.idx) files. Beyond opening these files in a hex editor, one resource that I had access to in order to assist me in parsing the files is this source code page: CacheEntry.java. Another resource that became available later in the process is the format specification that Mark Woan documented. What these resources show is that within the binary data, there is potentially some extremely valuable information. This information might be most useful during a root cause analysis investigation, perhaps to determine the initial infection vector of malware, or how a compromise occurred.

The "Four Lights" articles are partly a nod to the inner geek (and Star Trek fan) in all of us, but they're also to address something that may be lesser known, or perhaps seen as a misconception within the digital forensic analysis community. The title alludes to an episode of ST:TNG, during which his captors attempted to get the greatest starship captain...EVER...to say that there were only three lights, when, in fact, there were four.

If there is a particular topic that you'd like me to expand upon, or if there's something that you'd like to see addressed, feel free to leave a comment here, or to send me an email.

Interested in Windows DF training? Check it out: Timeline Analysis, 4-5 Feb; Windows Forensic Analysis, 11-12 Mar. Be sure to check the WindowsIR Training Page for updates.

Monday, January 21, 2013

BinMode: Parsing Java *.idx files, pt. deux

My last post addressed parsing Java *.idx files, and since I released that post, a couple of resources related to the post have been updated. In particular, Joachim Metz has updated the ForensicsWiki page he started to include more information about the format of the *.idx files, with some information specific to what is thought to be the header of the files.

Also, Corey Harrell was kind enough to share the *.idx file from this blog post with me (click here to see the graphic of what the file "looks like" in Corey's post), and I ran it through the parser to see what I could find:

File: d:\test\781da39f-6b6c0267.idx

Times from header:
------------------------------
time_0: Sun Sep 12 15:15:32 2010 UTC
time_2: Sun Sep 12 22:38:40 2010 UTC

URL: http://xhaito.com/work/builds/exp_files/rox.jar
IP: 91.213.217.31

Server Response:
------------------------------
HTTP/1.1 200 OK
content-length: 14226
last-modified: Sun, 12 Sep 2010 15:15:32 GMT
content-type: text/plain
date: Sun, 12 Sep 2010 22:38:35 GMT
server: Apache/2
deploy-request-content-type: application/x-java-archive

Ah, pretty interesting stuff. Again, the "Times from header" section is comprised of, at this moment, data from those offsets within the header that Joachim has identified as possibly being time stamps. In the code, I have it display only those times that are not zero. What we don't have at the moment is information about the structure of the header so that we can identify to what the time stamps refer.

However, this code can be used to parse *.idx files and help determine to what the times refer. For example, in the output above we see that "time_0" is equivalent to the "last modified" field in the server response, and that the "time_2" field is a few seconds after the "date" field in the server response. Perhaps incorporating this information into a timeline might be useful, while research continues in order to identify what the time stamps represent. What is very useful is that the *.idx files are associated with a specific user profile, so for testing purposes, an analyst should be able to incorporate browser history and *.idx info into a timeline, and perhaps be able to "see" what the time stamps may refer to...if the analyst were to control the entire test environment, to include the web server, even more information may be developed.

Speaking of timelines, Sploited commented to my previous post regarding developing timelines analysis pivot points from other resources; in the comment, a script for parsing IE history files (urlcache.pl) was mentioned; I would suggest that incorporating a user's web history, as well as incorporating searches against the Malware Domain List might be extremely helpful in identifying initial infect vectors and entry points.

Interested in Windows DF training? Check it out: Timeline Analysis, 4-5 Feb; Windows Forensic Analysis, 11-12 Mar. Be sure to check the WindowsIR Training Page for updates.

Saturday, January 19, 2013

BinMode: Parsing Java *.idx files

One of the Windows artifacts that I talk about in my training courses is application log files, and I tend to sort of gloss over this topic, simply because there are so many different kinds of log files produced by applications. Some applications, in particular AV, will write their logs to the Application Event Log, as well as a text file. I find this to be very useful because the Application Event Log will "roll over" as it gathers more events; most often, the text logs will continue to be written to by the application. I talk about these logs in general because it's important for analysts to be aware of them, but I don't spend a great deal of time discussing them because we could be there all week talking about them.

With the recent (Jan, 2013) issues regarding a Java 0-day vulnerability, my interest in artifacts of compromise were piqued yet again when I found that someone had released some Python code for parsing Java deployment cache *.idx files. I located the *.idx files on my own system, opened a couple of them up in a hex editor and began conducting pattern analysis to see if I could identify a repeatable structure. I found enough information to create a pretty decent parser for the *.idx files to which I have access.

Okay, so the big question is...so what? Who cares? Well, Corey Harrell had an excellent post to his blog regarding Finding (the) Initial Infection Vector, which I think is something that folks don't do often enough. Using timeline analysis, Corey identified artifacts that required closer examination; using the right tools and techniques, this information can also be included directly into the timeline (see the Sploited blog post listed in the Resources section below) to provide more context to the timeline activity.

The testing I've been able to do with the code I wrote has been somewhat limited, as I haven't had a system that might be infected come across my desk in a bit, and I don't have access to an *.idx file like what Corey illustrated in his blog post (notice that it includes "pragma" and "cache control" statements). However, what I really like about the code is that I have access to the data itself, and I can modify the code to meet my analysis needs, much the way I did with the Prefetch file analysis code that I wrote. For example, I can perform frequency analysis of IP addresses or URLs, server types, etc. I can perform searches for various specific data elements, or simply run the output of the tool through the find command, just to see if something specific exists. Or, I can have the code output information in TLN format for inclusion in a timeline.

Regardless of what I do with the code itself, I know have automatic access to the data, and I have references included in the script itself; as such, the headers of the script serve as documentation, as well as a reminder of what's being examined, and why. This bridges the gap between having something I need to check listed in a spreadsheet, and actually checking or analyzing those artifacts.

Resources
ForensicsWiki Page: Java
Sploited blog post: Java Forensics Using TLN Timelines
jIIr: Almost Cooked Up Some Java, Finding Initial Infection Vector

Interested in Windows DF training? Check it out: Timeline Analysis, 4-5 Feb; Windows Forensic Analysis, 11-12 Mar.

Saturday, January 12, 2013

There Are Four Lights: The Analysis Matrix

I've talked a lot in this blog about employing event categories when developing, and in particular, when analyzing timelines, and the fact is that we can use these categories for much more that just adding analysis functionality to our timelines. In fact, using artifact and event categories can greatly enhance our overall analysis capabilities. This is something that Corey Harrell and I have spent a great deal of time discussing.

For one, if we categorize events, we can raise our level of awareness of the context of the data that we're analyzing. Having categories for various artifacts can help us increase our relative level of confidence in the data that we're analyzing, because instead of looking at just one artifact, we're going to be looking at various similar, related artifacts together.

Another benefit of artifact categories is that they help us remember what various artifacts relate to...for example, I developed an event mapping file for Windows Event Log records, so as a tool parses through the information available, it can assign a category to various event records. This way, you no longer have to search Google or look up on a separate sheet of paper what that event refers to...you have "Login" or "Failed Login Attempt" right there next to the event description. This is particularly useful, as of Vista, Microsoft began employing a new Windows Event Log model, which means that there are a LOT more Event Logs than just the three main ones we're used to seeing. Sometimes, you'll see one event in the System or Security Event Log that will have corresponding events in other event logs, or there will be one event all by itself...knowing what these events refer to, and having a category listed for each, is extremely valuable, and I've found it to really help me a great deal with my analysis.

One way to make use of event categories is to employ an analysis matrix. What is an "analysis matrix"? Well, what happens many times is that analysts will get some general (re: "vague") analysis goals, and perhaps not really know where to start. By categorizing the various artifacts on a Windows system, we can create an analysis matrix that provides us with a means for at least begin our analysis.

An analysis matrix might appear as follows:

	Malware Detection	Data Exfil	Illicit Images	IP Theft
Malware	X		X
Program Execution	X	X		X
File Access		X	X	X
Storage Access		X	X	X
Network Access				X

Again, this is simply a notional matrix, and is meant solely as an example. However, it's also a valid matrix, and something that I've used. Consider "data exfiltration"...the various categories we use to describe a "data exfiltration" case may often depend upon what you learn from a "customer" or other source. For example, I did not put an "X" in the row for "Network Access", as I have had cases where access to USB devices was specified by the customer...they felt confident that with how their infrastructure was designed that this was not an option that they wanted me to pursue. However, you may want to add this one...I have also conducted examinations in which part of what I was asked to determine was network access, such as a user taking their work laptop home and connecting to other wireless networks.

The analysis matrix is not intended to be the "be-all-end-all" of analysis, nor is it intended to be written in stone. Rather, it's intended to be something of a living document, something that provides analysts with a means for identifying what they (intend to) do, as well as serve as a foundation on which further analysis can be built. By using an analysis matrix, we have case documentation available to us immediately. An analysis matrix can also provide us with pivot points for our timeline analysis; rather than combing through thousands of records in a timeline, we now not only have a means of going after that information which may be most important to our examination, but it also helps us avoid those annoying rabbit holes that we find ourselves going down sometimes.

Finally, consider this...trying to keep track of all of the possible artifacts on a Windows system can be a daunting task. However, it can be much easier if we were to compartmentalize various artifacts into categories, making it an easier task to manage by breaking it down into smaller, easier-to-manage pieces. Rather than getting swept up in the issues surrounding a new artifact (Jump Lists are new as of Windows 7, for example...) we can simply place that artifact in the appropriate category, and incorporate it directly into our analysis.

I've talked before in the blog about how to categorize various artifacts...in fact, in this post, I talked about the different ways that Windows shortcut files can be categorized. We can look at access to USB devices as storage access, and include sub-categories for various other artifacts.

Interested in Windows DFIR training? Check it out...Timeline Analysis, 4-5 Feb; Windows Forensic Analysis, 11-12 Mar.

Tuesday, January 08, 2013

Training

For those readers who may not be aware, I teach a couple of training courses through my employer, at our facility in Reston, VA. We're also available to deliver those courses at your location, if requested. As such, I thought it might be helpful to provide some information about the courses, so in this post, I'll talk about the courses we offer, some we're looking to offer, and what you can expect to get out of the courses.

Windows Forensic Analysis
Day 1 starts with a course introduction, and then we get right into discussing some core analysis concepts, which will be addressed again and again throughout the training. From there, we begin exploring and discussing some of the various data sources and artifacts available on Windows 7 systems. Knowing that XP is still out there, we don't ignore that version of Windows, we simply focus primarily on Windows 7. Artifacts specific to other systems are discussed, as they come up.

Throughout the course, we also discuss the various artifact categories, and how to create and use an analysis matrix to focus and document your analysis. We discuss what data is available, how to get it, how to correlate that data with other available data, and how to get previous versions of that data by accessing Volume Shadow Copies. All of this is accompanied by hands-on demonstrations of tools and techniques; many of the tools used are only available to those attending the training.

Day 2 starts with a quick review of the previous day's materials and answering any questions attendees may have; if there's any material that needs to be completed from the first day, we finish up with that, and then move into the hands-on exercises. Depending upon the attendee's familiarity with the tools and techniques used, these exercises may be guided, or they will be completed by attendees, in teams or individually.

Do you want to know what secrets lie hidden within Windows shortcut files and Jump Lists? Want to know more about "shellbags"? How about other artifacts? This course will tell...no, show...you. Not only that, we'll show you how to use this information to a greater effect, in a more timely and efficient manner, in order to extend your analysis.

Each attendee receives a copy of Windows Forensic Analysis Toolkit 3/e.

Timeline Analysis
Day 1 - Much like the Windows Forensic Analysis course, we start the first day with some core analysis concepts specific to timeline analysis, and then we jump right into exploring and discussing various data sources and artifacts as they relate to creating and analyzing timelines. We discuss the various artifact and event categories, and how this information can be used to get more out of your timeline analysis.

Day 2 starts off with completing any material from the first day, answering any questions the attendees may have, and then kicking off into a series of scenarios where questions are answered based on findings from a timeline; we not only go over how to create a timeline, but also how to go about analyzing that timeline and finding the answers to the questions.

If you can't remember all of the commands that we go over in the course, don't worry...you can write down notes on the provided copies of the slides, or you can turn to the provided cheat sheet for hints and reminders. Many of the tools used in this course are only available to those attending the course.

Each attendee receives a copy of Windows Forensic Analysis Toolkit 3/e.

Registry Analysis
This 1-day course is based on the material in my book, Windows Registry Forensics. As such, we spend some time in this course discussing not only the structure of the Registry, but also the value of performing Registry analysis. There is a good deal of information in the Registry that can significantly impact your analysis, and the goal of this course is to allow you to go beyond assumption to determining explicitly why you're seeing what you're seeing.

As you would guess, we spend some time discussing various tools, and some attention is given to RegRipper. For those interested, attendees will receive plugins that are not available through the public distribution. We also spend some time discussing the RegRipper components and structure, how it's used, and how to get the most out of it.

One of the take-aways we provide with this course is a graphic illustrating various components of USB device analysis, showing artifacts that aren't addressed anywhere else.

Each attendee receives a copy of Windows Registry Forensics.

Why Should I Attend?
That's always a great question; it's one I ask myself, as well, whenever I have an option to attend training.

Each attendee is provided the tools for the course, which includes tools that are only available to you if you attend the course. Tools for parsing various data structures, including RegRipper plugins that you can't get any place else. Several publicly available tools are discussed in the courses, but due to licenses, are not provided with the course materials. In such cases, the materials provide links to the tools.

I continually update the course materials. I sit down with the materials immediately following a course and look at my notes, any questions asked by attendees, and I pay particular attention to the course evaluation forms. When something new pops up in the media, I like to be sure to include it in the course for discussion. Updates come from other areas, as well...most notably, what I get from and how I perform my analysis. New techniques and findings are continually incorporated directly into the training materials.

As the Windows operating systems have gotten more complex, it's proven to be difficult for a lot of analysts to maintain current knowledge of the various artifacts, as well as analysis tools and techniques. These courses will not only provide you with the information, but also provide you with an opportunity to use those tools and employ those techniques, developing an understanding of each so that you can incorporate them into your analysis processes.

What Do I Need To Know Before Attending?
For the currently available courses, we ask that you arrive with a laptop with Windows 7 installed (can be a VM), a familiarity with operating at the command prompt, and a desire to learn. Bring your questions. While sample data is provided with the course materials, feel free to bring your own data, if you like.

The courses are developed so that you do NOT want to book all of these courses in a single 5-day training course. The reason is that a great deal of information is provided in the Windows Forensic Analysis course, and if you've never done timeline analysis before (and in some cases, even if you have), you do not want to immediately step off into the Timeline Analysis course. It is best to take the Windows Forensic Analysis (and perhaps the Registry Analysis) course(s), return to your shop, and make develop your familiarity with the data sources before taking the Timeline Analysis course.

If you've ever seen or heard me present, you know that I am less about lecturing and more about interacting. If you're interested in engaging and interacting with others to better understand data sources and artifacts, as well as how they can be used to further your analysis, then sign up for one of our courses.

Upcoming Course(s)
Malware Detection - By request, I'm working a course that addresses malware detection within an acquired image. I've taught courses similar to this before, and I think that in a lot of ways, it's an eye-opener for a lot of folks, even those who deal with malware regularly. This is NOT a malware analysis course...the purpose of this course is to help analysts understand how to locate malware within an acquired image. This is one of those analysis skills that traverses a number of cases, from breaches to data theft, even to claims of the "Trojan Defense".

Others - TBD.

Our website includes information regarding the schedule of courses, as well as the cost for each course. Check back regularly, as the schedule may change. Also, if you're interested in having us come to you to provide the training, let us know.

Saturday, January 05, 2013

There Are Four Lights: USB-Accessible Storage

There's been a good deal of discussion and documentation regarding discovering USB devices that had been connected to a Windows system, as this seems to be very important to a number of examiners. In 2005, Cory Altheide and I published some initial information, and over the years since then, that information has been expanded, simply because it continues to grow. For example, Rob Lee has published valuable checklists via the SANS Forensics Blog, and Jacky Fox recently published her dissertation, which includes some interesting and valuable information regarding interpreting some of the information that is available regarding user access to USB devices via the Registry. Ms. Fox determined that when a USB device is connected to a system and mounted as a volume, that volume GUID is added to the MountPoints2 key for all logged in users, not just the user logged in at the console.

Further, Mark Woan recently updated information collected by his USBDeviceForensics tool, to include querying some additional keys/values.

Regarding the additional keys/values that Mark's tool is querying, Windows 7 and 8 systems have additional values beneath the device keys in the System hive, specifically a "Property" key with a number of GUID subkeys. This blog post provides some very good information that facilitates further searches, which leads use to information regarding a time stamp value that pertains to the InstallDate, as well as one that pertains to the FirstInstallDate.

So what? Well, let's take a look at the MS definition for the FirstInstallDate:

Windows sets the value of DEVPKEY_Device_FirstInstallDate with the time stamp that specifies when the device instance was first installed in the system.

Pretty cool, eh? This is what MS says about the InstallDate time stamp:

This time stamp value changes for each successive update of the device driver. For example, this time stamp reports the date and time when the device driver was last updated through Windows Update.

Ah, interesting. So it would appear that, based on the MS definitions for these values, we now have the information about when the device was first connected to the system available right there in the Registry. I'm not saying that we don't have to go anywhere else...rather, I'm suggesting that we have corroborating data that we can use to provide an increased relative confidence (a phrase that you usually see in my posts regarding timelines) in the data that we're analyzing.

Something that hasn't been addressed is that most of the publicly-available processes that are currently being used are not as complete as they could be. Wait...what? Well, this is where specificity of language within the DFIR community comes into play...it turns out that the processes are actually really very good, as long as all we're interested in is specifically USB thumb drives or external drives. However, there are devices that can be connected to Windows systems via USB and accessed as storage devices (digital cameras, iStuff, smartphone handsets), that do not necessarily become apparent to analysts using the commonly-accepted tools, processes and checklists. We can find these devices by looking beneath other Registry keys, as well as in other locations beyond the Registry, and by correlating information between them. This is particularly useful when counter-forensics techniques have been used (however unintentional...), as not everything may be completely gone, and we may be able to find some remnant (LNK file, shellbags, deleted Registry keys/values, Windows Event Log, etc.) that will point us to the use of such devices.

One of the pitfalls of interpretation of Registry data, as Ms. Fox pointed out in her dissertation, is that we often don't have current, up-to-date databases of all devices that could be connected to a Windows system, so we might see vendor ID (VID) and product ID (PID) values within key names beneath the Enum\USB key, but not know what they translate to...I've found Motorola devices, for instance, that required a good deal of searching in order to determine which smartphone handset was pointed to by the PID value. As such, no process is going to be 100%, push-a-button complete, but the point is that we will know that the data is there, we know to get it, and we know how to use it.

Full analysis of USB-accessible storage media can be extremely important to a number of exams, such as illicit image and IP theft cases. Many examiners used to think that sneaking a thumb drive into an infrastructure was a threat...and it still is; these devices get smaller and smaller every day, while their capacity increases. But we need to start thinking about other USB-accessible storage, such as smartphones and iDevices, not because they're easily hidden, but because they're so ubiquitous that we tend to not focus on them...we take them for granted.

A Mapping Technique
The EMDMgmt subkey (within the Software Registry hive) names include the serial number for the mounted volume (VSN), which is also included in the MS-SHLLLINK structure, which itself is found in Windows shortcut/LNK files, as well as Windows 7 and 8 Jump Lists. By correlating the VSNs from multiple sources, I was able to illustrate access to external storage devices in a manner that overcomes the shortcoming identified by Ms. Fox. What I've done is used code to parse through the LNK structures (LNK files in the Recent folder, for example, and the LNK streams within the Jump Lists) to list the VSNs, looking for the one (or two, or however many...) that point to the device identified in the EMDMgmt subkey name.

Tuesday, January 01, 2013

BinMode

I've recently been working on a script to parse the NTFS $UsnJrnl:$J file, also known as the USN Change Journal. Rather than blogging about the technical aspects of what this file is, or why a forensic analyst would want to parse it, I thought that this would be a great opportunity to instead talk about programming and parsing binary structures.

There are several things I like about being able to program, as an aspect of my DFIR work:
- It very often allows me to achieve something that I cannot achieve through the use of commercially available tools. Sometimes it allows me to "get there" faster, other times, it's the only way to "get there".
- I tend to break my work down into distinct, compartmentalized tasks, which lends itself well to programming (and vice versa).
- It gives me a challenge. I can focus my effort and concentration on solving a problem, one that I will likely see again and will already have an automated solution for solving when I see it.
- It allows me to see the data in its raw form, not filtered through an application written by a developer. This allows me to see data within the various structures (based on structure definitions from MS and others), and possibly find new ways to use that data.

One of the benefits of programming is that I have all of this code available, not just as complete applications but also stuff I've written to help me perform analysis. Stuff like translating time values (FILETIME objects, DOSDate time stamps, etc.), as well as a printData() function that takes binary data of an arbitrary length and translates it into a hex editor-style view, which makes it easy to print out sections of data and work with them directly. Being able to reuse this code (even if "code reuse" is simply a matter of copy-paste) means that I can achieve a pretty extensive depth of analysis in fairly short order, reducing the time it takes for me to collect, parse, and analyze data at a more comprehensive level than before. If I'm parsing some data, and use the printData() function to display the binary data in hex at the console, I may very well recognize a 64-bit time stamp at a regular offset, and then be able to add that to my parsing routine. That's kind of how I went about writing the shellbags.pl plugin for RegRipper.

I've also recently been looking at IE index.dat files in a hex editor, and writing my own parser based on the MSIE Cache File Format put together by Joachim Metz. So far, my initial parser works very well against the index.dat file in the TIF folder, as well as the one associated with the cookies. But what's really fascinating about this is what I'm seeing...each record has two FILETIME objects and up to three DOSDate (aka, FATTime) time stamps, in addition to other metadata. For any given entry, all of these fields may not be populated, but the fact is that I can view them...and verify them with a hex editor, if necessary.

As a side note regarding that code, I've found it very useful so far. I can run the code at the command line, and pipe the output through one or more "find" commands in order to locate or view specific entries. For example, the following command line gets the "Location : " fields for me, and then looks for specific entries; in this case, "apple":

C:\tools>parseie.pl index.dat | find "Location :" | find "apple" /i

Using the above command line, I'm able to narrow down the access to specific things, such as purchase of items via the Apple Store, etc.

I've also been working on a $UsnJrnl (actually, the $UsnJrnl:$J ADS file) parser, which itself has been fascinating. This work was partially based on something I've felt that I've needed to do for a while now, and talking to Corey Harrell about some of his recent findings has renewed my interest in this effort, particularly as it applies to malware detection.

Understanding binary structures can be very helpful. For example, consider the target.lnk file illustrated in this write-up of the Gauss malware. If you parse the information manually, using the MS specification...which should not be hard because there are only 0xC3 bytes visible...you'll see that the FILETIME time stamps for the target file are nonsense (Cheeky4n6Monkey got that, as well). As you parse the shell item ID list, based on the MS specification, you'll see that the first item is a System folder that points to "My Computer", and the second item is a Device entry whose GUID is "{21ec2020-3aea-1069-a2dd-08002b30309d}". When I looked this GUID up online, I found some interesting references to protecting or locking folders, such as this one at LIUtilities, and this one at GovernmentSecurity.org. I found this list of shell folder IDs, which might also be useful.

The final shell item, located at offset 0x84, is type 0x06, which isn't something that I've seen before. But there's nothing in the write-up that explains in detail how this LNK file might be used by the malware for persistence or propagation, so this was just an interesting exercise for me, as well as for Cheeky4n6Monkey, who also worked on parsing the target.lnk file manually. So, why even bother? Well, like I said, it's extremely beneficial to understand the format of various binary structures, but there's another reason. Have you read these posts over on the CyanLab blog? No? You should. I've seen shortcut/LNK files with no LinkInfo block, only the shell item ID list, that point to devices; as such, being able to parse and understand these...or even just recognize them...can be very beneficial if you're at all interested in determining USB storage devices that had been connected to a system. So far, most of these devices that I have seen have been digital cameras and smart phone handsets.

Everything
Okay, right about now, you're probably thinking, "so what?" Who cares, right? Well, this should be a very interesting, if not outright important issue for DFIR analysts...many of whom want to see everything when it comes to analysis. So the question then becomes...are you seeing everything? When you run your tool of choice, is it getting everything?

Folks like Chris Pogue talk a lot about analysis techniques like "sniper forensics", which is an extremely valuable means for performing data collection and analysis. However, let's take another look at the above question, from the perspective of sniper forensics...do you have the data you need? If you don't know what's there, how do you know?

If you don't know that Windows shortcut files include a shell item ID list, and what that data means, then how can you evaluate the use of a tool that parses LNK files? I'm using shell item ID lists as an example, simply because they're so very pervasive on Windows 7 systems...they're in shortcut files, Jump Lists, Registry value data. They're in a LOT of Registry value data. But the concept applies to other aspects of analysis, such as browser analysis. When you're performing browser analysis in order to determine user activity, are you just checking the history and cookies, or are you including Registry settings ("TypedURLs" key values for IE 5-9, and "TypedURLsTimes" key values on Windows 8), bookmarks, and session restore files? When performing USB device analysis on Windows systems, are you looking for all devices, or are you using checklists that only cover thumb drives and external hard drives?

I know that my previous paragraph covers a couple of different levels of granularity, but the point remains the same...are you getting everything that you need or want to perform your analysis? Does the tool you're using get all system and/or user activity, or does it get some of it?

Can we ever know it all?
One of the aspects of the DFIR community is that, for the most part, most of us seem to work in isolation. We work our cases and exams, and don't really bother too much with asking someone else, someone we know and trust, "hey, did I look at everything I could have here?" or "did I look at everything I needed to in order to address my analysis goals in a comprehensive manner?" For a variety of reasons, we don't tend to seek out peer review, even after cases are over and done.

But you know something...we can't know it all. No one of us is as smart or experienced as several or all of us working together. This can be close collaboration, face-to-face, or online collaboration through blogs, or sites such as the ForensicsWiki, which makes a great repository, if it's used.

Choices
Finally, a word about choices in programming languages to use. Some folks have a preference. I've been using Perl for a long time, since 1999. I learned BASIC in the '80s, as well as some Pascal, and then in the mid-'90s, I picked up some Java as part of my graduate studies. I know some folks prefer Python, and that's fine. Some folks within the community would like to believe that there are sharp divides between these two camps, that some who use one language detest the other, as well as those who use it. Nothing could be further from the truth. In fact, I would suggest that this attempt to create drama where there is none is simply a means of masking the fact that some analysts and examiners simply don't understand the technical aspects of the work that's actually being done.

Resources
Forensics from the Sausage Factory - USN Change Journal
Security BrainDump - Post regarding the USN Change Journal
OpenFoundry - Free tools; page includes link to a Python script for parsing the $UsnJrnl:$J file

Friday, December 28, 2012

Malware Detection

Corey recently posted to his blog regarding his exercise of infecting a system with ZeroAccess. In his post, Corey provides a great example of a very valuable malware artifact, as well as an investigative process, that can lead to locating malware that may be missed by more conventional means.

This post isn't meant to take anything away from Corey's exceptional work; rather, my intention is to show another perspective of the data, sort of like "The Secret PoliceMan's Other Ball". Corey's always done a fantastic job of performing research and presenting his findings, and it is not my intention to detract from his work at all. Instead, I would like to present another perspective, utilizing Corey's work and blog post as a basis, and as a stepping stone.

The ZA sample that Corey looked at was a bit different from what James Wyke of SophosLabs wrote about, but there were enough commonalities that some artifacts could be used to create an IOC or plugin for detecting the presence of this bit of malware, even if AV didn't detect it. Specifically, the file "services.exe" was infected, an EA attribute was added to the file record in the MFT, and a Registry modification occurred in order to create a persistence mechanism for the malware. Looking at these commonalities is similar to looking at the commonalities between various versions of the Conficker family, which created a randomly-named service for persistence.

From the Registry hives from Corey's test, I was able to create and test a RegRipper plugin that do a pretty good job of filtering through the Classes/CLSID subkey (from the Software hive) and locating anomalies. In it's original form, the MFT parser that I wrote finds the EA attribute, but doesn't specifically flag on it, and it can't extract the shell code and the malware PE file (because the data is non-resident). However, there were a couple of interesting things I got from parsing the MFT...

If you refer to Corey's post, take a look at the section regarding the MFT record for the infected services.exe file. If you look at the time stamps and compare those from the $STANDARD_INFORMATION attribute to those of the $FILE_NAME attribute that Corey posted, you'll see an excellent example of file system tunneling. I've talked about this in a number of my presentations, but here's a pretty cool to see an actual example of it. I know that this isn't really "outside the lab", per se, but still, it's pretty cool to this functionality as a result of a sample of malware, rather than a contrived exercise. Hopefully, this example will go a long way toward helping analysts understand what they're seeing in the time stamps.

Corey also illustrated an excellent use of timeline analysis to locate other files that were created or modified around the same time that the services.exe file was infected. What the timeline doesn't show clearly is that the time stamps were extracted from the $FILE_NAME attribute in the MFT...the $STANDARD_INFORMATION attributes for those same files indicate that there was some sort of time stamp manipulation ("timestomping") that occurred, as many of the files have M, A, and B times from 13 and 14 Jul 2009. However, the date in question that Corey looked at in his blog post was 6 Dec 2012 (the day of the test). Incorporating Prefetch file metadata and Registry key LastWrite times into a timeline would a pretty tight "grouping" of these artifacts at or "near" the same time.

Another interesting finding in analyzing the MFT is that the "new" services.exe file was MFT record number 42756 (see Corey's blog entry for the original file's record number). Looking "near" the MFT record number, there are a number of files and folders that are created (and "timestomped") prior to the new services.exe file record being created. Searching for some of the filenames and paths (such as C:\Windows\Temp\fwtsqmfile00.sqm), I find references to other variants of ZeroAccess. But what is very interesting about this is the relatively tight grouping of the file and folder creations, not based on time stamps or time stamp anomalies, but instead based on MFT record numbers.

Some take-aways from this...at least what I took away...are:

1. Timeline analysis is an extremely powerful analysis technique because it provides us with context, as well as an increased relative level of confidence in the data we're analyzing.

2. Timeline analysis can be even more powerful when it is not the sole analysis technique, but incorporated into an overall analysis plan. What about that Prefetch file for services.exe? A little bit of Prefetch file analysis would have produced some very interesting results, and using what was found through this analysis technique would have lead to other artifacts that should be examined in the timeline. Artifacts found outside of timeline analysis could be used as search terms or pivot points in a timeline, which would then provide context to those artifacts, which could then be incorporated back into other analyses.

3. Some folks have told me that having multiple tools for creating timelines makes creating timelines too complex a task; however, the tools I tend to create and use are multi-purpose. For example, I use pref.pl (I also have a 'compiled' EXE) for Prefetch file analysis, as well as parsing Prefetch file metadata into a timeline. I use RegRipper for parsing (and some modicum of analysis) of Registry hives, as well as to generate timeline data from a number of keys and value data. I find this to be extremely valuable...I can run a tool, find something interesting in a data set as a result of the analysis, and then run the tool again, against the same data set, but with a different set of switches, and populate my timeline. I don't need to switch GUIs and swap out dongles. Also, it's easy to remember the various tools and switches because (a) each tool is capable of displaying its syntax via '-h', and (b) I created a cheat sheet for the tool usage.

4. Far too often, a root cause analysis, or RCA, is not performed, for whatever reason. We're losing access to a great deal of data, and as a result, we're missing out on a great deal of intel. Intel such as, "hey, what this AV vendor wrote is good, but I tested a different sample and found this...". Perhaps the reason for not performing the RCA is that "it's too difficult", "it takes too long", or "it's not worth the effort". Well, consider my previous post, Mr. CEO...without an RCA, are you being served? What are you reporting to the board or to the SEC, and is it correct? Are you going with, "it's correct to the best of my knowledge", after you went to "Joe's Computer Forensics and Crabshack" to get the work done?

Now, to add to all of the above, take a look at this post from the Sploited blog, entitled Timeline Pivot Points with the Malware Domain List. This post provides an EXCELLENT example of how timeline analysis can be used to augment other forms of analysis, or vice versa. The post also illustrates how this sort of analysis can easily be automated. In fact, this can be part of the timeline creation mechanism....when any data source is parsed (i.e., browser history list, TypedUrls Registry key, shellbags, etc.) have any URLs extracted run in comparison to the MDL, and then generate a flag of some kind within the timeline events file, so that the flag "lives" with the event. That way, you can search for those events (based on the flag) after the timeline is created, or, as part of your analysis, create a timeline of only those events. This would be similar to scanning all files in the Temp and system32 folders, looking for PE files with odd headers or mismatched extensions, and then flagging them in the timeline, as well.

Great work to both Corey and Sploited for their posts!

Friday, December 21, 2012

Are You Being Served?

Larry Daniel recently posted to his Ex Forensis blog regarding a very interesting topic, regarding "The Perils of Using the Local Computer Shop for Computer Forensics". I've thought about this before...when I was on the ISS ERS (and later the IBM ISS ERS) team, on more than one occasion we'd arrive on-site to work with another team, or to take over after someone else had already done some work. In a couple of instances, I worked with other teams that, while technically skilled, were not full-time DFIR folks. Larry's post got me to thinking about who is being asked to perform DFIR work, and the overall effect that it has on the industry.

There's a question that I ask myself sometimes, particularly when working on exams...am I doing all I can to provide the best possible product to my customers? As best I can, I work closely with the customer to establish the goals of the exam, to determine parameters of what they are most interested in. I do this, because like most analysts, I can spend weeks finding all manner of "interesting" stuff, but my primary interest lies in locating artifacts that pertain to what the customer's interested in, so that I can provide them with what they need in order to make the decisions that they need to make. As much as I can, I try to find multiple artifacts to clearly support my findings, and I avoid blanket statements and speculation, as much as I can.

Also, something that I do after every exam is take a look at what I did and what I needed to do, and ask myself if there's a way I could do it better (faster, more comprehensive and complete, etc.) the next time.

Let's take a step away from DFIR work for a moment. Like many, I make use of other's services. I own a vehicle, which requires regular upkeep and preventative maintenance. Sometimes, if all I need is an oil change, I'll go to one of the commercial in-and-out places, because I've looked into the service that they provide, what it entails, and that's all I need at the moment. However, when it comes to other, perhaps more specialized maintenance...brake work, inspections recommended by the manufacturer, as well as inspections of a trailer I own...I'm going to go with someone I know and trust to do the work correctly. Another thing I like about working with folks like this is that we tend to develop a relationship where, if during the course of their work, they find something else that requires my attention, they'll let me know, inform me about the issue, and let me make the decision. After all, they're the experts.

Years ago...1992, in fact...I owned an Isuzu Rodeo. I'd take it to one of the drive-in places to get the oil changed on a Saturday morning. The first time I took it to one place, I got an extra charge on my bill for a 4-wheel drive vehicle. Hold on, I said! Why are you adding a charge for a 4-wheel drive vehicle, when the vehicle is clearly 2-wheel drive? The manager apologized, and gave me a discount on my next oil change. However, a couple of months later, I came back to the same shop with the same vehicle and went through the same thing all over again. Needless to say, had I relied on the "expertise" of the mechanics, I'd have paid more than I needed to, several times over. I never went back to that shop again, and from that point on, I made sure to check everything on the list of services performed before paying the bill.

Like many, I own a home, and there are a number of reasons for me to seek services...HVAC, as well as other specialists (particularly as a result of Super Storm Sandy). I tend to follow the same sort of path with my home that I do with my vehicles...small stuff that I can do myself, I do. Larger stuff that requires more specialized work, I want to bring in someone I know and trust. I'm a computer nerd...I'm not an expert in automobile design, nor am I an expert in home design and maintenance. I can write code to parse Registry data and shell items, but I am not an expert in building codes.

So, the question I have for you, reader, is this...how do you know that you're getting quality work? To Larry's point, who are you hiring to perform the work?

At the first SANS Forensic Summit, I was on a panel with a number of the big names in DFIR, several of whom are SANS instructors. One of the questions that was asked was, "what qualities do you look for in someone you're looking to hire to do DFIR work?" At the time, my response was simply, "what did they do last week?" My point was, are you going to hire someone to do DFIR work, if last week they'd done a PCI assessment and the week prior to that, they'd performed a pen test? Or would you be more likely to hire someone who does DFIR work all the time? I stand by that response, but would add other qualifications to it. For example, how "tied in" are the examiners? Do they simply rely on the training they received at the beginning of their careers, or do they continually progress in their knowledge and education? Do they seek professional improvement and continuing education? More importantly, do they use it? Maybe the big question is not so much that the examiners do these things, but do their managers require that the examiners do these things, and make them part of performance evaluations?

Are you being served?

Addendum: Why does any of this matter? So what? Well, something to consider is, what will a CEO be reporting to the board, as well as to the SEC? Will the report state, "nothing found", or worse, will the report be speculation of a "browser drive-by"? In my experience, most regulatory organizations want to know the root cause of an issue (such as a compromise or data leakage)...they don't want a laundry list of what the issue could have been.

In addition, consider the costs associated with PCI (or any other sensitive information) data theft; if an organization is compromised, and they hire the local computer repair shop to perform the "investigation", what happens when PCI data is discovered to be involved, or potentially involved? Well, you have to go pay for the investigation all over again, only this time it's after someone else has come in an "investigated", and this is going to have a potentially negative effect on the final report. I think plumbers have a special fee for helping folks who have already tried to "fix" something themselves. ;-)

Look at the services that you currently have in your business. Benefits management. Management of a retirement plan. Payroll. Do you go out every month and select the lowest bidder to provide these services? Why treat the information security posture of the your organization this way?

Saturday, December 15, 2012

There are FOUR lights!

Okay, you're probably wondering what Picard and one particular episode of Star Trek TNG has to do with forensicating. Well, to put it quite simply...everything!

I recently posted the question in a forum regarding Shellbag analysis, and asked who was actually performing it as part of their exams. One answer I got was, "...I need to start." When I asked this same question at the PFIC 2012 conference of a roomful of forensicators, two raised their hands...and one admitted that they hadn't done so since SANS training.

I've seen during exams where the shellbags contain artifacts of user activity that are not found anywhere else on the system. For example, I've seen the use Windows Explorer to perform FTP transfers (my publisher used to have me do this to transfer files), where artifacts of that activity were not found anywhere else on the system. When this information was added to a timeline, a significant portion of the exam sort of snapped into place, and became crystal clear.

Something I've seen with respect to USB devices that were connected to Windows systems is that our traditional methodologies for parsing this information out of a system are perhaps...incomplete. I have seen systems where some devices are not so much identified as USB storage devices by Windows systems (rather, they're identified as portable devices...iPods, digital cameras, etc.), and as such, starting by examining the USBStor subkeys means that we may miss some of these devices that could be used in intellectual property theft, as well as the creation and trafficking of illicit images. Yet, I have seen clear indications of a user's access to these devices within the shellbags artifacts, in part because of my familiarity with the actual data structures themselves.

The creation and use of these artifacts by the Windows operating system goes well beyond just the shellbags, as these artifacts are comprised of data structures known as "shell items", which can themselves be chained together into "shell item ID lists". Rather than providing a path that consists of several ASCII strings that identify resources such as files and directories, a shell item ID list builds a path to a resource using these data structures, which some in the community have worked very hard to decipher. What this work has demonstrated is that there is a great deal more information available than most analysts are aware.

So why is understanding shell items and shell item ID lists important? Most of the available tools for parsing shellbags, for example, simply show the analyst the path to the resource, but never identify the data structure in question...they simply provide the ASCII representation to the analyst. These structures are used in the ComDlg32 subkey values in the NTUSER.DAT hive on Windows Vista and above systems, as well as in the IE Start Menu and Favorites artifacts within the Registry. An interesting quote from the post:

Of importance to the forensic investigator is the fact that, in many cases, these subkeys and their respective Order values retain references to Start Menu and Favorites items after the related applications or favorites have been uninstalled or deleted.

I added the emphasis to the second half of the quote, because it's important. Much like other artifacts that are available, references to files, folders, network resources and even applications are retained long after they've been uninstalled or deleted. So understanding shell items are foundational to understanding larger artifacts.

But doesn't stop with the Registry...shell item ID lists are part of Windows shortcut (LNK) files, which means that they're also part of the Jump Lists found on Windows 7 and 8.

Okay, but so what, right? Well, the SpiderLabs folks posted a very interesting use of LNK files to gather credentials during a pen test; have any forensic analysts out there seen the use of this technique before? Perhaps more importantly, have you looked for it? Would you know how to look for this during an exam?

Here's a really good post that goes into some detail regarding how LNK files can be manipulated with malicious intent, demonstrating how important it is to parse the shell item ID lists.

So, the point of the graphic, as well as of the post overall, is this...if you're NOT parsing shellbags as part of your exam, and if you're NOT parsing through shortcut files as part of your root cause analysis (RCA), then you're only seeing three lights.

There are, in fact, four lights.

Resources
DOSDate Time Stamps in Shell Items
ShellBag Analysis, Revisited...Some Testing

Thursday, November 29, 2012

Forensic Scanner has moved

In order to be in line with other projects available through my employer, the Forensic Scanner has moved from Google Code to GitHub. When you get to the page, simply click the "Zip" button and the project will download as a Zip archive.

There has been no change to the Scanner itself.

Also, note that the license has changed to the Perl Artistic License.

Monday, November 26, 2012

The Next Big Thing

First off, this is not an end-of-year summary of 2012, nor where I'm going to lay out my predictions for 2013...because that's not really my thing. What I'm more interested in addressing is, what is "The Next Big Thing" in DFIR? Rather than making a prediction, I'm going to suggest where, IMHO, we should be going within our community/industry.

There is, of course, the CDFS, which provides leadership and advocacy for the DFIR profession. If you want to be involved in a guiding force behind the direction of our profession, and driving The Next Big Thing, consider becoming involved through this group.

So what should be The Next Big Thing in DFIR? In the time I've been in and around this profession, one thing I have seen is that there is still a great deal effort directed to providing a layer of abstraction to analysts in order to represent the data. Commercial tools provide frameworks for looking at the available (acquired) data, as do collections of free tools. Some tools or frameworks provide different capabilities, such as allowing the analyst to easily conduct keyword searches, or providing default viewers or parsers for some file types. However, what most tools do not provide is an easy means for analysts to describe the valuable artifacts that they've found, nor an easy means to communicate intelligence gathered through examination and research to other analysts.

Some of what I see happening includes analysts go to training and/or a conference, and hearing "experts" (don't get me wrong, many speakers are, in fact, experts in their field...) speak, and then return to their desks with...what? Not long ago, I was giving a presentation and the subject of analysis of shellbag artifacts came up. I asked how many of the analysts in the room did shellbag analysis and two raised their hands. One of them stated that they had analyzed shellbag artifacts when they attended a SANS training course, but they hadn't done so since. I then asked how many folks in the room conducted analysis where what the user did on the system was of primary interest in most of their exams, and almost everyone in the room raised their hands. The only way I can explain the disparity between the two responses is that the tools used by most analysts provide a layer of abstraction to the data (acquired images) that they're viewing, and leave the identification of valuable (or even critical) artifacts and the overall analysis process up to the analyst. A number of training courses provide information regarding analysis processes, but once analysts return from these courses, I'm not sure that there's a great deal of stimulus for them to incorporate what they just learned into what they do. As such, I tend to believe that there's a great deal of extremely valuable intelligence either missed or lost within our community.

I'm beginning to believe more and more that tools that simply provide a layer of abstraction to the data viewed by analysts are becoming a thing of the past. Or, maybe it's more accurate to say that they should become a thing of the past. The analysis process needs to be facilitated more, and the sharing of information and intelligence between both the tools used, as well as the analysts using them, needs to become more part of our daily workflow.

Part of this belief may be because many of the tools available don't necessarily provide an easy means for analysts to share that process and intelligence. What do I mean by that? Take a look at some of the tools used by analysts today, and consider why those tools are used. Now, think to yourself for a moment...how easy is it for one analyst using that tool to share any intelligence that they've found with another (any other) analyst? Let's say that one analyst finds something of value during an exam, and it would behoove the entire team to have access to that artifact or intelligence. Using the tool or framework available, how does the analyst then share the analysis or investigative processed used, as well as the artifact found or intelligence gleaned? Does the framework being used provide a suitable means for doing so?

Analysts aren't sharing intelligence for two reasons...they don't know how to describe it, and even if they do, there's no easy means for doing so within the framework that they're using. They can't easily share information and intelligence between the tools they're using, nor with other analysts, even those using the same tools.

For a great example of what I'm referring to, take a look at Volatility. This started out as a project that was delivering something not available via any other means, and the folks that make up the team continue to do just that. The framework provides much more than just a layer of abstraction that allows analysts to dig into a memory dump or hibernation file...the team also provides plugins that serve to illustrate not just what's possible to retrieve from a memory dump, but also what they've found valuable, and how others can find these artifacts via a repeatable process. Another excellent resource is MHL et al's book, The Malware Analyst's Cookbook, which provides a great deal of process information via the format, as well as intel via the various 'recipes'.

I kind of look at it this way...when I was in high school, we read Chaucer's Canterbury Tales, and each year the books were passed down from the previous year. If you were lucky, you'd get a copy with some of the humorous or ribald sections highlighted...but what wasn't passed down was the understanding of what was leading us to read these passages in the first place. Sure, there's a lot of neat and interesting stuff that analysts see on a regular basis, but what we aren't good at is sharing the really valuable stuff and the intel with other analysts. If that's something that would be of use...one analyst being aware of what another analyst found...then as consumers we need to engage tool and process developers directly and consistently, let them know what our needs are, and start intelligently using those processes and tools that meet our needs.

Wednesday, November 21, 2012

Updates

Timeline Analysis
I recently taught another iteration of our Timeline Analysis course, and as is very often the case, I learned somethings as a result.

First, the idea (in my case, thanks goes to Corey Harrell and Brett Shavers) of adding categories to timelines in order to increase the value of the timeline, as well as to bring a new level of efficiency to the analysis, is a very good one. I'll discuss categories a bit more later in this post.

Second (and thanks goes out to Cory Altheide for this one), I'm reminded that timeline analysis provides the examiner with context to the events being observed, as well as a relative confidence in the data. We get context because we see more than just a file being modified...we see other events around that event that provide indications as to what led to the file being modified. Also, we know that some data is easily mutable, so seeing other events that are perhaps less mutable occurring "near" the event in question gives us confidence that the data we're looking at is, in fact, accurate.

Another thing to consider is that timelines help us reduce complexity in our analysis. If we understand the nature of the artifacts and events that we observe in a timeline, and understand what creates or modifies those artifacts, we begin to see what is important in the timeline itself. There is no magic formula for creating timelines...we may have too little data in a timeline (i.e., just a file being modified) or we may have too much data. Knowing what various artifacts mean or indicate allows us to separate the wheat from the chaff, or separate what is important from the background noise on systems.

Categories
Adding category information to timelines can do a great deal to make analysis ssssooooo much easier! For example, when adding Prefetch file metadata to a timeline, identifying the time stamps as being related to "Program Execution" can do a great deal to make analysis easier, particularly when it's included along with other data that is in the same category. Also, as of Vista (and particularly so with Windows 7 and 2008 R2), there have been an increase in the number of event logs, and many of the event IDs that we're familiar with from Windows XP have changed. As such, being able to identify the category of an event source/ID pair, via a short descriptor, makes analysis quicker and easier.

One thing that is very evident to me is that many artifacts will have a primary, as well as a secondary (or even tertiary) category. For example, let's take a look at shortcut/LNK files. Shortcuts found in a user's Recents folder are created via a specific activity performed by the user...most often, by the user double-clicking a file via the shell. As such, the primary category that a shortcut file will belong to is something akin to "File Access", as the user actually accessed the file. While it may be difficult to keep the context of how the artifact is created/modified in your mind while scrolling through thousands of lines of data, it is oh so much easier to simply provide the category right there along with the data.

Now, take a look at what happens when a user double-clicks a file...that file is opened in a particular application, correct? As such, a secondary category for shortcut files (found in the user's Recents folder) might be "Program Execution". Now, the issue with this is that we would need to do some file association analysis to determine which application was used to open the file...we can't always assume that files ending in the ".txt" extension are going to be opened via Notepad. File association analysis is pretty easy to do, so it's well worth doing it.

Not all artifacts are created alike, even if they have the same file extension...that is to say, some artifacts may have to have categories based on their context or location. Consider shortcut files on the user's desktop...many times, these are either specifically created by the user, or are placed there as the result of the user installing an application. For those desktop shortcuts that point to applications, they do not so much refer to "File Access", as they do to "Application Installation", or something similar. After all, when applications are installed and create a shortcut on the desktop, that shortcut very often contains the command line "app.exe %1", and doesn't point to a .docx or .txt file that the user accessed or opened.

Adding categories to your timeline can bring a great deal of power to your fingertips, in addition to reducing the complexity and difficulty of finding the needle(s) in the hay stack...or stack of needles, as the case may be. However, this addition to timeline analysis is even more powerful when it's done with some thought and consideration given to the actual artifacts themselves. Our example of LNK files clearly shows that we cannot simply group all LNK files in one category. The power and flexibility to include categories for artifacts based on any number of conditions is provided in the Forensic Scanner.

RegRipper
Sorry that I didn't come up with a witty title for this section of the post, but I wanted to include something here. I caught up to SketchyMoose's blog recently and found this post that included a mention of RegRipper.

In the post, SM mentions a plugin named 'findexes.pl'. This is an interesting plugin that I created as a result of something Don Weber found during an exam when we were on the ...that the bad guy was hiding PE files (or portions thereof) in Registry values! That was pretty cool, so I wrote a plugin. See how that works? Don found it, shared the information, and then a plugin was created that could be run during other exams.

SM correctly states that the plugin is looking for "MZ" in the binary data, and says that it's looking for it at the beginning of the value. I know it says that in the comments at the top of the plugin file, but if you look at the code itself, you'll see that it runs a grep(), looking for 'MZ' anywhere in the data. As you can see from the blog post, the plugin not only lists the path to the value, but also the length of the binary data being examined...it's not likely that you're going to find executable code in 32 bytes of data, so it's good visual check for deciding which values you want to zero in on.

SM goes on to point out the results of the userinit.pl plugin...which is very interesting. Notice that in the output of that plugin, there's a little note that indicates what 'normal' should look like...this is a question I get a lot when I give presentations on Registry or Timeline Analysis...what is 'normal', or what about what I'm looking at jumps out at me as 'suspicious'. With this plugin, I've provided a little note that tells the analyst, hey, anything other than just "userinit.exe" is gonna be suspicious!

USB Stuff
SM also references a Hak5 episode by Chris Gerling, Jr, that discusses mapping USB storage devices found on Windows systems. I thought I'd reference that here, in order to say, "...there are more things in heaven and earth than are dreamt of in your philosophy, Horatio!" Okay, so what does quoting the Bard have to do with anything? In her discussion of her dissertation entitled, Pitfalls of Interpreting Forensic Artifacts in the Registry, Jacky Fox follows a similar process for identifying USB storage devices connected to a Windows system. However, the currently accepted process for doing this USB device identification has some...shortcomings...that I'll be addressing. Strictly speaking, the process works, and works very well. In fact, if you follow all the steps, you'll even be able to identify indications of USB thumb drives that the user may have tried to obfuscate or delete. However, this process does not identify all of the devices that are presented to the user as storage.

Please don't misunderstand me here...I'm not saying that either Chris or Jacky are wrong in the process that they use to identify USB storage devices. Again, they both refer to using regularly accepted examination processes. Chris refers to Windows Forensic Analysis 2/e, and Jacky has a lot of glowing and positive things to say about RegRipper in her dissertation (yes, I did read it...the whole thing...because that's how I roll!), and some of those resources are based on information that Rob Lee has developed and shared through SANS. However, as time and research have progressed, new artifacts have been identified and need to be incorporated into our analysis processes.

Propagation
I ran across this listing for Win32/Phorpiex on the MS MMPC blog, and it included something pretty interesting. This malware includes a propagation mechanism when using removable storage devices.

While this propagation mechanism seems pretty interesting, it's not nearly as interesting as it could be, because (as pointed out in the write up) when the user clicks on the shortcut for what they think is a folder, they don't actually see the folder opening. As such, someone might look for an update to this propagation mechanism in the near future, if one isn't already in the wild.

What's interesting to me is that there's no effort taken to look at the binary contents of the shortcut/LNK files to determine if there's anything odd or misleading about them. For example, most of the currently used tools only parse the LinkInfo block of the LNK file...not all tools parse the shell item ID list that comes before the LinkInfo block. MS has done a great job of documenting the binary specification for LNK files, but commercial tools haven't caught up.

In order to see where/how this is an issue, take a look at this CyanLab blog post.

Malware Infection Vectors
This blog post recently turned up on MMPC...I think that it's great because it illustrates how systems can be infected via drive-bys that exploit Java vulnerabilities. However, I also think that blog posts like this aren't finishing the race, as it were...they're start, get most of the way down the track, and then stop...they stop before they show what this exploit looks like on a system. Getting and sharing this information would serve two purposes...collect intelligence that they (MS) and others could use, and help get everyone else closer to conducting root cause analyses after an incident. I think that the primary reason that RCAs aren't being conducted is that most folks think that it takes too long or is too difficult. I'll admit...the further away from the actual incident that you detect a compromised or infected system, the harder it can be to determine the root cause or infection vector. However, understanding the root cause of an incident, and incorporating it back into your security processes, can go a long way toward helping you allocate resources toward protecting your assets, systems, and infrastructure.

If you want to see what this stuff might look like on a system, check out Corey's jIIr blog posts that are labeled "exploits". Corey does a great job of exploiting systems and illustrating what that looks like on a system.

Wednesday, November 07, 2012

PFIC2012 slides

Several folks at PFIC 2012 asked that I make my slides from the Windows 7 Forensic Analysis and Timeline Analysis presentations available...so here they are.

I'll have to admit, I've become somewhat hesitant to post slides, not because I don't want to share the info, but because posting the slides from my presentations doesn't share the info...most of the information that is shared during a presentation isn't covered in the slides.

Wednesday, October 31, 2012

Shellbag Analysis, Revisited...Some Testing

I blogged previously on the topic of Shellbag Analysis, but I've found that in presenting on the topic and talking to others, there may be some misunderstanding of how these Registry artifacts may be helpful to an analyst. With Jamie's recent post on the Shellbags plugin for Volatility, I thought it would be a good idea to revisit this information, as sometimes repeated exposure is the best way to start developing an understanding of something. In addition, I wanted to do some testing in order to determine the nature of some of the metadata associated with shellbags.

In her post, Jamie states that the term "Shellbags" is commonly used within the community to indicate artifacts of user window preferences specific to Windows Explorer. MS KB 813711 indicates that the artifacts are created when a user repositions or resizes an Explorer windows.

ShellItem Metadata
As Jamie illustrates in her blog post, many of the structures that make up the SHELLITEMS (within the Shellbags) contain embedded time stamps, in DOSDate format. However, there's still some question as to what those values mean (even though the available documentation refers to them as MAC times for the resource in question) and how an analyst may make use of them during an examination.

Having some time available recently due to inclement weather, I thought I would conduct a couple of very simple tests in to begin to address these questions.

Testing Methodology
On a Windows 7 system, I performed a number of consecutive, atomic actions and recorded the system time (visible via the system clock) for when each action was performed. The following table lists the actions I took, and the time (in local time format) at which each action occurred.

Action	Time
Create a dir: mkdir d:\shellbag	12:54pm
Create a file in the dir: echo "..." > d:\shellbag\test.txt	1:03pm
Create another dir: mkdir d:\shellbag\test	1:08pm
Create a file in the new dir: echo "..." > d:\shellbag\test\test.txt	1:16pm
Delete a file: del d:\shellbag\test.txt	1:24pm
Open D:\shellbag\test via Explorer, reposition/resize the window	1:29pm
Close the Explorer window opened in the previous step	1:38pm

The purpose of having some time pass between actions is so that they can be clearly differentiated in a timeline.

Once these steps were completed, I restarted the system, and once it came back up, I extracted the USRCLASS.DAT hive from the relevant user account into the D:\shellbag directory for analysis (at 1:42pm). I purposely chose this directory in order to determine how actions external to the shellbags artifacts affect the overall data seen.

Results
The following table lists the output from the shellbags.pl RegRipper plugin for the directories in question (all times are in UTC format):

Directory	MRU Time	Modified	Accessed	Created
Desktop\My Computer\D:\shellbag	2012-10-29 17:29:25	2012-10-29 17:24:26	2012-10-29 17:24:26	2012-10-29 16:55:00
Desktop\My Computer\D:\shellbag\test	2012-10-29 17:29:29	2012-10-29 17:16:20	2012-10-29 17:16:20	2012-10-29 17:08:18

Let's walk through these results. First, I should remind you that that MRU Time is populated from Registry key LastWrite times (FILETIME format, granularity of 100 ns) while the MAC times are embedded within the various shell items (used to reconstruct the paths) in DOSDate time format (granularity of 2 seconds).

First, we can see that the Created dates for both folders correspond approximately to when the folders were actually created. We can also see that the same thing is true for the Modified dates. Going back to the live system and typing "dir /tw d:\shell*" shows me that the last modification time for the directory is 1:42pm (local time), which corresponds to changes made to that directory after the USRCLASS.DAT hive file was extracted.

Next, we see that MRU Time values correspond approximately to when the D:\shellbag\test folder was opened and then resized/repositioned via the Explorer shell, and not to when the Explorer window was actually closed.

Based on this limited test, it would appear that the DOSDate time stamps embedded in the shell items for the folders correspond to the MAC times of that folder, within the file system, at the time that the shell items were created. In order to test this, I deleted the d:\shellbag\test\test.txt file at 2:14pm, local time, and then extracted a copy of the USRCLASS.DAT and parsed it the same way I had before...and saw no changes in the Modified times listed in the previous table.

In order to test this just a bit further, I opened Windows Explorer, navigated to the D:\shellbag folder, and repositioned/resized the window at 2:21pm (local time), waited 2 minutes, and closed the window. I extracted and parsed the USRCLASS.DAT hive again, and this time, the MRU Time for the D:\shellbag folder had changed to 18:21:48 (UTC format). Interestingly, that was the only time that had changed...the Modified time for the D:\shellbag\test folder remained the same, even though I had deleted the test.txt file from that directory at 2:14pm local time ("dir /tw d:\shellbag\te*" shows me that the last written time for that folder is, indeed, 2:14pm).

Summary
Further testing is clearly required; however, it would appear that based on this initial test, we can draw the following conclusions with respect to the shellbag artifacts on Windows 7:

1. The embedded DOSDate time stamps appear to correspond to the MAC times of the resource/folder at the time that the shell item was created. If the particular resource/folder was no longer present within the active file system, an analyst could use the Created date for that resource in a timeline.

2. Further testing needs to be performed in order to determine the relative value of the Modified date, particularly given that events external to the Windows Explorer shell (i.e., creating/deleting files and subfolders after the shell items have been created) may have limited effect on the embedded dates.

3. The MRU Time appears to correspond to when the folder was resized or repositioned. Analysts should keep in mind that (a) there are a number of ways to access a folder that do not require the user to reposition or resize the window, and (b) the MRU Time is a Registry key LastWrite time that only applies to one folder within the key...the Most Recently Used folder, or the one listed first in the MRUListEx value.

I hope that folks find this information useful. I also hope that others out there will look at this information, validate it through their own testing, and even use it as a starting point for their own research.