Friday, October 29, 2010

Analysis Techniques

Now and again, it's a good idea to revisit old thoughts and ideas, to dredge them up again and expose them to the light of day. One of the things that I have spent a lot of time thinking about is the analysis techniques that I'm using during examinations, and in particular taking a step forward and seeing how those techniques stack up against some of the threats and issues that are being seen.

First, a caveat...a lot of what I think (and sometimes talk) about focuses on disk analysis. This is due to the realities of the work I (and others) do. Was a responder, many times, all I have access to is a disk image. Or, as is often the case now, I focus on disk analysis because I'm part of a team that includes folks who are way smarter and far more capable than I am in network, memory and malware analysis. So the disk stuff is a piece of the puzzle, and should be treated as such, even if it's the only piece you have.

So, there are a couple of general techniques I tend to use in analysis, and I often start with timeline analysis. This is a great analysis technique to use due to the fact that when you build a timeline from multiple data sources on and from within a system, you give yourself two things that you don't normally have through more traditional analysis techniques...context, and a greater relative level of confidence in your data. By context, we aren't simply seeing an event such as a file being created or changed...we're seeing other surrounding events that can (and often do) indicate what lead to that event occurring. The multiple data sources included in a timeline provide a great deal of information; Event Logs may show us a user login, Registry contents may provide additional indications of that login, we may see indicators of web browsing activity, perhaps opening an email attachment, etc...all of which may provide us with the context of the event in which we are most interested.

As to the overall relative level of confidence in our data, we have to understand that all data sources have a relative level of confidence associated with each of them. For example, from Chris's post, we know that the relative confidence level of the time stamps within the $STANDARD_INFORMATION attributes within the MFT (and file system) is (or should be) low. That's because these values are fairly easily changed, often through "time stomping", so that the MACB times (particularly the "B" time, or creation date of the file) do not fall within the initial timeframe of the incident. However, the time stamps within the $FILE_NAME attributes can provide us with a greater level of confidence in the data source (MFT, in this case). By adding other data sources (Event Log, Registry, Prefetch file metadata, etc.), particularly data source whose time stamps are not so easily modified (such as Registry key LastWrite times), we can elevate our relative confidence level in the data.

Another aspect of this is that by adding multiple sources, we will begin to see patterns in the data and also begin to see where there are gaps in that data.

This is particularly important as intrusions and malware are very often the least frequency of occurrence on a system. Credit for this phrase goes to Pete Silberman of Mandiant, and it's an extremely important concept to understand, particularly when it comes to timeline analysis. In short, many times analysts will look for large tool kits or spikes in the event volume as an indication of compromise. However, most often, this simply is not the case...spikes in activity in a timeline will correspond to an operating system or application update, Restore Point being created, etc. So, in short, intrusions and even malware have taken a turn toward minimalization on systems, so looking for spikes in activity likely won't get you anywhere. This is not to say that if your FTP server is turned into a warez server, you won't see a spike in activity or event volume...rather, the overall effects of an incident are most likely minimized. A user clicks a link or an application is exploited, something is uploaded to the system, and data gets exfiltrated at some point...disk forensics artifacts are minimized, particularly if the data is exfiltrated without ever writing it to disk.

Timeline Creation
When creating a timeline, I tend to take what most analysts consider a very manual process, in that I do not use tools that simply sweep through an image and collect all possible timeline data. However, there is a method to my madness, which can be seen in part in Chris's Sniper Forensics presentation. I tend to take a targeted approach, adding the information that is necessary to complete the picture. For example, when analyzing a system that had been compromised via SQL injection, I included the file system metadata and only the web server logs that contained the SQL injection attack information. There was no need to include user information (Registry, index.dat, etc.); in fact, doing so would have added considerable noise to the timeline, and the extra data would have required significantly more effort to analyze and parse through in order to find what I was looking for.

Many times when creating a timeline, I don't want to see everything all at once. When examining a Windows system, there are so many possible data sources that filling in the timeline with the appropriate sources in an iterative manner is...for me...more productive and efficient that loading everything up into the timeline all at once, and parsing things out from there. If a system had a dozen user profiles, but only one or two is of interest, I'm not about to populate the timeline with the LastWrite times of all the keys in the other user's NTUSER.DAT hives. Also, when you get to more recent versions of Windows, specific keys in the USRCLASS.DAT hive become more important, and I don't want to add the LastWrite time of all of those keys when I'm more interested in some specific values from specific keys.

Part of the iterative approach in developing the timeline is looking at the data you have available and seeing gaps. Some gaps may be due to the fact that the data no longer exists...file last access/modified times may be updated to more recent times, Event Logs may "roll over", etc. Looking at gaps has a lot to do with your analysis goals and the questions you're trying to answer, and knowing what you would need to fill that gap. Many times, we may not have access to a direct artifact (such as an event record with ID 528, indicating a login...), but we may be able to fill that gap (at least partially) with indirect artifacts (i.e., UserAssist key entries, etc.).

Feedback Loop
Timeline analysis provides a view of a system that, in most cases, an analyst wouldn't get any other way. This leads to discovering things that wouldn't otherwise be found, due in part to the view enabled via context. This context can also lead to finding those much quicker. Once you've done this, its critical for future exams that you take what you have learned and roll it back into your analysis process. After all, what good would it be for anyone for you to simply let that valuable institutional knowledge disappear? Retaining that institutional knowledge allows you to fine-tune and hone your analysis process.

Consider folks that work on teams...say, there's a team of 12 analysts. One analyst finds something new after 16 hrs of analysis. If every analyst were to take 16 hrs to find the same thing, then by not sharing what you found, your team consumes an additional 176 (16 x 11) hrs. That's not terribly efficient, particularly when it could be obviated by a 30 minute phone call. However, if you share your findings with the rest of the team, they will know to look for this item. If you share it with them via a framework such as a forensic scanner (see the Projects section of this post), similar to RegRipper (for example), then looking for that same set of artifacts is now no longer something they have to memorize and remember, as it takes just seconds to check for them via the scanner. All that's really needed is that your image intake process is modified slightly; when the image comes in, make your working copy, verify the file system of the copy, and then run the forensic scanner. Using the results, you can determine whether or not things that you've seen already were found, removing the need to remember all of the different artifacts and freeing your analysts up some in-depth analysis.

So What?
So why does all this matter? Why is it important? The short answer is that things aren't going to get easier, and if we (analysts) don't take steps to improve what we do, we're going to quickly fall further behind. We need to find and use innovative analysis techniques that allow us to look at systems and the available data in different ways, adding context and increasing our relative level of confidence in the data, particularly as some data becomes less and less reliable. We also need to consider and explore previously un- or under-utilized data sources, such as the Registry.

Consider malware detection...some folks may think that you can't use RegRipper or tools like it for something like finding malware, but I would suggest that that simply isn't the case. Consider Conficker...while the different variants escaped detection via AV scanners, there were consistent Registry artifacts across the entire family. And it's not just me...folks like MHL (and his co-authors) have written RegRipper plugins (if you deal with malware at all, you MUST get a copy of their book...) to assist them in their analysis.

The same thing can be said of intrusions...understanding Windows from a system perspective, and understanding how different artifacts are created or modified can show us what we need to focus on, and what data sources we need to engage for both direct and indirect artifacts. But these techniques are not limited to just malware and intrusions...they can be used similarly used to analyze other incidents, as well.

Monday, October 18, 2010


CyberSpeak is Back!
Ovie returns, sans Bret, to talk about browser forensics, and more...check it out!


ESET's paper, Stuxnet Under the Microscope, has so far been an excellent read. It's 72 pages, but there are a lot of graphics. ;-) Take a look...there's some good info there.

Addendum: Check out Symantec's Stuxnet dossier, as well...

XP Restore Points
A bit ago, I'd reached to some friends at Microsoft regarding the structure of the drivetable.txt file in XP Restore Points. I received a response recently, and I'm posting it here with a big thanks to Jim for providing it and for the permission to post it:

C:\/\\?\Volume{181b2120-e4ac-11de-a517-806d6172696f}\ 3b 0 15604 / <>

If you see the flags field with a nonzero value, this is what they mean…










The reason I'd asked about this was that I'd seen the question posted by LE and hadn't seen a response. Again, thanks to Jim for sussing this one out...

Speaking of Exploits...
Both Brian Krebs and the MMPC are reporting an increase in exploits to Java (not JavaScript). This is easy for both of them to report, because the solution is simply to update your Java installation. However, what's not being mentioned anywhere is what that looks like on a system. Should forensic analysts be looking for .jar files in the browser cache? Following one of the vulnerability links from Brian's post takes us to a CVE entry that starts with:

Unspecified vulnerability in the Java Runtime Environment component in Oracle Java SE and Java for Business 6 Update 18 and 5.0 Update 23 allows remote attackers to affect...

Unspecified? So how do I know if I've been affected by it? It's relatively easy for me to check to see if I'm vulnerable...I just check my version of Java. But how would I know if I've already been affected by this? More importantly, how does an analyst examining a system figure out if this is the initial infection vector?

This isn't just an idle question I'm asking. This potentially affects victims subject to PCI investigations, and a wide range of other compliance issues.

Friday, October 15, 2010

Back from WACCI

So I got up this morning to find myself back from the WACCI conference, which is still going on, with today being the final day.

First, I want to thank Cindy and Drew for doing such a great job setting up and organizing this conference, which is geared toward computer crime investigators. The conference venue was spacious and very close to the hotel, and well organized. As with many conferences, there were often two (or more) interesting talks scheduled at the same time.

I also want to thank those who attended for...attending. One of the things I really like about conferences is the networking opportunity. I got to see some friends, like Ovie Carroll, Ken Pryor, and Mark McKinnon. I also got to make some new friends and meet some great people...Sam Brothers, Brad Garnett, Mark Lachinet, and Fergus Toolan, to name a very few.

WACCI brought together private industry analysts, LE, and academics, all in one place. This is a smallish, regional conference, but these kinds of conferences are important, as they foster relationships that lead to sharing...which was the central point of my keynote, and as it turns out, an important element of Ovie's, as well.

On Tuesday, we started off with a welcome message to the conference from the Dane County sheriff, had lunch, and then kicked off into Ovie's keynote. Having been an instructor at TBS back in the '90s, I am very familiar with presentations given right after lunch, during "the death hour". Ovie is a very dynamic speaker, and his presentation was very engaging, not only with the movement and transitions on the slides, but more importantly, with the content. He made some excellent points throughout the presentation, particularly with respect to sharing information and intel amongst LE.

For my keynote...which was the "zombie hour", as I followed both lunch and Ovie (I KID!!!)...I opted to go commando. That's slides. What I tried to engage the audience in was a discussion regarding the need not just for sharing, but simply communications between LE and the private sector. Folks in the private sector, such as myself, Chris Pogue, etc., tend to see a lot of different things and run across (and very often solve) some of the same challenges met by LE. As such, there is really no need for LE to spend the time re-solving technical problems that have already been addressed by others. Reaching out can tend to decrease the amount of time needed to complete a case while increasing the volume/quality of information/data retrieved. The "Trojan Defense" comes to mind. Remember, those of us in the private sector don't just deal with intrusions and compromises, we address issues and solve just happens that the folks who call us have intrusion issues. Many of us are more than willing to assist local LE with issues where we can, which is something Chris and Maj Carole Newell talked about at the SANS Forensic Summit this past summer.

I didn't expect to solve any problems with this discussion. Instead, I wanted to engage others in a discussion about how we could engage in more sharing and communication between the sectors. For me personally, success is measured in having one member of LE who hadn't reached out before overcome those often self-imposed obstacles and share, either by asking a question or contributing something (white paper, finding, etc.).

Perhaps the most significant thing to come out of the discussion is the need for a central repository of some kind where folks (LE in particular) can go for credible information, and even provide information in a manner that protects anonymity (although there was discussion that suggested that anonymity is inversely proportional to credibility) and privacy of the poster. The MITRE CVE was suggested as a model. One of the issues I've heard time and again is that the information is out there, and the problem is that it's out do you find it, and how do you find the credible information? In most cases, I agree with that, but I also think that sometimes the issue is a lack of knowledge, or perhaps something else. During the conference, several questions came up (mostly during the more social parts of the conference) where someone would say, "I've been looking all over for information on this...", and within seconds, someone else would get on their computer or smartphone, call up Google, and find highly relevant links. But that's the first step...the sharing, the asking.

As a side note to this, I've had an opportunity to start discussing something like this, a forensic artifact repository, with Troy Larson of Microsoft. However, if you haven't seen it yet, be sure to check out something like this is a start, and what's needed are forums...some open, some vetted...for discussions.

Ken Pryor has already had his comments about the first day of WACCI posted on the SANS Forensic Blog. Be sure to check these and other comments out for varying views on the conference and presentations.

I'll admit that I only attended one presentation, as I spent a lot of time engaging and networking with folks at the conference, including the vendors (Cellebrite, in particular). In particular, one of the most vocal attendees during my keynote (besides Cindy!!) was Sam Brothers. Sam is not only well-published in mobile forensics, but he's also an amazing magician, regaling us with some truly incredible card tricks! Listening and talking to Sam, I know who I'm going to go to if when I encounter any forensic issues with mobile devices!

In closing, I just want to thank Cindy and Drew again for putting together such a great conference, and Cindy in particular for inviting me (and the rest of us) to her home for a great social event. I also want to thank Mark McKinnon for sharing such wonderful gifts with us...Bells Two-Hearted Ale (to which Sam whispered, "this is awesome!!"), steak brats, and Michigan cherry gourmet coffee. A great big "THANK YOU" to everyone who attended and made this conference a great success!