Monday, August 22, 2011

More Updates

Scanning Hosts
There's a great post over on the SANS ISC blog regarding how to find unwanted files on workstations...if you're a sysadmin for an organization and have any responsibilities regarding IR, this is a post you should really take a look at.

As a responder, one of the things I've run across is that we'd find something on a system that appeared to be a pretty solid indicator of compromise (IoC) or infection.  Sometimes this is a file, directory name, or even a Registry key or value.  This indicator may be something that we could use to sweep across the entire enterprise in order to look for it on other systems...and very often the question becomes, is there a tool that I can use to scan my infrastructure for other compromised/infected systems?  Well, there is...it's called a "batch file".

Forensics
One of the things most of us are aware of as analysts is that in many cases, deleted stuff really isn't; files, Registry keys, etc., get deleted but are often recoverable if you know what you're looking for.  Well, here's a great example of that was used recently. 

CyberSpeak Podcast
I was listening to Ovie's latest CyberSpeak podcast, and very early on in the show, Ovie read a listener email from a LEO who does forensics as a collateral duty.  Now, this is really nothing new...I've met a number of LEOs for whom forensics is an additional duty.  Also, for a lot of LEOs, digital forensics isn't the primary task of the job, even if it is the primary assignment, as LEOs need to remain qualified as LEOs in order to get promoted, and they very often do.  This means that someone will come into the digital forensics field as a seasoned investigator, and several years later move out to another aspect of the law enforcement field. 

I had an opportunity to sit down with some LEOs a couple of weeks ago, and one of the things we came up with as a resource is the LEO's rolodex; if you run into something that you have a question or thought on, call someone you know and trust that may have detailed knowledge of the subject, or knows someone who does.  No one of us knows everything, but there may be someone out there that you know who knows just a little bit more about something...they may have read one or two more articles, or they may have done one more bit of research or testing. 

Ovie also mentioned the ForensicsWiki as a resource, and I completely agree.  This is a great resource that needs to be updated by folks with knowledge and information on the topic areas, so that it will become a much more credible resource.

Also, I have to say that I disagree with Ovie's characterization that there are two different types of forensics; "intrusion forensics" and "regular forensics".  I've heard this sort of categorization before and I don't think that that's really the way it should be broken out...or that it should be broken out at all.  For example, I spoke to a LEO at the first OSDFC who informed me that "...you do intrusion and malware forensics; we do CP and fraud cases."  My response at the time was that I, and others like me, solve problems, and we just get called by folks with intrusion problems.  In addition, there's a lot of convergence in the industry, and you really can't separate the aspects of our industry out in that way.  So let's say that as a LEO, you have a CP case, and the defense counsel alludes to the "Trojan Defense"...you now have a malware aspect to your case, and you have to determine if there is a Trojan/malware on the system and if it could have been responsible for the files having been placed on the system.  Like many examiners, I've done work on CP cases, and the claim was made that someone accessed the system remotely...so now I had an intrusion component of the examination to address.

I went on and listened to Ovie's interview with Drew Fahey...great job, guys!

Time and Timelines
When I give presentations or classes on timeline analysis, one of the things I discuss (because it's important to do so) is those things that can affect time and how it's recorded and represented on a system.  One of the things I refer to is file system tunneling, which is a very interesting aspect of file systems, particularly on Windows.  In short, by default, on both FAT and NTFS systems, if you delete a file in a directory and create a new file of the same name within 15 seconds (default setting), then that new file uses the original files creation dates from both the $STANDARD_INFORMATION and $FILE_NAME attributes. 

This is just one of the things that can affect time on systems.  Grayson, a member of the Trustwave team along with Chris, recently posted to his blog regarding his MAC(b) Daddy presentation from DefCon19, and in that post, linked to this Security BrainDump blog post.

Tools
There's a "new" version of Autopsy available...v3.0.0beta.  Apparently, this one is Windows-only, according to the TSK download page.  I've been using the command line TSK tools for some time, in particular mmls, fls, and blkls...but this updated version of Autopsy brings the power of the TSK tools to the Windows platform in a bit of a more manageable manner.

Similarly, I received an email not long ago regarding a new version of OSForensics beta version 0.99f being available for testing.  I'd taken a look at this tool earlier this year...I haven't looked at this new version yet, but it does seem to have some very interesting capabilities.  There's some capability in the tool to interact with a live system, in that it appears to be able to capture and parse memory.  Also, the tool still seems to be written primarily to interact with a live system...I'll have to take another look at this latest version.

For mounting images, PassMark also makes their OSFMount tool available for free, as well.  This tool is capable of mounting a variety of image formats, which is great...generally, I look for "read-only", but OSFMount has the ability to mount some image formats "read-write", as well.

In chapter 3 of Windows Registry Forensics, I mentioned some tools that you can use to gather information about passwords within an image acquired from a Windows system.  This included tools that would not only quickly illustrate whether or not a user account has a password, but also allow you to do some password cracking.  Craig Wright has started a thread of posts his blog regarding password cracking tools; this thread goes into a bit more detail regarding the use, as well as pros and cons of each tool.

Thoughts on Tool Validation
Now again I see something posted in the lists and forums regarding tool validation, followed by a lot of agreement, but with little discussion regarding what that actually means.  I have also been contacted by folks who have asked about RegRipper validation, or have wanted to validate RegRipper.

When I ask what that means, often "validation" seems to refer to "showing expected results".  Okay, that's fine...but what, exactly, are the expected results?  What is the basis of the reviewer's expectation with respect to results?  When I was doing PCI work for example, we'd have to scan acquired images for credit card numbers (CCNs), and we sort of knew what the "expected result" should look like; however, we very often found CCNs that weren't actually CCNs (they were GUIDs embedded in MS PE files), but has passed the three tests that we used.  When looking for track data, we were even more sure of the accuracy of the results, as the number of tests were increased.  Also, we found that a lot of CCNs were missed; we were using the built-in isValidCreditCard() function that was part of the commercial forensic analysis tool we used, and it turned out that what the vendor considered to be a valid CCN (at the time) and what Visa considered to be a valid CCN were not completely overlapping or congruent sets.  We ended up seeking assistance to rewrite the functionality of that built-in function, and ended up sacrificing speed for accuracy. 

The point of this is that we found an issue with respect to the expected results returned by a specific bit of functionality, and compared that what we considered a "known good".  We knew what the expected result should look like, and as part of the test, we seeded several test files in an image to run side-by-side tests between the built-in function and our home-brew function.  In this case, we knew what the expected result should look like and had purposely seeded the data set with examples of data that should have been correctly and accurately parsed.

When someone takes it upon themselves to "validate" a tool, I have to ask myself, on what are they basing this validation?  For example, if someone says that they need to validate RegRipper (and by extension, rip.pl/.exe), what does that mean?  Does the person validating the tool understand the structures of Registry keys and values, and do they know what the expected result of a test (data extraction, I would presume) would be?  Validation should be performed against or in relation to something, such as a known-good standard...so, in this case, what standard would RegRipper be validated against?  If the validation is against another tool, then is the assumption made that the other tool is "correct"?

Another question to consider is, is the function and design of the tool itself understood?  With respect to RegRipper, if the test is to see if a certain value is extracted and it isn't, is the tool deemed a failure?  Did the person making the assessment ever check to see if the there was a plugin to retrieve the value in question, or did they simply assume that any possible condition they established was accounted for by the tool? The same thing would be true for tools such as Nessus...in a validation, are the properly constructed plugins available for the test?  RegRipper is open-source, and it's functionality isn't necessarily limited by any arbitrary measure.

Why do we validate tools?  We should be validating our tools to ensure that not only do they return the expected results, but at the same time, we're validating that we understand what that expected result should be.  Let's say that you're looking at Windows 7 Jump Lists, and your tool-of-choice doesn't give the expected result; i.e., it chokes and spits out an error.  What do you do?  I know what I do, and when I hear from folks about tools (either ones I've written or ones I know about) that have coughed up an error or not returned an expected result, I often find myself asking the same round of questions over and over.  So, here's what I do:

1.  Based on my case notes, I attempt to replicate the issue.  Can I perform the same actions, and get the same results that I just saw?

2.  Attempt to contact the author of the tool.  As I have my case notes currently available (and since my case notes don't contain any sensitive data, I can take them home, so "...my case notes are in the office..." is never an excuse...), I know what the data was that I attempted to access, where/how that data was derived, which tool and version I was using, and any additional tools I may have used to perform the same task that may have given the same or similar results.  I can also specify to the author the process that I used...this is important because some tools require a particular process to be correctly employed in their use, and you can't simply use it the way you think it should be used.  An example of this is ripXP...in order properly use the tool, you need to either mount the XP image via FTK Imager v3.0 in "file system/read-only" mode, or you have to extract all of the RPx subdirectories from the "_restore{GUID}" directory.  Doing it any other way, such as extracting the System hive files from each RP into the same directory (renaming each one) simply won't work, as the tool wasn't designed to address the situation in that manner.

3.  Many times, I'm familiar with what the format of the data should look like...in particular, Registry hive files, Jump Lists, etc.  Now, I do NOT expect every analyst to be intimately familiar with binary file formats.  However, as a professional analyst, I would hope that most of us would follow a troubleshooting process that doesn't simply start and end with posting to a list or forum.  At the very least, get up from your workbench or cubicle and get another analyst to look at what you're trying to do.  I've always said that I am not an expert and I don't know everything, so even with a simple tool, I could be missing a critical step, so I'll ask someone.  In a lot of cases, it could be just that simple, and reaching out to a trusted resource to ask a question ends up solving the problem.  I once had a case where I was searching an image for CCNs, and got several hits "in" Registry hive files.  I then opened the suspect hive files in a Registry viewer and searched for those hits, but didn't find anything.  As I am somewhat familiar with the binary format of Registry keys and values, I was able to determine that the hits were actually in yet-to-be-allocated sections of the hives...the operating system had selected sectors from disk for use as the hive files grew in size, and those sectors had once been part of a file that contained the potential CCNs.  So, the sectors were "added" to the hive files, but hadn't been completely written to by the time the system was acquired.

So, the point is, when we're "validating" tools, what does that really mean?  I completely agree that tools need to be validated, but at what point is "validating" a buzzword, and at what point is it meaningful?

6 comments:

Anonymous said...

I tend to break validation into two parts, with validation of the tool as the least signifigant. For instance with RegRipper, I ran it in over 40 "test" cases against known registries that I had manually parsed in the past. My "validation" was that it returned everything I knew was there and that it didn't return things that were not. If it did, then I had to revistit my maual parse to see if 1) it existed and I missed it, or 2) RegRipper produced false results.
The second and to me more meaningful validation, was validating myself with the tool. Did I understand what it did? And did I understand everything the tool produced. If not the results would be less than completely useful. One of the benefits of OS is you can go through the code and make sure you KNOW the tool's method and operation and thereby validate yourself and the results.
Just my two cents.

H. Carvey said...

Excellent input, Bill! Thanks!

Troy said...

For what it is worth, I break forensics into two broad categories: 1) content focused and 2) activity focused. They are not mutually exclusive. Often times cases will involve elements of both types of focus, or start as one type of focus and end as another. The point of the distinction, however, is in the methodologies typically employed. Content indexing, for example, is a common task for content focused cases, whereas registry and file system differencing, and time stamp analysis, are common for activity focused cases.

eDiscovery and CP case would be examples of content focused investigations, as in such cases the investigator focuses finding files of specific types of content.

Intrusion and malware cases are examples of activity focused cases, as there the focus is on determining what happened on a system or network.

In any event, the distinction is really only useful to me in developing training materials and in organizing forensics proceedures into some kind of logical taxonomy.

H. Carvey said...

Interesting distinctions, Troy...I'll have to think about those and try wrapping my head around them.

I think that the most important aspects of them, though, is that you've thought about and considered them, and they work for you. You're not sitting back waiting for someone to give you "best practices"...

Mark G said...

I don't usually comment, but... I like Troy's categories and I agree with your LEO friend. There really are two types, as well as some the cross those boundaries. I started out doing mostly content focused. Recently, I have been crossing the boundaries alot, with a few activity focused ones along the way. After 6+ years, I find LEO (in my experience) to be mostly Content based.

For reference - I sent you a coin a couple years ago from SoCal..

H. Carvey said...

Mark G.,

I don't usually comment...

That's too bad, really. You clearly have some well thought-out comments.

I sent you a coin a couple years ago from SoCal..

Thanks! You're one of the few/only that's done anything like that, and I greatly appreciate it.

I don't see there being different categories, per se, when it comes to forensic analysis. IMHO, you need to keep to the core foundation concepts, and you can have a different focus or different procedures based on those foundations. I think that the issues the occur with categorizations is that there's a great deal of overlap and convergence in what analysts do, even when we think that what we do is different from the next guy. CP cases quickly become malware cases when the defense claims the "Trojan Defense", or an intrusion case (or both intrusion and malware) when the claim is made that someone accessed the system remotely.