Windows Incident Response: February 2010

Tuesday, February 23, 2010

More on AV write-ups

Okay, okay...the title of this post isn't the greatest, I get it...but no pun intended. Anyway, I left it as is because (a) I couldn't think of anything witty, and (b) it is kinda funny.

Anyway, on with the show...

I was looking at an issue recently and I came across the following statement in a malware write-up:

It also creates a hidden user account named "HelpAssistant" and creates the following hidden folder: C:\Documents and Settings\HELPASSISTANT

Hhhmmm. Okay, so an artifact for an infected system is this hidden user account...interesting. So I go to a command prompt on my Windows XP box and type net user, and I get a list of user accounts on my system, one of which is HelpAssistant. Wow. So does that mean I'm infected?

Well, the next thing I do is export the Registry hive files from my system and hit the SAM hive with the samparse.pl RegRipper plugin, and I see:

Username : HelpAssistant [1000]
Full Name : Remote Desktop Help Assistant Account
User Comment : Account for Providing Remote Assistance
Account Created : Mon Aug 7 20:23:36 2006 Z
Last Login Date : Never
Pwd Reset Date : Mon Aug 7 20:23:36 2006 Z
Pwd Fail Date : Never
Login Count : 0
--> Password does not expire
--> Account Disabled
--> Normal user account

Okay, so this is a normal user account, never logged in, and appears to have been created in 2006. I think that this is interesting, because I installed this system in Sept, 2009. It appears that this is a default account that's set up with some settings already set to specific values.

Now, a little research tells me that this is an account used for Remote Assistance. If that's the case, does malware create or take over the account? It's possible, with the appropriate privileges, to use the API (or the net user commands) to delete and then create the account. To see if this is what happened, you may be able to find some information in the Event Log (assuming the proper auditing is enabled...) having to do with account deletion/creation. Another analysis technique is to examine the RID on the account as RIDs are assigned sequentially, and to check the unallocated space within the SAM hive (using regslack) to see if the original key for the HelpAssistant account was deleted.

What about this hidden thing? Well, as the write-up never states how the account is hidden, one thing to consider is that the fact that it's hidden is part of normal system behavior. That's right...Windows has a special Registry key that tells it to hide user accounts from view on the Welcome screen, essentially making those accounts hidden. Win32/Starter and Win32/Ursnif both take advantage of this key.

This is just another example of how AV write-ups can be incomplete and misleading, and how responders and analysts should not allow themselves to be mislead by the information provided in these write-ups.

Researching Artifacts

One of the things I really like about this industry is that there's always something new...a new challenge, a new twist to old questions, etc. This is fun, because I like to see about approaching these issues with a novel approach.

Here's an example; I recently found this article discussing an issue with web cams on laptops issued to high school students having been allegedly turned on remotely and used to monitor students in their homes. More and more laptops are available with built-in web cams, and web cams are relatively inexpensive. How long before there are stalking cases or civil suits in which the victim's web cam is enabled? The "Trojan Defense" (ie, the malware did it, not me) has been around for a while, so how long before we can expect to see other devices (web cams, in particular) being recognized as a source for illicit images, or somehow involved in other issues or crimes? Not long afterward, we're going to hear, "hey, I didn't do it...it was the virus."

So the novel approach comes in when you start to consider, what are the artifacts of the use of a web cam on a system? How do you tell if a web cam (or any other device) has been used, and more importantly, how do you address attribution? Was it the local user that started the web cam, was it malware, or was the web cam activated remotely by a legitimate user (or, activated remotely by someone with access to a legitimate user account)?

So what happens when this sort of issue lands on an analysts desk? This may be an example of one of those new, we haven't seen this kind of thing before issues. There very likely isn't a public repository of data, artifacts, and analysis plans somewhere, is there? Maybe there's a private one, but how does that help folks who don't have access to it, particularly if it's only accessible by a very small group of individuals? Where do folks go to start developing answers to questions like those in the previous paragraph, and once they determine those answers, what do they then do with the information? Is it available to the next analyst who runs into this sort of thing, or do we have to start all over again?

There's a good deal of research that goes on in a number of areas within the IR/DF community...file carving, for example. However, a lot of new issues that land on an analyst's desk are just that...new. New issue, new device, new operating system. Most of use are intimately familiar with the fact that the automated analysis approach we used in XP systems was, in some cases, broken when we got our first Vista system in for analysis. Oh, and hey...guess what? Windows 7 is out...in some ways, we need to start all over again.

So what happens when something new...some new issue, operating system, or application...comes out? Sometimes, someone puts forth the effort to conduct analysis and document the process and the findings, and then make that available, like what was done with Limewire examinations, for example.

Speaking of artifacts, I've posted before about browser stuff to look at beyond the traditional TypedURLs key and index.dat files. Well, as it happens, there appears to be data that indicates that it's not so much the browser that's being targeted...it's the stuff running in support of the browser. Brian Krebs posted recently about BLADE (no, not this Blade); the point of the post is that it isn't the browser that's the issue, it's the stuff running behind the scenes; the plugins, the add-ons, etc.

Consider this...someone gets an email or IM with a link to a PDF or other file format, and they click on it. Their default browser is opened, but it isn't the browser that's popped...it's the old/unpatched version of Adobe Reader (or some other unpatched add-on) that results in the system being compromised. Ultimately, a compromise like this could lead to significant losses. So while there will be artifacts in the browser history, this tells us that we need to look beyond that artifact if we're going to attribute an incident to the correct root cause; finding the first artifact and attributing the issue to a browser drive-by may not be correct, and in the long run, may hurt both your employer's reputation, and most certainly your customer. What happens if your customer reads your report and updates or changes the browser used throughout their infrastructure, only to get hit again?

IT firm looses...a lot!

I caught a very interesting post on Brian Krebs' site this morning...you'll find it here.

As an incident responder, the first thing that caught my eye was:

Since the incident, he has conducted numerous scans with a variety of anti-virus and anti-malware products – which he said turned up no sign of malicious software.

Ouch! When I read things like that, I hope that it's not all (nor the first thing) that was done, and that it's a gross, over-simplification of the summation of response activities. Most times, though, it isn't.

I've read Brian's stuff for years, and I think that he's done a great job of bringing some very technical issues into the public eye without subjecting them to the glitz and hoopla that you see in shows like CSI. For example, while Brian mentioned some specific malware that could have been involved, he also made a very clear statement at the beginning of a paragraph that it has not been confirmed that this or any other malware had been involved. I think that's very important when presenting these kinds of stories.

So, look at the situation...the IT firm had a dedicated system with extra protective measures that was used to perform online banking. Even with those measures in place (I did some research on biometric devices back in 2001, and they don't provide the level of protection one would think), a bank official "...said the bank told him that whoever initiated the bogus transaction did so from another Internet address in New Hampshire, and successfully answered two of his secret questions."

I think that Brian's story is a very good illustration of what many of us see in the response community.

Malware may have been associated with what happened, but no one knows for sure. Many of us have been on-site, working with victims, and AV scans can't find anything, but the victims were clearly (and we later determine it to be true) subject to some sort of malware infection. It's interesting how an AV scan won't find anything, but check a few Registry keys and you start to find all sorts of interesting things.

Many of the "protection measures" that folks have in place are easily circumvented, or worse, lead the victims themselves to not consider that as an avenue of infection or compromise, because of the fact that they do have that "protection".

Finally, if malware was involved in this situation, it's a great illustration of how attacks are becoming smarter...for example, rather than logging keystrokes, as pointed out in the article, the malware will read the contents of the form fields; when it comes to online banking and some of the protective measures that have been put in place, this approach makes sense.

Friday, February 19, 2010

Fun Analysis Stuff

Event Log Analysis
Here's another one for all of you out there doing Event Log Analysis. I installed Office 2007 (ie, version 12) on an XP system, and now I have two new .evt files...Microsoft Office Sessions and Microsoft Office Diagnostics. The Microsoft Office Sessions Event Log really seems promising...most of the events are ID 7000 or 7003 (session terminated unexpectedly). The ID 7000 events include how long the session was up, and how long it was active. While the event record doesn't appear to include a specific username or SID, this information can be correlated to Registry data...UserAssist, RecentDocs, application MRUs, etc...to tie the session to a specific user.

As we've seen before, Event Log records can be very useful...sorting them based on record number may show us that the system clock had been manipulated in some way. Another is to show activity on a system during a specific time frame.

Timeline Analysis
Speaking of Event Log records, an interesting and useful way to determine if the system clock had been set back is to sort Event Log records by event record number and observe the times...for each sequential record number, does the generated time for the record increment accordingly?

Another way to check for this (on XP) via the Event Log is to look for event ID 520, with a source of "Security". This event indicates the system time was successfully changed, and includes such information as the PID and name of the process responsible for the change, as well as the old system time (prior to the change) and the new time. An excellent resource and example of this is Lance's first practical.

Now, does event ID 520 necessarily mean that the user changed the system time? By itself, no, it doesn't. In fact, if you create a timeline using the image Lance provided in his first practical, incorporating the Event Logs, you'll see where event ID 520 is in close association with an event ID 35, with a source of W32Time...the system time was automatically updated by the W32Time service! You'll also find a number of instances where the system time was updated via other means. I'll leave it as an exercise for the reader to determine that other means.

An interesting side-effect of creating a timeline using multiple sources is that it provides us with event context. Historically, timelines have consisted of primarily file system metadata, and as such, did not give us a particularly clear picture of what was going on on the system when, say, a file was accessed or modified. Who was logged in, and from where? Was a user logged in? Was someone logged in via RDP? Was the file change a result of someone running a remote management tool such as PSExec, or perhaps due to something else entirely?

Devices
It's been a while since Cory Altheide and I published a paper on tracking USB removable storage devices on Windo

ws systems. Recently, Cory asked me about web cams, and I started looking around to see what I could find out about these devices. As you might think, Windows stores a LOT of information about devices that have been connected to it...and with USB ports, and so many devices coming with USB cables, it just makes sense to connect them to your computer for updates, etc.

Now you may be wondering...who cares? Someone has a web cam...so what? Well, if you're law enforcement, you might be interested to know if a web cam, or a digital camera, or a cell phone...pretty much anything capable of taking or storing pictures...had been connected to the system. Or if there's an issue of communications, and you know the application (Skype, NetMeeting, etc.), then knowing that there was a web cam attached might be of interest. I'm thinking that having device information would be useful when dealing with pictures (EXIF data), as well as looking at different aspects of the use of applications such as Skype...did the user send info as an IM, or via video chat, etc.?

Interestingly, I have access to a Dell Latitude with a built-in web cam, and I took a couple of pictures with the software native to Windows XP...the pictures were placed in the "All Users" profile.

Speaking of taking pictures, got nannycam? Microsoft PowerToys for XP includes a Webcam Timershot application.

Resources
If you don't have a copy of the paper that Cory and I wrote, there's another one available here

Addendum: Volume Shadow Copies
Much like System Restore Points, you can't say enough about accessing files in Volume Shadow Copies...I'm sure that a lot of it bears repeating. Continually. Like from the sausage factory.

Wednesday, February 17, 2010

Monday, February 15, 2010

Links Plus

I've spent a lot of space in this blog talking about timeline analysis lately, so I wanted to take something of a different tact for a post or two...mostly to let the idea of timeline analysis marinate a bit, and let folks digest that particular analysis technique.

PDF Forensics
Didier Stevens has provided a fantastic resource and tools for analyzing PDF files...so much so, that some have been incorporated into VirusTotal. Ray Yepes has provided an excellent article for locating MYD files, mySql database files used by Adobe Organizer that maintain information about PDF files that have been accessed. Congrats, Ray, on some excellent work!

Web Browser Forensics
When most folks think "web browser forensics", they think cache and cookie files. I also mentioned some other browser stuff that might be of interest...in particular bookmarks and favorites, as well as some other tidbits. Bringing even more to the game, Harry Parsonage has put together an excellent resource describing web browser session restore forensics (woany released a tool inspired by the paper). Here's some additional value add to Harry's information, from the sausage factory.

Associated with web browser forensics, Jeff Hamm has written an excellent (all of the papers are excellent!) regarding Google Toolbar Search Artifacts. Jeff also has a paper available regarding the Adobe Photoshop Album Cache File.

Resources
Woany also has other tools...woanware...available for parsing other data that may be associated with web browser forensics, as well as data from other sources. Some of the other interesting tools include ForensicUserInfo and RegExtract.

NirSoft provides a number of excellent utilities for password recovery, etc. If you're analyzing an acquired image, you may need to boot the image with LiveView and login to run some of the tools.

JADSoftware has several excellent tools, including a couple of free ones. Even the ones that aren't free are definitely worth the purchase price, particularly if you're doing the kind of work that requires you to look at these areas a lot.

Activity
Now and again, I see a posting to a forum or receive an email, and the basic question is, how do I determine if there was activity on a system during a specific time period?

The historical approach to answering this type of question is to look at the file system metadata, and see if there are any file creation, access, or modification times during the window in question. However, this presents us with a couple of challenges. In Vista, MS disabled updating of file last access times by default...it's no longer an option that an administrator can set. Then what happens if we're looking for activity on a system a couple of weeks or months ago? File system metadata will show is the most recent changes to the system, but much of that may be relatively close to our current time and not give us a view into what may have happened during the time window we're interested in.

However, we have more than just file system metadata available to us to answer this type of question (I know...we're circling back to timeline analysis now...):

MFT Analysis: Generate a timeline based on $FILE_NAME attribute timestamps. Chapter 12 of Brian Carrier's File System Forensic Analysis book contains a good deal of information relating to these timestamps.

Event Log Analysis: Generate a timeline based on EVT/EVTX file entries. For EVT records, don't rely on just those in the system32\config\*.evt files; see if there's any indication of records being backed up, and also check the pagefile and unallocated space. All you may need to demonstrate activity during a time window is the event record header information anyway.

Log Files: Windows systems maintain a number of log files that record activity. For example, there's the Task Scheduler log file (SchedLgu.txt), setupapi.log, mrt.log, etc. If you're looking at a Windows XP system, System Restore Points each have an rp.log file that states when the Restore Point was created, as well as the reason for the creation, giving you more than just "something happened on this day". Also, look for application logs, particularly AV application logs...some AV applications may also write to the Application Event Log, in addition to maintaining their own log files.

File Metadata: Lots of applications maintain time-stamped information embedded within the structure of the files they interact with; for example, application Prefetch files on XP and Vista. Also, Scheduled Task/.job files. Office documents are also widely known for maintaining a staggering amount of very useful metadata.

Registry Analysis: Ah...the Registry. In some cases, time-stamped information is maintained as Registry key LastWrite times, but there is also considerable information maintained in binary value data, as well. The system-wide hives...SAM, Software, System, and Security...will maintain some useful information (LastShutdownTime, etc.), but you may find more valuable information in the user's NTUSER.DAT and USRCLASS.DAT hives. Also, don't forget that you may also find considerable information in the unallocated space within hive files! Specifically, when keys are deleted, their LastWrite time is updated to reflect when they were deleted, providing what may be some very valuable information.

Of course, when we're talking about Registry hives, we also have to keep in mind that we may have hive files available in either XP System Restore Points, or within Volume Shadow Copies.

In short, if you need to determine if there was activity on a system during a particular window, and perhaps relate that activity to a particular user account, there are a number of data sources available to you. This type of question lends itself very well to timeline analysis, too.

Monday, February 08, 2010

MFT Analysis

As an aside to timeline analysis, I've been considering the relative confidence levels inherent to certain data sources, something I had discussed with Cory. One of the things we'd discussed was the relative confidence level of file system metadata, specifically the timestamps in the $STANDARD_INFORMATION attribute versus those in the $FILE_NAME attribute. Brian Carrier addresses some specifics along these lines in chapter 12 of his File System Forensic Analysis book.

So, I've been looking at the output of tools like Mark Menz's MFTRipper and David Kovar's analyzeMFT.py tools. Based on the information in Brian's book and my chat with Cory, it occurred to me that quite a bit of analysis could be done automatically, using just the MFT and one of the two tools. One thing that could be done is to compare the timestamps in both attributes, as a means of possibly detecting the use of anti-forensics, similar to what Lance described here.

Another thing that could be done is to parse the output of the tools and build a bodyfile using the timestamps from the $FILE_NAME attribute only. However, this would require rebuilding the directory paths from just what's available in the MFT...that is, record numbers, and file references that include the parent record number for the file or folder. That's the part that I got working tonight...I rebuilt the directory paths from the output of David's tool...from there, it's a trivial matter to employ the same code with Mark's tool. And actually, that's the hardest part of the code...the rest is simply extracting timestamps and translating them, as necessary.

Also, I didn't want to miss mentioning that there's a tool for performing temporal analysis of the MFTRipper output from Mark McKinnon over at RedWolf Computer Forensics. I haven't tried it yet, but Mark's stuff is always promising.

Timeline Analysis...do we need a standard?

Perhaps more appropriately, does anyone want a standard, specifically when it comes to the output format?

Almost a year ago, I came up with a 5-field output format for timeline data. I was looking for something to 'define' events, given the number of data sources on a system. I also needed to include the possibility of using data sources from other systems, outside of the system being examined, such as firewalls, IDS, routers, etc.

Events within a timeline can be concisely described using the following five fields:

Time - A timeline is based on times, so I put this field first, as the timeline is sorted on this field. Now, Windows systems have a number of time formats...32-bit Unix t_time format, 64-bit FILETIME objects, and the 128-bit SYSTEMTIME format. The FILETIME object has granularity to 100 nanoseconds, and the SYSTEMTIME structure has granularity to the millisecond...but is either really necessary? I opted to settle on the Unix t_time format, as the other times could be easily reduced to that format, without loosing significant granularity.

Source - This is the source from which the timeline data originates. For example, using TSK's fls.exe allows the analyst to compile file system metadata. If the analyst parses the MFT using MFTRipper or analyzeMFT, she still has file system metadata. The source remains the same, even though the method of obtaining the data may vary...and as such, should be documented in the analyst's case notes.

Sources can include Event Logs (EVT or EVTX), the Registry (REG), etc. I had thought about restricting this to 8-12 characters...again, the source of the data is independent of the extraction method.

Host - This is the host or name of the system from which the data originated. I included this field, as I considered being able to compile a single timeline using data from multiple systems, and even including network devices, such as firewalls, IDS, etc. This can be extremely helpful in pulling together a timeline for something like SQL injection, including logs from the web server, artifacts from the database server, and data from other systems that had been connected to.

Now, when including other systems, differences in clocks (offsets, skew, etc.) need to be taken into account and dealt with prior to entering the data into the timeline; again, this should be thoroughly documented in the analyst's case notes.

Host ID information can come in a variety of forms...MAC address, IP address, system/NETBios name, DNS name, etc. In a timeline, it's possible to create a legend with a key value or identifier, and have the timeline generation tools automatically translate all of the various identifiers to the key value.

This field can be set to a suitable length (25 characters?) to contain the longest identifier.

User - This is the user associated with the event. In many cases, this may be empty; for example, consider file system or Prefetch file metadata - neither is associated with a specific user. However, for Registry data extracted from the NTUSER.DAT or USRCLASS.DAT hives, the analyst will need to ensure that the user is identified, whereas this field is auto-populated by my tools that parse the Event Logs (.evt files).

Much like the Host field, users can be identified in a variety of means...SID, username, domain\username, email address, chat ID, etc. This field can also have a legend, allowing the analyst to convert all of the various values to a single key identifier.

Usually, a SID will be the longest method of referring to a user, and as such would likely be the maximum length for this field.

Description - This is something of a free-form, variable length field, including enough information to concisely describe the event in question. For Event Log records, I tend to include the event source and ID (so that it can be easily researched on EventID.net) , as well as the event message strings.

Now, for expansion, there may need to be a number of additional, perhaps optional fields. One is a means for grouping individual events into a super-event or a duration event, such as in the case of a search or AV scan. How this identifier is structured still needs to be defined; it can consist of an identifier in an additional column, or it may consist of some other structure.

Another possible optional field can be a notes field of some kind. For example, event records from EVT or EVTX files can be confusing; adding additional information from EventID.net or other credible sources may add context to the timeline, particularly if multiple examiners will be reviewing the final timeline data.

This format allows for flexibility in storing and processing timeline data. For example, I currently use flat ASCII text files for my timelines, as do others. Don has mentioned using Highlighter as a means for analyzing an ASCII text timeline. However, this does not obviate using a database rather than flat text files; in fact, as the amount of data grows and as visualization methods are developed for timelines, using a database may become the standard for storing timeline data.

It is my hope that keeping the structure of the timeline data simple and well-defined will assist in expanding the use of timeline creation and analysis. The structure defined in this post is independent of the raw data itself, as well as the means by which the data is extracted. Further, structure is independent of the storage means, be it a flat ASCII text file, a spreadsheet or a database. I hope that those of us performing timeline analysis can settle/agree upon a common structure for the data; from there, we can move on to visualization methods.

What are your thoughts?

Friday, February 05, 2010

Is anyone doing timeline analysis??

Apparently, someone is...the Illustrious Don Weber, the inspiration behind the ITB, to be specific. In a recent SecRipcord blog post, he talks about finding the details of a Hydraq infection via timeline creation and analysis.

In his post, Don also illustrates some information from a malicious service that includes a ServiceDllUnloadOnStop value. I hadn't seen this value before, and it appears that Geoff Chappell has a very detailed explanation of that value, as well as some others that are also part of the service keys. This can add a good deal of context to the information, particularly since this isn't often seen in legitimate Windows services. Sometimes searching or sorting by service Registry key LastWrite times isn't all that fruitful, as many seem to be updated when the system boots. So add something else to your "what's unusual or suspicious" checks for services...lack of descriptions, apparently random names, and some of these values.

Don then goes on to talk about what an APT-style manual compromise "looks like" via timeline analysis. Don includes the contents of a Task Scheduler log file in his timeline, and also shows what would appear to be a remote intruder interacting with the system via a remote shell...running native tools allows the intruder to conduct recon without installing additional tools. After all, if the system already has everything you need on it...nbtstat, net commands, etc., why deposit other tools on the system that will essentially provide the same information?

What Don's posts illustrate are great examples of the immense value of timeline analysis, and how it can be used to provide a greater level of confidence in, as well as context to, your data.

Addendum: I had a conversation via IM with Chris yesterday...with over 2 feet of snow, what else am I going to do, right? We were exchanging ideas about timeline analysis and how things could be represented graphically for analysis purposes, particularly give the nature of the data (LOTS of it) and the nature of malware and intrusions (LFO). I think we came up with some pretty good ideas, and there's still some thinking and looking around to do...the problem is that neither one of us is a graphics programmer, so there's going to be a good deal of trial and error using currently available tools. We'll just have to see how that goes.

I think that the major obstacle to moving forward is going to be a lack of a standard. While I applaud the work that Don's done and admire his initiative and sense of innovation, looking at his posts, it's clear that he's decided to sort of take things in his own direction. Don't get me wrong...there's nothing wrong with that at all. Where it does come into play is that if there's a particular next step tool that relies on a particular format or structure for data, then it's going to be difficult to transition other 'branches' to that tool.

Log2timeline is another, more comprehensive framework for developing timelines, and great piece of work from Kristinn. It's very automated, and uses some of the code in my tools, and provides other output formats in addition to TLN.

So, overall, I'm seeing that there's quite a bit of interest in helping responders, analysts, and examiners move beyond the manual spreadsheet approach to timeline analysis, but perhaps its time to come together and find some common ground that we can all stand on.

Thursday, February 04, 2010

How Did THAT Get There???

Didier posted recently regarding a VBA macro in Excel that allowed him to launch a command shell. This got me to thinking about something I'd read about in the Mandiant M-Trends report...specifically:

For starters, the attackers conduct reconnaissance to identify workers to target in spear-phishing attack...

From an analysis perspective, this can be something of a concern for a responder. One of the biggest analysis issues I've seen has been determining the original infection or compromise vector for an incident. Very often, the analyst can easily locate malware or new user accounts created on a compromised system, but these are often secondary or tertiary artifacts of the original compromise. While these artifacts do provide significant information (i.e., add context and provide a timeframe for the compromise), many times, the initial means of compromise will not be determined...at least, not in a manner that is supported by data.

One of the first steps to determine the initial infection vector may be to identify the malware (secondary artifacts) and determine how it propagates. If there are indications of web browser or email client use on the system...most often for workstations/laptops, but not unheard of on servers...then the initial attack vector may have been via a document-borne mechanism. In this case, the analyst would want to look for indications of documents in email attachment, browser cache, or temp directories. The analyst may be looking for PDF or MSWord documents, or Excel spreadsheets.

So once you locate the files in question, what tools are out there to parse them?

PDF
Didier's PDF Tools are pretty much the de facto standard

Word
cat_open_xml.pl

Excel
Strings. Seriously. Look for stuff you wouldn't see in a spreadsheet. In the case of Didier's cmd.dll...it's a DLL, so look for stuff that might appear in the Import Table..."CreateThread"?

Also, the analyst may want to look for indications of the user actually opening files, via RecentDocs key or application MRU keys.

So the point is that yes, something happened on the system, but how did it get there? More importantly, how do we prove it and not just speculate? Something like this may obviate or support the "Trojan Defense" claim...after all, if you find no indications of a doc-borne attack (spear phishing), then might that not be one way to obviate the claim?

Wednesday, February 03, 2010

Forensic Analysis and Intel Gathering

Continuing the vein of my previous post, while I do see some benefit to an intelligence rating system being adopted and utilized when it comes to forensic analysis, I'm not entirely sure that, particularly at this point, this is something that can be formalized or used more widely, for two reasons...

First, I don't see this as being particularly wide-spread. I do think that there are analysts out there who would see the value of this and take it upon themselves to adopt it and incorporate it into what they do. However, I don't necessarily see this as being part of introductory or maybe even intermediate level training. This might be training that's conducted internally by organizations that conduct analysis or provide analysis services, someplace where it's easier to answer questions and provide more direct guidance to analysts who are seeing this for the first time. Further, something like this may have already been adopted by an analyst who is associated with the intel community in some way.

Second, the rating system is somewhat objective, and this is where you can get really caught up in non-technical/political issues. For example, regarding my statement on the $STANDARD_INFORMATION and $FILE_NAME attributes; when making a statement like that, I would cite Brian Carrier's excellent book, as well as perhaps conduct some testing and document my findings. Based on this, I might assign a fairly high level of confidence to the information; but that's me. Or, what if the information is from the Registry...how is this information rated? Get a roomful of people, you'll get a lot of different answers.

But why is this important? Well, for one, a good deal of forensic analysis has progressed beyond finding files (i.e., pictures, movies, etc.) on systems. Consider issues surrounding web application security...there are vulnerabilities to these systems that allow a knowledgeable attacker to gain access to a system without writing anything to disk; all of the artifacts of the exploit would be in memory. Subsequently, nothing would necessarily be written to disk until the attacker moved beyond the initial step of gaining access, but at that point, anything written to disk might simply appear to be part of the normal system activity.

Consider Least Frequency of Occurrence, or LFO. Pete Silberman was on target when he said that malware has the LFO on a system, and that same sort of thinking applies to intrusions, as well. Therefore, we can't expect to find what we're looking for...the initial intrusion vector, indicators of what the intruder did, or even if the system is compromised...by only looking at one source of data. What we need to do is overlay multiple sources of data, all with their own indicators, and only then will we be able to determine the activity that occurs least frequently. Think of this as finding what we're looking for by looking for the effects of the artifacts being created; we would know that something was dropped in a pond without seeing it being dropped, but by observing the ripples or waves that resulted from the object being dropped into the pond.

Matt Frazier posted to the Mandiant blog recently regarding sharing indicators of compromise...the graphic in the post is an excellent example that demonstrates multiple sources of data. Looking at the graphic and understanding that not everything can be included (for the sake of space), I can see file system metadata, metadata from the EXEs themselves, and partial Registry data included as "indicators". In addition to what's there, I might include the Registry key LastWrite times, any Prefetch files, etc., and then look for "nearby" data, such as files being created in Internet cache or an email attachments directory.

Consider the Trojan Defense. With multiple data sources from the system, as well as potentially from outside the system, it would stand to reason that the relative confidence level and context of the data, based on individual ratings for sources as well as a cumulative rating for the data as a whole, would be of considerable value, not only to the analyst, but to the prosecutor. Or perhaps the defense. By that I mean that as much as most of us would want to have a bad guy punished, we also would not want to wrongly convict an innocent person.

In summary, I really see this sort of thought and analysis process as a viable tool. I think that many analysts have been using this to one degree or another, but maybe hadn't crystallized this in their minds, or perhaps hadn't vocalized it. But I also think that incorporating this into a level of training closer to the initial entry point for analysts and responders would go a long way toward advancing all analysis work. Whether as organizations or individuals, effort should be directed toward investigating and developing/supporting methods for quickly, efficiently, and accurately collecting the necessary information...I'd hate to see something this valuable fall by the wayside and not be employed, simply because someone thinks that it's too hard to learn or use.

Tuesday, February 02, 2010

More Thoughts on Timeline Analysis

I had a conversation with Cory recently, and during the conversation, he mentioned that if I was going to present at a conference and talk about timeline analysis, I should present something novel. I struggled with that one...I don't see a lot of folks talking about using timeline analysis, and that may have to do with the fact that constructing and analyzing a timeline is a very manual process at this point, and that's a likely too high an obstacle for many folks, even with the tools I've provided, or using other tools, such as log2timeline.

Something Cory mentioned really caught my attention, as well. He suggested that various data sources might provide the analyst with a relative level of confidence as to the data itself, and what's being shown. For example, when parsing the MFT (via analyzeMFT or Mark Menz's MFTRipper), the analyst might have more confidence in the temporal values from the $FILE_NAME attribute than from the $STANDARD_INFORMATION attribute, as tools that modify file MAC times modify the temporal values in the latter attribute. See Episode 84 from the CommandLine KungFu blog for a good example that illustrates what I'm talking about...

This is an interesting concept, and something that I really wanted to noodle over and expand. One of the reasons I look to the Registry for so much valuable data is...well...because it's there, but also because I have yet to find a public API that allows you to arbitrarily alter Registry key LastWrite times. Sure, if you want to change a LastWrite time, simply add and delete a value from a key...but I have yet to find an API that will allow me to backdate a LastWrite time on a live system. But LastWrite times aren't the full story...there are a number of keys whose value data contains timestamps.

Particularly for Windows systems, there are a number of sources of timestamped data that can be added to a timeline...metadata from shortcut files, Prefetch files, documents, etc. There are also Event Log records, and entries from other logs (mrt.log, AV logs, etc.).

So, while individual sources of timeline data may provide the analyst with varying levels of relative confidence as to the veracity and validity of the data, populating a timeline with multiple sources of data can serve to raise the analyst's level of relative confidence.

Let's look at some examples of how this sort of thinking can be applied. I did PCI breach investigations for several years, and one of the things I saw pretty quickly was that locating "valid" credit card numbers within an image gave a lot of false positives, even with three different checks (i.e., overall length, BIN, and Luhn check). However, as we added additional checks for track data, our confidence that we had found a valid credit card number increased. Richard talks about something similar in his Attribution post...by using 20 characteristics, your relative confidence of accurate attribution is increased over using, say, 5 characteristics. Another example is malware detection...running 3 AV scanners provides an analyst with a higher level of relative confidence than running just one, just as following a comprehensive process that includes other checks and tools provides an even higher level of relative confidence.

Another aspect of timeline analysis that isn't readily apparent is that as we add more sources, we also add context to the data. For example, we have a Prefetch file from an XP or Vista system, so we have the metadata from that Prefetch file. If we add the file system metadata, we have when the file was first created on the system, and the last modification time of the file should be very similar to the timestamp we extract from the Prefetch file metadata. We may also have other artifacts from the file system metadata, such as other files created or modified as a result of the application itself being run. Now, Prefetch files and file system metadata apply to the system, but not to the specific user...so we may get a great deal of context if we find that a user launched the application, as well as when they took this action. We may also get additional context from an Event Log record that shows, perhaps a login with event ID 528, type 10, indicating a login via RDP. But wait, we know that the user to which the user account applies was in the office that day...

See how using multiple data sources builds our "story" and adds context to our data? Further, the more data we have that shows the same or similar artifacts, the greater relative confidence we have in the data itself. This is, of course, in addition to the relative level of confidence that we have in the various individual sources. I'm not a mathy guy, so I'm not really sure how to represent this in a way that's not purely arbitrary, but to me, this is really a compelling reason for creating timelines for analysis.

What say you?

A Conference By Any Other Name...

...would still smell as sweet.

In a somewhat lame attempt at paraphrasing Willie the Shakes, I wanted to point out that it's that time of year again when folks start looking at training and conference options for the year, and I'm no different. The DoD CyberCrime 2010 conference finished up last week, so I'm keeping an eye on my RSS feeds for attendees posting on their thoughts and experiences, and what feedback there may be. I'm also going to be looking for presentations (and feedback on them) to be posted ...some conferences don't provide that sort of thing, but authors (like Jesse) may.

This got me to thinking...what is it that I look for in a conference? While I've thought about it, I've never really written down what those thoughts are, and then stepped back and taken a look at them. In the past, I've looked forward to conference attendance because of the hype and the titles of the presentations (and the chance to get out of the office, of course), and been sorely disappointed when the presentations ended up being about wicca or being more of a blue comedy routine. Consequently, no amount of hype would get me to go (or recommend going) after that.

For example of the title of a presentation being out of whack with the actual content, when Network Associates purchased Secure Networks and their Ballista product (gawd, dude, how old am I??), I attended a presentation by Art Wong entitled, "The Art of Incident Response". Oddly enough, the presentation had nothing whatsoever to do with incident response.

I think that most people attend conferences for two basic reasons...quality talks, and networking. Okay, the unspoken third reason applies, too..."boondoggle". But for the most part, I think that most conference attendees go to see presentations that could directly and immediately impact what they do, and to meet up with others in the community.

From my own perspective, I generally tend to look for conferences that are going to have some impact on what I do...either because I'm going to see presentations that will impact what I do, or because I can meet and talk to other examiners, as well as potential customers. Something else I also look for is whether or not Syngress is going to have a bookstore at the conference, although this usually isn't the primary reason for going, nor is it a deal breaker.

I attended part of one day of Blackhat DC today, mostly to see Nick talk about TrustWave's numbers. Now, on the surface, you might think that this doesn't impact what I do so much, as I'm no longer in the PCI game. However, the numbers themselves are interesting, and Nick talked about not only the incidents that TW had responded to, but also the scans they'd run. This gave a bit of a different perspective but interesting nonetheless. I also talked to Colin Sheppard for quite a while, and also to Richard Bejtlich (more on that conversation in another post).

So my brief attendance (cut short by an impending snow storm...last week, the weather man said "light dusting" and we got 6+ inches of snow!!) to BHDC was fruitful. In addition to the professional networking, some of the things I heard sparked ancillary ideas...no, Jamie, I wasn't taking notes on Nick's presentation, my furtive scribbling was me jotting down ideas...

Some New Stuff

Podcast
Caught the new CyberSpeak podcast Monday morning...this one was dated 31 Jan, and was sans Bret...I grabbed it from the CyberSpeak Facebook page. It's up on the CyberSpeak page now, so give it a listen.

In The News
Brian Krebs has an interesting tale of two victims, check it out. By now, most are probably aware that Brian's no longer with the Washington Post, but that hasn't diminished his inquisitiveness or passion for writing, particularly on infosec topics.

eEvidence
I probably don't say this often enough, but here goes...check out the What's New section of the eEvidence site. Christina does a great job of posting links to some very interesting and relevant information.

Monday, February 01, 2010

Forensic Analysis Process/Procedures

I've seen posts recently on some of the lists regarding processing forensic data...in most cases, the original question seems to center around, what is (are) the first thing(s) you do with your forensic data?

I thought I'd approach a response from a couple of different perspectives...

Goals
The VERY first thing I start with, regardless of what type of work I'm doing...IR, data collection, CSIRP development, forensic analysis...whatever...are the customer goals. Always. Every time.

Customer goals are documented early on in the engagement, most often from the very first call. They're also revisited throughout the engagement, to ensure that the on-site responder or the analyst is on track, and also ensure that the customer's expectations for the engagement are managed properly. It's a pretty bad feeling to have no communication with the customer, and deliver a report, only to have them say, "...uh...I thought you were going to do X..."

The thing about customer goals is that I can go on-site and back up a Ryder rental truck and acquire images of 300 or 500 systems...but for what? At what expense? If the customer needs an immediate answer, by telling them, "hold on...I've gotta image all of these systems first...", I've already done my customer a huge disservice. The customer's goals dictate everything about the engagement...how many responders are sent on-site, which ones, which analysts will be engaged, how long the engagement will take, etc.

Another thing about goals is that an analyst can easily consume 40 to 80 hours just analyzing an acquired image, and never answer the questions that the customer asked. Not only does the analyst need to be sure to keep the customer's goals in the forefront of their minds, but they also need to ensure that the goals are clear, understandable, and achievable. One popular issue is the customer who ask the analyst to, "...find everything suspicious...", and the analyst takes that and starts analysis...without ever determining what constitutes "suspicious". I've analyzed systems used by pen testers, and systems used by administrators...for these systems, the existence of tools like MetaSploit, pskill, psexec, etc., wouldn't necessarily be "suspicious". What if you find password cracking tools on a system, and spend considerable time determining and reporting on their use...only to find out that the user was an admin whose job it was to test password strength?

Makin'

Copies
Pretty much the first thing I do once I have my images in the lab is make working copies and secure the originals. This is a basic step, I know...but for me, all I need to happen is to get caught once with going on-site and then not being able to access my data. I'm one of those guys that tries to learn from the mistakes of others, rather than my own. I'm not always quite as successful as I'd like to be, but I was almost caught by this once, so I learned my lesson. I had just copied and verified an image from a USB external wallet drive, and began working on the copy. After a day of processing, I went to get a file off the USB ext HDD, and Windows asked me if I wanted to format the drive. For some reason, something had happened to the drive...I have no idea what it was. The point was, however, that I had made my working copy...I had copied the image file, verified the file system, and used Jesse's md5deep to ensure that the image file had completely and correctly copied.

Documentation
Before addressing the actual analysis of an acquired image (or any other data), I make sure that my documentation is in order. At this point in my case management, this usually means that I've started my case notes...the first thing at the very top of my case notes is usually the customer goals. This helps me focus my analysis efforts...I usually even outline an analysis plan to get myself started at this point.

Documentation is a consistent aspect of engagements, beginning to end. Documentation keeps me on track, and also allows me to pass off the work to someone else in case of an emergency, without having them start over completely. It also allows an analyst to pick up a case 6 months or a year later and see how things went...particularly if they need to go to court. Documentation should be clear, concise, and maintained in a thorough enough manner that the analysis is repeatable and can be verified by another analyst.

Analysis
After all that, the very first thing I like to do before doing any actual analysis is to extract specific files from the image...Registry hives, Event Logs, etc...and to look for specific items (i.e., AV logs, MRT logs, etc.). Depending upon the goals of my analysis, I may even run TSK's fls.exe to generate a bodyfile, ultimately for inclusion into a timeline for analysis.

I tend to do this sort of thing if I have something that may take a while...AV scans of a mounted image, keyword searches, etc...because I don't want to go back to the customer and say, "sorry it took so long, but I was running a scan and couldn't do anything else." To me, that's simply unprofessional...analysts are hired as "experts" or at least relied upon as professionals, and should conduct themselves as such. That, in part, means doing analysis tasks in a parallel, rather than serial, fashion should simply be part of what we do. So, if I have a process that I need to run that's going to take some time, I'll extract pertinent files first so that I can continue with my analysis with the other task is running.

Of course, this all ties directly back to the goals of the engagement. In fact, depending upon the goals, I may make two working copies of the image file...one to work on while the other is being scanned. Or, I may not even run AV scans...I've found the malware that the customer described without having to run any scans.

So...what are your first steps?