Monday, January 20, 2014

Book Review: Cloud Storage Forensics

I had an opportunity to review Cloud Storage Forensics recently, and I wanted to provide my thoughts on the contents of the book.  I generally don't find book reviews that read like a table of contents (i.e., "...chapter 1 2 covers...") entirely useful, and I'm not sure that others would find them useful, either.  As such, I'm going to approach my review in a different manner.

The book addresses digital forensic analysis of client systems used to connect to and make use of several "cloud storage providers".  This is important to point out, as the terms 'cloud' and 'cloud storage' can so often be misunderstood.  Some may think, for example, that this may have to do with those services available through Amazon Web Services.

The book primarily addresses three cloud storage providers...SkyDrive, Dropbox, and Google Drive...each accessed from a Windows 7 PC and an Apple iPhone 3G.  In both instances, access to the storage facilities were conducted via the browser, as well as the client application for the particular provider.

Brett's review of the book can be found here.  Brett is also the author of the sole review available on the book's Amazon page.  It turns out that Brett was the technical editor of the book (he was also the technical editor for WFA 4/e, and he is the author of Placing the Suspect Behind the Keyboard), and as such, I was able to get a little bit of valuable insight into the process that went into getting this particular book published.  This was as enlightening as it is important, because not all books, even books produced by the same publisher, follow the same process.  A number of years ago, I was reading a book that was very popular at the time on the topic of computer forensics and incident response, and based on something I read, I contacted the authors to ask for clarification.  One of the authors responded with, "...we wrote that section three years ago and didn't touch it before the book was published."  So...not all books follow the "...sit down, write, review, publish..." format that is completed in a year (or less).

One of the things I liked about the book included the detailed, methodical approach that the authors took to populating their test environment with data, as it not only provides an excellent road map for testing, but also for reasoning during the analysis process.  Too many times in DFIR work, too much is left to assumption, in part because analysts simply receive a hard drive or image, and are not equipped to address potential gaps between the data they observe, and the questions that they need to answer.  One of the very first things I noticed about this book is the thorough approach taken to documenting the testing environment.

Also, the authors clearly stated the tools and versions that they used during their analysis.  Some analysts may not realize it, but this is very important, as tools can very in their capabilities (sometimes, quite significantly) between versions.

This aspect of 'full disclosure' (i.e., clearly identifying the tools and versions used) are near and dear to me, as they are a significant aspect of chapter 9, Reporting, of my upcoming book, Windows Forensic Analysis 4/e.

On the subject of the tools used, when I read the tool listing on pg 27 (I was reading the soft cover edition, not the Kindle edition), in ch. 3, I thought back to the "challenges face by law enforcement and government agencies" in ch. 1; it occurred to me that the reason the authors were using the tools on that list was that those are the tools most often used by law enforcement and government agencies.

The authors address a great number of data sources, including not just Prefetch, LNK files, and Event Logs, but also browser artifacts.  The authors also explored (to some extent) what was still available in memory, as well.  This can be very valuable, as analysts should consider parsing available hibernation files, as well as the pagefile.

The chapters that address the actual location of artifacts include additional information regarding the use of anti-forensic techniques (through the use of tools such as Eraser and CCleaner), and illustrate the artifacts that remain.  Further, these chapters also include sections on Presentation, as well as tables that summarize the available artifacts.  I had found this type of summary to be very valuable when teaching courses, and it works equally well in the book.

The book was published in 2014, and very shortly into chapter 3, it already appears out of date.  For example, one of the tools used is "RegRipper version 20080909".

The version of X-Ways used in the book was version 16.5 which according to Facebook, first became available in May, 2012 (see the graphic to the right).  Now, I'm not bringing this up to say that the most up-to-date version of a tool must always be used...not at all.  But this information gives us a time frame to understand when the authors were writing the book.  It also brings into question why some this case, shellbags) were not discussed, as some of the discussions of artifacts were alarmingly light.  For example, on pg 40 (in ch 3), one sentence starts, "References were also found within the UsrClass.dat Registry files..."; clearly, the authors are referring to shellbags, but there was no further discussion of the artifact, nor anything that illustrated the artifact for the reader.  A similar reference to artifacts in the UsrClass.dat Registry hive was made on pg 75 (ch 4) and on pg 105 (ch 5), but again, there were no further details.

What's also curious about the Registry hive file references is that when the client applications are used to access the cloud storage, there is no mention in any of the three instances (mentioned in the previous paragraph) of UserAssist artifacts.  After all, it would stand to reason that when the user accesses the client application, they would most likely double-click an icon on their desktop, or click an entry on their Start menu...doing so would likely create artifacts in the UserAssist key.  The Registry section on pg 105 in particular specifically mentions the use of "keyword searches", which would not locate entries in the UserAssist key, as the value names are ROT-13 encrypted.

Many of the artifacts (RecentDocs listing from the Registry, Recycle Bin, browser artifacts) displayed in figures and tables in the book include time stamps (which allows us to see when the research was conducted), but there are no analysis techniques illustrated beyond simply locating and displaying the contents of the individual data sources.  Specifically, there are no illustrations of timeline analysis to illustrate not just the available artifacts, but how those artifacts might relate to each other.  There were several examples of timelines (figures 3.2, 4.4, etc.), but these were used for presentation of data, not for data analysis.

The book is very well structured, had a very methodical approach, and as such, it's easy to locate information in the book.  Each section is structured identically...when the Windows 7 PC is used to access SkyDrive or Dropbox, the sections listing the findings of artifacts are the same as when the iPhone 3G is used to access the same storage facilities.  This structure provides a framework for other analysts who want to use updated, more recent versions of the platforms (Windows 8, iPad, iPhone 5+, etc.), as well as of the client applications for the cloud storage facilities.

However, the book was a bit light on the approach to artifacts; rather than taking a targeted approach to artifact (i.e., shellbags, etc.) analysis, and using timelines in the analysis of the systems, the primary means of analysis appears to have been keyword searches and the use of tools such as Magnet Forensics' IEF.  There is nothing inherently wrong or incorrect about this approach, other than that this approach is known to miss certain artifacts (i.e., UserAssist data).  I had hopped that the backgrounds of the authors, particular the number of forensic investigations undertaken by one, would have obviated sections of the book that included, "...keyword was found in the UsrClass.dat file...".

Sunday, January 12, 2014

Malware RE - IR Disconnect

Not long ago, I'd conducted some analysis that I had found to be...well, pretty fascinating...and shared some of the various aspects of the analysis that were most fruitful.  In particular, I wanted to share how various tools had been used to achieve the findings and complete the analysis.

Part of that analysis involved malware known as PlugX, and as such, a tweet that pointed to this blog post recently caught my attention.  While the blog post, as well as some of the links in the post, contains some pretty fascinating information, I found that in some ways, it illustrates a disconnect between the DFIR and malware RE analysis communities.
I've noticed this disconnect for quite some time, going back as far as at least this post...however, I'm also fully aware that AV companies are not in the business of making the job of DFIR analysts any easier.  They have their own business model, and even if they actually do run malware (i.e., perform dynamic analysis), there is no benefit to them (the AV companies) if they engage in the detailed analysis of host-based artifacts.  The simple fact and the inescapable truth is that an AV vendors goals are different from those of a DFIR analyst.  The AV vendor wants to roll out an updated .dat file across the enterprise in order to detect and remove all instances of the malware, whereas a DFIR analyst is usually tasked with answering such questions as "...when did the malware first infect the system/infrastructure?", " did it get in?", and "...what data was taken?"

These are very different questions that need to be addressed, and as such, have very different models for the businesses/services that address them.  This is not unlike the differences between the PCI assessors and the PCI forensic analysts.

Specifically, what some folks on one side find to be valuable and interesting may not be useful to folks on the other side.  As such, what's left is two incomplete pictures of the overall threat to the customer, with little (if any) overlap between them.  In the end, this simply leads not only both sides to having an incomplete view of what happened, and the result is that what's provided to the customer...the one with questions that need to be answered...aren't provided the value that could potentially be there.

I'd like to use the Cassidian blog post as an example and walk through what I, as a host-based analysis guy, see as some of the disconnects.  I'm not doing this to highlight the post and say that something was done wrong or incorrectly...not at all.  In fact, I greatly appreciate the information that was provided; however, I think that we can all agree that there are disconnects between the various infosec sub-communities, and my goal here is to see if we can't get folks from the RE and IR communities to come together just a bit more.  So what I'll do is discuss/address the content from some of the sections if the Cassidian post.

Seeing the evolution of malware, in general, is pretty fascinating, but to be honest, it really doesn't help DFIR analysts understand the malware, to the point where it helps them locate it on systems and answer the questions that the customer may have.  However, is useful information and is part of the overall intelligence picture that can be developed of the malware, it's use, and possibly even lead to (along with other information) attribution.

Network Communications
Whenever an analyst identifies network traffic, that information is valuable to SOC analysts and folks looking at network traffic.  However, if you're doing DFIR work, many times you're handed a hard drive or an image and asked to locate the malware.  As such, whenever I see a malware RE analyst give specifics regarding network traffic, particularly HTTP requests, I immediately want to know which API was used by the malware to send that traffic.  I want to know this because it helps me understand what artifacts I can look for within the image.  If the malware uses the WinInet API, I know to look in index.dat files (for IE versions 5 through 9), and depending upon how soon after some network communications I'm able to obtain an image of the system, I may be able to find some server responses in the pagefile.  If raw sockets are used, then I'd need to look for different artifacts.

Where network communications has provided to be very useful during host-based analysis is during memory analysis, such as locating open network connections in a memory capture or hibernation file.  Also, sharing information between malware RE and DFIR analysts has really pushed an examination to new levels, as in the case where I was looking at an instance where Win32/Crimea had been used by a bad guy.  That case, in particular, illustrated to me how things could have taken longer or possibly even been missed had the malware RE analyst or I worked in isolation, whereas working together and sharing information provided a much better view of what had happened.

The information described in the post is pretty fascinating, and can be used by analysts to determine or confirm other findings; for example, given the timetable, this might line up with something seen in network or proxy logs.  There's enough information in the blog post that would allow an accomplished programmer to write a parser...if there were some detailed information about where the blob (as described in the post) was located.

The blog post describes a data structure used to identify the persistence mechanism of the malware; in this case, that can be very valuable information.  Specifically, if the malware creates a Windows service for persistence.  This tells me where to look for artifacts of the malware, and even gives me a means for determining specific artifacts in order to nail down when the malware was first introduced on to the system.  For example, if the malware uses the WinInet API (as mentioned above), that would tell me where to look for the index.dat file, based on the version of Windows I'm examining.

Also, as the malware uses a Windows service for persistence, I know where to look for other artifacts associated (Registry keys, Windows Event Log records, etc.) with the malware, again, based on the version of Windows I'm examining.

Unused Strings
In this case, the authors found two unused strings, set to "1234", in the malware configuration.  I had seen a sample where that string was used as a file name.

Other Artifacts
The blog post makes little mention of other (specifically, host-based) artifacts associated with the malware; however, this resource describes a Registry key created as part of the malware installation, and in an instance I'd seen, the LastWrite time for that key corresponded to the first time the malware was run on the system.

In the case of the Cassidian post, it would be interesting to hear if the FAST key was found in the Registry; if so, this might be good validation, and if not, this might indicate either a past version of the malware, or a branch taken by another author.

Something else that I saw that really helped me nail down the first time that the malware was executed on the system was the existence of a subkey beneath the Tracing key in the Software hive.  This was pretty fascinating and allowed me to correlate multiple artifacts in order to develop a greater level of confidence in what I was seeing.

Not specifically related to the Cassidian blog post, I've seen tweets that talk about the use of Windows shortcut/LNK files in a user's Startup folder as a persistence mechanism.  This may not be particularly interesting to an RE analyst, but for someone like me, that's pretty fascinating, particularly if the LNK file does not contain a LinkInfo block.

Once again, my goal here is not to suggest that the Cassidian folks have done anything wrong...not at all.  The information in their post is pretty interesting.  Rather, what I wanted to do is see if we, as a community, can't agree that there is a disconnect, and then begin working together more closely.  I've worked with a number of RE analysts, and each time, I've found that in doing so, the resulting analysis is more complete, more thorough, and provides more value to the customer.  Further, future analysis is also more complete and thorough, in less time, and when dealing with sophisticated threat actors, time is of the essence.