Sunday, April 13, 2014

TTPs


Within the DFIR and threat intel communities, there has been considerable talk about "TTPs" - tactics, techniques and procedures used by targeted threat actors.  The most challenging aspect of this topic is that there's a great deal of discussion of "having TTPs" and "getting TTPs", but when you really look at something hard, it kind of becomes clear that you're gonna be left wondering, "where're the TTPs?"  I'm still struggling a bit with this, and I'm sure others are, as well.

I ran across Jack Crook's blog post recently, and didn't see how just posting a comment to his article would do it justice.  Jack's been sharing a lot of great stuff, and there's a lot of great stuff in this article, as well.  I particularly like how Jack tied what he was looking at directly into the Pyramid of Pain, as discussed by David Bianco.  That's something we don't see often enough...rather than going out and starting something from scratch, build on some of the great stuff that others have done.  Jack does this very well, and it was great to see him using David's pyramid to illustrate his point.

A couple of posts that might be of interest in fleshing out Jack's thoughts are HowTo: Track Lateral Movement, and HowTo: Determine Program Execution.

More than anything else, I would suggest that the pyramid that David described can be seen as a indicator of the level of maturity of the IR capability within an organization.  What this means is that the more you've moved up the pyramid (Jack provides a great walk-through of moving up the pyramid), the more mature your activity tends to be, and when you've matured to the point where you're focused on TTPs, you're actually using a process, rather than simply looking for specific data points.  And because you have a process, you're going to be able to not only detect and respond to other threats that use different TTPs, but you'll also be able to detect when those TTPs change.  Remember, when it comes to these targeted threats, you're not dealing with malware that simply does what it does, really fast and over and over again.  Your adversary can think and change what they do, responding to any perceived stimulus.

Consider the use of PSExec (a tool) as means to implement lateral movement.  If you're looking for hashes, you may miss it...all someone has to do is flip a single bit within the PE file itself, particularly one that has no consequence on the function of the executable, and your detection method has been obviated.  If a different tool (there are a number of variants...) is used, and you're looking for specific tools, then similarly, your detection method is obviated.  However, if your organization has a policy that such tools will not be used, and it's enforced, and you're looking for Service Control Manager event records (in the System Event Log) with event ID 7045 (indicating that a service was installed), you're likely to detect the use of the tool on the destination systems, as well as the use of other similar tools.  In the case of more recent versions of Windows, you can then look at other event records in order to determine the originating system for the lateral movement.

Book
Looking for Service Control Manager/7045 events is one of the items listed on the malware detection checklist that goes along with chapter 6 of Windows Forensic Analysis, 4/e.

When it comes to malware, I would agree with Jake's blog post regarding not uploading malware/tools that you've found to VT, but I would also suggest that the statement, "...they create a new piece of malware, with a unique hash, just for you..." falls short of the bigger issue.  If you're focused on hashes, and specific tools/malware, yes, the bad guy making changes is going to have a huge impact on your ability to detect what they're doing.  After all, flipping a single bit somewhere in the file that does not affect the executionof the program is sufficient to change the hash.  However, if you're focused on TTPs, your protection and detection process will likely sweep up those changes, as well.  I get it that the focus of Jake's blog post is to make the information more digestible, but I would also suggest that the bar needs to be raised.

Uploading to VT
One issue not mentioned in Jake's post is that if you upload a sample that you found on your infrastructure, you run the risk of not only letting the bad guy know that his stuff has been detected, but more than once responders have seen malware samples include infrastructure-specific information (domains, network paths, credentials, etc.) - uploading that sample exposes the information to the world.  I would strongly suggest that before you even consider uploading something (sample, hash) to VT, you invest some time in collecting additional artifacts about the malware itself, either though your own internal resources or through the assistance of experts that you've partnered with.

A down-side of this pyramid approach, if you're a consultant (third-party responder), is that if you're responding to a client that hasn't engineered their infrastructure to help them detect TTPs, then what you've got left is the lower levels of the pyramid...you can't change the data that you've got available to you.  Of course, your final report should make suitable recommendations as to how the client might improve their posture for responding.  One example might be to ensure that systems are configured to audit at a certain level, such as Audit Other Object Access - by default, this isn't configured.  This would allow for scanning (via the network, or via a SIEM) for event ID 4698 records, indicating that a scheduled task was created.  Scanning for these, and filtering out the known-good scheduled tasks within your infrastructure would allow for this TTP to be detected.

For a good example of changes in TTPs, take a look at this CrowdStrike video that I found out about via Twitter, thanks to Wendi Rafferty.  The speakers do a very good job of describing changes in TTPs.  One thing from the video, however...I wouldn't think that a "new approach" to forensics is required, per se, at least not for response teams that are organic to an organization.  As a consultant, I don't often see organizations that have enabled WMI logging, as recommended in the video, so it's more a matter of a combination of having an established relationship with your client (minimize response time, maximize available data), and having a detailed and thorough analysis process.

Regarding TTPs, Ran2 says in this EspionageWare blog post, "Out of these three layers, TTP carries the highest intelligent value to identify the human attackers."  While being the most valuable, they also seem to be the hardest to acquire and pin down.

Thursday, April 03, 2014

What's Up?

TTPs
A bit ago, I ran across this fascinating blog post regarding the Pyramid of Pain.  Yes, it's over a year old, but it's still relevant today.

For one thing, back when I was doing PCI exams (while a member of the IBM ISS ERS team), Visa would send us these lists which included file names (no paths) and hashes...we had to search for them in every exam, so we did.  While I could see the value in the searches themselves, I felt at the time that Visa was sitting on a great deal of valuable intelligence that, if shared and used properly, could not only help us with our analysis, but could also be used to protect the victim merchants.  After all, if we knew more about how the bad guys were getting in and how they were targeting specific systems (TTPs), we could help prevent and detect such things.  As such, it was validating to see someone else discuss the value of such things.

Over the years, I've heard others talk about things like attribution, when it came to "APT" (I apologize for using that term...).  In fact, at one conference in particular, the speaker talked about how examples of code could be used for attribution, but shortly thereafter stated that the code could be downloaded from the Internet, and that various snippets could be pasted together to form working malware.

Indicators
In his recent RSA Confernce State of the Hack talk, Kevin Mandia said that in the APT1 report, 3000 indicators were released...he mentioned domains and IP addresses, specifically.  Those are pretty low on the pyramid.

I have to agree that tools don't make the threat group, as many seem to be using the same tools.  In fact, it seems that a lot of the tools used for infrastructure recon are native tools...so which group gets the attribution?  Microsoft?  ;-)

Every time I read over the blog post, I keep coming back to agreeing with what the author says about TTPs.  Once you're to the point of detecting behaviors, you're no longer concerned with things like disclosure (public or otherwise) resulting in an adversary changing/adapting their tactics...because now, rather than focusing on individual data points, you have a process in place.  In particular, that process is iterative...anything new you learn gets rolled right back into the process and shared amongst all responders, thereby improving the overall response process.  What you want to do is get to the point where you stop reacting to the adversary, and instead, they react to you.

RegRipper EnScript
I recently found out that someone is selling an EnScript to tie RegRipper into EnCase.  Some tweeted questions about how I felt about it, if I was receiving royalties, and asking if it "violated the GPL"...which, honestly, I didn't understand, as the license file in the archive specifies the license.  Why ask me if the answer is right there?

Since Twitter really isn't the medium for this sort of thing (and I honestly have no idea why so many people in the DFIR community restrict themselves to that medium), I thought I'd share my thoughts here.  First, others are making money off of free software, in general, and a few are doing so specifically with RegRipper...so why is this particular situation any different?  The GPL v3 quick guide states, in part, that users should have the freedom to "use the software for any purpose".  It further goes on to state that free software should remain free...and in this case, it would appear that RR remains free, and the $15 pays for the EnScript.  As such, I'm wouldn't think that selling the EnScript violates anything.

Second, this is clearly an attempt to bring the use of RR to users of EnCase.  I've never been a big fan of EnCase, but I realize that there are a number of folks who are, and that many specifically rely on this software.  My original purpose for releasing RegRipper was to put it out in the community for others to use and improve upon...well, that second part really hasn't happened to a great extent, and I don't see this EnScript either taking anything away from RegRipper, or adding anything to it.

RegRipper
I'm not saying that no one has offered up ways for improving RegRipper...some have. Not long ago, I received a plugin from someone, and Corey Harrell submitted one just the other day. I've had exchanges recently with some folks who have had some thoughtful suggestions regarding how to improve RegRipper, and perhaps make it more useful to a wider range of users. All I'm saying is that it hasn't happened to a great extent; some of the improvements and updates (inclusion of alerts, RLO plugin, etc.) are things I've added for my own benefit, and I don't want to maintain two disparate source trees.

Does this mean that more people are likely to use it?  Hhhhhmmmm...maybe.  Folks who go this route are likely going to go the same route as most of the folks who already use RegRipper, either by downloading it or using it as part of a consolidated distribution.  That is to say, they're just going to just blindly run all plugins against the hives that they have available, and it's unlikely that there're going to have any ideas for new plugins (or tool updates or improvements as a whole) coming from this crowd.

So, in short, I don't see how this EnScript is a violation of anything...it's no different from what others are doing, and RegRipper itself remains free.  Further, it takes nothing away from RegRipper, nor adds anything to it.

Finally, Jamie had a good point on Twitter...if you don't want this to happen, don't put stuff out there for free.  Point well taken.

Speaking of which, I had an email exchange with Jamie and Corey Harrell recently, where we discussed some well-considered possible future additions to RegRipper.

Book
Windows Forensic Analysis 4/e is due to be released soon...I need to complete the archive of materials that go along with the book and get it posted.  As soon as that's done, I will start working on the possible additions to RegRipper.

Malware Detection
Speaking of WFA 4/e, one of the chapters I kept was the one on Malware Detection.  Not long ago, I was following the steps that I had laid out in that chapter, and I found that the system had McAfee AV installed.  So, per my process, I noted this to (a) be sure that I didn't run the same product against the image (mounted as read-only volume) and (b) look for logs and quarantined items.

It turns out that when McAfee AV quarantines an item, it creates a .bup file, which follows the MS CFB file format.  This Open Security Research blog post is very helpful in opening the files up, in part because it points to this McAfee Knowledge Center article on the same topic.

Some additional resources:
Punbup - Python script for parsing BUP files
Bup-parse - Enscript for parsing BUP files
McBup - Another Python script that may be useful

Conferences
I'll be speaking at a couple of conferences here in the near future.  I'm giving two presentations at the USACyberCrime Conference (formerly known as the DoD CyberCrime Conference, or DC3) at the end of April.  My presentations will be "APT sans Malware", and "Registry Analysis".  It's unlikely that I will be posting the PPTXs for these, as I'm not putting everything I'm going to say in bullets in the slides...if I did, what would be the point of me actually speaking, right?

Thanks to Suzanne Widup, I'll be speaking on the author's panel at the SANS Forensics Summit in Austin, TX, in June.  This is something new, and something I'm looking forward to.  Not getting feedback from the community regarding what they'd like to see or hear in a presentation, I've backed away somewhat from submitting presentations to CfPs that are posted.  One of my go-to presentations is Registry Analysis, in part because I really believe that it's a critical component of Windows forensic analysis, and also because I'm not sure that analysts are doing it correctly.  However, I've been told that I need to present on something else...but not what.  Also, the panel format is more free-form...I was on a panel at one of the first SANS Forensic Summits, and if you've attended any of the Open Memory Forensic Workshops, you've seen how interesting a panel can be.

Saturday, March 29, 2014

Writing DFIR Books: Questions

Based on my Writing DFIR Books post, Alissa Torres tweeted that she had a "ton of questions", so I encouraged her to start asking them.  I think that getting the questions out and asked now would be a great way to get started, for a couple of reasons.  First, the Summit is a ways away still, and it's unlikely that she's going to remember the questions.  Second, we don't know how the panel itself is going to go, so even if she did remember her "ton of questions", she may not be able to ask all of them.  Third, it's likely that some questions, and responses, are going to generate further questions, which themselves won't be asked due to time constraints.  Finally, it's unlikely that everyone is going to see the questions and responses, and it's likely that other panelists are going to have answers of their own.  So...I don't really see how someone asking their questions now is really going to take anything away from the panel that Suzanne Widup is putting together...if anything, I believe strongly that getting questions and answers out there now is going to make the panel that much better.

So, I scooped up some of the questions from her tweets, and decided to answer them via a medium more conducive to doing so, and here are my answers...

Forensics research is constantly in a state of new discovery. When does one stop researching and start writing?

The simple answer is that you're going to have to stop researching and start writing at some point.  It's up to you to decide, based on your topic, what you want to address, your outline, your schedule, etc.  The best advice I can give about this is to write the book the way you'd write a report...you'll want to be able to explain to a client (or anyone else) how you reached your conclusions 6 months or a year later, right?  The same holds true for the book...explain what you were doing in your research, in a clear and concise manner.  That way, if someone comes to you with a question about a new discovery after the book is published, you can discuss this new information intelligently.

Publishing Timelines
One thing to keep in mind about writing books is that the book doesn't immediately go to print as soon as "put down your pen". Rather, once you've completed writing the manuscript, it goes into a review process (separate from the technical review process) and the proofs are then sent to you for review. Once you approve the proofs and send them back, it can be 2 or 3 more months before the book is actually available on shelves.  So, the simple fact is that a published book is always going to be just a bit behind new developments.  However, that doesn't make a book any less valuable...there are always new people coming into the field, and none of knows everything, so a well-written book is going to be very useful, regardless.

If new research disproves something that you wrote, does it work against you later as an expert witness?

With respect to this question, writing a book is no different from conducting analysis and writing a report for a client.  Are you going to write something into a report that someone working for the client is going to disprove in a week or two when they read it?  If you found during your analysis that malware on the system had created a value beneath the user's Run key in order to remain persistent, are you going to say in your report that the malware started up each time the system was booted?  No, you're going to say that it was set to start whenever the user logged in, and because you did a thorough analysis, which included creating a timeline of system activity, you're going to have multiple data points to support that statement.

That is not to say that something won't change...things change all the time, particularly when it comes to DFIR work, and particularly with respect to Windows systems.  However, there's very likely going to be something that changed...some other application was installed on the system, some Registry value was set a certain way, a patch had been installed that modified a DLL, etc.

If you've decided to do "research" and add it to your book, do the same thing you would with a report that you're writing for a client.  Describe the conditions, the versions of tools and OS utilized, etc.  Be clear and concise, and if necessary, caveat your statements as necessary.

When I was writing the fourth edition of Windows Forensic Analysis, I wanted to include updated information regarding Windows 8 and VSCs in chapter 3, so I took what was in that chapter in the third edition, and I ran through the process I'd described, using an image acquired from a Windows 8 system...and it didn't work.  So, I figured out why, and was sure to provide the updated information in the chapter.

Something else to keep in mind is that most publishers want you to have a technical reviewer or editor, someone who will be reviewing each chapter as you submit it.  You can stick with whomever they give you, and take your chances, or you can find someone you know and trust to hold you accountable, and offer their name to the publisher.  This is a great way to ensure that something doesn't "slip through the cracks".  Like a report, you can also have someone else review your work...submit it to peer review.  This way, you're less likely to provide research and documentation that is so weak that it's easily disproved.

As to the part about being an expert witness, well...as Alissa said before, "forensics research is constantly in a state of new discovery".  I've never been an expert witness, but I could not imagine an attorney putting an expert witness on the stand to testify based on research or findings that are five years old, or so weak that they could be so easily disproved.   I mean, I'd hardly think that such a witness would qualify as "expert".

You all have to address time management as well - how did you juggle paid work/full-time job with book writing?

Short answer: you do.

Okay...longer answer:  This is something you have to consider before you even sign a contract...when am I going to write?  How often, how much, etc?

I learned some useful techniques while writing fitness reports in the Marine Corps...one being that it's easier to correct and modify something than it is to fill empty space.  Write something, then step away from it.  When I wrote fitreps, I'd jot some bullets down, flesh out a paragraph, and step away from it for a day or so.  Coming back to it later would give me a fresh perspective on what I was writing, allowing my thoughts to marinate a bit.  Of course, it also goes without saying that I didn't wait until the last minute to get started.

Something that I've recommended to folks before they start looking at signing a contract to have a book published is to try writing a couple of chapters.  I will provide a template for them...the one that I use for my publisher...and have them try writing a chapter or two.  I think that this is a very good approach to getting folks to see if they really want to invest the time required to write a book.  One of the things I've learned about the DFIR community, and technical folks as a whole, is that people really do not like writing...case notes, reports, books, etc.  So the first hurdle is for a potential author to see what it's like to actually write, and it's usually much harder if they haven't put a good deal of thought into what they want to write, and they haven't started by putting a detailed outline together.  Once something is ready for review, I then offer to take a look at it and provide feedback...writing a book, just like a report, isn't about the first words you put down on paper.  Then the potential author gets to see what that part of the process is like...and it's like having to do 50 push ups, and then being told to do them over because 19 of them didn't count.  ;-)

So far, good questions.  Like I said, I think that getting some of these questions out there and answered now really doesn't take away from the panel, but instead, brings more attention to it.  And it appears that Suzanne agrees, so keep the questions coming...

Addendum:  Shortly after I tweeted this blog post, Corey Harrell tweeted this question:

What's the one thing you know now that you wish you knew writing your first book?

That it's so hard to get input or thoughtful feedback from the community.  Most often, if you do get anything, it's impossible to follow up and better understand the person's perspective.

Seriously...and I'm not complaining.  It's just a fact that I've come to accept over the years.

Most folks who do this sort of thing want some kind of feedback.  When I taught courses, I had feedback forms.  I know other courses, and even some conferences, include feedback forms.  It's this interaction that allows for the improvement of things such as books, open source tools, and analysis processes.  I'm a firm believer that it's impossible to know everything, but by engaging with each other, we can all become better analysts.  The great thing about writing a book, in this context, is that I've taken the first step by putting something out there to be scrutinized.

One of the things I've found over time is that my books have been and are being used in academic and training (government, military) courses.  This is great, and I really appreciate the fact that the course developers and instructors find enough value in my books to use them.  When I have had the chance to talk to some of these instructors, they've mentioned that they have thoughts on what could be done...what could be added or modified in the book...to make it more useful for their purposes.  When I've asked them to share their thoughts, or asked them to elaborate on statements such as "...cover anti-forensics...", most often, I don't hear anything.

Now and again, I do hear through the grapevine that someone has/had comments about a book, or specific material in one of my books, but what I've yet to see much of, beyond the reviews posted on Amazon, is thoughtful feedback on how the books might be improved.  That is not to say that I haven't received it...just recently I did receive some thought feedback on one of my books from a course instructor, but it was a one-shot deal and it's been impossible to engage in a discussion so that I can better understand what they're asking.

Had I known that when writing my first book, I would've had different expectations.

Friday, March 28, 2014

Writing DFIR Books

Suzanne Widup (of Verizon) recently asked me to sit on an author's panel that she's putting together, in order to make the rounds of several conferences.  I won't be available for the panel at CEIC, but David Cowen will be there...be sure to stop by and see what he, and the other panel members have to share about their experiences.

I thought I'd put together a blog post on this topic for a couple of reasons.  First, to organize my thoughts a bit, and provide some of those thoughts.  Also, I wanted to let folks know that I'll be a member of the author panel at the SANS DFIR Summit in Austin, TX.

How did I get started?
Several years ago, a friend asked me to be a tech reviewer for a book that he was co-authoring, and during the process, I provided some input into the book itself.  It wasn't a great deal of information really, just a paragraph or so, but I provided more than just a terse "looks good" or "needs work".  After the book was completed, the publisher asked my friend if they knew of anyone who wanted to write a book, and he provided three names...of the three, I was apparently the only one to respond.   From there, I went through the process of writing my first book.  After that book was published, it turned out the publisher had no intention of continuing with follow-on editions, and our contract provided them with the right of first refusal, which they did, and I moved on to another publisher.

Why write a book?
I can't speak for everyone, but the reason I decided to write a book initially was because I couldn't find a book that covered the digital forensic analysis of Windows systems in the manner that I would've liked.  Like many in the DFIR community, I've had notes and stuff sitting all over, and in a lot of cases, those notes were on systems that I no longer have; over time, I've upgraded systems, or installed new OSs.  Writing a book to use as a reference means that rather than rummaging all over looking for something, I can reach up to my bookshelf, open to the chapter in question and find what I'm looking for.

What goes into writing a book?
A lot of work.  Writing books for the DFIR community is hard, because there's so much information out there that is constantly changing.  Even if you pick a niche to focus on, it's still a lot of work, because historically, our community isn't really that good at writing.  People in the DFIR community tend to not like to write case notes and reports, let alone a book.  

For most, the process of writing a book starts with an idea, and then finding someone to publish the book.  Once there's interest from a publisher, the potential author starts the process of putting the necessary information together in the format needed by the publisher, which is usually a proposal or questionnaire of some kind.  If the proposal is accepted, the author likely receives a contract spelling out things like the timeline for the book development, page/word count, specifics regarding advances and royalties, etc.  Once the contract is signed, the writing process begins.  As the authors send the chapters in, they are subjected to a review process and sent back to the author for updates.  Once the manuscript...chapters, front- and back-matter, etc...are all submitted, the proofs are put together for the author to review, and once those are done, the book goes to printing.  The time between the author sending the proofs back and the book being available for shipping can be 2 - 3 months.

I'm not saying that I agree with this process...in fact, I think that there is a lot that can be done to not only make the entire process easier, but also result in better quality DFIR books being available.  However, my thoughts on this are really a matter for another blog post.

How do you decide what to put in the book?
When you feel like you would like to write a DFIR book, start with an outline.  Seriously.  This helps organize your thoughts, and it also helps you see if there's enough information to put into a book. Preparation is key, and this is true when taking on the task of writing a book.  I've found over time that the more effort I put into the outline and organizing my thoughts ahead of time, the easier it is to write the book.  Because, honestly, we all know how much folks in the DFIR profession like to write...

What do you get from writing a book?
It depends on what you want from writing a book, and what you put into it.  For example, I started writing books because I wanted a reference, some sort of consolidated location for all of the bits and pieces, tips and tricks that I have laying around.

First, let me clear something up...some people seem to think that when you write a book, you make a lot of money.  Yes, there are royalties...most contracts include them...but it's also really easy to sit back and assume what the percentages are and what the checks look like, and the fact is that most people that think like that are wrong.  I've had people ask me why I didn't include information about Windows Mobile Devices in my books...either for full analysis or just the Registry...and they've suggested that I make enough money in royalties to purchase these devices.  If you think that writing a book for the DFIR community is going to make you enough money to do something like that, then you probably shouldn't start down that road.  Yes, a royalty check is nice, but it's also considered taxable income by the IRS, and it does get reported to the IRS, and it does get taxed.  This takes a small amount and makes it smaller.  I'm not complaining...I have engaged in this process with my eyes open...I'm simply stating the facts as they are so that others are aware.

One thing that you do get from writing a book, whether you want it or not, is notoriety.  This is especially true if the book is useful to folks, and looked upon favorably...they get to know your name (the same is also true if the book ends up being really bad).  And yes, this notoriety kind of puts you in a club because honestly, the percentage of folks who have written successful DFIR books is kind of small.  But this can also have a down-side; a lot of people will look at you as unapproachable.   I was told once during an interview process that the potential employer didn't feel that they could afford me...even though we hadn't talked salary, nor salary history...because I'd written books.  I've received emails from people in the industry, some that I've met in person, in which they've said that they didn't feel that they could ask me a question because I'd written books.

What I get from writing books is the satisfaction of completing the book and seeing it on my bookshelf.  I've actually had occasion to use my books as references, which is exactly what I intended them to be.  I've gone back and looked up tools, commands, and data formats, and used that information to complete exams.  I've also been further blessed, in that some of my books have been translated into other languages, which adds color to my bookshelf.

Book Reviews
Book reviews are very important, not just for marketing the book, but because they're one way that the author gets feedback and can decide to improve the book (if they opt to develop another edition).

Book reviews need to be more than "...chapter 1 contains...chapter 2 covers..."; that's a table of contents (ToC), not a book review, and most books already have a ToC.  

A review of a DFIR book should be more about what you can't get from the ToC.  If you review a restaurant on Yelp, do you repeat the menu (which is usually already available online), or do you talk about your experience?  I tend to talk about the atmosphere, how crowded the restaurant was, how the service was, and the food was, etc.  I tend to do something similar when reviewing DFIR books.  The table of contents is going to be the same regardless of who reads the book; what's going to be different is the reader's experience with the book.

When writing a book review and making suggestions or providing input, it really helps (the author, and ultimately the community) to think about what you're suggesting or asking for.  For example, now and again, one of the things I've been asked to add to the WFA book is a chapter on memory analysis.  Why would I do that if the Volatility folks (and in particular, Jamie Levy) have already put a great deal of effort into the tool documentation AND they have a book coming out?  The book is currently listed as being 720 pages long...what would I be able to provide in a chapter that they don't already provide in a much more complete and thorough manner?

Now, I know that not everyone who purchases a book is going to actually open it.  I know this because there are folks who've asked me questions and knowing that they own a copy of the book, I've referenced the appropriate page number in the book.  But if you do have a book, and you have some strong feelings about it (whether positive or negative), I would strongly encourage you to write a review, even if you're only going to send it to the author.  The reason is that if the author has any thought of updating the book to another edition, or writing another book all together, your feedback can be helpful in that regard.  In fact, it could change the direction of the book completely.  If you share the review publicly, and the author has no intention of updating the book, someone else may see your review and that might be the catalyst for them to write a book.  

Saturday, March 22, 2014

Coding for Digital Forensic Analysis

Over the years, I've seen a couple of questions on the topic of coding for digital forensic analysis.  Many times, these questions tend to devolve into a quasi-religious debate over the programming language used, and quite honestly, that detracts from the discussion as a whole, because regardless of the language used, these questions are very often more about deconstructing or reconstructing data structures, processing logs, or simply obtaining context and meaning from the available mass of data.

Languages
Programming languages abound, and from what I've seen, the one chosen comes down to personal preference, usually based on experience and knowledge of the language.

I started programming BASIC on the Apple IIe back around 1982.  I typed a couple of BASIC programs into a Timex-Sinclair 1000, and then took the required course in BASIC programming my freshman year in college.  In high school, I took the brand new AP Computer Science course, which focused on using PASCAL...I ended up using TurboPASCAL at home to compile my programs and bring them in on a 5.25 in. floppy drive.  In graduate school, I took a C/C++ programming course, but it didn't involve much more than opening a file, writing to it, and then closing it.  I also did some M68000 assembly language programming. We used MatLab a lot in some of the different courses, particularly digital signal processing and neural networks, and I used it to perform some statistical analysis for my thesis.

In 1999, I started teaching myself to program Perl in order to have something to do.  I was working as a consultant at Predictive Systems, and we didn't have a security practice at the time.  I could hear the network operations team talking about how they needed someone who could program Perl, so I started teaching myself what I could so that maybe I could offer some assistance.  From there, I branched out to interacting with Windows systems through the API provided by Dave Roth's Perl modules.  At one point, while working at TDS, I had a fully-functional product that we were using to replace the use of ISS's Internet Scanner, due to the number of false positives and cryptic responses we were receiving.

Since then, I've used Perl for a wide variety of tasks, from IIS web server log processing to RegRipper to decoding binary data structures.

However, I'm NOT suggesting that Perl is the be-all and end-all of programming languages, particularly for DFIR work. Not at all.  All I've done is provided my experience.  Over the years, other programming languages have been found to be extremely useful.  I've seen R used for statistical analysis, and it makes a lot of sense to use this language for that task.  I've also seen a lot of programming in the DFIR space using C# and .NET, and even more using Python.  I've seen folks switch from another language to Python because "everyone is doing it".  I've seen so much of a use of Python, that I've started learning it myself (albeit slowly) in order to better understand the code, and even create my own.  The list of projects written in Python is pretty extensive, so it just makes sense to learn something about this language.

Defining the Problem
The programming language you use to solve a problem is really irrelevant.  I've got bits of Perl code lying around...stuff for parsing certain data structures, printing binary data to the console in hex editor format, etc...so if I need to pull something together quickly, I'm likely going to use a snippet of Perl code that I already have.  I'm sure that I'm no different from any other analyst in that regard.

But when it comes to the particular language or approach I'm using, it depends on what I'm trying to achieve...what are my goals?  If I'm trying to put something together that I'm going to either have a client use, or leave behind after I'm done for the client to use, I may opt for a batch file.  If I need something quickly, but with more flexibility than is offered by a batch file, I may opt for a Perl script.

I've found over time that some of the programming languages used can be difficult to work with...what I mean by that is that some of the tools written and made available by the authors display their results in a GUI, and are closed source.  So, you can't see what the tool is doing, and you can't easily incorporate the output of the tools using techniques like timeline analysis.

Task-Specific Programming
Some folks have asked about learning programming specifically for DFIR work; I'm not sure if there are any.  What it comes down to is, what do you want to do?  If you want to read binary data and parse out data structures based on some definition, then C/C++, C#, Python, Perl, and some other languages work well.  For many, some snippets of code are already available online for these tasks.  If you're trying to process text-based log files, I've found Perl to be well-suited for this task, but if you're more familiar with and comfortable with Python, go for it.

When it really comes down to it, it isn't about which programming language is "the best", it's about what your goal is, and how you want to go about achieving it.

Resources
Python tutorial
80+ Best Free Python Tutorials/books
List of free programming books, Python section

Saturday, March 01, 2014

Reconstructing Data Structures

I've posted before on the topic of understanding data structures (here, here, and here), and some recent analysis brought this back to me yet again.  I had an opportunity to make use of my understanding of data structures, specifically within the Windows Registry, in order to attempt to gain some information from a file where the tools normally used by analysts had failed.

The situation was that we had a Windows system that had been compromised...the bad guy had accessed the system using stolen credentials, then used it to move laterally to other systems.  Between this and the response activities, the system had been infected with malware that overwrites and deletes files.  A responder had collected potentially usable files from this system, including the Registry hives from the compromised user account, and had then run some Registry parsing tools against the hives, none of which worked.  Yep.  All of the tools failed...even the viewers.

I was sent a Registry hive file and a list of "strings of interest", and asked to provide some context to those strings, and if possible, when those strings had been created within the hive file.  My first step was to get an idea as to why the tools used to view and parse the hive had reportedly failed...so, I opened the file in a hex editor.

The first thing I noticed was that there was no 'regf' header.  In Windows Registry hive files, the first four bytes of the file should read "regf" (or "72 65 67 66" in hexadecimal)...there was no header.  Next, I looked for the first 'hbin' section, which is usually found at offset 0x1000 within the file, and starts with 'hbin'.  In this case, I didn't find an 'hbin' section until I reached offset 0x10000 in the file...and the entire space up to that point was all zeros.  I could tell right away that this wasn't good.  Even worse, when I was finally able to locate a key node structure within the hive file, it wasn't the root node.  In fairly short order, it was easy to see why all of the parsing tools had failed, as none had been able to discern a recognizable file structure.

Each hbin section is 4096 (0x1000) bytes in size, which means that if the first hbin section was located at offset 0x10000 within the file, I was missing over a dozen complete hbin sections.  Not a great way to get started, eh?  As I scrolled through the rest of the file quickly, I could see what looked like legitimate key and value nodes, but I could also see other sections  of the file...large sections...that were full of zeros, as well as some that were full of binary stuff that made no sense whatsoever.  In some cases, I could see sections that contained what appeared be Unicode strings, but there were no discernible structures surrounding those strings.

When I was reading over this article prior to posting it, I tried to imagine that last paragraph as scary as possible, just for effect. I pictured myself reading this out loud like a scout leader telling a bunch of scouts a ghost story around a campfire, or huddled under a blanket with a flashlight. I don't know if that helps get the point across about how badly damaged this file was, or if it was just funny.   I mean, for me, saying, "...the first hbin in the hive file was found at offset 0x10000..." IS a horror story!  Either way, in attempting to provide anything at all, I had my work cut out for me...this was going to be tough.

Some background about myself...I was "trained" as an electrical engineer; that is, my undergraduate studies were in EE.  One of my professors would constantly say that "electrical engineers are inherently lazy", meaning that rather than making a complicated solution, or worse, making wild, unsupported assumptions about what we were looking at, electrical engineers would always seek the simplest solution. He even told us a story once about how a radar system used across the Air Force had been "fixed" by soldering a resistor in parallel to another resistor on a circuit board, rather than replace the entire circuit board.  Seeking a simple solution, I thought it best to seek out some reference material for help.  My reference for this work was/is Windows Registry Forensics; in this case, the lower half of pg 26 (in the soft cover edition).

By now I pretty much knew that I had my work cut out for me.  I had a list of strings of interest, but just the strings.  If I was going to make any sense of these strings at all, I'd have to know where they were located within the file.  So, I ran MS/SysInternals strings.exe with the -o switch, so that I could get the offset of the where the string was located within the file.  Once I had done that, I noted a couple of the strings of interest, and opened the hive file in a hex editor.  I picked one of the strings...the output of strings gives me the offset in decimal, so I converted the offset to hexidecimal...and located it in the file.  I did this with several of the strings, and in each case, my findings fell into one of three categories:

1.  The string was a value name.
Pp. 29 - 31 of WRF cover the structure of a Registry value.  The value node header is 20 bytes long, and starts with a 4-byte (DWORD) value that is the size of the overall structure itself (i.e., the header and the name).  An example value is illustrated in figure 1.

Fig 1: Registry value structure
In figure 1, the value header starts 8 bytes into the listing, with "D8 FF FF FF".  This value translates to -40, and tells us that the structure is 40 bytes in size.  Next, we see "vk", which is the value node identifier.  The next two bytes, "10 00", tell us that the name of the value is 16 bytes long.  The name starts immediately following the value structure, so there is no offset to the name listed in the value header; in this case, we can see the name, "GroupByDirection", which is clearly 16 bytes in size.

The remaining elements of the value header can be found in table 1.2 in WRF.  The value types can be found in table 1.3.

2.  The string was the name of a deleted value.
In several instances, I located the string in question and following the structure of a 'nearby' value, found that the string was indeed the value name.  However, when a node (key, value) is deleted in a Registry hive file, the size value (first DWORD) is converted to a positive number.

Consider figure 1 again...the first DWORD is "D8 FF FF FF", which is equal to -40.  Had the value been deleted, the DWORD would be "28 00 00 00".

3.  The string was value data.
In some cases, I found strings at offsets where there was no 'nearby' value (vk) or key (nk) node.   Instead, immediately before the string was what appeared to be a DWORD indicating a negative size value.  In figure 2, the value is '88 FF FF FF'.

Fig 2: Value Data

The string illustrated in figure 2 is an excerpt of a value data entry that I extracted from one of my own systems, but it illustrates the point very well.  In this case, the first DWORD translates to -120, indicating that the string value is 120 bytes in length.  Again, figure 2 illustrates an excerpt of the Registry value data, not the entire string.

Just to be clear, I wasn't looking for all available strings, and I also did not look at all of the strings of interest. By this point, I had looked at about a dozen of so of the strings of interest, out of several dozen.  I mention this because some of the strings of interest could have been key names, but at this point, none of the strings I'd looked at were, in fact, key names.  I should mention that some of the strings appeared as Unicode strings in those sections of the file that I mentioned earlier in this post...while the string was clearly visible, and of interest in the context of the overall examination, I could not find any discernible structure (Registry key or value node, shell item, etc.) near or surrounding the string.

Now, the Windows Registry is described as a hierarchical database, but it's also something of a singly-linked list of structures.   What I mean by this is that the root key node in a Registry hive file points to other keys (subkeys) and values.  Subkeys point to other keys and values, and values only point to data.  When I say "point to", I'm referring to the offset within the value or key header structure that tells us where the next element is located.  This offset is not measured from the beginning of the file (from 0); rather, it starts at the beginning of the first hbin structure, which is (usually) at offset 0x1000.  Let's say that you find an offset value within a value header structure that points to the data, and the offset is "D8 54 01 00".  Translating endianness, we would look for that data structure at 0x0154D8 + 0x1000 within the hive file, or at offset 0x0164D8 from the beginning of the file.

Because key nodes point to other other key nodes and value nodes, and value nodes point to data, the Registry can be described as a singly-linked list.  By contrast, active processes in memory are maintained as a doubly-linked list...each process points to the next process in the list, as well as the previous process in the list.  In the Registry, a value does not point to it's parent key, nor do keys point to their parent key.  This can make it difficult to reconstruct a damaged Registry hive file, particularly when strings of interest that may be pertinent to the investigation can be seen within the file.

In the instance I was looking at, just scrolling through the hive file, I could see that a good deal of it had been destroyed.  In fact, it looked as if the virus had successfully overwritten entire sectors with zeros, and in other some cases, some of the sectors that made up the rest of the file I was sent did not even contain discernible structures.  I knew that I couldn't start at the beginning of the file and reconstruct the hive file...too much was missing.  One approach might have been to comb through the file and catalog all of the key and value node structures that I could locate (both allocated and unallocated), and then run consecutive scans to (a) locate associated value data, and (b) correlate the values to the appropriate keys.

What I did instead was pick out a couple of the more interesting strings, and locate them within the hive file.  I had the offset to the string from the strings.exe output, so I went to that offset in the file, found the string, and then found the beginning of the structure, of which the string was a member.  I noted the structure type (value, data) and recorded the offset for the structure.  I then subtracted 0x1000 from the offset, reversed the endianness, and searched for the value in the hive file.

Here's an example...in one instance, I located a string that was a value name, and the value structure began at offset 0x164D8 within the file.  Subtracting 0x1000, I got 0x154D8, and reversing endianness, I got "D8 54 01 00".  I then searched for that hex string in UltraEdit.  What that led to was a structure that maintained a list of offsets to values associated with a key.  So, I repeated this process, using the location where this structure started.  What that led me to discover was that the key had been obliterated from hive; it wasn't "deleted" in the sense that the first DWORD had been converted to a positive value...it simply no longer existed in the file.

As laborious as it is, this process can be used to gain some modicum of context and value for the investigator.

Process
When confronted with a hive file as badly damaged as the one I was looking at, there are basically two ways to go about collecting some modicum of information and context from the file.  The first is to comb through the entire file, cataloging each discernible structure, as if you were putting puzzle pieces out on a table.  Each structure would have to include not just the information it contained, but also the offset to where it was located within the file.  Once the scanning process had been completed, the pieces could be assembled much like a puzzle, albeit with a lot of missing pieces.

The other way to go about this would be to do something like what I did...find a string of interest, locate it within the file, determine what type of structure it belonged to (if it was part of a structure...), and attempt to reconstruct the path based on knowledge of the structures.  I opted for this approach because it gave me some answers quickly.

Take-away
My biggest take away from this exercise was that understanding the structure of what I was looking at allowed me to not only troubleshoot the issue and determine why the tools weren't working, but it also allowed me to provide some information, context, and insight regarding the data when those tools were not able to do so.

Final Thought
One of the comments to one of my previous posts on this topic included the following question:

Another question might be: With all the data structures out there, is it even possible to truly understand them all?

I would suggest that, no, it's not possible for any single analyst to understand all of the structures available on a Windows system.  That's why none of us try to do so...instead, some of us document what we know, in books or by posting to a wiki, and then offer ourselves as resources.  That way, if someone has a question, all they have to do is ask.  So, if an analyst runs across something that they don't understand, they can continue to not understand it, or they can ask someone who appears to know something about the topic, and in fairly short order, get an understanding.

Monday, February 17, 2014

More Tracking User Activity via the Registry

I have previously posted on the topic of determining a user's access to files, and thanks to Jason Hale's recent post on a similar topic (i.e., MS Word 2013 Reading Locations), I got interested in the topic again.  I was interested in seeing if there were any similar artifacts available for other MS Office apps, such as PowerPoint or Excel.  I exchanged a couple of emails with Jason, and ran a somewhat simple, atomic test  to see what artifacts might be available.  Jason took a different approach to testing than I did...he used ProcMon on a live system, whereas I ran my test, pulled the pertinent hive files off of the system and created a timeline.

Based on Jason's testing with MS Word, I wanted to see if a user were to open a PowerPoint presentation, get partway through it and close the application, would PowerPoint make some record of which slide was last in focus when the app was closed.  Jason ran a test or two of his own, and shared with me that he found some very interesting information beneath the following key:

HKCU\Software\Microsoft\Office\15.0\Common\Roaming\Identities\Anonymous\Settings\1076\{00000000-0000-0000-0000-000000000000}\PendingChanges\

I ran my own tests, using MS Office 2013 installed on Windows 8.1.  I plugged in a thumb drive (mounted as G:\) and double-clicked a PPTX file in the root of that volume (file named "shell items.pptx").  Once the presentation was open, I clicked on the third slide, and then closed the application, launched FTK Imager on the system, copied off the NTUSER.DAT and USRCLASS.DAT hives for the user account, and shut the system down.  I moved the thumb drive to my analysis system and created a timeline of the key LastWrite times from the hives...only to find that changes hadn't been written to the hives.  So, I booted the Win8.1 system again, copied off the hives again, and repeated the process, and was able to see what I was looking for.

First, I had used my Live ID account when logging into the system, so the path to the PendingChanges subkeys was a bit different from what Jason found:

HCKU/Software/Microsoft/Office/15.0/Common/Roaming/Identities/c4d2c8d78e44cf84_LiveId/Settings/1076/{00000000-0000-0000-0000-000000000000}/PendingChanges/dafbbe5e/22434

Beneath this key are several values, including one named "ItemData" that appears to contain XML-like contents, in Unicode format, in the binary value data.  Another value, "LastModified", has data that appears to be a SystemTime structure.

I had incorporated details from the UserAssist data from the hive into the timeline, and as such, I could see in the timeline where that data marked the start of the artifacts I had hoped to see.  For example, within the same second in the timeline, I could see the UserAssist data illustrating that PowerPoint had been launched, as well as an entry beneath the RecentDocs\.pptx key being created, pointing to the file "shell items.pptx".

I also found values beneath the HCKU/Software/Microsoft/Office/15.0/PowerPoint/User MRU/LiveId_blah/File MRU and Place MRU keys that were very useful.  Specifically, both keys had values named "Item 1" with data that pointed to the resource in question.  For the File MRU key, the "Item 1" value data was:

[F00000000][T01CF2BE8F8E1FA40][O00000000]*G:\shell items.pptx

The Place MRU "Item 1" value data was similar, but only pointed to "G:\".  Notice the "T" value in the brackets...that's a FileTime structure that correlates to the time of the access to the file...in this case, 17 Feb 2014 at approx. 14:03:02 UTC.

A couple of things to note at this point:

1.  This really illustrates what I've been saying for some time; that as the versions of Windows have increased over the years, more information is being recorded and maintained with respect to user activity, particularly within the Registry.  These artifacts are based solely on the Registry; the file system is likely to have additional artifacts, such as Jump Lists, that can provide even more context and visibility into user activity.

2.  This was an atomic test...I performed one action and collected data to see what changes occurred.  We all know that DFIR work never occurs this way; in fact, most often, we may not get access to the data for weeks or months after the activity in question occurred.  As such, you should not expect the artifacts to appear exactly as described in this post.

3.  The available findings are based on extremely limited testing and data; more testing and data means better results.

Monday, January 20, 2014

Book Review: Cloud Storage Forensics


I had an opportunity to review Cloud Storage Forensics recently, and I wanted to provide my thoughts on the contents of the book.  I generally don't find book reviews that read like a table of contents (i.e., "...chapter 1 covers...ch 2 covers...") entirely useful, and I'm not sure that others would find them useful, either.  As such, I'm going to approach my review in a different manner.

The book addresses digital forensic analysis of client systems used to connect to and make use of several "cloud storage providers".  This is important to point out, as the terms 'cloud' and 'cloud storage' can so often be misunderstood.  Some may think, for example, that this may have to do with those services available through Amazon Web Services.

The book primarily addresses three cloud storage providers...SkyDrive, Dropbox, and Google Drive...each accessed from a Windows 7 PC and an Apple iPhone 3G.  In both instances, access to the storage facilities were conducted via the browser, as well as the client application for the particular provider.

Brett's review of the book can be found here.  Brett is also the author of the sole review available on the book's Amazon page.  It turns out that Brett was the technical editor of the book (he was also the technical editor for WFA 4/e, and he is the author of Placing the Suspect Behind the Keyboard), and as such, I was able to get a little bit of valuable insight into the process that went into getting this particular book published.  This was as enlightening as it is important, because not all books, even books produced by the same publisher, follow the same process.  A number of years ago, I was reading a book that was very popular at the time on the topic of computer forensics and incident response, and based on something I read, I contacted the authors to ask for clarification.  One of the authors responded with, "...we wrote that section three years ago and didn't touch it before the book was published."  So...not all books follow the "...sit down, write, review, publish..." format that is completed in a year (or less).

Pros
One of the things I liked about the book included the detailed, methodical approach that the authors took to populating their test environment with data, as it not only provides an excellent road map for testing, but also for reasoning during the analysis process.  Too many times in DFIR work, too much is left to assumption, in part because analysts simply receive a hard drive or image, and are not equipped to address potential gaps between the data they observe, and the questions that they need to answer.  One of the very first things I noticed about this book is the thorough approach taken to documenting the testing environment.

Also, the authors clearly stated the tools and versions that they used during their analysis.  Some analysts may not realize it, but this is very important, as tools can very in their capabilities (sometimes, quite significantly) between versions.

Reporting
This aspect of 'full disclosure' (i.e., clearly identifying the tools and versions used) are near and dear to me, as they are a significant aspect of chapter 9, Reporting, of my upcoming book, Windows Forensic Analysis 4/e.

On the subject of the tools used, when I read the tool listing on pg 27 (I was reading the soft cover edition, not the Kindle edition), in ch. 3, I thought back to the "challenges face by law enforcement and government agencies" in ch. 1; it occurred to me that the reason the authors were using the tools on that list was that those are the tools most often used by law enforcement and government agencies.

The authors address a great number of data sources, including not just Prefetch, LNK files, and Event Logs, but also browser artifacts.  The authors also explored (to some extent) what was still available in memory, as well.  This can be very valuable, as analysts should consider parsing available hibernation files, as well as the pagefile.

The chapters that address the actual location of artifacts include additional information regarding the use of anti-forensic techniques (through the use of tools such as Eraser and CCleaner), and illustrate the artifacts that remain.  Further, these chapters also include sections on Presentation, as well as tables that summarize the available artifacts.  I had found this type of summary to be very valuable when teaching courses, and it works equally well in the book.

Cons
The book was published in 2014, and very shortly into chapter 3, it already appears out of date.  For example, one of the tools used is "RegRipper version 20080909".

The version of X-Ways used in the book was version 16.5 which according to Facebook, first became available in May, 2012 (see the graphic to the right).  Now, I'm not bringing this up to say that the most up-to-date version of a tool must always be used...not at all.  But this information gives us a time frame to understand when the authors were writing the book.  It also brings into question why some artifacts...in this case, shellbags) were not discussed, as some of the discussions of artifacts were alarmingly light.  For example, on pg 40 (in ch 3), one sentence starts, "References were also found within the UsrClass.dat Registry files..."; clearly, the authors are referring to shellbags, but there was no further discussion of the artifact, nor anything that illustrated the artifact for the reader.  A similar reference to artifacts in the UsrClass.dat Registry hive was made on pg 75 (ch 4) and on pg 105 (ch 5), but again, there were no further details.

What's also curious about the Registry hive file references is that when the client applications are used to access the cloud storage, there is no mention in any of the three instances (mentioned in the previous paragraph) of UserAssist artifacts.  After all, it would stand to reason that when the user accesses the client application, they would most likely double-click an icon on their desktop, or click an entry on their Start menu...doing so would likely create artifacts in the UserAssist key.  The Registry section on pg 105 in particular specifically mentions the use of "keyword searches", which would not locate entries in the UserAssist key, as the value names are ROT-13 encrypted.

Many of the artifacts (RecentDocs listing from the Registry, Recycle Bin, browser artifacts) displayed in figures and tables in the book include time stamps (which allows us to see when the research was conducted), but there are no analysis techniques illustrated beyond simply locating and displaying the contents of the individual data sources.  Specifically, there are no illustrations of timeline analysis to illustrate not just the available artifacts, but how those artifacts might relate to each other.  There were several examples of timelines (figures 3.2, 4.4, etc.), but these were used for presentation of data, not for data analysis.

Summary
The book is very well structured, had a very methodical approach, and as such, it's easy to locate information in the book.  Each section is structured identically...when the Windows 7 PC is used to access SkyDrive or Dropbox, the sections listing the findings of artifacts are the same as when the iPhone 3G is used to access the same storage facilities.  This structure provides a framework for other analysts who want to use updated, more recent versions of the platforms (Windows 8, iPad, iPhone 5+, etc.), as well as of the client applications for the cloud storage facilities.

However, the book was a bit light on the approach to artifacts; rather than taking a targeted approach to artifact (i.e., shellbags, etc.) analysis, and using timelines in the analysis of the systems, the primary means of analysis appears to have been keyword searches and the use of tools such as Magnet Forensics' IEF.  There is nothing inherently wrong or incorrect about this approach, other than that this approach is known to miss certain artifacts (i.e., UserAssist data).  I had hopped that the backgrounds of the authors, particular the number of forensic investigations undertaken by one, would have obviated sections of the book that included, "...keyword was found in the UsrClass.dat file...".

Sunday, January 12, 2014

Malware RE - IR Disconnect

Not long ago, I'd conducted some analysis that I had found to be...well, pretty fascinating...and shared some of the various aspects of the analysis that were most fruitful.  In particular, I wanted to share how various tools had been used to achieve the findings and complete the analysis.

Part of that analysis involved malware known as PlugX, and as such, a tweet that pointed to this blog post recently caught my attention.  While the blog post, as well as some of the links in the post, contains some pretty fascinating information, I found that in some ways, it illustrates a disconnect between the DFIR and malware RE analysis communities.
Caveat
I've noticed this disconnect for quite some time, going back as far as at least this post...however, I'm also fully aware that AV companies are not in the business of making the job of DFIR analysts any easier.  They have their own business model, and even if they actually do run malware (i.e., perform dynamic analysis), there is no benefit to them (the AV companies) if they engage in the detailed analysis of host-based artifacts.  The simple fact and the inescapable truth is that an AV vendors goals are different from those of a DFIR analyst.  The AV vendor wants to roll out an updated .dat file across the enterprise in order to detect and remove all instances of the malware, whereas a DFIR analyst is usually tasked with answering such questions as "...when did the malware first infect the system/infrastructure?", "...how did it get in?", and "...what data was taken?"

These are very different questions that need to be addressed, and as such, have very different models for the businesses/services that address them.  This is not unlike the differences between the PCI assessors and the PCI forensic analysts.

Specifically, what some folks on one side find to be valuable and interesting may not be useful to folks on the other side.  As such, what's left is two incomplete pictures of the overall threat to the customer, with little (if any) overlap between them.  In the end, this simply leads not only both sides to having an incomplete view of what happened, and the result is that what's provided to the customer...the one with questions that need to be answered...aren't provided the value that could potentially be there.

I'd like to use the Cassidian blog post as an example and walk through what I, as a host-based analysis guy, see as some of the disconnects.  I'm not doing this to highlight the post and say that something was done wrong or incorrectly...not at all.  In fact, I greatly appreciate the information that was provided; however, I think that we can all agree that there are disconnects between the various infosec sub-communities, and my goal here is to see if we can't get folks from the RE and IR communities to come together just a bit more.  So what I'll do is discuss/address the content from some of the sections if the Cassidian post.

Evolution
Seeing the evolution of malware, in general, is pretty fascinating, but to be honest, it really doesn't help DFIR analysts understand the malware, to the point where it helps them locate it on systems and answer the questions that the customer may have.  However, again...it is useful information and is part of the overall intelligence picture that can be developed of the malware, it's use, and possibly even lead to (along with other information) attribution.

Network Communications
Whenever an analyst identifies network traffic, that information is valuable to SOC analysts and folks looking at network traffic.  However, if you're doing DFIR work, many times you're handed a hard drive or an image and asked to locate the malware.  As such, whenever I see a malware RE analyst give specifics regarding network traffic, particularly HTTP requests, I immediately want to know which API was used by the malware to send that traffic.  I want to know this because it helps me understand what artifacts I can look for within the image.  If the malware uses the WinInet API, I know to look in index.dat files (for IE versions 5 through 9), and depending upon how soon after some network communications I'm able to obtain an image of the system, I may be able to find some server responses in the pagefile.  If raw sockets are used, then I'd need to look for different artifacts.

Where network communications has provided to be very useful during host-based analysis is during memory analysis, such as locating open network connections in a memory capture or hibernation file.  Also, sharing information between malware RE and DFIR analysts has really pushed an examination to new levels, as in the case where I was looking at an instance where Win32/Crimea had been used by a bad guy.  That case, in particular, illustrated to me how things could have taken longer or possibly even been missed had the malware RE analyst or I worked in isolation, whereas working together and sharing information provided a much better view of what had happened.

Configuration
The information described in the post is pretty fascinating, and can be used by analysts to determine or confirm other findings; for example, given the timetable, this might line up with something seen in network or proxy logs.  There's enough information in the blog post that would allow an accomplished programmer to write a parser...if there were some detailed information about where the blob (as described in the post) was located.

Persistence
The blog post describes a data structure used to identify the persistence mechanism of the malware; in this case, that can be very valuable information.  Specifically, if the malware creates a Windows service for persistence.  This tells me where to look for artifacts of the malware, and even gives me a means for determining specific artifacts in order to nail down when the malware was first introduced on to the system.  For example, if the malware uses the WinInet API (as mentioned above), that would tell me where to look for the index.dat file, based on the version of Windows I'm examining.

Also, as the malware uses a Windows service for persistence, I know where to look for other artifacts associated (Registry keys, Windows Event Log records, etc.) with the malware, again, based on the version of Windows I'm examining.

Unused Strings
In this case, the authors found two unused strings, set to "1234", in the malware configuration.  I had seen a sample where that string was used as a file name.

Other Artifacts
The blog post makes little mention of other (specifically, host-based) artifacts associated with the malware; however, this resource describes a Registry key created as part of the malware installation, and in an instance I'd seen, the LastWrite time for that key corresponded to the first time the malware was run on the system.

In the case of the Cassidian post, it would be interesting to hear if the FAST key was found in the Registry; if so, this might be good validation, and if not, this might indicate either a past version of the malware, or a branch taken by another author.

Something else that I saw that really helped me nail down the first time that the malware was executed on the system was the existence of a subkey beneath the Tracing key in the Software hive.  This was pretty fascinating and allowed me to correlate multiple artifacts in order to develop a greater level of confidence in what I was seeing.

Not specifically related to the Cassidian blog post, I've seen tweets that talk about the use of Windows shortcut/LNK files in a user's Startup folder as a persistence mechanism.  This may not be particularly interesting to an RE analyst, but for someone like me, that's pretty fascinating, particularly if the LNK file does not contain a LinkInfo block.

Once again, my goal here is not to suggest that the Cassidian folks have done anything wrong...not at all.  The information in their post is pretty interesting.  Rather, what I wanted to do is see if we, as a community, can't agree that there is a disconnect, and then begin working together more closely.  I've worked with a number of RE analysts, and each time, I've found that in doing so, the resulting analysis is more complete, more thorough, and provides more value to the customer.  Further, future analysis is also more complete and thorough, in less time, and when dealing with sophisticated threat actors, time is of the essence.

Saturday, December 28, 2013

Quick Post

RegRipper
At the end of this past summer and into the fall, I was working on the print matter for Windows Forensic Analysis 4/e, and I'm in the process now of getting extra, downloadable materials (I decided a while back to forego the included DVD...) compiled and ready to post.  During the entire process, and while conducting my own exams, I have updated a number of aspects of RegRipper...some of the code to RegRipper itself has been updated, and I've written or updated a number of plugins.  Some recent blogs that have been posted have really provided some information that have led to updates, or at least to a better understanding of the artifacts themselves (how they're created or modified, etc.).

I figured that it'll be time soon for an update to RegRipper.  To that end, Brett has graciously provided me access to the Wordpress dashboard for the RegRipper blog, so this will be THE OFFICIAL SITE for all things RegRipper.

Now, I know that not everyone who uses RegRipper is entirely familiar with the tool, how to use it, and what really constitutes "Registry analysis".  My intention is to have this site become the clearing house for all information related to RegRipper, from information about how to best use the tool to new or updated plugins.

I think that one of the biggest misconceptions about RegRipper is that it does everything right out of the box.  What people believe RegRipper does NOT do has been a topic of discussion, to my knowledge, since a presentation at the SANS Forensic Summit in the summer of 2012.  Unfortunately, in most cases, folks have used presentations and social media to state what they think RegRipper does not do, rather than ask how to get it do those things.  Corey has done a fantastic job of getting RegRipper to do things that he's needed done.  From the beginning, RegRipper was intended to be community-based, meaning that if someone needed a plugin created or modified, they could go to one resource with the request and some sample data for testing, and that's it.  That model has worked pretty well, when it's been used.  For example, Corey posted a great article discussing PCA, Yogesh posted about another aspect of that topic (specifically, the AmCache.hve file), and Mari shared some data with me so that I could get a better, more thorough view of how the data is maintained in the file.  Now, there's a RegRipper plugin that parses this file.  The same thing is true with shellbags...thanks to the data Dan provided along with his blog post, there have been updates to the shellbags.pl plugin.

So, expect to see posts to the RegRipper site in 2014, particularly as I begin working on the updates to Windows Registry Forensics.

USB Devices
Speaking of the Registry...

Thanks to David, I saw that Nicole recently posted some more testing results, this time with respect to USB device first insertion.  She also has a post up regarding directory transversal artifacts for those devices; that's right, another shellbag artifact post!  Add this one to Dan's recent comprehensive post regarding the same artifacts, and you've got quite a bit of fascinating information between those two posts!

Reading through the posts, Nicole's blog is definitely one that you want to add to your blogroll.

Yogesh posted to his blog recently on USB Registry artifacts on Windows 8, specifically with respect to some Registry values that are new specifically to Windows 8.