Saturday, March 29, 2014

Writing DFIR Books: Questions

Based on my Writing DFIR Books post, Alissa Torres tweeted that she had a "ton of questions", so I encouraged her to start asking them.  I think that getting the questions out and asked now would be a great way to get started, for a couple of reasons.  First, the Summit is a ways away still, and it's unlikely that she's going to remember the questions.  Second, we don't know how the panel itself is going to go, so even if she did remember her "ton of questions", she may not be able to ask all of them.  Third, it's likely that some questions, and responses, are going to generate further questions, which themselves won't be asked due to time constraints.  Finally, it's unlikely that everyone is going to see the questions and responses, and it's likely that other panelists are going to have answers of their own.  So...I don't really see how someone asking their questions now is really going to take anything away from the panel that Suzanne Widup is putting together...if anything, I believe strongly that getting questions and answers out there now is going to make the panel that much better.

So, I scooped up some of the questions from her tweets, and decided to answer them via a medium more conducive to doing so, and here are my answers...

Forensics research is constantly in a state of new discovery. When does one stop researching and start writing?

The simple answer is that you're going to have to stop researching and start writing at some point.  It's up to you to decide, based on your topic, what you want to address, your outline, your schedule, etc.  The best advice I can give about this is to write the book the way you'd write a'll want to be able to explain to a client (or anyone else) how you reached your conclusions 6 months or a year later, right?  The same holds true for the book...explain what you were doing in your research, in a clear and concise manner.  That way, if someone comes to you with a question about a new discovery after the book is published, you can discuss this new information intelligently.

Publishing Timelines
One thing to keep in mind about writing books is that the book doesn't immediately go to print as soon as "put down your pen". Rather, once you've completed writing the manuscript, it goes into a review process (separate from the technical review process) and the proofs are then sent to you for review. Once you approve the proofs and send them back, it can be 2 or 3 more months before the book is actually available on shelves.  So, the simple fact is that a published book is always going to be just a bit behind new developments.  However, that doesn't make a book any less valuable...there are always new people coming into the field, and none of knows everything, so a well-written book is going to be very useful, regardless.

If new research disproves something that you wrote, does it work against you later as an expert witness?

With respect to this question, writing a book is no different from conducting analysis and writing a report for a client.  Are you going to write something into a report that someone working for the client is going to disprove in a week or two when they read it?  If you found during your analysis that malware on the system had created a value beneath the user's Run key in order to remain persistent, are you going to say in your report that the malware started up each time the system was booted?  No, you're going to say that it was set to start whenever the user logged in, and because you did a thorough analysis, which included creating a timeline of system activity, you're going to have multiple data points to support that statement.

That is not to say that something won't change...things change all the time, particularly when it comes to DFIR work, and particularly with respect to Windows systems.  However, there's very likely going to be something that changed...some other application was installed on the system, some Registry value was set a certain way, a patch had been installed that modified a DLL, etc.

If you've decided to do "research" and add it to your book, do the same thing you would with a report that you're writing for a client.  Describe the conditions, the versions of tools and OS utilized, etc.  Be clear and concise, and if necessary, caveat your statements as necessary.

When I was writing the fourth edition of Windows Forensic Analysis, I wanted to include updated information regarding Windows 8 and VSCs in chapter 3, so I took what was in that chapter in the third edition, and I ran through the process I'd described, using an image acquired from a Windows 8 system...and it didn't work.  So, I figured out why, and was sure to provide the updated information in the chapter.

Something else to keep in mind is that most publishers want you to have a technical reviewer or editor, someone who will be reviewing each chapter as you submit it.  You can stick with whomever they give you, and take your chances, or you can find someone you know and trust to hold you accountable, and offer their name to the publisher.  This is a great way to ensure that something doesn't "slip through the cracks".  Like a report, you can also have someone else review your work...submit it to peer review.  This way, you're less likely to provide research and documentation that is so weak that it's easily disproved.

As to the part about being an expert witness, Alissa said before, "forensics research is constantly in a state of new discovery".  I've never been an expert witness, but I could not imagine an attorney putting an expert witness on the stand to testify based on research or findings that are five years old, or so weak that they could be so easily disproved.   I mean, I'd hardly think that such a witness would qualify as "expert".

You all have to address time management as well - how did you juggle paid work/full-time job with book writing?

Short answer: you do.

Okay...longer answer:  This is something you have to consider before you even sign a contract...when am I going to write?  How often, how much, etc?

I learned some useful techniques while writing fitness reports in the Marine being that it's easier to correct and modify something than it is to fill empty space.  Write something, then step away from it.  When I wrote fitreps, I'd jot some bullets down, flesh out a paragraph, and step away from it for a day or so.  Coming back to it later would give me a fresh perspective on what I was writing, allowing my thoughts to marinate a bit.  Of course, it also goes without saying that I didn't wait until the last minute to get started.

Something that I've recommended to folks before they start looking at signing a contract to have a book published is to try writing a couple of chapters.  I will provide a template for them...the one that I use for my publisher...and have them try writing a chapter or two.  I think that this is a very good approach to getting folks to see if they really want to invest the time required to write a book.  One of the things I've learned about the DFIR community, and technical folks as a whole, is that people really do not like notes, reports, books, etc.  So the first hurdle is for a potential author to see what it's like to actually write, and it's usually much harder if they haven't put a good deal of thought into what they want to write, and they haven't started by putting a detailed outline together.  Once something is ready for review, I then offer to take a look at it and provide feedback...writing a book, just like a report, isn't about the first words you put down on paper.  Then the potential author gets to see what that part of the process is like...and it's like having to do 50 push ups, and then being told to do them over because 19 of them didn't count.  ;-)

So far, good questions.  Like I said, I think that getting some of these questions out there and answered now really doesn't take away from the panel, but instead, brings more attention to it.  And it appears that Suzanne agrees, so keep the questions coming...

Addendum:  Shortly after I tweeted this blog post, Corey Harrell tweeted this question:

What's the one thing you know now that you wish you knew writing your first book?

That it's so hard to get input or thoughtful feedback from the community.  Most often, if you do get anything, it's impossible to follow up and better understand the person's perspective.

Seriously...and I'm not complaining.  It's just a fact that I've come to accept over the years.

Most folks who do this sort of thing want some kind of feedback.  When I taught courses, I had feedback forms.  I know other courses, and even some conferences, include feedback forms.  It's this interaction that allows for the improvement of things such as books, open source tools, and analysis processes.  I'm a firm believer that it's impossible to know everything, but by engaging with each other, we can all become better analysts.  The great thing about writing a book, in this context, is that I've taken the first step by putting something out there to be scrutinized.

One of the things I've found over time is that my books have been and are being used in academic and training (government, military) courses.  This is great, and I really appreciate the fact that the course developers and instructors find enough value in my books to use them.  When I have had the chance to talk to some of these instructors, they've mentioned that they have thoughts on what could be done...what could be added or modified in the make it more useful for their purposes.  When I've asked them to share their thoughts, or asked them to elaborate on statements such as "...cover anti-forensics...", most often, I don't hear anything.

Now and again, I do hear through the grapevine that someone has/had comments about a book, or specific material in one of my books, but what I've yet to see much of, beyond the reviews posted on Amazon, is thoughtful feedback on how the books might be improved.  That is not to say that I haven't received it...just recently I did receive some thought feedback on one of my books from a course instructor, but it was a one-shot deal and it's been impossible to engage in a discussion so that I can better understand what they're asking.

Had I known that when writing my first book, I would've had different expectations.

Friday, March 28, 2014

Writing DFIR Books

Suzanne Widup (of Verizon) recently asked me to sit on an author's panel that she's putting together, in order to make the rounds of several conferences.  I won't be available for the panel at CEIC, but David Cowen will be sure to stop by and see what he, and the other panel members have to share about their experiences.

I thought I'd put together a blog post on this topic for a couple of reasons.  First, to organize my thoughts a bit, and provide some of those thoughts.  Also, I wanted to let folks know that I'll be a member of the author panel at the SANS DFIR Summit in Austin, TX.

How did I get started?
Several years ago, a friend asked me to be a tech reviewer for a book that he was co-authoring, and during the process, I provided some input into the book itself.  It wasn't a great deal of information really, just a paragraph or so, but I provided more than just a terse "looks good" or "needs work".  After the book was completed, the publisher asked my friend if they knew of anyone who wanted to write a book, and he provided three names...of the three, I was apparently the only one to respond.   From there, I went through the process of writing my first book.  After that book was published, it turned out the publisher had no intention of continuing with follow-on editions, and our contract provided them with the right of first refusal, which they did, and I moved on to another publisher.

Why write a book?
I can't speak for everyone, but the reason I decided to write a book initially was because I couldn't find a book that covered the digital forensic analysis of Windows systems in the manner that I would've liked.  Like many in the DFIR community, I've had notes and stuff sitting all over, and in a lot of cases, those notes were on systems that I no longer have; over time, I've upgraded systems, or installed new OSs.  Writing a book to use as a reference means that rather than rummaging all over looking for something, I can reach up to my bookshelf, open to the chapter in question and find what I'm looking for.

What goes into writing a book?
A lot of work.  Writing books for the DFIR community is hard, because there's so much information out there that is constantly changing.  Even if you pick a niche to focus on, it's still a lot of work, because historically, our community isn't really that good at writing.  People in the DFIR community tend to not like to write case notes and reports, let alone a book.  

For most, the process of writing a book starts with an idea, and then finding someone to publish the book.  Once there's interest from a publisher, the potential author starts the process of putting the necessary information together in the format needed by the publisher, which is usually a proposal or questionnaire of some kind.  If the proposal is accepted, the author likely receives a contract spelling out things like the timeline for the book development, page/word count, specifics regarding advances and royalties, etc.  Once the contract is signed, the writing process begins.  As the authors send the chapters in, they are subjected to a review process and sent back to the author for updates.  Once the manuscript...chapters, front- and back-matter, etc...are all submitted, the proofs are put together for the author to review, and once those are done, the book goes to printing.  The time between the author sending the proofs back and the book being available for shipping can be 2 - 3 months.

I'm not saying that I agree with this fact, I think that there is a lot that can be done to not only make the entire process easier, but also result in better quality DFIR books being available.  However, my thoughts on this are really a matter for another blog post.

How do you decide what to put in the book?
When you feel like you would like to write a DFIR book, start with an outline.  Seriously.  This helps organize your thoughts, and it also helps you see if there's enough information to put into a book. Preparation is key, and this is true when taking on the task of writing a book.  I've found over time that the more effort I put into the outline and organizing my thoughts ahead of time, the easier it is to write the book.  Because, honestly, we all know how much folks in the DFIR profession like to write...

What do you get from writing a book?
It depends on what you want from writing a book, and what you put into it.  For example, I started writing books because I wanted a reference, some sort of consolidated location for all of the bits and pieces, tips and tricks that I have laying around.

First, let me clear something up...some people seem to think that when you write a book, you make a lot of money.  Yes, there are royalties...most contracts include them...but it's also really easy to sit back and assume what the percentages are and what the checks look like, and the fact is that most people that think like that are wrong.  I've had people ask me why I didn't include information about Windows Mobile Devices in my books...either for full analysis or just the Registry...and they've suggested that I make enough money in royalties to purchase these devices.  If you think that writing a book for the DFIR community is going to make you enough money to do something like that, then you probably shouldn't start down that road.  Yes, a royalty check is nice, but it's also considered taxable income by the IRS, and it does get reported to the IRS, and it does get taxed.  This takes a small amount and makes it smaller.  I'm not complaining...I have engaged in this process with my eyes open...I'm simply stating the facts as they are so that others are aware.

One thing that you do get from writing a book, whether you want it or not, is notoriety.  This is especially true if the book is useful to folks, and looked upon favorably...they get to know your name (the same is also true if the book ends up being really bad).  And yes, this notoriety kind of puts you in a club because honestly, the percentage of folks who have written successful DFIR books is kind of small.  But this can also have a down-side; a lot of people will look at you as unapproachable.   I was told once during an interview process that the potential employer didn't feel that they could afford me...even though we hadn't talked salary, nor salary history...because I'd written books.  I've received emails from people in the industry, some that I've met in person, in which they've said that they didn't feel that they could ask me a question because I'd written books.

What I get from writing books is the satisfaction of completing the book and seeing it on my bookshelf.  I've actually had occasion to use my books as references, which is exactly what I intended them to be.  I've gone back and looked up tools, commands, and data formats, and used that information to complete exams.  I've also been further blessed, in that some of my books have been translated into other languages, which adds color to my bookshelf.

Book Reviews
Book reviews are very important, not just for marketing the book, but because they're one way that the author gets feedback and can decide to improve the book (if they opt to develop another edition).

Book reviews need to be more than "...chapter 1 contains...chapter 2 covers..."; that's a table of contents (ToC), not a book review, and most books already have a ToC.  

A review of a DFIR book should be more about what you can't get from the ToC.  If you review a restaurant on Yelp, do you repeat the menu (which is usually already available online), or do you talk about your experience?  I tend to talk about the atmosphere, how crowded the restaurant was, how the service was, and the food was, etc.  I tend to do something similar when reviewing DFIR books.  The table of contents is going to be the same regardless of who reads the book; what's going to be different is the reader's experience with the book.

When writing a book review and making suggestions or providing input, it really helps (the author, and ultimately the community) to think about what you're suggesting or asking for.  For example, now and again, one of the things I've been asked to add to the WFA book is a chapter on memory analysis.  Why would I do that if the Volatility folks (and in particular, Jamie Levy) have already put a great deal of effort into the tool documentation AND they have a book coming out?  The book is currently listed as being 720 pages long...what would I be able to provide in a chapter that they don't already provide in a much more complete and thorough manner?

Now, I know that not everyone who purchases a book is going to actually open it.  I know this because there are folks who've asked me questions and knowing that they own a copy of the book, I've referenced the appropriate page number in the book.  But if you do have a book, and you have some strong feelings about it (whether positive or negative), I would strongly encourage you to write a review, even if you're only going to send it to the author.  The reason is that if the author has any thought of updating the book to another edition, or writing another book all together, your feedback can be helpful in that regard.  In fact, it could change the direction of the book completely.  If you share the review publicly, and the author has no intention of updating the book, someone else may see your review and that might be the catalyst for them to write a book.  

Saturday, March 22, 2014

Coding for Digital Forensic Analysis

Over the years, I've seen a couple of questions on the topic of coding for digital forensic analysis.  Many times, these questions tend to devolve into a quasi-religious debate over the programming language used, and quite honestly, that detracts from the discussion as a whole, because regardless of the language used, these questions are very often more about deconstructing or reconstructing data structures, processing logs, or simply obtaining context and meaning from the available mass of data.

Programming languages abound, and from what I've seen, the one chosen comes down to personal preference, usually based on experience and knowledge of the language.

I started programming BASIC on the Apple IIe back around 1982.  I typed a couple of BASIC programs into a Timex-Sinclair 1000, and then took the required course in BASIC programming my freshman year in college.  In high school, I took the brand new AP Computer Science course, which focused on using PASCAL...I ended up using TurboPASCAL at home to compile my programs and bring them in on a 5.25 in. floppy drive.  In graduate school, I took a C/C++ programming course, but it didn't involve much more than opening a file, writing to it, and then closing it.  I also did some M68000 assembly language programming. We used MatLab a lot in some of the different courses, particularly digital signal processing and neural networks, and I used it to perform some statistical analysis for my thesis.

In 1999, I started teaching myself to program Perl in order to have something to do.  I was working as a consultant at Predictive Systems, and we didn't have a security practice at the time.  I could hear the network operations team talking about how they needed someone who could program Perl, so I started teaching myself what I could so that maybe I could offer some assistance.  From there, I branched out to interacting with Windows systems through the API provided by Dave Roth's Perl modules.  At one point, while working at TDS, I had a fully-functional product that we were using to replace the use of ISS's Internet Scanner, due to the number of false positives and cryptic responses we were receiving.

Since then, I've used Perl for a wide variety of tasks, from IIS web server log processing to RegRipper to decoding binary data structures.

However, I'm NOT suggesting that Perl is the be-all and end-all of programming languages, particularly for DFIR work. Not at all.  All I've done is provided my experience.  Over the years, other programming languages have been found to be extremely useful.  I've seen R used for statistical analysis, and it makes a lot of sense to use this language for that task.  I've also seen a lot of programming in the DFIR space using C# and .NET, and even more using Python.  I've seen folks switch from another language to Python because "everyone is doing it".  I've seen so much of a use of Python, that I've started learning it myself (albeit slowly) in order to better understand the code, and even create my own.  The list of projects written in Python is pretty extensive, so it just makes sense to learn something about this language.

Defining the Problem
The programming language you use to solve a problem is really irrelevant.  I've got bits of Perl code lying around...stuff for parsing certain data structures, printing binary data to the console in hex editor format, if I need to pull something together quickly, I'm likely going to use a snippet of Perl code that I already have.  I'm sure that I'm no different from any other analyst in that regard.

But when it comes to the particular language or approach I'm using, it depends on what I'm trying to achieve...what are my goals?  If I'm trying to put something together that I'm going to either have a client use, or leave behind after I'm done for the client to use, I may opt for a batch file.  If I need something quickly, but with more flexibility than is offered by a batch file, I may opt for a Perl script.

I've found over time that some of the programming languages used can be difficult to work with...what I mean by that is that some of the tools written and made available by the authors display their results in a GUI, and are closed source.  So, you can't see what the tool is doing, and you can't easily incorporate the output of the tools using techniques like timeline analysis.

Task-Specific Programming
Some folks have asked about learning programming specifically for DFIR work; I'm not sure if there are any.  What it comes down to is, what do you want to do?  If you want to read binary data and parse out data structures based on some definition, then C/C++, C#, Python, Perl, and some other languages work well.  For many, some snippets of code are already available online for these tasks.  If you're trying to process text-based log files, I've found Perl to be well-suited for this task, but if you're more familiar with and comfortable with Python, go for it.

When it really comes down to it, it isn't about which programming language is "the best", it's about what your goal is, and how you want to go about achieving it.

Python tutorial
80+ Best Free Python Tutorials/books
List of free programming books, Python section

Saturday, March 01, 2014

Reconstructing Data Structures

I've posted before on the topic of understanding data structures (here, here, and here), and some recent analysis brought this back to me yet again.  I had an opportunity to make use of my understanding of data structures, specifically within the Windows Registry, in order to attempt to gain some information from a file where the tools normally used by analysts had failed.

The situation was that we had a Windows system that had been compromised...the bad guy had accessed the system using stolen credentials, then used it to move laterally to other systems.  Between this and the response activities, the system had been infected with malware that overwrites and deletes files.  A responder had collected potentially usable files from this system, including the Registry hives from the compromised user account, and had then run some Registry parsing tools against the hives, none of which worked.  Yep.  All of the tools failed...even the viewers.

I was sent a Registry hive file and a list of "strings of interest", and asked to provide some context to those strings, and if possible, when those strings had been created within the hive file.  My first step was to get an idea as to why the tools used to view and parse the hive had reportedly, I opened the file in a hex editor.

The first thing I noticed was that there was no 'regf' header.  In Windows Registry hive files, the first four bytes of the file should read "regf" (or "72 65 67 66" in hexadecimal)...there was no header.  Next, I looked for the first 'hbin' section, which is usually found at offset 0x1000 within the file, and starts with 'hbin'.  In this case, I didn't find an 'hbin' section until I reached offset 0x10000 in the file...and the entire space up to that point was all zeros.  I could tell right away that this wasn't good.  Even worse, when I was finally able to locate a key node structure within the hive file, it wasn't the root node.  In fairly short order, it was easy to see why all of the parsing tools had failed, as none had been able to discern a recognizable file structure.

Each hbin section is 4096 (0x1000) bytes in size, which means that if the first hbin section was located at offset 0x10000 within the file, I was missing over a dozen complete hbin sections.  Not a great way to get started, eh?  As I scrolled through the rest of the file quickly, I could see what looked like legitimate key and value nodes, but I could also see other sections  of the file...large sections...that were full of zeros, as well as some that were full of binary stuff that made no sense whatsoever.  In some cases, I could see sections that contained what appeared be Unicode strings, but there were no discernible structures surrounding those strings.

When I was reading over this article prior to posting it, I tried to imagine that last paragraph as scary as possible, just for effect. I pictured myself reading this out loud like a scout leader telling a bunch of scouts a ghost story around a campfire, or huddled under a blanket with a flashlight. I don't know if that helps get the point across about how badly damaged this file was, or if it was just funny.   I mean, for me, saying, "...the first hbin in the hive file was found at offset 0x10000..." IS a horror story!  Either way, in attempting to provide anything at all, I had my work cut out for me...this was going to be tough.

Some background about myself...I was "trained" as an electrical engineer; that is, my undergraduate studies were in EE.  One of my professors would constantly say that "electrical engineers are inherently lazy", meaning that rather than making a complicated solution, or worse, making wild, unsupported assumptions about what we were looking at, electrical engineers would always seek the simplest solution. He even told us a story once about how a radar system used across the Air Force had been "fixed" by soldering a resistor in parallel to another resistor on a circuit board, rather than replace the entire circuit board.  Seeking a simple solution, I thought it best to seek out some reference material for help.  My reference for this work was/is Windows Registry Forensics; in this case, the lower half of pg 26 (in the soft cover edition).

By now I pretty much knew that I had my work cut out for me.  I had a list of strings of interest, but just the strings.  If I was going to make any sense of these strings at all, I'd have to know where they were located within the file.  So, I ran MS/SysInternals strings.exe with the -o switch, so that I could get the offset of the where the string was located within the file.  Once I had done that, I noted a couple of the strings of interest, and opened the hive file in a hex editor.  I picked one of the strings...the output of strings gives me the offset in decimal, so I converted the offset to hexidecimal...and located it in the file.  I did this with several of the strings, and in each case, my findings fell into one of three categories:

1.  The string was a value name.
Pp. 29 - 31 of WRF cover the structure of a Registry value.  The value node header is 20 bytes long, and starts with a 4-byte (DWORD) value that is the size of the overall structure itself (i.e., the header and the name).  An example value is illustrated in figure 1.

Fig 1: Registry value structure
In figure 1, the value header starts 8 bytes into the listing, with "D8 FF FF FF".  This value translates to -40, and tells us that the structure is 40 bytes in size.  Next, we see "vk", which is the value node identifier.  The next two bytes, "10 00", tell us that the name of the value is 16 bytes long.  The name starts immediately following the value structure, so there is no offset to the name listed in the value header; in this case, we can see the name, "GroupByDirection", which is clearly 16 bytes in size.

The remaining elements of the value header can be found in table 1.2 in WRF.  The value types can be found in table 1.3.

2.  The string was the name of a deleted value.
In several instances, I located the string in question and following the structure of a 'nearby' value, found that the string was indeed the value name.  However, when a node (key, value) is deleted in a Registry hive file, the size value (first DWORD) is converted to a positive number.

Consider figure 1 again...the first DWORD is "D8 FF FF FF", which is equal to -40.  Had the value been deleted, the DWORD would be "28 00 00 00".

3.  The string was value data.
In some cases, I found strings at offsets where there was no 'nearby' value (vk) or key (nk) node.   Instead, immediately before the string was what appeared to be a DWORD indicating a negative size value.  In figure 2, the value is '88 FF FF FF'.

Fig 2: Value Data

The string illustrated in figure 2 is an excerpt of a value data entry that I extracted from one of my own systems, but it illustrates the point very well.  In this case, the first DWORD translates to -120, indicating that the string value is 120 bytes in length.  Again, figure 2 illustrates an excerpt of the Registry value data, not the entire string.

Just to be clear, I wasn't looking for all available strings, and I also did not look at all of the strings of interest. By this point, I had looked at about a dozen of so of the strings of interest, out of several dozen.  I mention this because some of the strings of interest could have been key names, but at this point, none of the strings I'd looked at were, in fact, key names.  I should mention that some of the strings appeared as Unicode strings in those sections of the file that I mentioned earlier in this post...while the string was clearly visible, and of interest in the context of the overall examination, I could not find any discernible structure (Registry key or value node, shell item, etc.) near or surrounding the string.

Now, the Windows Registry is described as a hierarchical database, but it's also something of a singly-linked list of structures.   What I mean by this is that the root key node in a Registry hive file points to other keys (subkeys) and values.  Subkeys point to other keys and values, and values only point to data.  When I say "point to", I'm referring to the offset within the value or key header structure that tells us where the next element is located.  This offset is not measured from the beginning of the file (from 0); rather, it starts at the beginning of the first hbin structure, which is (usually) at offset 0x1000.  Let's say that you find an offset value within a value header structure that points to the data, and the offset is "D8 54 01 00".  Translating endianness, we would look for that data structure at 0x0154D8 + 0x1000 within the hive file, or at offset 0x0164D8 from the beginning of the file.

Because key nodes point to other other key nodes and value nodes, and value nodes point to data, the Registry can be described as a singly-linked list.  By contrast, active processes in memory are maintained as a doubly-linked list...each process points to the next process in the list, as well as the previous process in the list.  In the Registry, a value does not point to it's parent key, nor do keys point to their parent key.  This can make it difficult to reconstruct a damaged Registry hive file, particularly when strings of interest that may be pertinent to the investigation can be seen within the file.

In the instance I was looking at, just scrolling through the hive file, I could see that a good deal of it had been destroyed.  In fact, it looked as if the virus had successfully overwritten entire sectors with zeros, and in other some cases, some of the sectors that made up the rest of the file I was sent did not even contain discernible structures.  I knew that I couldn't start at the beginning of the file and reconstruct the hive file...too much was missing.  One approach might have been to comb through the file and catalog all of the key and value node structures that I could locate (both allocated and unallocated), and then run consecutive scans to (a) locate associated value data, and (b) correlate the values to the appropriate keys.

What I did instead was pick out a couple of the more interesting strings, and locate them within the hive file.  I had the offset to the string from the strings.exe output, so I went to that offset in the file, found the string, and then found the beginning of the structure, of which the string was a member.  I noted the structure type (value, data) and recorded the offset for the structure.  I then subtracted 0x1000 from the offset, reversed the endianness, and searched for the value in the hive file.

Here's an one instance, I located a string that was a value name, and the value structure began at offset 0x164D8 within the file.  Subtracting 0x1000, I got 0x154D8, and reversing endianness, I got "D8 54 01 00".  I then searched for that hex string in UltraEdit.  What that led to was a structure that maintained a list of offsets to values associated with a key.  So, I repeated this process, using the location where this structure started.  What that led me to discover was that the key had been obliterated from hive; it wasn't "deleted" in the sense that the first DWORD had been converted to a positive simply no longer existed in the file.

As laborious as it is, this process can be used to gain some modicum of context and value for the investigator.

When confronted with a hive file as badly damaged as the one I was looking at, there are basically two ways to go about collecting some modicum of information and context from the file.  The first is to comb through the entire file, cataloging each discernible structure, as if you were putting puzzle pieces out on a table.  Each structure would have to include not just the information it contained, but also the offset to where it was located within the file.  Once the scanning process had been completed, the pieces could be assembled much like a puzzle, albeit with a lot of missing pieces.

The other way to go about this would be to do something like what I did...find a string of interest, locate it within the file, determine what type of structure it belonged to (if it was part of a structure...), and attempt to reconstruct the path based on knowledge of the structures.  I opted for this approach because it gave me some answers quickly.

My biggest take away from this exercise was that understanding the structure of what I was looking at allowed me to not only troubleshoot the issue and determine why the tools weren't working, but it also allowed me to provide some information, context, and insight regarding the data when those tools were not able to do so.

Final Thought
One of the comments to one of my previous posts on this topic included the following question:

Another question might be: With all the data structures out there, is it even possible to truly understand them all?

I would suggest that, no, it's not possible for any single analyst to understand all of the structures available on a Windows system.  That's why none of us try to do so...instead, some of us document what we know, in books or by posting to a wiki, and then offer ourselves as resources.  That way, if someone has a question, all they have to do is ask.  So, if an analyst runs across something that they don't understand, they can continue to not understand it, or they can ask someone who appears to know something about the topic, and in fairly short order, get an understanding.