Sunday, April 19, 2015

Micro- & Mini-Timelines

I don't always create a timeline of system activity...but sometimes when I do, I don't have all of the data from within the system image available.  Many times, I will create a mini-timeline because all I have available is either limited data sources, or even just a single data source.  I've been sent Event Logs (.evt files) or a couple of Windows Event Logs (.evtx files), and asked to answer specific questions, given some piece of information, such as an indicator or a time frame.  I've had other analysts send me Registry hive files and ask me to determine activity within a specific time frame, or associated with a specific event.

Mini, micro, and even nano-timelines can assist an analyst in answering questions and addressing analysis goals in an extremely timely and accurate manner.

There are times where I will have a full image of a system, and only create a mini- or nano-timeline, just to see if there are specific indicators or artifacts available within the image.  This helps me triage systems and prioritize my analysis based upon the goals that I've been given or developed.  For example, if the question before me is to determine if someone accessed a system via RDP, I really only need a very limited number of data sources to answer that question, or even just to determine if it can be answered.  I was once asked to determine the answer to that question for a Windows XP system, and all I needed was the System Registry hive file...I was able to show that Terminal Services was not enabled on the system, and it hadn't been.  Again, the question I was asked  (my analysis goal) was, "...did someone use RDP to access this system remotely?", and I was able to provide the answer to that question (or perhaps more specifically, if that question could be answered).

Sometimes, I will create a micro-timeline from from specific data sources simply to see if there are indicators that pertain to the time frame that I've been asked to investigate.  One example that comes to mind is the USN change journal...I'll extract the file from an image and parse it, first to see if it covers the time frame I'm interested in.  From there, I will either extract specific events from that output to add to my overall timeline, or I'll just add all of the data to the events file so that it's included in the timeline.  There are times when I won't want all of the data, as having too much of it can add a significant amount of noise to the timeline, drowning out the signal.

There are times when I simply don't need all of the available data.  For example, consider a repurposed laptop (provided to an employee, later provided to another employee), or a system with a number (15, 25, or more)  of user profiles; there are times that I don't need information from every user profile on the system, and including it in the timeline will simply make the file larger and more cumbersome to open and analyze.

I've also created what I refer to as nano-timelines.  That is, I'll parse a single Windows Event Log (.evtx) file, filter it for a specific event source/ID pair, and then create a timeline from just those events so that I can determine if there's something there I can use.

For example, let's say I'm interested in "Microsoft-Windows-Security-Auditing/5156" events; I'd start by running the Security.evtx file through wevtx.bat:

C:\tools>wevtx.bat F:\data\evtx\security.evtx F:\data\sec_events.txt

Now that I have the events from the Security Event Log in a text file, I can parse out just the events I'm interested in:

C:\tools>type F:\data\sec_events.txt | find "Microsoft-Windows-Security-Auditing/5156" > F:\data\sec_5156_events.txt

Okay, now I have an events file that contains just the event records I'm interested in; time to create the timeline:

C:\tools>parse -f F:\data\sec_5156_events.txt > F:\data\sec_5156_tln.txt

Now, I can open the timeline file, see the date range that those specific events cover, as well as determine which events occurred at a specific time. This particular event can help me find indications of malware (RAT, Trojan, etc.), and I can search the timeline for a specific time frame, correlating outbound connections from the system with firewall or IDS logs.  Because I still have the events file, I can write a quick script that will parse the contents of the events file, and provide me statistics based on specific fields, such as the destination IP addresses of the connections.

What's great about the above process is that an analyst working on an engagement can archive the Windows Event Log file in question, send it to me, and I can turn around an answer in a matter of minutes.  Working in parallel, I can assist an analyst who is neck-deep in an IR engagement by providing solid answers to concrete questions, and do so in a timely manner.  My point is that we don't always need a full system image to answer some very important questions during an engagement; sometimes, a well-stated or well-thought-out question can be used as an analysis goal, which leads to a very specific set of data sources within a system being examined, and the answer to whether that system is in scope or not being determined very quickly.

Analysis Process
Regardless of the size of the timeline (full-, mini-, micro-), the process I follow during timeline analysis is best described as iterative.  I'll use initial indicators...time stamps, file names/paths, specific Registry determine where to start.  From there, I'll search "nearby" within the timeline, and look for other indicators.

I mentioned the tool wevtx.bat earlier in this post; something I really like about that tool is that it helps provide me with indicators to search for during analysis.  It does this by mapping various event records to tags that are easy to understand, remember, and search for.  It does this through the use of the eventmap.txt file, which is nothing more than a simple text file that provides mappings of events to tags.  I won't go into detail in this blog post, as the file can be opened in viewed in Notepad.  What I like to do is provide references as to how the tag was "developed"; that is, how did I decide upon the particular tag.  For example, how did I decide that a Microsoft-Windows-TerminalServices-LocalSessionManager record with event ID 22 should get the tag "Shell Start"?  I found it here.

This is very useful, because my timeline now has tags for various events, and I know what to look for, rather than having to memorize a bunch of event sources and IDs...which, with the advent of Vista and moving into the Window 7 system and beyond, has become even more arduous due to the shear number of Event Logs now incorporated into the systems.

So, essentially, eventmap.txt serves as a list of indicators, which I can use to search a timeline, based upon the goals of my exam.  For example, if I'm interested in locating indications of a remote access Trojan (RAT), I might search for the "[MalDetect]" tag to see if an anti-virus application on the system picked up early attempts to install malware of some kind (I should note that this has been pretty helpful for me).

Once I find something related to the goals of my exam, I can then search "nearby" in the timeline, and possibly develop additional indicators.  I might be looking for indications of a RAT, and while tracking that down, find that the infection vector was via lateral movement (Scheduled Task).  From there, I'd look for at least one Security-Auditing/4624 type 3 event, indicating a network-based logon to access resources on the system, as this would help me determine the source system of the lateral movement.  The great thing about this is that this sort of activity can be determined from just three Windows Event Log files, two Registry hives, and you can even throw in the MFT for good measure, although it's not absolutely required.  Depending on the time frame of response...that is, was the malicious event detected and the system responded to in a relatively short time (an hour or two), or is this the result of a victim notification of something that happened months ago...I may include the USN change journal contents, as well.

My efforts are most often dedicated toward finding clusters of multiple indicators of specific activity, as this provides not only more context, but also a greater level of confidence in the information I'm presenting.  Understanding the content of these indicator clusters is extremely helpful, particularly when anti-forensics actions have been employed, however unknowingly.  Windows Event Logs may have "rolled over", or a malware installer package may include the functionality to "time stomp" the malware files.

Friday, April 10, 2015

Talk Notes

Thanks to Corey Harrell, I was watching the Intro to Large Scale Detection Hunting presentation from the NoLaSec meeting in Dec, 2014, and I started to have some thoughts about what was being said.  I looked at the comments field on YouTube, as well as on David's blog, but thought I would put my comments here instead, as it would give me an opportunity to structure them and build them out a bit before hitting the "send" button.

First off, let me say that I thought the talk was really good.  I get that it was an intro talk, and that it wasn't going to cover anything in any particular depth or detail.  I liked it enough that I not only listened to it beginning-to-end twice this morning, but I also went back to certain things to re-listen to what was said.

Second, the thoughts I'm going to be sharing here are based on my perspective as an incident responder and host/endpoint analyst.

Finally, please do not assume that I am speaking for my employer.  My thoughts are my own and not to be misconstrued as being the policies or position of my employer.

Okay, so, going into this, here are some comments and thoughts based on what I saw/heard in the presentation...

Attribution...does it matter?

As David said, if you're gov, yeah (maybe).  If you're a mom-and-pop, not so much.  I would suggest that during both hunting and IR, attribution can be a distraction.  Why is that?

Let's look at it this way...what does attribution give us?  What does it tell us?  Some say that it informs as to the intent of the adversary, and that it tells us what they're after.  Really?    Many times, an organization that has been compromised doesn't fully understand that they have that's "of value".  Is it data of some kind?  Marketing materials?  Database contents?  Manufacturing specs?  Or, is it the access that organization has to another organization?  If you're doing some hunting, and run across an artifact or indicator that puts you on high alert, how are you able to perform attribution?

Let's say that you find a system with a bunch of batch files on it, and it looks as if the intruder was performing recon, and even dumping credentials from this point, how do you perform attribution?  How do you determine intent?

Here's an example...about 5 yrs ago, I was asked to look at a hard drive from a small company that had been compromised.  Everyone assumed that the intruder was after data regarding the company's clients, but it turned out that this small organization's money, which was managed via online banking.  The intruder had been able to very accurately determine who managed the account, and compromised that specific system with a keystroke logger that loaded into memory, monitored keystrokes sent to the browser when specific web sites were open, and sent the captured keystrokes off of the system without writing them to a file on disk.  It's pretty clear that the bad guy thought ahead, and knew that if the the employee was accessing the online banking web site, that they could just send the captured data off of the system to a remote site.

"If you can identify TTPs, you can..."

...but how do you identify TTPs?  David talked about identifying TTPs and disrupting those, to frustrate the adversary; examples of TTPs were glossed over, but I get that, it's an intro talk.  This goes back to what does something "look like"...what does a TTP "look like"?

I'm not saying that David missed anything by glossing over this...not at all.  I know that there was a time limit to the talk, and that you can only cover so much in a limited time.

Can't automate everything...

No, you can't.  But there's much that you can automate.  Look at your process in a Deming-esque manner, and maybe even look at ways to improve your process, using automation.

"Can't always rely on signatures..."

That really kind of depends on the "signatures" used.  For example, if you're using "signatures" in the sense that AV uses signatures, then no, you can't rely on them.  However, if you're aware that signatures can be obviated and MD5 hashes can be changed very quickly, maybe you can look at things that may not change that often...such as the persistence mechanism.  Remember Conficker?  MS identified five different variants, but what was consistent across them was the persistence mechanism.

This is a great time to mention artifact categories, which will ultimately lead to the use of an analysis matrix (the link is to a 2 yr old blog post that was fav'd on Twitter this morning...)...all of which can help folks with their hunting.  If you understand what you're looking for...say, you're looking for indications of lateral can scope your hunt to what data you need to access, and what you need to look for within that data.

It's all about pivoting...

Yes, it is.

...cross-section of behaviors for higher fidelity indicators...

Pivoting and identifying a cross-section of behaviors can be used together in order to build higher fidelity indicators.  Something else that can be used is to do the same thing is...wait for it...sharing.  I know that this is a drum that I beat that a lot of folks are very likely tired of hearing, but a great way of creating higher fidelity indicators is to share what we've seen, let others use it (and there's no point in sharing of others don't use it...), and then validate and extend those indicators.

David also mentioned the tools that we (hunters, responders) use, or can put to use, and that they don't have to be big, expensive frameworks.  While I was working in an FTE security engineer position a number of years ago, I wrote a Perl script that would get a list of systems active in the domain, and then reach out to each one and dump the contents of the Run key from the Registry, for both the system and the logged on user.  Over time, I built out a white list of known good entries, and ended up with a tool I could run when I went to lunch, or I could set up a scheduled task to have it run at night (the organization had shifts for fulfillment, and 24 hr ops).  Either way, I'd come back to a very short list (half a page, maybe) of entries that needed to be investigated.  This also let me know which systems were recurring...which ones I'd clean off and would end up "infected" all over again in a day or two, and we came up with ways to address these issues.

So, my point is that, as David said, if you know your network and you know your data sources, it's not that hard to put together an effective hunting program.

At one point, David mentioned, "...any tool that facilitates lateral movement...", but what I noticed in the recorded presentation was that there were no questions about what those tools might be, or what lateral movement might look like in the available data.

Once we start looking at these questions and their responses, the next step is to ask, do I have the data I need?  If you're looking at logs, do you have the right logs in order to see lateral movement?  If you have the right sources, are they being populated appropriately?  Is the appropriate auditing configured on the system that's doing the logging?  Do you need to add additional sources in order to get the visibility you need?

"Context is King!"

Yes, yes it is.  Context is everything.

"Fidelity of intel has to be sound" 

Intel is worthless if it's built on assumption, and you obviate the need for assumption by correlating logs with network and host (memory, disk) indicators.

"Intel is context applied to your data"

Finally, a couple of other things David talked about toward the end of the presentation were with respect to post-mortem lessons learned (and feedback loops), and that tribal knowledge must be shared.  Both of these are critical.  Why?

I guess that another way to ask that question is, is it really "cost effective" for every one of us to have to all learn the same lessons on our own?  Think about how "expensive" that may see something and even if I were hunting in the same environment, I may not see that specific combination of events for the next 6 months or a year, if ever.  Or, I may see it, but not recognize it as something that needs to be examined.

Sharing tribal knowledge can also mean sharing what you've seen, even though others may have already "seen" the same thing, for two reasons:  (1) it validates the original finding, and (2) it lets others know that what they saw 6 months ago is still in use.

Consider this...I've seen many (and I do mean, MANY) malware analysts simply ignore the persistence mechanism employed by malware.  When asked, some will say, "...yeah, I didn't say anything because it was the Run key...".  Whoa, wait a didn't think that was important?  Here it is 2015, and the bad guys are still using the Run key for persistence?  That's HUGE!  That not only tells us where to look, but it also tells us that many of the organizations that have been compromised could have detected the intrusion much sooner with even the most rudimentary instrumentation and visibility.

Dismissing the malware persistence mechanism (or any other indicator) in this way blinds the rest of the hunting team (and ultimately, the community) to it's value and efficacy in the overall effort.

Again, this was a very good presentation, and I think serves very well to open up further conversation.  There's only so much one can talk about in 26 minutes, and I think that the talk was well organized.  What needs to happen now is that people who see the presentation start implementing what was said (if they agree with it), or asking how they could implement it.

Danny's blog - Theoretical Concerns
David Bianco's blog - Pyramid of Pain
Andrew Case's Talk - The Need for Proactive Threat Hunting (slides)

Monday, April 06, 2015


I caught an interesting thread on Twitter last week..."interesting" in the sense that it revisited one of the questions I see (or hear) quite a bit in DFIR circles; that is, how does one get started in the DFIR community?  The salient points of this thread covered blogging (writing, in general) and interacting within the community.  Blogging is a great way for anyone, regardless of how long you've been "doing" DFIR, to engage and interact with the community at large.

Writing isn't easy.  I get it.  I'm as much a nerd as anyone reading this blog, and I feel the same way most of you do about writing.  However, given my storied background, I have quite a bit of experience writing.  Even though I was an engineering major in college, I had to take writing classes.  One of my English professors asked if I was an English major, saying that I wrote like one...while handing back an assignment with a C (or D) on it.  I had to write in the military....fitreps, jagmans, etc.  I had jobs in the military that required other kinds of writing, for different audiences.

Suffice to say, I have some experience.  But that doesn't make me an expert, or even good at it.  What I've found is that the needs of the assignment, audience, etc., vary and change.

So how do you get better at writing?  Well, the first step is to read.  Seriously.  I read a lot, and a lot of different things.  I read the Bible, I read science fiction, and I read a lot of first person accounts from folks in special ops (great reading while traveling).  Some of the stuff I've read recently has included:

The Finishing School (Dick Couch) - I've read almost all of the books Mr. Couch as published

Computer Forensics: InfoSec Pro Guide (David Cowen)

Do Androids Dream of Electric Sheep (Philip K. Dick)

I've also read Lone Survivor, American Sniper, and almost every book written by William Gibson.

Another way to get better at writing is to write.  Yep, you read that right.  Write.  Practice writing.  A great way to do that is to open MSWord or Notepad, write something and hand it to someone.  If they say, "....looks good..." and hand it back, give it to someone else.  Get critiqued.  Have someone you trust read what you write.  If you're writing about something you did, have the person reading it follow what you wrote and see if they can arrive at the same end point.  A couple of years ago, I was working with some folks who were trying write a visual timeline analysis tool, and to get started, the first thing the developer did was sit down with my book and walk through the chapter on timelines.  He downloaded the image and tools, and walked through the entire process.  He did this all on his own accord and initiative, and produced EXACTLY what I had developed.  That was pretty validating for my writing, that someone with no experience in the industry could sit down and just read, and the process was clear enough that he was able to produce exactly what was expected.

Try creating a blog.  Write something.  Share it.  Take comments...ignore the anonymous comments, and don't worry if someone is overly critical.  You can ignore them, too.

My point is, get critiqued.  You don't sharpen a knife by letting it sit, or rubbing it against cotton.  The way to get better as a writer, and as an analyst, is to expose yourself to review.  The cool thing about a blog is that you can organize your thoughts, and you can actually have thoughts that consist of more than 140 characters.  And you don't have to publish the first thing you write.  At any given time, I usually have half a dozen or more draft blog posts...before starting this post, I deleted two drafts, as they were no longer relevant or of interest.

Writing allows you to organize your thoughts.  When I was writing fitness reports for my Marines, I started them days (in some cases, weeks) prior to the due date.  I started by writing down everything I had regarding that Marine, and then I moved it around on paper.  What was important?  What was truly relevant?  What needed to be emphasized more, or less?  What did I need to take out completely?  I'd then let it sit for a couple of days, and then come back to it with a fresh set of eyes.  Fitreps are important, as they can determine if a Marine is promoted or able to re-enlist.  Or they can end a career.  Also, they're critiqued.  As a 22 yr old 2ndLt, I had Majors and Colonels reviewing what I wrote, and that was just within my unit.  Getting feedback, and learning to provide constructive feedback, and go a long way toward making you a better writer.

I included a great deal of my experiences writing reports in chapter 9 of Windows Forensic Analysis Toolkit 4/e, and included an example scenario (associated with an image), case notes and report in the book materials.  So, if you're interested, download the materials and take a look.

One of the tweets from the thread:

it's a large sea of DFIR blogs and could be very intimidating to newbies in the field. What can they offer that is not there

Let's break this down a bit.  Yes, there are a lot of DFIR blogs out there, but as Corey tweeted, The majority of the DFIR blogs in my feed are either not active or do a few posts a year.  The same is true in my feed (and I suspect others will see something similar)...there are a number of blogs I subscribe to that haven't been updated in months or even a year or more (Grayson hasn't updated his blog in over two years).  There are several blogs that I've removed, either because they're completely inactive, or about ever 6 months or so, there's a "I know I haven't blogged in a while..." post, but nothing more.

There's no set formula for blog writing.  There are some blogs out there that have a couple of posts a month, and don't really say anything.  Then there are blogs like Mari's...she doesn't blog very much, but when she does, it's usually pure gold.  Corey's blog is a great example of how there's always something that you can write about.

...but I'm a n00b...
The second part of the above tweet is something I've seen many times over the years...folks new to the community say that they don't share thoughts or opinions (or anything else) because they're too new to offer anything of value.

That's an excuse.

A couple of years ago, one of the best experiences in my DFIR career was working with Don Weber.  I had finished up my time in the military as a Captain, and Don had been a Sgt.  On an engagement that we worked together, he was asking me why we were doing certain things, or why we were doing things a certain way.  Don wasn't completely new to the DFIR business, but he was new to the team, and he had fresh perspective to offer.  Also, his questions got me to I doing this because there's a good reason to do so, or am I doing it because that's the way I've always done it?

One of the things that the "...I'm a n00b and have nothing to offer..." leads to is a lack of validation within the community.  What do I mean by that?  Well, there's not one of us in the field who's seen everything that there is to see.  Some folks are new to the field and don't have the experience to know where to look, or to recognize what they're seeing.  Others have been in the field so long that they no longer see what's going on "in the weeds"; instead, all they have access to is an overview of the incident, and maybe a few interesting tidbits.  Consider the Poweliks malware; I haven't had an investigation involving this malware, but I know folks who have.  My exposure to it has been primarily through AV write-ups, and if someone hadn't shared it with me, I never would've known that it uses other Registry keys for persistence, including CLSID keys, as well as Windows services.  My point is that someone new the community can read about a particular malware variant, and then after an exam, say, "...I found these four IOCs that you described, and this fifth one that wasn't in any of the write-ups I read...", and that is a HUGE contribution to the community.

Even simply sharing that you've seen the same thing can be validating.  "Yes, I saw that, as well..." lets others know that the IOC they found is being seen by others, and is valid.  When I read the Art of Memory Forensics, and read about the indicator for the use of a credential theft tool, I could have left it at that.  Instead, I created a RegRipper plugin and looked for that indicator on cases I worked, and found a great deal of validation for the indicator...and I shared that with one of the book authors.  "Yes, I'm seeing that, as well..." is validating, and "...and I'm also seeing this other indicator..." serves to move the community forward.

If you're not seeing blog posts about stuff that you are interested in, reach out and ask someone.  Sitting behind your laptop and wondering, "...why doesn't anyone post about their analysis process?" doesn't inherently lend itself to people posting about their analysis process.  Corey's post about his process, I've done it, Mari's done it...if this is something you like to see, reach out to someone and ask them, "hey, could you post your thoughts/process regarding X?"

As Grayson said, get out and network.  Engage with others in the industry.  Reading a blog is passive, and isn't interacting.  How difficult is it to read a blog post, think about it, and then contact the author with a question, or post a comment (if the author has comments enabled)?   Or link to that blog in a post of your own.

Not seeing content that you're interested in in the blogs you follow?  Start your own blog.  Reach out to the authors of the blogs you follow, and either comment on their blogs or email them directly, and share your thoughts.  Be willing to refine or elaborate on your thoughts, offering clarity.  If you are interested in how someone would perform a specific analysis task, be willing to offer up and share data.  It doesn't matter how new you are to the industry, or if you've been in the industry for 15 years...there's always something new that can be shared, whether it's data, or even just a perspective.

Blogging is a great way to organize your thoughts, provide context, and to practice writing.  Who knows, you may also end up learning something in the long run.  I know I have.

Sunday, March 15, 2015

Perspectives on Threat Intel

A while back, I tweeted, saying that "threat intel has it's own order of volatility".  That tweet got one RT and 2 favorites, and at the time, not much of a response beyond that.  Along the way, someone did disagree with me on that, stating that rather than an "order of volatility", threat intel instead has a "shelf life".

Thinking about it, I can see where both are true.

To begin with, let's consider this "order of volatility"...what am I referring to?  Essentially, what I'm talking about was detailed in 2002, in RFC 3227, Guidelines for Evidence Collection and Archiving.  In short, the RFC states that when collecting evidence, "you should proceed from volatile to the less volatile".  What this means is that when collecting "evidence", you should collect that evidence that is most likely to change first, or soonest.  This is a guiding principle that should be used to direct collection methodologies.

As such, the "order of volatility" itself is guidance, as the definition appears in section 2.1 of the RFC, whereas the discussion of actual collection does not begin until section 3.

The term "shelf life" refers to the fact that threat intel indicators have a time period during which they are useful, or have value.  Notice that I don't say "specified time period", because that can vary.  Those who conduct these types of investigations have seen where a single file, with the same name and stored in the same file system location on two different endpoints, placed in those locations minutes apart, may have different hashes.  Or, during an IR investigation, you may find RATs on two different endpoints that are essentially the same version, but with different C2 domain names.  When doing historical analysis of indicators such as collected hashes and domain names, we see how these are useful for a limited amounted of time; they have a "shelf life".

Several years ago when I was doing PCI investigations, we ran across a file named "bp0.exe".  We'd seen it before, and happened to have a copy of the file that we'd seen 8 months prior to the current investigation.  Both files have the same path within the file system (on two completely different investigations and endpoints, of course), and they had different MD5 hashes.  Using fuzzy hashing, we found that they were 98% similar (using the phraseology somewhat loosely).  This is a good example of how MD5 hashes have a "shelf life" and tend to remain valid for a limited time.

When you consider David Bianco's Pyramid of Pain from a network- or malware RE-perspective, the technical indicators at the lower half of the pyramid (hash values, C2 IP addresses, domain names) definitely have a shelf life; they are only 'good' (valid) for a specified period of time.  These indicators can change between campaigns, or as is often the case, during a campaign.  Those who perform threat intel collection and analysis are also aware that C2 IP addresses and domain names can also change quickly (often within hours or even minutes), and there are organizations that continually monitor this sort of thing and track them (i.e., what IP addresses various domain names resolve to, etc.) over time.

So, when you look at "threat intel" from a perspective that is external to a compromised infrastructure, using open source intel collection (and including analysis of samples from sites such as VirusTotal...), then the indicators do, indeed, have a shelf life.

However, when considering these same threat intel indicators from the perspective of the endpoint systems, we can see how they have an order of volatility, as well, and that delays in detection and response will lead to some of these endpoint indicators becoming more difficult to recover, and even unavailable.  If an installer is run on a system, and deletes itself after infecting the endpoint, then the file (along with the MD5 hash of that file) is gone.  In some cases, this happens so quickly that the installer file itself may not be written to physical disk, so there is literally nothing that can be recovered, or "carved".  When malware reports out to a C2 domain, how long does that persist on the endpoint?  Well, it may depend on the particular API used by the malware to perform that off-system communications.  If the malware was written to create it's own network sockets, the domain name may exist in memory for only a very short time, and persist on the network (this does not include any logs on other endpoints) for an even shorter period of time.  The domain name may be found in the pagefile, but again, this does not mean that the domain name is recorded on the endpoint indefinitely...even a C2 domain name or URL will only persist in the pagefile for so long.

If you consider the Pyramid of Pain from the perspective of endpoints on a compromised infrastructure, the indicators begin to have an "order of volatility", in the manner described in RFC 3227,  Some indicators can persist for varying periods of time on those endpoints; some will persist for only a short time, while others will persist for quite some time.

For example, some indicators may be present in the Security Event Log, and in most cases (that I've dealt with), event records of value have been obviated by the normal operation of the system, simply due to the fact that the Security Event Logs have "rolled over", as older event records have been overwritten by newer ones.  I've received images of systems and the Security Event Log was 22MB in size, and when parsed, contained maybe...maybe...2 days worth of events.  The specific event I was interested in occurred weeks (or in some cases, months) prior to when the image as acquired.

Some data sources...RecentFileCache.bcf file, AppCompatCache data, Prefetch files, for example...can be obviated by the passage of time and the normal operation of the system.   I've seen IR team response procedures that included running multiple tools on a live system, which caused the Prefetch files of interest to be deleted, as new programs were run and new Prefetch files created by the operating system.

Ultimately, it seems that the discussion of "shelf life" versus "order of volatility" depends upon your perspective.  If you're not considering endpoints at all, and only looking at the "threat intel" that we usually see collected, discussed and shared (MD5 hashes, C2 IP address and domains) through open sources, then yes, I believe that it is well understood that the indicators do have a "shelf life"; that is, a C2 IP address or domain may be valid at the time that it was found, but there's nothing to say that it will continue to be valid 6 weeks or 6 months later.

However, from the perspective of the endpoint, it's pretty clear that indicators have an order of volatility all their own, and that order can be impacted (in some cases, significantly) by external stimulus, such as IR data collection procedures, or actions taken by an adversary (or an admin).

Full Disclosure
Like many other organizations, my employer provides threat intelligence to clients, some of which is network- and malware-based, and collected through open sources.  In my role within the company, as an incident responder and digital forensic analyst, I tend to be both a consumer and producer of threat intel that is based on analysis of endpoints.  What I wanted to point out in this post is that there are different perspectives on the issue, and that doesn't mean that any one is wrong.

Addendum, 17 Mar: Looking back on this post, it occurs to me that the "shelf life" description applies to indicators in the bottom half of the pyramid, from a malware RE perspective, and the "order of volatility" description applies to indicators at all levels of the pyramid, from a host or endpoint perspective.

Tuesday, March 10, 2015


Revisiting Macros
Kahu Security posted this recent blog article that was pretty interesting.  I thought that the trick that was used was pretty interesting, and yes, "sneaky"...but part of me was wondering what this sort of thing would "look like" within a system image.  What I mean is, if you're tasked with looking at an image of a system that may have been infected via this sort of trick, what would you look for?

The first thing that jumps out at me is the warning displayed in Word, in the second figure in the post.  Once the user clicks on the "Enable Content" button, a record is created within the user's MSOffice TrustRecords key.  This information can be extract using the RegRipper plugin, or added to a timeline using the plugin.  These plugins were mentioned as part of the HowTo: Determine User Access to Files blog post from July, 2013.

Once the user clicks on the button and the macros are enabled, you'll see the other files described in the blog post created and launched within the file system.  Because the command execution does not persist in memory once the process completes, this is an excellent argument for the use of tools such as Sysmon and Carbon Black.

If some of the files that are part of the infection process are deleted or time stomped, AND you can get to the system relatively quickly, a great resource for analysis is the USN change journal.

As far as recovering/extracting the actual macro itself, take a look at this blog post for some helpful hints and tools.  Also, the folks at OpenDNS posted about Investigating a Malicious Attachment without Reversing.

USN Change Journal
Speaking of files being created on a system (see what I did there?), Mari has a new blog post up where she shares her experience using the USN change journal during analysis.  As you can see in her excellent blog post, the USN change journal remains an excellent resource for obtaining extremely transitory information regarding system activity; while we don't know exactly which process is responsible for various files being created, we can see within a timeline when the various file system activities occurred, and this can not only provide additional context to a timeline, but it can also fill in some gaps in that timeline.

It's great that Mari is sharing her analysis experience with others, as it really helps those of us who may not have the same types of cases as she does, or those who may not approach (or "do") analysis in the same way.   It's the sharing of those experiences, as well as asking questions, that builds a stronger community.

I ran across Matt's recent blog post...what attracted my attention to it were the references ADSs and Powershell.  ADSs are something I've been interested in for quite a long time, and I've included sections in my books that have discussed creating them, running code from within them, and what they "look like" to tools such as Carbon Black.

As to the post, there was an exchange on Twitter regarding the original content of the post, which centered around the use of the phrase "without touching disk"...Matt took the initiative to correct this, as you cannot create an ADS, and at the same time say that you aren't "touching disk".  This is an interesting approach, and Matt's right...under most normal circumstances, at least the circumstances I encounter as an incident responder, something like this would go undetected by sysadmins.  However, this is pretty trivial to address for an incident responder, using various artifacts, some that don't hang around as long (see this blog post), and others that persist for a while longer.

Registry Stuff
I recently ran across this post over on the System Forensics blog...unfortunately, as is the case with many blogs (it seems), comments are turned off, so I have to comment here...

Early on in the post, Patrick says:

Then I ran a few more well known tools and in one case didn’t see some of the entries at all, and in another case saw the entries, but no context was provided.

Interesting...which tools?  Was anything done to contact the author(s)?  Was any data shared so that the author(s) could make the appropriate updates?

At the end of the post, Patrick says:

If you’re simply relying on the output of a tool you’re possibly missing some good information.

He's absolutely right, which is why I strongly recommend that analysts get into the Registry when performing analysis, looking to see what's there.  This is particularly true if there are any issues with tools that don't show you what you expect to see in the output.

This particular MRU isn't something I've seen before, and it is interesting...I can clearly see the value of this data.  If Patrick is willing to share some test data, I'd be more than happy to update the appropriate RegRipper plugin(s).  As the author of RegRipper, I'm fully aware that I don't see everything that is possible to see in the Registry, and as such, I rely on the good will of the DFIR community when it comes to sharing data so that RegRipper plugins can be created or updated.  For example, Eric Zimmerman recently shared some USRCLASS.DAT hive files with me so that I could update the RegRipper plugin to be able to address shell items particular to Windows 8.1; I've updated some of the new folder shell items and I'm working on the MTP device shell items.  I have also taken an opportunity to run the updated plugin against a USRCLASS.DAT hive from Windows 10 TP, but the content is limited.  Therefore, so is the testing.

Also, if Patrick were willing to share what it is he doesn't like about the output of some of the available tools, I'd be happy to consider making changes to RegRipper output, as appropriate.

I submitted two responses...titles and the HTCIA2015 conference "call for papers", and both were accepted.  The conference is in Orlando, FL, at the beginning of September.  Submitting for this conference was an interesting experience, as I started by asking what the attendees might be interested in hearing or seeing, and was told, in short, "yes!"  My presentation titles are "Registry Analysis" and "Lateral Movement".

For the "Lateral Movement" presentation, I plan to discuss various methods of lateral movement within an infrastructure and what they "look like", with respect to the source and destination systems.  I chose this topic as an example, and was told "yes!!"  Also, I've been to conferences before where topics such as this are discussed, but there's been no real discussion or presentation of what the artifacts look like on systems.

I have something of an idea at this point regarding what I'm going to talk about during the "Registry Analysis" presentation, and I'm working on crystallizing it a bit.  At this point, this is going to be an advanced presentation,

My question to you is, if you were to see a 1 hr presentation entitled, "Registry Analysis", what would you hope to get out of it?  What would you look for to be discussed?

Monday, March 02, 2015

How do you "do" analysis?

Everybody remembers "The Matrix", right?  So, you're probably wondering what the image to the right has to do with this article, particularly given the title.  Well, that's easy...this post is about employing various data sources and analysis techniques, and pivoting in order to add context and achieve a greater level of detail in your analysis.  Sticking with just one analysis technique or process, much like simply trying to walk straight through the building lobby to rescue Morpheus, would not have worked.  In order to succeed, Neo and Trinity had to pivot and mutually support each other in order to achieve their collective goal.  So...quite the metaphor for a blog post that involves pivoting, eh?

Timeline Analysis
Timeline analysis is a great technique for answering a wide range of questions.  For malware infections and compromises, timeline analysis can provide the necessary context to illustrate things like the initial infection (or compromise) vector, the window of compromise (i.e., based on when the system was really infected or compromised, if anti-forensics techniques were used), what actions may have been taken following the infection/compromise, the hours during which the intruder tends to operate, and other systems an intruder may have reached to (in the case of a compromise).

Let's say that I have an image of a system thought to be infected with malware.  All I know at this point is that a NIDS alert identified the system as being infected with a particular malware variant based on C2 communications that were detected on the wire, so I can assume that the system must have been infected on or before the date and time that the alert was generated.  Let's also say that based on the NIDS alert, we know that the malware (at least, some variants of it) persists via a Windows service.  Given this little bit of information, here's an analysis process that I might follow, including pivot points:
  1. Load the timeline into Notepad++, scroll all the way to the bottom, and do a search (going up from the bottom) to look for "Service Control Manager/7045" records.
  2. Locate the file referenced by the event record by searching for it in the timeline.  PIVOT to the MFT: parse the MFT, extract the parsed record contents for the file in question in order to determine if there was any time stomping involved.
  3. PIVOT within the timeline; start by looking "near" when the malware file was first created on the system to determine what other activity occurred prior to that event (i.e., what user was logged in, were there indications of web browsing activity, was the user checking their email, etc.)
  4. PIVOT to the file itself: parse the PE headers to get things like compile time, section names, section sizes, strings embedded in the file, etc.  These can all provide greater insight into the file itself.  Extract the malware file and any supporting files (DLLs, etc.) for analysis.
  5. If the malware makes use of DLL side loading, note the persistent application name, in relation to applications used on the system, as well as within the rest of the infrastructure.  
  6. If your timeline doesn't include AV log entries, and there are AV logs on the system, PIVOT to those in order to potentially get some additional detail or context.  Were there any previous attempts to install malware with the same or a similar name or location?  McAfee AV will flag on behaviors...was the malware installed from a Temp directory, or some other location?  
  7. If the system has a hibernation file that was created or modified after the system became infected, PIVOT to that file to conduct analysis regarding the malicious process.
  8. If the malware is known to utilize the WinInet API for off-system/C2 communications, see if the Local Service or Network Service profiles have a populated IE web history (location depends upon the version of Windows being examined). 
  9. If the system you're analyzing has Prefetch files available, were there any specific to the malware?  If so, PIVOT to those, parsing the modules and looking for anything unusual.  
Again, this is simply a notional analysis, meant to illustrate some steps that you could take during analysis.  Of course, it will all depend on the data that you have available, and the goals of your analysis.

Web Shell Analysis
Web shells are a lot of fun.  Most of us are familiar with web shells, at least to some extent, and recognize that there are a lot of different ways that a web shell can be crafted, based on the web server that's running (Apache, IIS, etc.), other applications and content management systems that are installed, etc.  Rather than going into detail regarding different types of web shells, I'll focus just on what an analyst might be looking for (or find) on a Windows server running the IIS web server.  CrowdStrike has a very good blog post that illustrates some web shell artifacts that you might find if an .aspx web shell is created on such a system.

In this example, let's say that you have received an image of a Windows system, running the IIS web server.  You've created a timeline and found artifacts similar to what's described in the CrowdStrike blog post, and now you're read to start pivoting in your analysis.

  1. You find indications of a web shell via timeline analysis; you now have a file name.
  2. PIVOT to the web server logs (if they're available), searching for requests for that page.  As a result of your search, you will know have (a) IP address(es) from where the requests originated, and (b) request contents illustrating the commands that the intruder ran via the web shell.
  3. Using the IP address(es) you found in step 2, PIVOT within the web server logs, this time using the class C or class B range for the IP address(es), to cast the net a bit wider.  This can give you additional information regarding the intruder's early attempts to fingerprint and compromise the web server, as you may find indications of web server vulnerability scans originating from the IP address range.  You may also find indications of additional activity originating from the IP address range(s).
  4. PIVOT back into your timeline, using the date/time stamps of the requests that you're seeing in the web server logs as pivot points, in order to see what events occurred on the systems as a result of requests that were sent via the web shell.  Of course, where the artifacts can be found may depend a great deal upon the type of web shell and the contents of the request.
  5. If tools were uploaded to the system and run, PIVOT to any available Prefetch files, and parse out the embedded strings that point to module loaded by the application, in order to see if there are any additional files that you should be looking to.
Once again, this is simply a notional example of how you might create and use pivot points in your analysis.  This sort of process works not just for web shells, but it's also very similar to the process I used on the IBM ISS ERS team when Chris and I were analyzing SQL injection attacks via IIS web servers; conceptually, there is a lot of overlap between the two types of attacks.

Additional Resources (Web Shells)
Security Disclosures blog post
Yara rules - 1aN0rmus, Loki

Memory Analysis
This blog post from Contextis provides a very good example of pivoting during analysis; in this case, the primary data source for analysis was system memory in the form of a hibernation file.  The case stated with disk forensics, and a hit for a particular item was found in a crash dump file, and then the analyst pivoted to the hibernation file.

Adam did a great job with the analysis, and in writing up the post.  Given that this post started with disk forensics, some additional pivot points for the analysis are available:

  1. Pivoting within the memory dump, the analyst could have identified any mutex utilized by the malware.  
  2. Pivoting into a timeline, the analyst may have been able to identify when the service itself was first installed (i.e., "Service Control Manager" record with event ID 7045).
  3. Determining when the malicious service was installed can lead the analyst to the initial infection vector (IIV), and will be extremely valuable if the bad guys used anti-forensic techniques such as time stomping the malware files to try to obfuscate the creation date.
  4. Pivot to the MFT and extract records for the malicious DLL files, as well as the keystroke log file.  Many of us have seen malware that includes a keylogger component that will continually time stomp the keystroke log file as new key strokes are added to it.  

"Doing" Analysis
I received an interesting question a while back, asking for tips on how I "do analysis".  I got to thinking about it, and it made sense to add my thoughts to this blog post.

Most times, when I receive an image, I have some sort of artifact or indicator to work with..a file name or path, a date/time, perhaps a notice from AV that something was detected.  That is the reason why I'm looking at the image in the first place.  And as a result, producing a timeline is obviated by the questions I need to answer; that is to say, I do not create a timeline simply because I received an image.  Instead, I create a timeline because that's often the best way to address the goals of my exam.

When I do create a timeline, I most often have something to look for, to use as an initial starting or pivot point for my analysis.  Let's say that I have a file that I'm interested in; the client received a notification or alert, and that led them to determine that the system was infected.  As such, they want to know what the malware is, how it got on the system, and what may have occurred after the malware infected the system.  After creating the timeline, I can start by searching the timeline for the file listing.  I will usually look for other events "around" the times where I find the file listed...Windows Event Log records, Registry keys being created/modified, etc.

Knowing that most tools (TSK fls.exe, FTK Imager "Export Directory Listing..." functionality) used to populate a timeline will only retrieve the $STANDARD_INFORMATION attributes for the file, I will often extract and parse the $MFT, and then check to see if there are indications of the file being time stomped.  If it does appear that the file was time stomped, I will go into the timeline and look "near" the $FILE_NAME attribute time stamps for further indications of activity.

One of the things I use to help me with my analysis is that I will apply things I learned from previous engagements to my current analysis.  One of the ways I do this is to use the wevtx.bat tool to parse the Windows Event Logs that I've extracted from the image.  This batch file will first run MS's LogParser tool against the *.evtx files I'm interested in, and then parse the output into the appropriate timeline format, while incorporating header tags from the eventmap.txt event mapping file.  If you open the eventmap.txt file in Notepad (or any other editor) you'll see that it includes not only the mappings, but also URLs that are references for the tags.  So, if I have a timeline from a case where malware is suspected, I'll search for the "[MalDetect]" tag.  I do this even though most of the malware I see on a regular basis isn't detected by AV, because often times, AV will have detected previous malware infection attempts, or it will detect malicious software downloaded after the initial infection (credential dumping tools, etc.).

Note: This approach of extracting Windows Event Logs from an acquired image is necessitated by two factors.  First, I most often do not want all of the records from all of the logs.  On my Windows 7 Ultimate system, there are 141 *.evtx files.  Now, not all of them are populated, but most of them do not contain records that would do much more than fill up my timeline.  To avoid that, there are a list of less than a dozen *.evtx files that I will extract from an image and incorporate into a timeline.

Second, I often work without the benefit of a full image.  When assisting other analysts or clients, it's often too cumbersome to have a copy of the image produced and shipped, when it will take just a few minutes for them to send me an archive containing the *.evtx files of interest, and for me to return my findings.  This is not a "speed over accuracy" issue; instead, it's a Sniper Forensics approach that lets me get to the answers I need much quicker.

Another thing I do during timeline analysis is that I keep the image (if available) open in FTK Imager for easy pivoting, so that I can refer to file contents quickly.  Sometimes it's not so much that a file was modified, as much as it is what content was added to the file.  Other times, contents of batch files can lead to additional pivot points that need to be explored.

Several folks have asked me about doing timeline analysis when asked to "find bad stuff".  Like many of you reading this blog post, I do get those types of requests.  I have to remember that sometimes, "bad stuff" leaves a wake.  For example, there is malware that will create Registry keys (or values) that are not associated with persistence; while they do not lead directly to the malware itself (the persistence mechanism will usually point directly to the malware files), they do help in other ways.  One way is that the presence of the key (or value, as the case may be) lets us know that the malware is (or was) installed on the system.  This can be helpful with timeline analysis in general, but also during instances when the bad guy uses the malware to gain access to the system, dump credentials, and then comes back and removes the malware files and persistence mechanism (yeah, I've seen that happen more than a few times).

Another is that the LastWrite time of the key will tell us when the malware was installed.  Files can be time stomped, copied and moved around the file system, etc., all of which will have an effect on the time stamps recorded in the $MFT.  Depending on the $MFT record metadata alone can be misleading, but having additional artifacts (spurious Registry keys created/modified, Windows services installed and started, etc.) can do a great deal to increase our level of confidence in the file system metadata.

So, I like to collect all of those little telltale IOCs, so that when I do get a case of "find the bad stuff", I can check for those indicators quickly.  Do you know where I get the vast majority of the IOCs I use for my current analysis?  From all of my prior analysis.  Like I said earlier in this post, I take what I've learned from previous analysis and apply it to my current analysis, as appropriate.

Sometimes I get indicators from others.  For example, Jamie/@gleeda from Volatility shared with me (it's also in the book) that when the gsecdump credential theft tool is run to extract LSA secrets, the HKLM/Security/Policy/Secrets key LastWrite time is updated.  So I wrote a RegRipper plugin to extract the information and include it in a timeline (without including all of the LastWrite times from all of the keys in the Security hive, which just adds unnecessary volume to my timeline), and since then, I've used it often enough that I'm comfortable with the fidelity of the data.  This indicator serves as a great pivot point in a timeline.

A couple of things I generally don't do during analysis:
I don't include EVERYTHING into the timeline.  Some times, I don't have everything...I don't have access to the entire image.  Someone may send me a few files ($MFT, Registry hives, Windows Event Logs, etc.) because it's faster to do that than ship the image.  However, when I do have an image, I very often don't want everything, as getting everything can lead to a great deal of information being put into the timeline that simply adds noise.  For example, if I'm interested in remote access to a system, I generally do not include Windows Event Logs that focus on hardware monitoring events in my timeline.

I have a script that will parse the $MFT and display the $STANDARD_INFORMATION and $FILE_TIME metadata in a timeline...but I don't use it very often.  In fact, I can honestly say that after creating it, I haven't once used it during my own analysis.  If I'm concerned with time stomping, it's most often only for a handful of files, and I don't see that as a reason for doubling the size of my timeline and making it harder to analyze.  Instead, I will run a script that will display various metadata from each record, and then search the output for just the files that I'm interested in.

I don't color code my timeline.  I have been specifically asked about this...for me, with the analysis process I use, color coding doesn't add any value.  That doesn't mean that if it works for you, you shouldn't do it...not at all.  All I'm saying is that it doesn't add any significant value for me, nor does it facilitate my analysis.  What I do instead is start off with my text-based timeline (see ch. 7 of Windows Forensic Analysis) and I'll create an additional file for that system called "notes"; I'll copy-and-paste relevant extracts from the full timeline into the notes file, annotating various things along the way, such as adding links to relevant web sites, making notes of specific findings, etc.  All of this makes it much easier for me to write my final report, share findings with other team members, and consolidate my findings.

Wednesday, February 11, 2015


Microsoft recently released an update (KB 3004375) that allows certain versions the Windows OS to record command line options, if Process Tracking is enabled, in the Windows Event Log. Microsoft also recently upgraded Sysmon to version 2.0, with some interesting new capabilities.  I really like this tool, and I use it when I'm doing testing in my lab, to provide more detailed information about what's happening on the system. I like to run both of these side-by-side on my testing VMs, to see the difference in what's reported.  I find this to be very valuable, not only in testing, but also in making recommendations regarding auditing and the use of process creation monitoring tools, such as Carbon Black. Even if you're not able to run something like Cb in your environment, monitoring process creation via the Windows Event Log or the use of Sysmon, and shuttling the records off of the system in to a SEIM, can be extremely valuable.

If you do have Process Tracking enabled in your Windows Event Log, Willi Ballenthin has released a pretty fascinating tool called process-forest that will parse Windows Security Event Logs (Security.evtx) for process tracking events, and assemble what's found into a process tree, sorting by PID/PPID.  Agan, if you've enabled Process Tracking in your logging policy, this tool will be very valuable for displaying the information in your logs in a manner that's a bit more meaningful. If you're a consultant (like me) then having this tool as an option, should the client have the appropriate audit configuration, can provide a quick view of available data that may be very beneficial.
Willi has also released a Python script for parsing AmCache Registry hive files, which were new to Windows 8, and are available in Windows 10.  To get more of an understanding of the information available in this hive file that was first available on Windows 8, check out Yogesh's blog post here, with part 2 here.  RegRipper has had an plugin for over a year.

After reading Jon Glass's blog post on parsing the IE10+ WebCacheV01.dat web history file, I used his code as the basis for creating a script similar to, in that I can now parse the history information from the file and include it in my analysis.  This can be very helpful if I need to incorporate it into a timeline, or if I just want to take a look at the information separately.  Thanks to Jon for providing the example code, and to Jamie Levy/@gleeda for helping me parse out the last accessed time stamp information.  Don't expect anything spectacularly new from this code, as it's based on Jon's code...I just needed something to meet my needs.

The NCC Group has released a tool called "Windows Activity Logger", which produces (per the description on the web site), a three hour rolling window of insight into system activity by recording process creation, along with thread creation, LoadImage events, etc.  The free version of the tool allows you to run it on up to 10 hosts.  I'm not sure how effective a "3 hr rolling window" is for some IR engagements (notification occurs months after the fact) but it's definitely a good tool for testing within a lab environment.  I can also see how this can be useful if you have some sort of alerting going on, so that you're able to respond within a meaningful time, in order to take advantage of the available data.

I was doing some reading recently regarding CrowdStrike's new modules in their CrowdResponse tool to assist with collecting application execution information from hosts.  Part of this included the ability to parse SuperFetch files.  As I dug into it a bit more, I ran across the ReWolf SuperFetch Dumper (read about the tool here).

Speaking of Windows 10, there was a recent post to an online forum in which the OP stated that he'd seen something different in the DestList stream of Windows 10 *.automaticDestinations-ms Jump Lists.  I downloaded the Windows 10 Technical Preview, installed it in VirtualBox, and ran it.  I then extracted the two Jump List files from the appropriate folder and started looking at a hex view of their DestList streams.  Within pretty short order, I began to see that many of the offsets that I'd identified previously were the same as they were for Windows 7 and 8, so I ran my tool, and found that they were pretty much the same...the tool worked just fine, exactly as expected.  As yet, there's been nothing specific from the OP about what they'd seen that was different, but it's entirely possible.  Whenever a new version of Windows comes out, DFIR folks seem to immediately ask, "...what's new?"  Why not instead focus on what's the same?  There seem to be more artifacts that don't change much between versions that there are wildly new structures and formats.  After all, the OLE/structured storage format used by the Jump List files has been around for a very long time.

Dell SecureWorks Tools
WindowsIR: FOSS Tools
Loki - Simple IOC Scanner

Writing Tools
I mentioned during my presentation at OSDFCon that tools like RegRipper come from my own use cases.  I've discussed my motivation for writing DFIR books, but I've never really discussed why I write tools.

Why do I write my own tools?
First, writing my own tools allows me to become more familiar with the data itself.  Writing RegRipper put me in a position to become more familiar with Registry data.  Writing an MFT parser got me much more familiar with the MFT, and forced me to look really hard a some of the short descriptions in Brian's book (File System Forensic Analysis).

Sometimes, I want/need a tool to do something specific, and there simply isn't something available that meets my immediate need, my current use case.  Here's an example...I was once asked to take a look at an image acquired from a system; during the acquisition process, there were a number of sector errors reported, apparently.  I was able to open the image in FTK Imager, but could not extract a directory listing, and TSK fls.exe threw an error and quit before any output was generated.  I wanted to see if I could add file system metadata to a timeline...I was able to use both FTK Imager and TSK icat.exe to extract most of the $MFT file.  Using a Perl script I'd written for parsing the MFT, I was able to incorporate some file system metadata into the timeline...this was something I was not able to do with other tools.

Why do I share the tools I write?
I share the tools I write in the hopes that others will find them useful, and provide feedback or input as to how the tools might be more useful.  However, I know that this is not why people download and use tools.  So, rather than expecting feedback, I now put my tools up on GitHub for two reasons; one is so that I can download them for my own use, regardless of where I am.  Two, there are a very small number of analysts who will actually use to the tools and give me their feedback, so I share the tools for them.

One of the drawbacks of sharing free tools is that those who use them have no "skin in the game".  The tools are free, so it's just as easy to delete them or never use them as it is to download them.  However, there's nothing in the freely available tools that pushes or encourages those who use them to "develop" them further.  Now, I do get "it doesn't work" emails every now and then, and when I can get a clear, concise description of what's going on, or actual sample data to test against, I can see if I can figure out what the issue is and roll out/commit an update.  However, I have also heard folks say, "we couldn't get it to work"...and nothing else.  Someone recently told me that the output of "recbin -d dir -c" was "messed up", and it turns out that what they meant was that the time stamp was in Unix epoch format.

Similarly, those who incorporate free tools into their distributions or courses seem to rarely contribute to the development and extension of those tools.  I know that RegRipper is incorporated into several *nix-based forensic tools distributions, as well as used in a number of courses, and some courses incorporate scenarios and labs into their coursework; yet, it's extremely rare to get something from one of those groups that extends the use of the tool (i.e., even ideas for new or updated plugins).

I am very thankfully to those folks who have shared data; however, it's been limited.  The only way to expand and extend the capabilities of tools like RegRipper and others is to use them thoughtfully, thinking critically, and looking beyond the output to seeing what other data may be available.  If this isn't something you feel comfortable doing, providing the data may be a better way to approach it, and may result in updates much faster.

Why do I write multiple tools?
I know that some folks have difficulty remembering when and how to use various tools, and to be honest, I get it.  There are a lot of tools out there that do various useful things for analysts.  I know that writing multiple tools means that I have to remember which tool to run under which circumstance, and to help me remember, I tend to give them descriptive names.  For example, for parsing Event Log/*.evt files from XP/2003 systems, I called the tool "", rather than "".

My rationale for writing multiple tools has everything to do with my use case, and the goals of my analysis.  In some cases, I may be assisting another analyst, and may use a single data source (a single Event Log file, or multiple files) in order to collect or validate their findings.  In such cases, I usually do not have access to the full image, and it's impractical for whomever I'm working with to share the full image.  Instead, I'll usually get Registry hives in one zipped archive, and Windows Event Logs in another.

In other cases, I may not need all of the data in an acquired image in order to address my analysis goals.  For example, if the question I'm trying to answer is, "did someone access this Win7 via Terminal Services Client?", all I need is a limited amount of data to parse.

Finally, if I'm giving a presentation or teaching a class, I would most likely not want to run a full application multiple times, for each different data source that I have available.

Tool Requests
Every now and then, I get requests to create a tool or to update some of the tools I've written.  A while back, a good friend of mine reached out and asked me to assist with parsing Facebook chat messages that had been parsed out of an image via EnCase...she wanted to get them all parsed and reassembled into a complete conversation.  That turned out to be pretty fun, and I had an initial script turned around in about an hour, with a final polished script finished by the end of the day (about four hours).

One tool I was asked to update is recbin, something I wrote to parse both XP/2003 INFO2 files as well as the $I* files found in the Vista+ Recycle Bin.  I received a request to update the tool to point it to a folder and parse all of the $I* files in that folder, but I never got around to adding that bit of code.  However, when that person followed up with me recently, it took all of about 5 minutes Googling to come up with a batch file that would help with that issue...

@echo off 
echo Searching %1 for new $I* files...
for %%F in (%1\$I*) do (recbin -f %%F)

This isn't any different from Corey Harrell's auto_rip script.  My point is that getting the capabilities and functionality out of the tools you have available is often very easy.  After I sent this batch file to person who asked about it, I was asked how the output could be listed in CSV format, so I added the "-c" switch to the recbin command, and sent the "new" batch file back.

Tool Updates
Many times, I don't get a request for new capabilities to a tool; instead, I find something interesting and based on what I read, I update one of the tools myself.  A great example of this is Brian Baskin's DJ Forensic Analysis blog post; the post was published on 11 Nov, and on the morning of 12 Nov, I wrote three RegRipper plugins, did some quick testing (with the limited data that I had available), and committed the new plugins to the GitHub repository...all before 8:30am.  The three plugins can be used by anyone doing analysis to validate Brian's findings, and then hopefully expand upon them.