Monday, April 09, 2018

Mommy, where do plugins come from?

This is one of those questions kids have been asking their parents throughout history, and in more recent times, those parents may have resorted to a book.  Just sayin'.

Well, they come from three general sources, really, none of which involves a stork or a cabbage leaf.

Recently, there were a couple of requests for functionality to be added to RegRipper.  One was for the ability to automatically update the default profiles in RegRipper.  I was speaking with someone recently, and demonstrating the RegRipper extension that had been added to Nuix's Workbench product.  As part of the discussion, I explained that I do not update the default profiles when I create new plugins (something I've mentioned a number of times in this blog), as I don't want to overwrite any customized profiles folks have made to their installations.  This person then asked, "...can you add the ability to update the default profiles automatically?"  I thought for a minute and realized that rip already has about 2/3 of the code I would need to do exactly that.  So, I opened an editor, used that code to populate a hash of arrays, and then wrote the lists of plugin names, each to their own file.  Boom.  Done.

The other "feature" was for a new plugin to be created.  Someone I know reached out to me to say that they'd found value in a particular Registry key/value during an investigation, and that it might make a good plugin to retrieve the value in question.  This person didn't initially provide any test data, and when they did, it was an exported .reg file; I know it sounds easy enough to handle, but this adds several additional steps (i.e., open a VM, transfer the .reg file to it, import the .reg file into a hive, then shut down the VM, open the .vmdk file in FTK Imager, and extract the hive...), as well as a level of uncertainty (are there variations based on the version of Windows, etc.), to the testing process.

Not having data to test on makes it difficult to write a plugin, as well as test the plugin before releasing it. 

Intel from IR engagements
So, where do other plugins come from?  Similar to the request for the new plugin, sometimes I'll find something during an examination that might make a good plugin.  For example, during an examination, we found a Registry value of interest, and as such, I added the key LastWrite times from the user's NTUSER.DAT hive to a timeline, using regtime.exe.  For context, I did the same for the Software hive from that system, and as an aside, found some interesting Registry keys/values associated with the installed AV product.  In this instance, the keys and values were related to the responder's activities, but writing a plugin (or two) to extract data from the Registry keys/values would facilitate research activities, and as such, make it easier to determine their nature and context.

Interestingly enough, this is how RegRipper started, and was the source of most of the plugins I've written in the past decade.

Online research
Another source of plugins is OSINT, and online research.

For example, FireEye recently released their 2018 M-Trends report, and page 23 includes a Registry key that an attacker modified to hide their activity, by adding a folder to the AV exclusion list.  If I had data on which to test a plugin, I'd write one; online research indicates that there's a key within the path that may vary, and as such, I'd need a bit better understanding of the path in order to write a useful plugin. 

Oddly enough, I don't think I have ever received a request for a plugin based on something published online, via a blog post, or an annual report.  I don't see (or know) everything, and it's likely that I may simply have not seen that post or report. 

Another analysis aspect that RegRipper can be used for is the check or verify system configuration.  For example, see this Microsoft documentation regarding making remote calls to the local SAM database; including a plugin to extract these values may help an analyst narrow down the original attack vector, or at least identify possibilities.

Friday, April 06, 2018


Based on some feedback I received recently, I updated RegRipper's (and the corresponding .exe, of course) to include the "-uP" switch.

What this switch does is run through all of the plugins, determine to which hives they apply, and the automatically update the default profiles with those plugins.  As I've stated in the past, when I create a new plugin, I do not update that appropriate profile...I just add the plugin to the repository.  If you want to run all of the plugins available for the NTUSER.DAT hive against all NTUSER.DAT hives you get, run rip.exe with the "-uP" switch, and the profile named "ntuser" will include all of those plugins, EXCEPT those with "_tln" in their name.

This switch will create or overwrite (if they already exist) profiles named for the hives (lower case, without ".dat" at the end).  This does NOT affect any custom profiles you've created, unless they use the same names.

I recently received a request from someone to create a new plugin to retrieve the IE Search Scopes.  The '' plugin was added to the repository today, along with the updated mentioned above.

Something I wanted to point out about both of these updates...they started with someone asking.  That's it.  I don't use RegRipper's profiles for the analysis work I do, but I know that others do.  If you use the GUI, then that's pretty much what you use profiles, rather than individual plugins.  The profiles are also valuable when using the RegRipper extension added to the Nuix Workbench product (fact sheet); the extension relies on a mapping of the hive type to the RegRipper profile.  You can edit/update the profiles themselves, or you can create your own custom profiles and edit the mapping file (JSON, just open it in Notepad...).  I showed the extension to someone, and they asked, "hey, can you create a tool that automatically updates the default profiles?"

The same is true with the plugin...someone said, hey, there's this thing that I found useful during an investigation, it might make a good plugin.  Boom.  Done. 

If you've been thinking about something along these lines, or trying to find a way to do it manually, maybe there's a way to do it in an automated fashion.  Sometimes, the smallest interaction can lead to a big result.  Don't isolate yourself on your own island.

Saturday, March 31, 2018

DFIR Analysis and EDR/MDR Solutions

There is only so much the DFIR analysis can do.

There it is, I said it.  And it's especially true when the DFIR analysis is the result of external third party incident notification, which we ultimately determine to come months after the incident originally occurred.

Some artifacts exist forever.  Until they don't.  Some artifacts are recorded and exist for an unspecified and indeterminate period of time...nanoseconds, microseconds, weeks, or months.  Processes execute, finish, and the memory they used is freed for use by another process.  Text files exist until they are deleted, and the last modification times on the files remain the same until the next time they're modified.  Windows Event Logs record events, but some event logs "roll over" more quickly than others; events in some may exist for only a few days, while events in others may exist for weeks or even months.  As time passes, artifact clusters corrode to the point where, by the time DFIR analysts get the data for analysis, their ability to definitively answer questions is severely hampered.

The 2017 Ponemon Institute Data Breach Report indicates an average "dwell time" (the time between initial breach and discovery of the breach) of 191 days.  The Nuix 2018 Black Report findings indicate that professional red teamers/pen testers report that they can target, compromise, and exfil data within 15 hrs.  Hardly seems fair, particularly when you consider that if legitimate, scheduled pen tests go undetected, what chance do we have of detecting an unscheduled, uninvited intruder?

Some artifacts are created, are extremely transient (although they do exist), but are never recorded.  An example of this is process command lines; if an adversary runs the "whoami" command as part of their initial attempts to orient themselves, that process exists for a very short time, and then the memory used gets freed for later use.  By default, this is not recorded, so it ceases to exist very quickly, and the ceases to be available a short time later.   The same is true when an intruder runs the command "net user /add" to create a user account on the system; the command runs, and the command line no longer exists.  Yes, the user account is created, so the results of the command persist...but the command line itself, which likely included the password for the account, is no longer available.  Finally, when the adversary stages files or data for exfiltration, many times they'll use rar.exe (often renamed) to archive the collected data with a password...the command line for the process will include the password, but once the command has completed, the plain text password issued at the command line is no longer available.

Several years ago, I was working a targeted threat response engagement, and we'd observed the adversary staging data for exfiltration.  We alerted on the command line was rar.exe, although the executable had been renamed.  The full command line included the password that the adversary used, and was recorded by the EDR/MDR solution.  We acquired an image of the system, and through analysis determined that the archive files were no longer visible within the "active" file system.  As such, we carved unallocated space for RAR archives and were able to open the twelve archives we retrieved, using the password recorded in the command line used to archive the files.  The web server logs definitively illustrated that the files had been exfiltrated (the IIS logs included the requested file name, number of bytes transferred, and the success code of "200"), and we had several of the archives themselves; other deleted archives have enough sectors overwritten that we were not able to successful recover the entire files.

That's a great example of how an EDR/MDR solution can be so powerful in today's world.  Also, consider this recent Tanium blog post regarding the Samsam ransomware, the same ransomware family that recently hit the City of Atlanta...using the EDR/MDR solution to detect malicious actor activity prior to them deploying the ransomware, such as during their initial orientation and recon phase, means that you have a better chance of inhibiting, hampering, or completely obviating their end goal.

Finally, EDR/MDR solutions become budget items; I'm pretty sure that Equifax never budgeted for their breach, or budgeted enough.  So, not only do you detect breaches early in the attack cycle, but with an IR plan, you can also respond in such a way as to prevent the adversary from accessing whatever it is they're after, obviating compliance and regulatory issues, notification, and keeping costs down.

Monday, March 26, 2018

Even More DFIR Brain Droppings...

Something I've seen over the years that I've been interested in addressing is the question of "how do I analyze X?"  Specifically, I'll see or receive questions regarding analysis of a particular artifact on Windows systems, such as " do I analyze this Windows Event Log?" and I think that after all this time, this is a good opportunity to move beyond the blogosphere to a venue or medium that's more widely accessible.

However, addressing the issue here, the simple fact is that artifacts viewed in isolation are without context.  A single artifact, by itself, can have many potential meanings.  For example, Jonathon Poling correctly pointed out that the RemoteConnectionManager/1149 event ID does not specifically indicate a successful login via Terminal Services; rather, it indicates a network connection.  By itself, in isolation, any "definitive" statement made about the event, beyond the fact that a network connection occurred, amounts to speculation.  However, if we know what occurred "near" that event, with respect to time, we can get enough information to provide come much needed context.  Sure, we can add Security Event Log records, but what if (and this happens a lot) the Security Event Log only goes back a day or two, and the event you're interested in occurred a couple of months ago?  File system metadata might provide some insight, as would UserAssist data from user accounts.  As you can see, as we start adding specific data sources...not willy-nilly, because some data sources are more valuable than others under the circumstances...we begin to develop context around the event of interest.

The same can be said for other events, such as a Registry key LastWrite time...this could indicate that a key was modified by having a value added, or deleted, or even that the key had been created on that date/time.  In isolation, we don't know...we need more context.  I generally tend to look to the RegBack folder, and then any available VSCs for that additional context.  Using this approach, I've been able to determine when a Registry key was most likely modified, versus the key being created and first appearing in the Registry hive.

As such, going back to the original questions, I strongly recommend against looking at any single artifact in isolation.  In fact, for any artifact with a time stamp, I strongly recommend developing a timeline in order to see that event in context with other events. 

LNK Shell Items...what's old is new again
It's great to see LNK Shell Items being discussed on the Port139 blog, as a lot of great stuff is being shared via the blog.  It's good to see stuff that's been discussed previously being raised and discussed again, as over time, artifacts that we don't see a lot of get forgotten and it's good to revisit them.  In this case, being able to parse LNK files is a good thing, as adversaries are using LNK files for more than just simple persistence on systems.  For example, they've have been observed sending LNK files to their intended victims, and as was described by JPCERT/CC about 18 months ago, those files can provide clues to the adversary's development environment. LNK files have been sent as attachments, as well as embedded in OLE objects; both can be parsed to provide insight into not just the adversary's development environment, but also potentially to track a single actor/platform used across multiple campaigns.

Another use for parsing LNK files is that adversaries can also use them for maintaining the ability to return to a compromised environment, by modifying the icon filename variable to point to a remote system.  The adversary records and decrypts authentication attempt, and gets passwords that may have changed.

Something else that hasn't been discussed in some time is the fact that shell items can be used to point to devices attached to a system.  Sure, we know about USB devices, but what about digital cameras, and being able to determine the difference between possession of images, and production?

EDR Solutions
Something I've encountered fairly regularly throughout my DFIR experience is Locard's Exchange Principle. I've also blogged and presented on the topic, as well.  Applied to DFIR, this means that when an adversary connects to/engages with a system on a compromised infrastructure, digital material is exchanged between the two systems.  Now, this "digital material" may be extremely transient and persist for only a few micro-seconds, but the fact is that it's there.  As most commercial operating systems are not created with digital forensics and incident response in mind, most (if not all) of these artifacts are not recorded in any way (meaningful or otherwise).  This is where EDR solutions come in.

For the sake of transparency, I used to work for a company that created endpoint technology that was incorporated into its MSSP offering.  My current employer includes a powerful EDR product amongst the other offerings within their product suite.

For something to happen on a system, something has to be executed.  Nothing happens, malicious or otherwise, without instructions being executed by the CPU.  Let's say that an adversary is able to get a remote access Trojan (RAT) installed on a system, and then accesses that system.  For this to occur, something needed to have happened, something that may have been extremely transient and fleeting.  From that point, commands that the adversary runs to, say, perform host and network reconnaissance,  will also be extremely transient.

For example, one command I've seen adversaries execute is "whoami".  This is a native Windows command, and not often run by normal users.  While the use of the tool is not exclusive to adversaries, it's not a bad idea to consider it a good indicator.  When the command is executed, the vast majority of the time involved isn't in executing the command, but rather in the part of the code that sends the results to the console.  Even so, once the command is executed, the process block in memory is freed for use by other processes, meaning that even after a few minutes, without any sort of logging, there's no indication that this command was ever executed; any indication that the adversary ran the command is gone.

Now, extend this to things like copy commands (i.e., bad guy collects files from the local system or remote shares), archival commands (compressing the collected files into a single archive, for staging), exfiltration, and deletion of the archive.  All of these commands are fleeting, and more importantly, not recorded.  Once the clean-up is done, there're few, if any, artifacts to indicate what occurred, and this is something that, as many DFIR practitioners are aware, is significantly impacted by the passage of time.

This is where an EDR solution comes in.

The dearth of this instrumentation and visibility is what leads to speculation (most often incorrect and over exaggerated) about the "sophistication" of an adversary.  When we don't see the whole picture, because we simply do not accept the fact that we do not have the necessary instrumentation and visibility, we tend to fill the gaps in with assumption and speculation.  We've all been in meetings where someone would say, "...if I were the attacker, this is what I would do...", simply because there's no available data to illustrate otherwise.  Also, it's incredibly easy under such circumstances for us to say that the attacker was "sophisticated", when all they really did was modify the hosts file, and then create, run, and then delete an FTP script file.

Why does any of this matter?

Well, for one, current and upcoming legislation (i.e., GDPR) levies 'cratering' fines for breaches; that is, fines that have what can be a hugely significant impact on the financial status of a company.  If we continue the way we're going now...receiving external notification of an intrusion weeks or months after the attack actually occurred...we're going to see significant losses, beyond what we're seeing now.  Beyond paying for a consulting firm (or multiple firms) to investigate the breach, along with loss of productivity, reporting/notification, law suits, impact to brand, drop in stock price, there are these huge fines. 

Oh, and the definition of a breach includes ransomware, so yeah...there's that.

And all of these costs, both direct and indirect, are included in the annual budget for companies...right?  We sit down at a table each year and look at our budget, and take a swag...we're gonna have five small breaches and one epic "Equifax-level" breach next year, so let's set aside this amount of money in anticipation...that actually happens, right?

Why not employ an EDR solution?  It's something you can plan for, include in your budget...costs are known ahead of time.  The end result is that you detect breaches early in the attack cycle, obviating the need to report.  In addition, you know have the actual data to definitively demonstrate that 'sensitive data' was NOT accessed, so why would you need to notify?  If client data is not accessed, and you can demonstrate that, why would you need to notify? 

Recently, following a ransomware attack, an official with a municipality in the US stated that there was "no evidence" that sensitive data was accessed.  What should have been said was that there was simply no evidence...the version of ransomware that impacted that municipality was one that is not email-borne; delivery of the ransomware required that someone access the infrastructure remotely, locate the servers, and deploy the ransomware.  If all of this occurred and no one noticed until the files are no longer accessible and the ransom note was displayed, how can you then state definitively that sensitive data was not accessed?

You can't.  All you can say is that there is "no evidence".

It's almost mid-year 2018...if you don't already have a purchase of an EDR product planned, rest assured that you'll be part of the victim pool in the coming year.

Sunday, March 25, 2018

More DFIR Brain Droppings

Ransomware (TL;DR)
This past week, the City of Atlanta was hit with ransomware. This is not unusually...I've been seeing a lot of municipalities in the US...cities, counties, etc...getting hit, since last fall.  Some of them aren't all that obvious to anyone who isn't trying to obtain government services, of course, and the media articles themselves may not be all that easy to find.  Here are just a few examples:

Mecklenburg County update
Spring Hill, TN update
Murfreesboro, TN update

Just a brief read of the above articles, and maybe doing a quick Google search for other articles specific to the same venues will quickly make it clear how services in these jurisdictions are affected by these attacks.

The ArsTechnica article that covered the Atlanta attack mentioned that the ransomware used in the attack was Samsam.  I've got some experience with this variant, and Kevin Strickland wrote a great blog post about the evolution of the ransomware itself, as well.  (Note: both of those blog posts are almost 2 yrs old at this point).  The intel team at SWRX followed that up with some really good information about the Samas ransomware campaigns recently. The big take-away from this is that the Samsam (or Samas) ransomware has not historically been email-borne. Instead, it's delivered (in most cases, rather thoughtfully) after someone has gained access to the organization, escalated privileges (if necessary), located the specific systems that they want to target, and then manually deployed the ransomware.  All of these actions could have been detected early in the attack cycle. Think of it this way...instead of being delivered via mail (like a letter), it's more like Amazon delivery folks dropping the package...for something that you never in your house...only you didn't have one of those special lock-and-camera combinations that they talked about, there was just a door or window left open.  Yeah, kind of like that.

On the topic of costs, the CNN article includes:

The Federal Bureau of Investigation and Department of Homeland Security are investigating the cyberattack...


The city engaged Microsoft and a team from Cisco's Incident Response Services in the investigation...

Okay, so we're getting a lot of help, that's good.  But they're getting a lot of help, and that's going to be expensive.

Do you, the reader, think that when the staff and leadership for the City of Atlanta sat down for their budget meetings last year, that they planned for this sort of thing?  When something like this occurs, the direct costs include not only the analyst's time, but food, lodging, etc.  Not only does it add up over time, but it's multiplied...$X per analyst per day, times Y analysts, times Z days.

From the USAToday article on the topic:

Such attacks are increasingly common. A report last year from the Ponemon Institute found that half of organizations surveyed had had one or more ransomware incidents in 2017, and 40% had experienced multiple attacks.   

An IBM study found that 70% of businesses have been hit with ransomware. Over half of those paid more than $10,000 to regain their data and 20% paid more than $40,000.

In January, an Indiana hospital system paid a $50,000 ransom to hackers who hijacked patient data. The ransomware attack accessed the computers of Hancock Health in Greenfield through an outside vendor's account Thursday. It quickly infected the system by locking out data and changing the names of more than 1,400 files to "I'm sorry."

Something else that's not addressed in that Ponemon report, or at least not in the quote from the USAToday article, is that even when the ransom is paid, there's no guarantee that you'll get your files back.  Just over a year ago, I did some DFIR analysis work for a client that paid the ransom and didn't get all of their files back, and the paid us to come in and try to provide some answers.  So the costs are stacking up.

What about other direct and/or indirect costs associated with ransomware attacks?  There are hits to productivity and the ability to provide services, costs of downtime, costs to bring consultants in to assist in discovery and recovery, etc.  While we don't have the actual numbers, we can see these stacking up.  But what about other costs?  Not too long ago, another municipality got hit with ransomware, and a quote from one of the articles was along the lines of, "...we have no evidence that sensitive information was accessed."

Yes, of course you don't.  There was no instrumentation, nor visibility, to detect the early stages of the attack, and as a consequence, no evidence of anything about the attack was recorded.  On the surface that sounds like a statement meant to comfort those whose personal data was held by that organization, but focusing just 1nm beyond that statement reveals an entirely different picture; there is "no evidence" because no one was watching.

So what can we expect to see in the future?  We're clearly going to see more of these attacks, because they pay; there is a monetary incentive to conducting these attacks, and an economic benefit (to the attacker) for doing so.  As such, there is no doubt in my mind that two things are going to happen; one is that with the advent of GDPR and other similar legislation, this is going to open up a whole new avenue for extortion.  Someone's going to gain access to an infrastructure, collect enough information to have solid proof that they've done so, and use that as their extortion.  Why is this going to work?  Because going public with that information is going to put the victim organization in the legislative spotlight in a way that they will not be able to avoid.

The other thing that's going to happen is that when statements regarding access to 'sensitive data' are made, there are going to be suits demanding proof.  I don't think that most individuals (regardless of country) have completely given up on "privacy" yet.  Someone...or a lot of going to go to court to demand definitive answers.  A hand-waving platitude isn't going to be enough, and in fact, it's going to be the catalyst for more questions.

Something else I thought was pretty interesting from the CNN article:

When asked if the city was aware of vulnerabilities and failed to take action, Rackley said the city had implemented measures in the past that might have lessened the scope of the breach. She cited a "cloud strategy" to migrate critical systems to secure infrastructure.

This is a very interesting question (and response), in part because I think we're going to see more of questions just like this.  We will never know if this question was a direct result of the Equifax testimony (by the former CEO, before Congress), but I do think that it's plausible enough to assume that that's the case.

And the inevitable has occurred...the curious have started engaging in their own research and posted their findings publicly.  Now, this doesn't explicitly mean that this is the avenue used by the adversary, but it does speak to the question from the CNN article (i.e., "...were you aware of vulnerabilities...").

Attacker Sophistication
Here's a fascinating GovTech article that posits that some data breaches may be the result of professional IT pride.  As I read through the article, I kept thinking of the Equifax breach, which reportedly occurred for want of a single patch, and then my mind pivoted over to what was found recently via online research conducted against the City of Atlanta.

From the article, "’s sometimes an easy out to say: “the bad guys are just too good.” "  Yes, we see that a lot...statements are made without qualification about the "sophisiticated" attacker, but for those of us who've been in the trenches, analyzed the host and log data, and determined the initial avenue that the attacker used to get into the "sophisticated" does one have to be to guess that password for an Internet-accessible Terminal Services/RDP account when that password is on every password list?  Or to use a publicly available available exploit against an unpatched, unmanaged system?  "Hey, here's an exploit against JBoss servers up through and including version 6, and I just found a whole bunch of JBoss servers running version 4...hold my beer."

In my experience, the attacker only needs to be as sophisticated as he needs to be.  I worked an engagement once where the adversary got in, collected and archived data, and exfiltrated the archive out of the infrastructure.  Batch files were left in place, and the archive was copied (not moved) from system to system, and not deleted from the previous location.  The archive itself didn't have a password on it.  Someone said that the adversary was sloppy and not sophisticated.  However, the client had been informed of the breach via third party notification, weeks after the data was taken.

Tool Testing
Daniel Bohannon has a great article up about testing your tools.

I'd like to add to his comments about tool testing...specifically, if you're going to effectively test your tools, you need to understand what the tools do (or at least what they're supposed to do...) and how they work.  Perhaps not at a bit-level (you don't have to be the tool developer), but there are some really simple troubleshooting steps you can follow if you have an issue with a tool and you feel that you want to either ask a question, or report the issue.

For one...and Jamie Levy can back me up on this..."don't work" don't work.  What I mean is, if you're going to go to a tool developer (or you're going to bypass the developer and just go public), it's helpful to understand how the tool works so that you can describe how it appears to not be working.  A long time ago, I learned that Volatility does NOT work if the 8Gb memory dump you received is just 8Gb of zeroes.  Surprising, I know.

Oddly enough, the same is true for RegRipper; if you extract a hive file from an image, and for some reason it's just a big file full of zeroes, the RegRipper will through errors.  This is also true it you use reg.exe to export hive files, or portions thereof.  RegRipper is intended to be run against the raw hive file, NOT text files with a .reg extension.  Instead of using "reg export", use "reg save".

The overall point is that it's important to test the tools you're using, but it's equally important to understand what the tools are supposed/designed to do.

Speaking of tools, not long ago I ran across a reference to Rattler, which is described as, "...a tool that automates the identification of DLL's which can be used for DLL preloading attacks."  Reading through the blog post that describes the issue that Rattler addresses leads me to believe that this is the DLL search order hijacking issue, in that if you have malicious DLL in the same folder as the executable, and the executable calls a DLL with the same name that's found in another folder (i.e., C:\Windows\system32), then your malicious DLL will be loaded before the legit DLL.

Monday, March 19, 2018

DFIR Brain Droppings

Live Response
It's been a while since I posted anything on the topic of live response, but I recently ran across something that really needed to be shared as widely as possible.

Specifically, Hadar Yudovich recently authored an article on the Illusive Networks blog about finding time stamps associated with network connections. His blog post is pretty fascinating, as he says some things that are probably true for all of us; in particular, we'll see a native tool (such as netstat.exe), and assume that the data that the tool presents is all that there is.  We simply don't remember that MS did not create an operating system with DFIR in mind.  However, Hadar demonstrates that there is a way to get time stamps for network connections on Windows systems, and wrote a Powershell script to do exactly that.

I downloaded and ran the Powershell script from a command prompt (not "Run as administrator") using the following command line:

powershell -ExecutionPolicy Bypass .\Get-NetworkConnections.ps1

The output is in a table format, but for anyone familiar with Powershell, I'm sure that it wouldn't be hard to modify the output to CSV, or some other format.  Either way, this would be a great addition to any volatile data collection script.

This is pretty cool stuff, and I can see it being included in volatile data collection processes going forward.

Processes and Procedures
Speaking of collecting volatile data...

A couple of things I've seen during my time as an incident responder is (a) a desire within the community for the "latest and greatest" volatile data collection script/methodology, and (b) a marked reticence to document processes and procedures.  I mention these two specifically because from my perspective, they seem to be diametrically opposed; after all, what is a volatile data collection script but a documented process?

Perhaps the argument I've heard against documented processes over the years is that having them "stifles creativity".  My experience has been the documenting my analysis process for activities such as malware detection within an acquired image, I'm able to ultimately spend more time on the fun and interesting aspects of analysis.  Why is that?  Well, for me, a documented process is a living document, one that is continually used and updated as necessary.  Also, the documented process serves as a means for automation.  As such, as I learn something new, I add it to the process, so that I don't forget something..God knows that most days, I can't remember what I had for breakfast, so how am I going to remember something that I read (didn't find or do as part of my own analysis) six months ago?  The simple fact is that I don't know everything, and I haven't seen everything, but I can take those things I have seen, as well as what others have seen (culled together from blog posts, etc.) and incorporate them into my process.  Having the process automated means that I spend less time doing those things that can be automated, and more time actually investigating those things that I need to be...well...investigating.

An example of this is eventmap.txt; I did not actually work the engagement where the event source/ID pair for "Microsoft-Windows-TaskScheduler/709" event record was observed.  However, based on what it means, and the context surrounding the event being recorded, it was most definitely something I wanted to incorporate into my analysis process.  Even if I never see the event again in the next 999 cases I work, that 1000th case where I do see it will make it worth the effort to document it.

Documenting processes and procedures for many is a funny thing.  Not long ago, with a previous employer, I was asked to draft analysis processes and workflows, because other analysts said that they didn't "have the credibility".  Interestingly enough, some of those who said this are now active bloggers.  Further, after I drafted the workflows, they were neither reviewed, nor actually used.  However, I found (and still find) a great deal of value in having a documented process or workflow, and I continue to use and develop my own.

All this talk of processes and workflows logically leads to questions of, where do I get the data and information that I turn into intelligence and incorporate into my workflows?  Well, for the most part, I tend to get the most intel from cases I work.  What better way to go from GB or TB of raw data to a few KB of actual information, with the context to turn that information into intelligence, than to do so working actual DFIR cases?  Ultimately, that's where it all starts, right?

So, we can learn from our own cases, but we can also learn from what others have learned and shared.  Ah, that's the key, though, isn't it...sharing the information and intelligence.  If I learn something, and keep it to myself, what good is it?  Does it mean that I have some sort of "power"?  Not at all; in fact, it's quite the opposite.  However, if I share that information and/or intelligence with others, then you get questions and different perspectives, which allows us to develop and sharpen that intelligence.  Then someone else can use that intelligence to facilitate their analysis, and perhaps include additional data sources, extending the depth and value of the intelligence. As such, pursuing OSINT sources is a great way to not only further develop your own intel, but to develop indicators that you can then use to further your analysis, and by extension, furthering your intel. 

This recent FireEye blog post is a great example (it's one, there are many others) of OSINT material.  For example, look at the reference to the credential theft tool, HomeFry.  This is used in conjunction with other tools; that alone is pretty powerful.  In the past, we've seen a variety of sources say things like, " X uses Y and Z tools..." without any indication as to how the tools are used.  In one of my own cases, it was clear that the adversary used Trojan A to get on an initial jump system, and from there, installed Trojan B, and the used that as a conduit to push our Trojan B to other systems.   So, it's useful to know that certain tools are used, but yes, those tools are easily changed.  Knowing how the tools are used is even more valuable. There's also a reference to lure documents that exploit the CVE-2017-11882 vulnerability; here's the PaloAlto Networks analysis of the exploit in the wild (they state that they skipped some of the OLE metadata...), and here are some apparent PoC exploits.

This write-up from ForcePoint is also very informative.  I know a lot of folks in the industry would look at the write-up and say, "yeah, so the malware persists via the user's Run key...big deal."  Yeah, it IS a big deal.  Why?  Because it still works.  You may have seen the use of the Run key for persistence of years, and may think it's passe, but what does it say about the security industry as a whole that this still works, and that this persistence mechanism is found through response post-mortem?

Here's a fascinating write-up on the QWERTY ransomware from BleepingComputer.  Some of what I thought was fascinating about it is the use of batch files and native tools...see, the bad guys automate their stuff, so maybe we should, too...right?  Part of what the ransomware does is use two different commands to delete volume shadow copies, and uses wbadmin to delete backups.  So how is this information valuable?  Well, do you use any of these tools in your organization?  If not, I'd definitely enable filters/alerts for their use in an EDR framework.  If you do use these tools, do you use them in the same way as indicated in the write-up?  No?  Well, have alerts you can add to your EDR framework.

Here's another very interesting blog post from Carbon Black, in part describing the use of MSOffice doc macros.  I really like the fact that they used Didier's to list the streams and extract the VBS code.  Another interesting aspect of the post is something we tend to see a great deal of from the folks at CarbonBlack, and that's the use of process trees to illustrate the context behind one suspicious or apparently malicious process.  Where did it come from?  Is this something we see a great deal of?  For example, in image 7, we see that wmiprvse.exe spawns certutil.exe, which spawns conhost.exe; is this something that we'd want to create an alert for?  If we do and run it in testing mode, or search the EDR data that we already have available, do we see a great deal of this?  If not, maybe we'd want to put that alert into production mode.

Recently, US CERT published an alert, TA18-074A, providing intel regarding specific threat actors.  As an example of what can be done with this sort of information and intelligence, Florian Roth published some revised Yara rules associated with the alert.

Something I saw in the alert that would be useful for both threat hunters and DFIR (i.e., dead box) analysis is the code snippets seen in section 2 of the alert.  Specifically, modified PHP code appears as follows:

img src="file[:]//[.]dd/main_logo.png" style="height: 1px; width: 1px;" /

Even knowing that the IP address and file name could change, how hard would it be to create a Yara rule that looks for elements of that line, such as "img src" AND "file://"?  More importantly, how effective would the rule be?  Would it be a high fidelity rule, in your environment?  I think that the answer to the last question depends on your perspective.  For example, if you're threat hunting within your own environment, and you know that (a) you have PHP code, and (b) your organization does NOT use "img src" in any of your code, then this would be pretty effective.

Something else that threat hunters and DFIR analysts alike may want to consider adding to their analysis processes is from the section of the alert that talks about the adversary's use of modified LNK files to collect credentials from systems.  Parsing LNK files and looking for "icon filename" paths to remote systems might not seem like something you'd see a great deal of, but I can guarantee you that it'll be worth the effort the first time you actually do find something like that.

Side Note: I updated my lnk parsing tool (as well as the source files) to display the icon filename path; the tool would already collect it, it just wasn't displaying it.  It does now. 

If you're looking into threat feeds as a "check in the compliance box", then none of this really matters.  But if you're looking to really develop a hunting capability, either during DFIR/"dead box" analysis, or on an enterprise-wide scale, the real value of threat intel is only realized in the context of your infrastructure, and operational policies. 

Thursday, March 15, 2018

DFIR Questions, How-Tos...

Not long ago, I finished up the content of my latest book, Investigating Windows Systems, and got it all shipped off to the publisher.  The purpose of this book is to go beyond my previous books; rather than listing artifacts and mentioning ways they can be used, I wanted to walk through examinations, using CTF and forensic challenge images that are available online.

A short-coming of this approach is that it leaves a lot of topics not addressed, or perhaps not as fully addressed as they could be.  For example, of the images I used in writing my book, there were no business email compromises, and little in the way of lateral movement, etc.  There was some analysis of user activity, but for the most part, it was limited.

Back in July 2013, I had some time available, and I wrote up about a dozen "How To" blog posts covering various Windows DFIR topics.  What I've thought might be of value to the community is to go back to those "How To" posts, expand and extend them a bit, add coverage for Windows 10, and include them in a book.

My question to the community at large is this...what are some of the topics that should be addressed, beyond those I blogged about almost 5 years ago?

Now, when considering these questions, or opportunities for "How To" chapters, please understand that I may not be able to address all of them.  For example, I've never conducted a business email compromise (BEC) I've pointed out before, even in just over two decades of DFIR consulting, I haven't seen everything, and I don't know everything.  I also do not have access to an AD environment.

Even so, I'd still appreciate your input, because some of the answers and thoughts I can provide may serve as building blocks for larger solutions.

So, again...what are some DFIR analysis topics, specific to Windows systems, that provide good opportunities for "just in time" training via "How To" articles or documents?


Addendum, 20 Mar:
Okay, I was able to pull together some input from other sources, and here's what I've got so far...

How to analyze Windows Event Logs
How to get the most out of RegRipper
How to investigate CD burning
How to perform malware detection
How to detect data exfiltration
File (LNK, DOCX/DOC, PDF) Analysis
How to investigate lateral movement
How to investigate program execution
How to investigate user activity
How to find and interpret true last access time and dates
How to correlate/associate a device with a user (USB, Bluetooth)
How to detect/analyze the use of anti-forensics

This is just the high-level view and not the detailed outline.  However, it does seem pretty extensive.  So...thoughts?  Input?  Comments?  Complaints?  All are welcome...

Monday, March 12, 2018

New and Updated Plugins, Other Items

BAM Key and Process Execution, Updated Plugins
Recently, blog posts describing the "BAM Key" and it's viability as a process execution artifact began to appear (port139 blog, padawan-4n6 blog).  The potential for this key was previously mentioned last summer by Alex Ionescu, and this began to come to light as of Feb, 2018.  As such, I wrote two plugins for parsing this data, and

There's also been a good bit of testing going on and being shared with respect to when data in Registry transaction logs is committed to the hive files themselves.  In addition to the previously linked blog, there's also this blog post.

I recently ran across a couple of System hive files from Windows 10 systems for which the AppCompatCache data was not parsing correctly.  As such, I updated the and plugins, as well as their corresponding * variants.

Finally, Eric documented changes to the AmCache.hve file, but until recently, I hadn't seen any updated hive files.  Thankfully, Ali Hadi was kind enough to share some hives for testing, and I updated the and plugins accordingly. However, these updates were solely to address the process execution artifacts, and I have not yet updated them to include the device data that Eric pointed out in his blog post.

NOTE: Yes, I uploaded these plugins (a total of 8) to the GitHub repository.  Again, I do not modify the RegRipper profiles when I do so, so if you want to incorporate the plugins in your processes, you'll need to open Notepad and update the profiles yourself.

Part of the reason I made these updates is to perform testing of the various artifacts, specifically to see where the BAM key data falls out in the spectrum of process execution artifacts. 

The data that Ali shared with me included the AmCache.hve, System hive (as well as other hives from the config folder). and the user's NTUSER.DAT hive.  Using these sources, I created a micro-timeline using the AmCache process execution data, AppCompatCache data, BAM key data, and the user's UserAssist data, and the results were really quite fascinating. 

Here's an example of some of the data that I observed.  I created my overall timeline, and then picked out one entry (for time2.exe) for a closer look.  I used the type command to isolate just the events I wanted from the events file, and created the below timeline:

Thu Feb 15 16:49:16 2018 Z
  AmCache      - Key LastWrite - f:\time2.exe (46f0f39db5c9cdc5fe123807bb356c87eb08c48e)

Thu Feb 15 16:49:14 2018 Z
  BAM             - \Device\HarddiskVolume4\Time2.exe (S-1-5-21-441239525-4047580167-3361022386-1001)

Thu Feb 15 16:49:10 2018 Z
  REG             forensics - [Program Execution] UserAssist - C:\Users\forensics\Desktop\Time2 - Shortcut.lnk (1)
  REG             forensics - [Program Execution] UserAssist - F:\Time2.exe (1)

Mon Nov  2 22:20:14 2009 Z
  REG             - M... AppCompatCache - F:\Time2.exe

This is pretty fascinating stuff.  We know the context of the time stamps for the AppCompatCache data; even though we understand the data to be populated as a result of process execution events, the time stamp associated with the data is the file system last modification time (specifically, from the $STANDARD_INFORMATION attribute).  The UserAssist data illustrates the user launching the application, and in the following 6 seconds, the BAM and AmCache entries are created.

Keep in mind that this is just one example, from one test.  In the data I've got, not all entries in the AppCompatCache data have corresponding entries in the BAM key.  For example, there's a file called "slacker.exe" that appears in the AppCompatCache and AmCache data, but there doesn't seem to be an entry in the BAM key. 

Over on the Troy 4n6 blog, Troy has a couple of great comments on testing.  Yes, they're specific to the P2P cases he mentions in the blog, but they're also true throughout the rest of the DFIR community.

Processes and Checklists
Something I've done over the years is develop and maintain processes and checklists for the various types of analysis work I've done.  For example, as an incident responder, I've received a lot of those "...we think this system may have been infected with malware..." cases over the years, and what I've done is maintained processes and checklists for conducting this sort of analysis. 

Now, there's a school of thought within the DFIR community that follows the belief that having defined and documented processes stifles the creativity and innovation of the individual analyst. I wholeheartedly my experience, having a documented process means I don't forget things, and it leads directly to automating the process, so that I spend less time sifting through data and more time conducting actual analysis. 

Further, maintaining a documented process does not mean that the process is set in stone once it's written; instead, it's a living document that continues to be updated and developed as new things are learned.  As MS has developed not just new operating systems, but updated the currently available OSs, new artifacts (see the BAM key mentioned above) have been discovered.  And this is solely with respect to the OS, and doesn't take new applications (or new versions of current applications) into account.  Maintaining an ever-expanding list of Windows artifacts is neither as useful nor as viable as maintaining documented processes that illustrate how those artifacts are used in an investigation, so having documented processes is a key component of providing comprehensive and accurate analysis in a timely manner.

Word Metadata Verification
Phill's got a blog post up on documenting Word doc metadata bugs.  My take-aways from his post are:

1. Someone asked him for assistance in verifying what was thought to be a bug in a tool.  This isn't to point out that the first thing that was blamed was the tool...not at all.  Rather, it's to point out that someone said, hey, I'm seeing something odd, I'll see if others are seeing the same thing.  It isn't the first step I'd take, but it was still a good call. 

2. Phill documented and shared his testing methodology.  As someone who's written open source tools, I've gotten a lot of "...the tool doesn't work..." over the years, without any information regarding what was done, or how "doesn't work" was reached.  Sharing what you did and saw in a concise manner isn't an's necessary, and it allows others to have a baseline for further testing.

3. Phill stated, "...I decided I was being lazy by not actually looking for the word count in the docx file itself...".  Sometimes, the easiest approach escapes us completely.  I've seen/done the same thing myself, and I've gotten to the point where, if I get odd errors from RegRipper or Volatility, I'll open the target file in a hex editor to make sure it's not all zeroes (yes, that has happened).  I guess that's the benefit of being familiar with file formats; like Phill said, he opened up the .docx file he was using for testing in an archive tool and pulled out what he needed.

Sunday, March 11, 2018

Creating Tools and Solving Problems

I received a request recently for a blog post on a specific topic via LinkedIn; the request looked like this:

Have you blogged or written about creating your own tools, such as you do, from a beginner's standpoint? I am interested in learning more about how to do this.

A bit of follow-up revealed a bit more information behind the request:

I teach a DFIR class at a local university and would like to incorporate this into the class.

I don't often get requests, and this one seemed kind of interesting to me anyway, so I thought I'd take a shot at it.

To begin with, I did write a section of a blog post entitled "Why do I write my own tools?"  The post is just a bit more than three years old, and while the comments were short, they still apply today.

The Why?
Why do I write my own tools?  As I mentioned in my previous post, it helps me to understand the data itself much better, and as such, I understand the context and usage of the data much better, as well.

Another reason to write my own tools is to manage the data in a manner that best suits my analysis needs.  RegRipper started that way...well, that and the need for automation.  Over time, I've continued to put a great deal of thought into my analysis process and why I do the things I do, and why I do them the way I do them.  This is, in part, where my five-field TLN format came from, and it still holds up as an extremely successful methodology.

The timeline creation and analysis methodology has proven to be extremely successful in testing, as well.  For example, there was this blog post (not the first, and it won't be the last) that discusses the BAM key.  Again, this isn't the first blog post on the topic, and speaking with the author of the post recently, face-to-face (albeit through a translator), it was clear that he'd found some discrepancies in previously-posted findings regarding when the key is updated.  So, someone is enthusiastically focusing their efforts in determining the nature of the key contents, and as such, I have opted to focus my analysis on the context of the data, with respect to other process execution data (AmCache.hve, UserAssist, AppCompatCache/ShimCache, etc.)  In order to do this, I'd like to see all of the data sources normalized to a common format (TLN) so that I can look at them side-by-side, and the only way I'm going to do that is to write my own tools.  In fact, I have...I have a number of RegRipper plugins that I can use to parse this information out into TLN format, add Windows Event Log data to the Registry data, and boo-yah!  There it is.

Another advantage of writing my own tools is that I get to deal directly with the data itself, and in most cases, I don't have to go through an API call.  This is how I ended up writing the Event Log/*.evt file parser, and from that, went on to write a carving tool to look for individual records.  Microsoft has some really clear and concise information about the various structures associated with EVT records, making it really easy to write tools.  Oh, and if you think that's not useful anymore, remember the NotPetya stuff last year (summer, 2017)?  I used the tool I wrote to carve unallocated space for EVT records when a Win2003 server got hit.  You never know when something's going to be useful like that.

The How
How do I write my own tools?  That's a good it more about the process itself, or the thought process behind writing a tool.  Well, as I learned early in my military career, "it depends".

First, there are some basics you need to understand, of course...such as endianness.  There is also how to recognize, parse, and translate binary data into something useful.  I usually start out with a hex editor, and I've gotten to the point where I not only recognize 64-bit FILETIME time stamps in binary data, but specifically with respect to shellbags, I've gotten to the point where I recognize patterns that end up being GUIDs.  It's like the line from The Matrix..."all is see are blondes, brunettes, and redheads."

I start by understanding the structures within data, either by following a programming specification (MS has a number of good ones), or some other format definition.  Many times, I'll start with a hex editor, or a bit of code to dump arbitrary-length binary data to hex format, print it out, and go nuts with highlighters.  For Registry stuff, I started by using Peter Nordahl's offline password editing tool header files to understand the structure of the various cells available within the Registry.  When the Parse::Win32Registry Perl module came along, I used that for accessing the various cells, and was able to shift my focus to identifying patterns in binary data types within values, as well as determining the context of data through the use of testing and timelining.  For OLE files, I started with the MS-CFB definition, and like I said, MS maintains some really good info on Event Log structures.

The upshot of this is that I have a better understanding than most, regarding some of the various data types, particularly those that include or present time stamps. There are a lot of researchers who put effort into understanding the specific actions that cause an artifact to be created or modified, but I think it's also important to understand the time format itself.  For example, FILETIME objects are 64-bit time stamps with a granularity of 100 nanoseconds, where a DOSDate time stamp (embedded within many shell item artifacts) has a granularity of 2 seconds.  The 128-bit SYSTEMTIME structure only has a granularity of one second, similar to the Unix epoch time.

In addition to the understanding of time stamp formats, I've also found a good number of time stamps where most folks don't know they exist.  For example, when analyzing Word .doc files, the 'directories' within the OLE structure have time stamps associated with them, and tying that information to other document metadata, compile time stamps for executables found in the same campaign, etc., can all be used to develop a better understanding of the adversary.

Something else that can be valuable if you understand it is the metadata available within LNK files sent by the adversary as an attachment.  Normally, LNK files may be created on a victim system as the result of an installation process, but when the adversary includes an LNK file as an attachment, then you've got information about the adversary's system available to you, and all it takes to unlock that information is an understanding of the structure of the LNK files, which are composed, in part, of shell items.

Things To Consider
Don't get hung up on the programming language.  I started teaching myself Perl a long time ago, in part to assist some network engineering guys.  However, I later learned that at the time, Perl was the only language that had the necessary capability to access the data I needed to access (i.e., live Windows systems).  Perl later remained unique in that manner when it came to "dead box" and file analysis.  Over time, that changed as Python caught up.  Now, you can use languages like Go to parse Windows Event Logs.  Oh, and you can still do a lot with batch files.  So, don't get hung up on which language is "best"; the simple answer is, "the one that works for you", and everything else is just a distraction.

This isn't just about writing tools to get to the data, so that I can perform analysis.  One of the things I'm particular about is developing intelligence from the work I do, learning new things and incorporating or "baking it into" my tools and processes.  This is why I have the eventmap.txt file as part of my process for parsing Windows Event Logs (*.evtx files); I see and learn something new (such as the TaskScheduler/706 event), add it to the file with comments, and then I always have the information available.  Further, sharing it with others means that they can benefit from knowledge of others without having to have had the same experiences.

Closing Thoughts
Now, is everyone going to write their own tools?  No.  And that's not the expectation at all.  If everyone were writing their own tools, no one would ever get any actual work done.  However, understanding data structures to the point of writing your own tools can really open up new vistas for the use of available data, and of intelligence that can be developed from the analysis that we do.

However, if this is something you're interested in, then once you are able to start recognizing patterns and matching those patterns up to structure definitions, there isn't much that you can't do, as the skills are transferable.  It doesn't matter where the file comes from...from which device or'll be able to parse the files.

Monday, February 19, 2018

Stuffy Stuff

...yeah, I get it...I'm not entirely imaginative, and don't come up with the best titles for my blog posts....

#DFIR Intel
About 4 1/2 years ago, I got on a riff and blasted out a bunch of blog posts, over a dozen in one month.  Most of these posts were "how to" articles...

One of the articles I wrote was How To: Add Intelligence to Analysis Processes.  This is also a concept I address to some extent in my new book, Investigating Windows Systems (recently sent the manuscript to the publisher for review and formatting...); not only that, I give lots of examples of how to actually do this.  IMHO, there are a lot of opportunities that are missed during pure #DFIR work to correlate and develop intelligence, which can then be "baked" back into tools and processes to make analysis "better" in the future.

Further, I read Brett's recent post about fitting a square peg into a round #DFIR hole, and I started to see correlations between the intel stuff, and what Brett was saying.  Sometimes, in looking for or at which tool is "best", we may run across something that's extremely valuable during another engagement, or to another analyst all together.

I can easily see Brett's point; how do you answer the question of "what's the best tool to do X?", if the user/analyst requirements aren't fully understood?  "Best" is relative, after all, isn't it?  But I really think that in the process of determining requirements (what is "best"?) and which tool or process is best suited to provide the necessary answers, there's a great deal of intelligence that can and should be preserved.

A great example of developing intel within the community is from Mari's post, where she evaluated MFT parsers.  In her post, Mari clearly specified the testing requirements, and evaluated the various tools against those requirements.  This was great work, and is not only valid today (the post was published about 2 1/2 yrs ago), but continues to offer a great example of what can be done.

By now, you're probably wondering about the connection between correlating and developing intel from #DFIR engagements, and deciding which is the "better" #DFIR tool.  Well, here it correlating and developing intel from #DFIR cases/engagements, we can make all of our tools (and more importantly, analysis processes) inherently "better".

For example, going back to Mari's blog post on MFT parsers, let's say that the issue at hand is time stomping, and you'd like a means, as part of your analysis process to just look at one file and determine if it had been time stomped.  I wrote my own parser, in part, to let me do exactly that...and in doing so, improved my analysis process.  You can do the same thing by finding the right tool...or contacting the author of a tool and ask that they provide the provide the needed capability.

What Does It Look Like?
So, what does this "intel" we're talking about look like?  Well, they can be several things, depending upon what you're doing.  For example, if you do string searches, consider using Yara rules to augment your searches.  Yara rules can be incorporated into analysis processes through a variety of means; one of my favorites is checking systems for web shells.  Usually, I'll mount the image via FTK Imager, and run a Yara scan across the web site folder(s), using a combination of open source Yara rules to scan for web shells, as well as things that have been developed internally.  Sometimes, this approach works great for locating web shells right out of the box; in other cases, further investigation of web server logs may be required to narrow down the hits you received from the Yara scan to the file in question.

Side Note:  "Intel" can also "look like" this blog post. Great stuff, and yes, Yara rule(s) resulted from the analysis, as well.

Speaking of Yara rules, there is a LOT you can do with Yara rules beyond just running them against a file system or folder.  You can run them against memory dumps, and in this article, Jeremy Scott mentions the use of page_brute to run Yara rules across a page file, or in the case of more recent versions of Windows (yet another reason why knowing the version of Windows you're examining matters!), all of the page files.

Another means for preserving and sharing intel is through the use of RegRipper plugins.  I know you're thinking, "oh, yeah...easy, right?  You write them all the time..."...and yes, I do.  And getting a plugin written or modified is as easy as asking.  I've been able to turn around plugins pretty quickly with a concise, clear description of what you're looking for, and some sample data for testing.  Another example is that while I was examining one of the CTF images for my upcoming book, I ran across Registry data I've never encountered, and as a result, I wrote a RegRipper plugin.

Another great thing about RegRipper plugins is that Nuix (for transparency, Nuix is my employer) now has a RegRipper extension for their Workbench product, meaning that you can run RegRipper plugins against Registry hives (and depending upon the version of Windows, the AmCache.hve file) right from Workbench, and then incorporate the results directly into your Nuix case.  That way, all of your word searches, etc., will be run against this data, as well.

There are a myriad other ways of preserving intel and sharing your findings.  The means you may use for preserving intel depends on what you're doing in your analysis process.  Unusual command line options, such as those seen with cryptocurrency miners, can be preserved in EDR filter or alert rules, or in the case of the browser-based miners, Yara rules to be run against memory.  Sometimes, you may not immediately see how to turn your findings into useful intel or tools, nor how to bake what you found back into your own analysis process...this is a great reason for reaching out to someone, and seeing that they might have to offer. 

Side Note: There was a pretty extensive #DFIR sharing thread over on Twitter, that started out as thoughts/discussion, re: writing blogs vs writing books.  As this thread progressed, Jessica reiterated the point that sharing doesn't have to be just about writing, that sharing within the #DFIR community can take a variety of forms; book reviews, podcasts, etc.  I wholeheartedly agree, and also believe at the same time that writing is of paramount importance in our profession, as whether you're writing code, case notes, or a report, you have to be able to communicate.  To that point, Brett shared his thoughts on ham sandwiches.

Friday, February 16, 2018

On Writing (DFIR) Books

After sharing my recent post regarding my next book, IWS, one of the comments I received via social media was a tongue-in-cheek reference to me being a "new" author.  I got the humor in that right away, but it also got me to thinking about something that I hadn't thought about in a while...what it actually takes to write a DFIR book.  I've thought about this before, at considerable length, because over the years, I've talked to others who have considered going down that path but for whatever reason, were not able to complete the journey.

Often times, the magnitude of the endeavor can simply overwhelm folks.  In some cases, events turn out to be much less easy to manage than originally thought.  In one instance, I was once asked for advice from a friend...he and two co-authors had worked through the process of establishing a contract for a book with a publisher.  It turned out that after the contract was signed, the team was assigned an editor, who then informed them that there was an error in the contract; they needed to deliver twice as many words than were previously stated, with no extension on the delivery date.  Needless to say, the team made the decision to not go forward with writing the book.

To be honest, one of the biggest challenges I've seen over the years is the disparity between the publishing company and their SOP, and the authors.  It took me a while to figure this out, but the publishing company (I can't speak to all publishing companies, just the three I've been associated with...) look to objective measures; word counts, numbers of chapters, numbers of images or figures, etc.  I would think that schedules are pretty much universal, as we all deal with them, but some publishing companies are used to dealing with academia, for whom publishing is often an absolute necessity for survival.  For many of those within the DFIR community who may be considering the idea of becoming a published author, writing a book is one of many things on an already crowded plate.

The other side of the coin is simply that, in my experience, many DFIR folks do not like to write, because they're not good at it.  One of the first company's I worked with out of the military had a forensics guy who apparently did fantastic work, but it took two other people (usually) to turn his reports into something presentable for review...not for the client, but for someone on our team to review before sending them to the client.  I recognize that writing isn't something people like to do, and I also recognize that my background, going back to my time on active duty, includes a great deal of writing (i.e., personnel evaluations/fitness reports, JAG manual investigations, etc.).  As such, I approach it differently.  I documented that approach to some extent in one of my books, providing a chapter on...wait for it...writing reports.  Those same techniques can be used in writing books.

I've been with essentially the same publishing company (that's not to say the same editor, and I haven't worked with the same individuals throughout) since my second book (Elsevier bought Syngress), so needless to say, I've seen a great deal.  I've gone through the effort (and no small amount of pain) and trials to get books published, and as such, I've learned a great deal along the way.  At the same time, I've talked to a number of friends and other folks within the DFIR community who've expressed a desire to write a book, and some who've already demonstrated a very good basis for doing just that.

Sometime ago, in a galaxy far, far away, I engaged with my editor to develop a role for myself, one in which, rather than writing books, I engaged with new authors as a liaison.  In this role, I would begin working with aspiring authors in the early stages of developing their ideas, and help them navigate the labyrinth to getting a book published.  I basically sat down and asked myself (after my fourth or fifth book had been published), "what do I know now that I wish I'd known when writing my first book?"  Armed with this information, I thought, here's a great opportunity to present this information to new authors ahead of time, and make the process easier for them.  Or, they may look at the scope and range of the process, and determine that it's not for them.  Either way, it's a win-win.

Also, and I think that this important to point out, this was not a paying position or role.  There are significant cultural differences between DFIR practitioners, and a publisher of predominantly academic books, and as such, this role needed to be socialized on both sides.  However, before either editor could really wrap their heads around the idea, and socialize it with the publishing company, they moved on to other adventures. 

As such, I figured that a good way to help folks interested in writing a book would be to provide some initial thoughts and advice, and then let those who are interested take it a step or two beyond that.

The Idea
All books start with an idea.  What is the basis for what you want to write/communicate?  When I started out with the Windows Forensic Analysis books, the basic idea I had in mind was that I wanted to write a book that I'd want to purchase.  I'd seen a number of the books that were out there that covered, to some extent, the same topic, but not to what I saw as the appropriate depth.  I wanted to be able to go to a bookstore, see a book with the words "Windows" and "forensics" on the spine, and upon opening it, have it be something I'd want to take to the register and purchase. 

Something else to consider is that you do not have to have a new or original idea.  I wrote Windows Registry Forensics because there was nothing out there like it.  But I wrote Windows Forensic Analysis because I wanted to take a different approach to what was already out there...most of what I found didn't go into the depth that I wanted to see.

When I was employed by SecureWorks, I authored a blog post that discussed the use of the Samsam ransomware.  Kevin Strickland, a member of the IR team, took a completely different approach in how he looked at some of the same data, which ended up being one of the most quoted Secureworks blog posts for the entire 2016 year.  My point is that it doesn't always take an original idea...sometimes, all it really takes is a different way of looking at the same data.

Structure Your Thoughts
It may not seem obvious, but structuring your thoughts can go a LONG way toward making your project an achievable success.

The best way to do this, that I've found, is to create a detailed outline.  Actually write down your thoughts.  And don't think you have to do it all at once...when I wrote personnel evaluations in the military, I didn't do it one sitting, because I didn't think that would be fair to my Marines.  I did it over time...I wrote down my initial thoughts, then let them marinate, and came back to them a day or two later.  The same thing can be done with the outline...create the initial outline, and then walk away from it.  Maybe socialize it with some co-workers, discuss it, see what other ideas may be out there.  Take some of the terms and phrases you used in your outline, and Google them to see what others may be saying about them.  You may find validation that way, like, "yeah, this is a good idea...", or you may find that others are thinking about those terms in a different way.  Either way, use time to develop your ideas. I do this with my blog posts. Something to realize is that the outline may be a living document; once you've "completed it", it will likely change and grow over time, as you grow in your writing.  Chapters/thoughts may be consolidated, or you may find that what you thought would be one chapter is actually better communicated as two (or more) chapters.  And that's okay.

What I've learned over the years is that the more detailed your outline is, the easier it is to communicate your ideas to the publisher, because they're going to send your idea out to others for review.  Much like a resume, the thought behind your outline is that you want to leave the person reviewing it no other option than to say, "yes" the clearer you can be, the more likely this is to happen.  And the other thing I've learned is that the more detailed the outline, the easier it is to actually write the book.  Because you're very likely going to be writing in sections, it's oh, so much easier to pick something back up if you know exactly where you left off, and a detailed outline can help you with that.

Start Writing
That's right...try writing a chapter.  Pick one that's easy, and see what it's like to actually write it.  We all have "life", that stuff we do all the time, and it's a good idea to see how this new adventure fits into yours.  Do you get up early and write before kicking off your work day, or is your best time to write after the work day is over?

Get someone to take a look at what you've written, from the perspective of purchasing the finished product.  We may not hit the bull's eye on the first few iterations, and that's okay. 

Get your initial attempts reviewed by someone you trust to be honest with you.  Too many times over the years, I've provided draft reports for co-workers to review, and within 15 minutes received a just "looks good".  Great, that makes me feel wonderful, but is that realistic for a highly technical report that's over 30 pages long?  In one particular instance, I rewrote the entire report from scratch, and got the same response within the same time frame, from the same co-worker.  Clearly, this is not an honest review.

In the early stages of writing my second book, I had a reviewer selected by the publishing company, and I'd get chapters back that just said, "looks good" or "needs work".  From that point on, I made a point of finding my own reviewer and making arrangements with them ahead of time to get them on-board with the project.  What I wanted to know from the reviewer was, does what I wrote make sense?  Is it easy to follow?  When you're writing a book based on your own knowledge and experience, you're very often extremely close to and intimate with the subject, and someone else how may not be as familiar with it may need a bit more explanation or description.  That's okay...that's what having a reviewer is all about.

At this point, we've probably reached the "TL;DR" mark.  I hope that this article has been helpful, in general, and more specifically, if you're interested in writing a DFIR book.  If you have any thoughts or questions, feel free to comment here, or send them to me.

Wednesday, February 14, 2018


As I'm winding up the final writing for my next book, Investigating Windows Systems, I thought I'd take the opportunity to say/write a few words with respect to what the book is, and what it is not.

In the past, I've written books that have provided walk-thrus of various artifacts on Windows systems.  This seemed to be a good way to introduce folks to the possibilities of what was available on Windows systems, and what they could achieve through their analysis of images acquired from those systems.

With Investigating Windows Systems, I've taken a markedly different approach.  Rather that providing introductory walk-thrus of artifacts, I'm focusing on the analysis process itself, and discussing pivot points, and analysis decisions made along the way.  To do this, I've used available CTF and forensic challenge images (I reached to the authors to see if it was okay with them to do this...) as the basis, and in chapters 2, 3, and 4, walk through the analysis of the images.  In most cases, I've tried to provide more real world examples of the analysis goals (which we document) than what was provided as part of the CTF.  For instance, one CTF has 31 questions to answer as part of the challenge, some of which are things that should be documented as a matter of SOP in just about every case.  However, I opted to take a different approach with the analysis goals, because in two decades of cybersecurity consulting, I've never worked with a client that has asked 30 or more questions regarding the case, or the image being analyzed.  In the vast majority of cases, the questions have been, "..was the system compromised/infected?", often followed by "...was sensitive data exfiltrated from the system?"  Pretty straightforward stuff, and as such, I wanted to take of what I've seen as a realistic, IRL approach to the analysis goals.

Another aspect of the book is that a certain level of knowledge and capability is assumed of the reader, like a "you must be this tall to ride this ride" thing.  For example, throughout the book, in the various sections, I create timelines as part of the analysis process.  However, I don't provide a basic walk-thru of how to create a timeline, because I assume that the reader already knows how to do so, either by using their own process, or from chapter 7 of Windows Forensic Analysis (in both the third and fourth editions).  Also, in the book, I don't spend any time explaining all of the different things you can do with some of the tools that are discussed; rather, I leave that to the reader.  After all, a lot of the things that someone might be curious about are easy to find online.  Now, this doesn't mean that a new analyst can't make use of the book...not at all.  I'm simply sharing this to set the expectation of anyone who's considering purchasing the book.  I don't cover topics such as malware RE, memory acquisition and analysis, etc., as there are some fantastic resources already available that provide in-depth coverage of these topics.

Additional Materials
With some of my previous books, I've established online repositories for additional materials included with the book.  As such, I've established a Github repository for materials associated with this one.  As an example, in writing Chapter 4, I ended up having to write some code to parse some logs...that code is included in the repository.

Producing Intel
Something else I talk about in the book, in addition to the need for documentation, is the need for DFIR analysts to look at what they have available in an IR engagement that they can use in other engagements in the future.  The basic idea behind this to develop, correlate and maintain corporate knowledge and intelligence.

In one instance in the book, during an analysis, I found something in the Registry that didn't directly pertain to the analysis in question, but I created a new RegRipper plugin,  I added the plugin directly to the RegRipper repository.

As a bit of a side note, if you're a Nuix customer, you can now run RegRipper through Workbench.  Nuix has added an extension to their Workbench product that allows you to run RegRipper without having to close out the case or export individual files.  For more details, here's the fact sheet.

Other ways to maintain and share intelligence include Yara rules, endpoint filter/alert rules, adding an entry to eventmap.txt, etc.  But that's not it, there are other ways to share intelligence, such as this blog post that I wrote during previous employment, with a good deal of help from some really smart folks.  That blog post alone has a great deal of valuable intelligence that can be baked back into tools and processes, and extend your team's capabilities.  For example, look at figure 2 in the blog post; it illustrates the command that the adversary issued to take specific actions (fig. 1 illustrates the results of that command).  If you're using an EDR tool, monitoring for that command line (or something similar) will allow you to detect this activity early in the adversary's attack cycle.  If you're not using an EDR tool and want to do some threat hunting, you now have something specific to look for.

How To...

...Parse Windows Event Logs
I caught a really interesting tweet the other day that pointed to the DFIR blog, one that discussed how to parse Windows Event Logs.  I thought the approach was interesting, so I thought I'd share the process I use for parsing Windows Event Logs (*.evtx files).

So, I'm not saying that there's anything wrong with the process laid out in the DFIR blog post...not at all.  In fact, I'm grateful that the author took the time to write it up and share it with others.  It's a fantastic resource, but there's more than one way to accomplish a great many tasks in the DFIR world, aren't there?  As Dan said, there are some great examples in the post.

When I create timelines, I use a batch file (wevtx.bat) that runs LogParser, and as the *.evtx logs are parsed, runs them through eventmap.txt to "tag" interesting events.  The batch file takes two arguments, the path to a file or directory with *.evtx files (LogParser understands wild cards), and the output event file (events are appended to the file if the file already exists).

Now, I did say, "...when I create timelines...", but this method works very well with just *.evtx files, or even just a few, or even one, *.evtx file.

The methodology in the DFIR blog post includes looking for specific events IDs, which is great.  The way I do it in my methodology is that when I parse all of the *.evtx files that I'm going to parse, I have an "events file"; from there, I can parse out event source/ID pairs pretty easily using "type" and "find".  It's pretty easy, like so:

type events.txt | find "Security/4624" > logon_events.txt

You can then add to that file using the append redirection operator (i.e., ">>"), or search for other source/ID pairs and create other output files.  I've used this method to create micro- or nano-timelines of just specific events, so that I can get a view of things that I wouldn't be able to see in a complete timeline.

Okay, why am I talking about event source/ID "pairs"?  Well, in the DFIR blog post, they're looking in the Security Event Log file (Security.evtx) for specific event IDs, but when you start looking across other *.evtx files and incorporating them into your analysis, you may start to see that some event records may have different sources, but the same event ID, depending upon what's installed on the system.  For example, event ID 6001 can have sources of WinLogon, DNS, and Wlclntfy

So, for the sake of clarity, I use event source/ID pairs in eventmap.txt; I haven't seen every possible event ID, and therefore, don't want to have something incorrectly tagged.  There's no reason to draw the analyst's attention to something if it's not necessary to do so.

Okay, there are times when Windows Event Logs are not Windows Event Logs...that's when they're Event Logs.  Get it?

Okay, stand by...this is the part where the version of Windows matters.  I've gotten myself in trouble over the years for asking stoopit questions (after someone takes the time to write out their request), like, "What's the version of Windows you're examining?"  I get it.  Bad Harlan.  But you know matters.  And I know you're going to say, "yeah, dude...but no one uses XP any more."  During the summer of 2017, I was assisting with analyzing some systems that had been hit with NotPetya, and another analyst was examining Windows XP and 2003 systems from another client.

The reason this is important is that, in addition to there being many more log files available on Vista+ systems, the binary format of the log files themselves is different.  For example, I wrote evtparse.exe (NOTE: there is NO "x" in the file name!  Evtxparse.exe is a completely different tool!) specifically to parse Event Log files from XP and Win2003 systems.  The great thing is that it does so on a binary basis, without using the MS API.  This means that if the header information says that there are 400 event records in the file, but there are actually 4004, you will get 4004 records parsed. 

I also wrote to parse Event Log records from unstructured data (pagefile, memory dump, unallocated space, etc.).  I originally wrote this code to assist a friend of mine who'd been working on a way to carve Event Log records from unallocated space from a Win2003 server for about 3 months.  Since I wrote it, I've used it successfully to parse records myself.  Lots of fun!