Windows Incident Response: 2025

Thursday, July 03, 2025

RegRipper

The awesome folks over at Cyber Triage recently published their 2025 Guide to Registry Forensic Tools, and being somewhat interested in the Windows Registry, I was very interested to take a look. The article is very well-written, and provides an excellent basis for folks who are new to DF/IR work, and new to the Windows Registry.

Within the blog post, there's a table in the Registry Forensic Tools section (see the image to the right). In the image, we see that one of the metrics or indicators associated with the tools listed are whether or not the tool "handles transaction logs", with just a statement to that effect.

If someone is new to including the Windows Registry as part of their analysis process, and doesn't understand the purpose of the transaction logs, nor how they work, they'd likely look at this table and think, "Well, I'm not using RegRipper! Handling the transaction logs are important to Chris Ray, and while I don't know why, I'm going to go along with what Chris recommends!"

The statement, "Does not handle transaction logs" doesn't tell the whole story, as I purposely wrote RegRipper to not handle the transaction logs. From my perspective, incorporating transaction logs into your analysis needs to be a purposeful, intentional decision. Incorporating transaction logs certainly has it's place in any analysis process for Windows systems, but it should not happen automagically, without the analyst/examiners knowledge. And it should not just happen every time. Further, why should I write out code for processing transaction logs, when as it is, there are a number of other tools that already allow you to do so? Why re-write this capability?

You know, this kind of thing has happened before. In 2012, at a pretty big DF/IR security conference, a Google engineer was presenting on an enterprise-wide response capability, and included a slide that said, "RegRipper does not scale to the enterprise." I was in the front row, because...you know...DF/IR, and was a little taken aback by this statement. This was like stating that the F-150 truck, the most popular model of light pickup, does not transition to airplane mode. No, because it was never intended to, and it wasn't designed that way. So, rather than reaching out and engaging the author of the tool, and asking, "hey, what do you think about making this an enterprise tool?", the presenter simply made their statement, and left it at that.

Now, why did I want handling the transaction logs to be a purposeful, intentional decision? If you've ever processed the transaction logs, you'll notice that when you apply the transaction logs to a Registry hive, the hive file itself remains the same size; keys and values are updated or added, but the file itself remains the same size, even through the hash changes. This means that for the resulting hive file, unallocated space within the hive file is overwritten...deleted keys and values, and possibly even slack space, are overwritten.

Why does this matter? Well, consider the recent write-up on the DEVMAN ransomware variant (from ANY.RUN). The image to the left discusses file lock evasion (the inclusion of "persistence" in the heading is a bit misleading), and states, "Each of these entries is quickly deleted after being written...", which means these entries become part of unallocated space. Now, this may not be important to you, based on your investigative goals...or it may be very important.

So, to be clear, if you're at all interested in data deleted from the Registry, and you understand that Registry hive files themselves contain unallocated space, and that values can contain slack space, you might not want to just automatically apply transaction logs. Depending upon the timing of the incident and your investigative goals, you may want to first fully parse the hive file, before applying the transaction logs and applying the same parsing process a second time. Sort of a "before" and "after" snapshot of the hive.

Neither RegRipper v3.0 nor RegRipper v4.0 processes the transaction logs; however, both are open source, and you can write your own plugins, or modify current plugins in any way you choose, such as changing the output format. For example, both versions include multiple plugins that output in 5-field TLN format (for inclusion directly into a timeline events file), and v4.0 has several plugins that output in JSON format. I get it, though...the TLN output is meaningless if you're not creating timelines.

Also, with RegRipper v4.0, I got Yara working within RegRipper, meaning that you can run Yara rules against Registry values, right from RegRipper.

Finally, both versions include plugins to do various parsing, such as parsing unallocated space, parsing Registry value sizes, locating EXE/PE files in Registry values, etc.

Monday, June 30, 2025

Hunting Fileless Malware

I ran across Manuel Arrieta's Hunting Fileless Malware in the Windows Registry article recently, and found it to be an interesting read.

Let me start by saying that the term "fileless malware", for me, is like finger nails dragged down a chalkboard. Part of this is due to the DarkWatchman write-up, where the authors stated that some parts of the malware were "...written to the Registry to avoid writing to disk." That kind of distinction has always just rubbed me the wrong way. However, regardless of what we call it, I do "get" the concept behind the turn of phrase, and why folks tend to feel that this sort of thing is more difficult to detect than malware that actually writes a file to disk. I'm not sure why they feel that way...maybe it's because this code that downloads the malware and injects it directly into memory (in some cases) can reside anywhere on the system, and within any Registry value. However, the key is that this somehow needs to remain persistent, which limits the number of locations for the code that initiates the download, accesses the shellcode, or performs whichever initiating function.

In his article, Manuel discusses the use of LOLBins to write information to the Registry, and how this can be used for detection. He references several LOLBins, and something that we have to keep in mind is that there's often more to these detections than just what we see on the surface. For example, is PowerShell used extensively within your infrastructure to create Registry values; if not then just the use of PowerShell in that manner would make for a good hunt, or even a good detection opportunity. The same is true for other LOLBins, including reg.exe, rundll32.exe, etc. If these are not something that you usually 'see' within your infrastructure, then those instances that you do see would need to be investigated.

Manuel's article discusses a number of interesting approaches for creating detections for "fileless" malware that's written to the Registry, and anyone involved with detection engineering should strongly consider giving it a good, solid read, and seeing how it can be applied to their environment.

I'd like to take the opportunity to add to Manuel's work by presenting means for detecting this type of malware from a triage or "dead box" perspective. For example, Manuel mentions looking for LOLBins writing Registry values of suspicious lengths. I like this approach, because I'd taken a similar approach in 2015 when I originally wrote the RegRipper sizes.pl plugin. This plugin walks through a Registry hive and looks for all values over a specific $min_size, which is set to 5000 bytes (you can easily change this by opening the plugin in Notepad and changing the size value). Now, you're going to have legitimate Registry values that contain a lot of data, and that's to be expected; it's normal. However, a way to extend this is to look to publicly available threat intel based on actual incident data, see what different threat actors are placing in Registry values, and tailor your approach.

Almost 2 yrs ago, I announced that I'd found a way to integrate Yara into RegRipper v4.0, so any Yara rule that looks for indications of "fileless" malware or shellcode within a file can be run against Registry values, through RegRipper. This can include rules that look for base64-encoded strings, or that begin with some variation of "powershell" (i.e., mixed-case, carets between the letters, etc.).

The findexes.pl plugin, which looks for strings beginning with "MZ" in Registry values, was originally written in 2009, based off an engagement that Don Weber worked while he was a member of the IBM ISS X-Force ERS team. Don found that, during the engagement, the threat actor had written a PE file to a Registry value, and then rather than reaching out to a network resource to download the malware, it was simply extracted from the Registry value. I found this very interesting at the time, because several years prior, while working on network exploitation research for the military, I'd theorized something similar happening, and actually created a working demo. Jump forward several years, and Don was showing me that this was actually being used in the wild. This findexes.pl plugin is one approach, and using Yara rules is another.

Examples
Here's an example of a persistence mechanism in the Registry pointing to a value that contains a base64-encoded string.

Here's an example from Splunk; scroll down to the section called "Fileless Execution Through Registry"; this one creates a Run key value that creates Javascript code to run calc.exe, so it's clearly not "fileless", per se, but it does serve as a harmless example you can use to test dead-box detections.

Here are some practical examples from Securonix; unfortunately, some of the examples are in screen captures, so you can't get specifics about them, such as the length of the value data, but you can use these example to round out your detections.

Thursday, June 26, 2025

Program Execution, follow-up pt II

On the heels of my previous post on this topic, it occurred to me that this tendency to incorrectly refer to ShimCache and AmCache artifacts as "evidence of execution" strongly indicates that we're also not validating program execution. That is to say, when we "see" a program execution event, or something that indicates that a program may have executed, are we validating that it was successful? Are we looking to determine if it completed its intended task, or are we simply assuming that it did?

For example, let's say we have an alert based on a threat actor running a net user command to add a new user account to an endpoint; when I see this command, I want to check the Security Event Log to see if there are any Security-Auditing/4720 records at about the same time, to indicate that the command succeeded. The command will very likely be accompanied by other Security Event Log records related to the account being enabled, the password being reset, etc; however, the ../4720 event record is what primarily interests me, because sometimes, you'll see the net user command that does not include the /add or /ad switch, but is still reported as a "new user being created", when, in fact, the account already exists and the password is being changed.

Regardless of what's reported, the point here is, are we validating what we're seeing? Another example is the use of msiexec.exe; when we see a command using this LOLBin run, do we also see accompanying MsiInstaller records in the Application Event Log? I've seen reports of msiexec.exe being run against HTTP resources, stating that something was installed; however, there are no corresponding MsiInstaller records in the Application Event Log.

Another use of the Application Event Log, when validating program execution, comes when you timeline the log records alongside EDR telemetry or process launch (Sysmon, Security-Auditing/4688) records. For example, if you see Application Pop-up or Windows Error Reporting messages for the program around the same time as the program execution, this would indicate that the program did not successfully launch.

Another similarly valuable resource is AV logs. You may see the program execution attempt, followed by an AV message indicating that the process was detected and quarantined. Or, as has occurred several times, Window Defender may generate a detection record, and rather than a successful quarantine message, the detection is followed by a critical failure message, and the malware continues to execute.

The great folks over at Cyber Triage posted this guide on Malware WMI Event Consumers; pg 6 illustrates the "Classic Detection" techniques. Looking at these, EDR/Sysmon, and the WMI-Activity/Operational Event Log can be incorporated into a timeline to not only illustrate program execution, but that the execution succeeded and resulted in the intended (by the threat actor) outcome. For example, if you incorporate EDR into a timeline that includes the Windows Event Logs, then you'd likely look for WMI-Activity/5861 event records to see if a new event consumer had been successfully created.

From there, the next step would be to parse the Objects.DATA file to determine if the event consumer is still visible in the WMI repository.

Summary
Continuing to see artifacts such as ShimCache and AmCache referred to in the community as "evidence of execution" really showed me how we're overall too focused on the one thing that illustrates that something happened. While it's important to have a correct, accurate understanding of the nature of various individual artifacts, as a community we need to start processing this understanding within a system framework, understanding that each data source plays an important role within the system, as a whole. Nothing happens in isolation; whenever something happens on a live system, impressions and tool marks are going to be left in a variety of data sources. Some may be extremely transient, existing in memory for only a very short time, while others may be written to logs or to the Registry, and persist well beyond the removal of the "offending" application.

But, I get it. It's easy to simply state that something happened, and hope that no one questions your statement. It's much harder to make a statement supported by data, because doing so isn't something we're familiar with, it's not something we've been doing for years at this point. It's not part of our process, nor is it part of our culture. But remember...everything is difficult, sometimes even after the first time we do it. Climbing a rope in gym class was hard, until you first did it. It may even have been hard the second or third time, but eventually you realized you could do it.

Validation of your findings is important, because when you complete the ticket or the report you're writing, and send it off to your "customer", someone may be making a decision and allocating resources based on those findings. My previous blog post provides one example of how I've experienced the need to validate findings during my time in the industry. Whether you see it right now or not, at some point, someone's very likely going to take your findings and make a decision based on what you're provided, and you want to be as sure as you can that those findings are correct, and supported by the data.

Wednesday, June 25, 2025

Program Execution, follow-up

Last Nov, I published a blog post titled Program Execution: The ShimCache/AmCache Myth as a means of documenting, yet again and in one place, the meaning of the artifacts. I did this because I kept seeing the "...these artifacts illustrate program execution..." again and again, and this is simply incorrect.

I recently ran across Mat's post on Medium called Chronos vs Chaos: The Art (and Pain) of Building a DFIR Timeline. Developing timelines is something I've done for a very long time, and continue to do even today. The folks I work with know that I document my incident reviews with a liberal application of timelining. I first talked about timelining in Windows Forensic Analysis 2/e, published in 2009, and by the time Windows Forensic Analysis 3/e was published 3 yrs later, timelining had it's own chapter.

In his post, Mat quite correctly states that one of the issues with timelining is the plethora (my word, not his) of time stamp formats. This is abundantly true...64-bit formats, 32-bit formats, string formats, etc. Mat also states, in the section regarding "gaps", that "Analysts must infer or corroborate from context, which is tricky"; this is very true, but one of the purposes of a timeline is to provide that context, by correlating various data sources and viewing them side-by-side.

Not quite halfway into the post, Mat brings up ShimCache and AmCache, and with respect to ShimCache, refers to it as:

A registry artifact that logs executables seen by the OS. Specifically, it records the file path and the file’s last modified time at the moment the program was executed...

So, "yes" to "executables seen by the OS", but "no" to "at the time the program was executed".

Why do I say this? If you refer back to my previous blog post on this topic, and then refer to Mandiant's article on on ShimCache, the following statement will stand out to you:

It is important to understand there may be entries in the Shimcache that were not actually executed. [emphasis added]

So, a program doesn't actually have to be executed to appear in the ShimCache artifact.

With respect to the AmCache artifact, Mat states that it "does record execution times", but that is perhaps a too general, too broad-brush approach to the artifact. When considering the AmCache artifact in isolation, please refer to Analysis of the AmCache v2. For example, pg 27 of the linked PDF, under the "AmCache" section, states:

Furthermore, for the PE that is not part of a program, this is also a proof of execution. As for the last modification date of a registry File key, it corresponds with a run of ProgramDataUpdater more often than not.

This states that for Windows 10 version 1507, the File key LastWrite time is the last execution time, but not for the identified executable file.

Finally, as an additional resource, Chris Ray over at Cyber Triage recently posted an Intro to ShimCache and AmCache, where he stated:

Due to the complex nature of these artifacts, it’s best to think of this data under evidence of existence rather than evidence of execution. In certain scenarios you can show a file executed with a high degree of confidence, but should never be the definitive proof that something ran.

Mat also states in his post, "AmCache is often used in conjunction with ShimCache...", which may be the case, but the "conjunction" part should not end there. If you're attempting to demonstrate program execution, for example, you should use all of the artifacts that Mat mentions in his post (MFT, Prefetch, UserAssist, ShimCache, AmCache, etc.), if available, in conjunction with others, to not only demonstrate program execution, but to also provide much greater insight and context than you'd get from just one of the artifacts.

When I was taking explosives training in the military, they had a saying for detonators: One is none, two is one. The idea is that one detonator, by itself, could fail, and has failed. But the likelihood of one of two detonators failing is extremely small. This idea can also be applied to demonstrating any particular category in digital forensics, including program execution...one artifact by itself, in isolation, is essentially "none". It could fail to do it's job, particularly if we're talking about ShimCache or AmCache by themselves.

You should also consider additional artifacts to provide more granular context around the execution. If Process Tracking is enabled, the Security Event Log can be valuable, particularly if the system also has the Registry value set enabling full command lines. If Sysmon is installed, the Sysmon Event Log would prove incredibly valuable. The Application Event Log may provide indications of application failures, such as Application Pop-up or Windows Event Reporting failures. The Application Event Log may also contain DCOM/10028 messages referring to netscan or Advanced IP Scanner being executed. The Windows Defender Event Log may contain ../1116 records indicating a detection, followed by ../1119 records indicating a critical failure in attempting to quarantine the detected behavior.

So, What?
Why does any of this matter? Who cares?

When I was performing PCI forensic investigations, one of the things Visa (the de facto "PCI Council", at the time) wanted us to include in our reports was a value called "window of compromise". This equated to the time from when the endpoint was compromised and the credit card gathering malware was placed on it, to the point where the compromise was detected and responded to/remediated. During one investigation, I found that the endpoint had been compromised, the malware dropped and launched, and then shortly thereafter, the installed AV detected and quarantined the malware. The threat actor then returned about 6 weeks later, on about 6 Jan, and put the malware back on the endpoint; this one wasn't detected by the AV.

Now, if I had simply said that the "window of compromise" began when the malware was first placed on the system, without qualification or context, then Visa could have assessed a fine based on the number of credit cards processed over that 6 week period. That period was over the Thanksgiving-to-Christmas time frame is historical when more purchases are made, and the assessment of processing volume would have had a significant impact on the retailer.

At the time, the malware that a lot of threat actors were using had a component that was "compiled" Perl code, and each time it was launched, the "compiled" Perl runtime was extracted into a unique folder path. Using the creation and last modification times of those folders, we could determine when and how often these components were run. As the malware had been quarantined by the AV, as expected, we found no indication of these folders during that 6 wk period.

The outcome of an investigation...your findings...can have a profound impact on someone, or on an organization. As such, having context beyond just the ShimCache or the AmCache, incorrectly put forth as "evidence of execution" solely and in isolation, is extremely important.

I've Seen Things, pt II

As a follow-on to my previous post with this title, I wanted to keep the story going; in fact, there are likely to be several more posts in this series, so stay tuned.

And hey, I'm not the only one sharing my journey! Check out Josh's blog, particularly his recent post about how he broke into cybersecurity! I might have been drawn to Josh's post because, like me, he's a former Marine, although I can say that I was in the Corps back before computers were in common usage, back when we used radios that were 1950s tech, built in the 1970s. We didn't cross paths...I was off of active duty 12 yrs before Josh went to boot camp, but even so, there's some commonality in shared traditions and experiences.

Okay...back to it!

Programming
Programming is talked about a great deal within the industry, particularly within DFIR. Some folks will say that you absolutely need to be able to program, and even have very strong feelings about the language of choice, and others will do just fine with basic shell scripting and batch files. I've met some folks who are really great programmers, coming up with either individual projects, or more team or community based ones, like Volatility. A lot of the programming very often seems specialized, like HindSight, while other projects and contributions might be a bit more general. Even so, some of the absolute best DFIR analysts I've ever worked with have had minimal programming capabilities, not going much beyond shell scripting and regexes.

As a result, when it comes to programming, your mileage may vary. I will say this, though...the experience of programming, in whichever language or framework you opt for, has the benefit of helping you understand how to break things down into manageable "chunks". Whether you're writing some code to manage logs, or you're leading an IR engagement, you'll realize that to get from A to Z, you first have to get from A to B, then to C, and then to D, and so on. Accomplishing a task by writing code forces you to approach the problem in that manner, and as such, has benefits outside of just getting the coding task completed.

I started programming BASIC on a MacIIe in the early '80s, and then did some small programming tasks on a Timex-Sinclair 1000; "small" because saving programs to or loading them from a tape recorder, or copying them out of a magazine, was a whole new level of hell. In high school, I took AP Computer Science the first year that it was offered; the course used Pascal as the language of choice, and when I got to college, it was back to BASIC on TRS-80 systems. When I got to graduate school (mid-'90s), it was MatLab, a small amount of C and some M68000 assembly, and then Java. At one point, just as I was leaving active duty, I had a small consulting gig teaching Java programming to a team at a business.

After I moved to the private sector, I taught myself Perl, initially because the network engineers were looking for someone to join their team with that skillset. I later ran across Dave Roth's work, and found some really fantastic use for his modules, bought his book, and even got in touch with him directly. I continued to stick with Perl, as at the time, it was the only language that had a functioning module for accessing offline Registry hives. I'd started writing my own module for this, but ran across James Macfarlane's module and figure, why not?

The Roles
My first role out of the military was at a small defense contracting firm, and that really didn't work out. After a short stay, I moved on to Trident Data Systems; these folks were another defense contractor, with offices in San Antonio, TX, and LA. As it turned out, I ended up on the commercial team, which was great. We did vulnerability assessments, and because we had a really good sales guy, that's what we did, and as a team, we got really good at it. Every now and then, something a bit different would come along, like a pen test, but for the most part, we had a lot of work in vuln assessments.

My boss in this jo was a retired Army Colonel; right off the bat, he told me that we were using ISS's Internet Scanner product, and that it would probably take about 2 - 3 years of running the tool regularly to really understand what it was doing. I thought he was right...there was a pretty, and pretty complicated GUI, and if you didn't turn off some of the defaults, like the check of the net send "vulnerability", you'd end up sending a message to every desktop, creating a headache for the local admin.

As it turned out, within about 6 months, I started using those Perl skills I'd developed, along with Dave Roth's modules, to begin writing a replacement for Internet Scanner commercial product. The ISS product was a black box...you pushed a button, and it gave you answers back. Only we didn't know what the product was "looking at", nor how it was determining that something was "vulnerable". As we began to look closer and with a lot of digging into the MS KnowledgeBase, we began to figure out what the product was checking, and we started to see that some of the answers were wrong; one of the biggest issues we ran into was when the tool told us that the AutoAdminLogon functionality was "enabled" on 21 systems in one particular office, and it turned out that it was only truly enabled on one system. To make things worse, the customer knew that only one system had the functionality enabled; it had previously been enabled on the other 20 systems, but the customer had knowingly and painstakingly disabled it. So, it was a good thing that we were running the ISS product side-by-side with the new tool we were developing, and checking the work.

This was one of the first moments in my private sector career that really illustrated the need to understand, to really know, what your tools were doing and how they were doing it. This led to other moments throughout the ensuing 25+ years, as this lesson was continually revisited, again and again.

Monday, May 26, 2025

I've Seen Things

I like the movie "Blade Runner". I've read Philip K. Dick's "Do Androids Dream of Electric Sheep", on which the movie is based.

So what does this have to do with anything? Well, I've been around the industry for some time. I've added and swapped out devices in desktop computers...printers, modems, hard drives (watch those jumpers!), RAM. I've installed and re-installed operating systems, and I still have installation disks for Windows XP, MS/DOS 3.0, Windows 3.1, and Windows for Workgroups 3.11.

In the early '90s, my first computer was a 386SX. If you wanted to print stuff, you had to purchase a printer and a printer card, get them set up, and connect the printer to the connector on the printer card, on the back of the computer. The same was true for modems. I started off with BBSs, CompuServe, Prodigy, and then AOL. Before heading off to graduate school on the west coast, I got a copy of a SLIP/PPP dial up script that ran AT commands against the modem to negotiate a TCP connection; connecting to the raw Internet was a whole new world!

I was in graduate school (on active duty) when the SATAN scanner, and OS/2 Warp 3.0 were released. I went to Frye's Electronics in Sunnyvale, CA, to purchase a copy of OS/2 2.1, because it came with a $15 coupon for OS/2 Warp 3.0. Web Explorer, the browser that shipped with Warp, was the first browser to let you drag-and-drop images from the browser onto the desktop, which was huge at the time. I was using NetScape running on Solaris (SPARC Stations) in grad school, and that's not something we could do at the time.

During my graduate studies, I had to set up a lab...that is to say, I had to build one from scratch. I had two networks built on either side of Cisco 2514 routers; one was 10BaseT, the other 10Base2. Each network had 3 Windows 95 workstations, all of which were cobbled together, Frankenstein-style, from spare parts, and a Windows NT 3.51 server. All of this was connected to the campus area network (CAN) via 10Base5 vampire tap.

Before I left active service, I worked with a young Marine to get the entire Marine Corps Detachment at DLI connected to the CAN, via token ring. The young Marine I worked with was a college graduate, who, while in college, had worked with his frat brothers to create an ISP in their frat house. When he graduated and went to work for Microsoft, he had his partners cash him out of his share of the ISP. After working at Microsoft for a while, his plan was to learn Arabic in the military...a place where they teach you something and then require you to use it...and when his enlistment was up, he'd go back to Microsoft and say, "hey, I'm a former employee, I'm already vetted, and now I speak Arabic!" He and I went around the detachment, installing token ring networking cards that had already been purchased into machines, and getting the systems connected to the CAN. The purchase of the cards had been the result of a thesis by two previous graduates who had assessed the "best way" to get the detachment connected to the Internet; they'd studied the environment, made recommendations, submitted purchase orders...and then graduated. So, the PFC and I had a box full of network cards, and admin staff from the Army that we had to work with.

Not long after leaving active service, I was performing vulnerability assessments for various organizations. At one point, I was sitting in offices in one of the twin towers in New York (circa 1998), performing a war dialing exercise. Our laptops had modems installed (most laptops did, at the time), and we came armed with both THCScan and ToneLoc. With this exercise, we were given a cubicle in the "cube farm", where there was only a single phone line. So, my co-worker and I connected the laptop and began our exercise...the volume on the laptop was turned down low, and even then we realized that the number we were dialing were stepping across the cube farm we were in, so we huddled closer to the laptop, and even threw our suit jackets over the laptop and our heads, as we could hear the folks in the cubicles on either side of ours answer their phones, say "hello" a couple of times, and hang up. The whole time, we were hoping that no one would catch on to us, and thankfully, no one did.

PS: Here's a link to Roy Batty's "I've seen things" monologue.

Thursday, March 20, 2025

Know Your Tools

In 1998, I was in a role where I was leading teams on-site to conduct vulnerability assessments for

organizations. For the technical part of the assessments, we were using ISS's Internet Scanner product, which was a commercial scanner. Several years prior, while I was in graduate school, the SATAN scanner had been released, but it was open source, and you could look at the code and see what it was doing. This wasn't the case with Internet Scanner.

What we started to see, when we began looking closely, was that the commercial product was returning results that weren't...well...correct. One really huge example was the AutoAdminLogon setting; you could set this value to "1", and the Administrator account name you chose would be included in another value, and the password would be included in a third value, in plain text. When the system was restarted, those credentials would be used to automatically login to the system.

Yep. Plain text.

Anyway, we ran a scan across an office within a larger organization, and the product returned 21 instances where the AutoAdminLogon capability was enabled. However, the organization knew that only one had that functionality actually set; the other 20 had had it set at one point, but the capability had been disabled. On those 20 systems, the AutoAdminLogon value was set to "0". We determined that the commercial product was checking for the existence of the AutoAdminLogon value only, and not going beyond that...not checking to see if the value was set to "1", and not checking to see if the value that contained the plain text password actually existed.

We found a wide range of other checks that were similarly incorrect, and others that were highly suspect. So, we started writing a tool to replace the commercial product, called NTCAT. This tool had various queries, all of which were based on research and included references to the Microsoft KnowledgeBase, so anyone running the tool could look it up, understand what was being queried, what responses meant, and could explain it to the customer.

Later, when supporting CBP in a consulting role and waiting for my agency clearance, the Nessus scanner was a scanning tool that was popular at the time. One day, I heard a senior member of the CIRT talking to someone else who was awaiting their clearance, telling them that the Nessus scanner determined the version of the Windows operating system by firing off a series of specially crafted TCP packets at the target endpoint (perhaps via nmap), and then mapping the responses to a matrix. I listened, thinking that was terribly complicated. I found a system with Nessus installed, and started reading through the various modules, and found that Nessus determined the version of Windows by attempting to make an SMB connection to the target endpoint, and reading the Registry. If the scanner was run without the necessary privileges, a lot of the modules would not connect, and would simply fail.

Jump forward to 2007 and 2008, and the IBM ISS X-Force ERS team was well into performing PCI forensic exams. At the time, we were one of seven companies on a list of certified organizations; merchants informed by their banks that they needed a forensic exam would go to the PCI web site, find the names of the companies listed, and invariably call through all seven companies to see who could offer the best (i.e., lowest) price, not realizing that, at the time, the price was set by Visa (this is prior to the PCI Council being stood up).

As our team was growing, and because we were required to meet very stringent timelines regarding providing information and reporting, Chris Pogue and I settled on a particular commercial tool that most of our analysts were familiar with, and provided the documented procedures for them to move efficiently through the required processes, including file name, hash, path searches, and scans for credit card numbers.

During one particular engagement, someone from the merchant site informed us that JCB and Discover cards were processed by the merchant. This was important, because our PCI liaison needed to get points of contact at those brands, so we could share compromised credit card numbers (CCNs). We started doing our work, and found that we weren't getting any hits for those two card brands.

The first step was to confirm that, in fact, the merchant was processing those brands...we got the thumbs-up. Next, we went to the brands, got testing CCNs, and ran our process across those numbers...only to get zero hits. Nada. Nothing. Zippo.

It turned out that the commercial suite we were using included an internal function called IsValidCreditCard(), and through extensive testing, and more than a few queries on the user forums, found out that the function did not recognize those two brands as valid. So, with some outside assistance, we wrote a function call to override the internal function call, and had everyone on our team add it to their systems. The new function ran a bit slower, but Chris and I were adamant with the team that long-running processes like credit card and AV scans should be run in the evening, not started at the beginning of your work. This way, you didn't tie up an image with a long running process when you could be doing actual work.

In 2020, I was working at an IR consulting provider, and found that some of the team used CyLR; at the time, the middleware was plaso, and the backend was Kibana. In March of that year, Microsoft released a fascinating blog post regarding human-operated ransomware, in which they described the DoppelPaymer ransomware as using WMI persistence. Knowing that the team had encountered multiple ransomware engagements involving that particular variant, I asked if they'd seen any WMI persistence. They responded, "no". I asked, how did you determine that? They responded that they'd check the Kibana output for those engagements, and got no results.

The collection process for that toolset obtained a copy of the WMI repository, so I was curious as to why no results were observed. I then found out that, at least at the time, plaso did not have a parser for the WMI repository; as such, the data was collected, but not parsed...and the result of "no findings" in the backend was accepted without question.

All of this is just to say that it's important to know and understand how your tools work. When I ran an internal SOC, the L3 DF analysts were able to dump memory from endpoints using the toolset we employed. Very often, they would do so in order to check for a particular IP address; however, most of them felt that running strings and searching for the IP address in question was sufficient. I had to work with them to get them to understand that (a) IP addresses, for the most part, are not stored in memory in ASCII/Unicode, and (b) different tools (Volatility, bulk_extractor) look for different structures in order to identify IP addresses. So, if they were going to dump memory, running strings was neither a sufficient nor appropriate approach to looking for IP addresses.

Know how your tools work, how they do what they do. Understand their capabilities and limitations. Sometimes you may encounter a situation or circumstance that you hadn't thought of previously, and you'll have to engage, ask questions, and intentionally engage in order to make a determination as the tools ability to address the issue.

Monday, March 17, 2025

WMI

The folks over at CyberTriage recently shared a complete guide to WMI; it's billed as a "complete guide to WMI malware", and it covers a great deal more than just malware. They cover examples of discovery and enumeration, as well as execution, but what caught my attention was persistence. This is due in large part to an investigation we'd done in 2016 that led to a blog post about a novel persistence mechanism. The persistence mechanism illustrated in the blog post bore remnants similar to what was seen in this Mandiant BlackHat 2014 presentation (see slide 44).

What's interesting is that we continue to see this WMI persistence mechanism used again and again, where event consumers are added to the WMI repository. In addition to the 2016 blog post mentioned previously, MS's own Human-operated ransomware blog post from 2020 includes the statement, "...evidence suggests that attackers set up WMI persistence mechanisms...".

In addition to some of the commands offered up by the CyberTriage guide and other resources, MS's own AutoRuns tool includes a check for WMI persistence mechanism on live systems.

There are also a number of tools for parsing the WMI repository/OBJECTS.DATA file for event consumers added for persistence during disk or "dead box" forensics, such as wmi-parser and flare-wmi.

Chad Tilbury shared some really great info in his blog post, Finding Evil WMI Event Consumers with Disk Forensics.

Disk forensics isn't just about parsing the WMI repository; there's also the Windows Event Log. From this NXLog blog post regarding WMI auditing, look for event ID 5861 records in the WMI/Operational Event Log.

I know that some folks like to use plaso, and while it is a great tool, I'm not sure that it parses the WMI repository. I found this issue regarding adding the capability, but I haven't seen where the necessary parser has been added to the code. If this capability has been added, I'd greatly appreciate it if someone could link me to a resource that describes/documents this fact. Thanks!

Monday, March 10, 2025

The Problem with the Modern Security Stack

I read something interesting recently that stuck with me. Well, not "interesting", really...it was a LinkedIn post on security sales. I usually don't read or follow such things, but for some reason, I started reading through this one, and really engaging with the content. This piece said that "SOC analysts want fewer alerts", and went on with further discussions of selling solutions for the "security stack". I was drawn in by the reference to "security stack", and it got me to thinking...what constitutes a "security stack"? What is that, exactly?

Now, most folks are going to refer to various levels of tooling, but for every definition you hear, I want you to remember one thing...most of these "security stacks" stand on an extremely weak foundation. What this means is that if you lay your "security stack" over default installations of OSs and applications, with no configuration modifications or "hardening", and if you have no asset inventory, and you haven't performed even the most high-level attack surface reduction, it's all for naught. It's difficult to filter out noise and false positives in detections when nothing has been done to configure the endpoints themselves to reduce noise.

One approach to a security stack is to install EDR and other security tooling on the endpoints (all of them, one would hope), and manage it yourself, via your own SOC. I know of one organization several years ago that had installed EDR on a subset of their systems, and enabled in learning mode. Unfortunately, it was a segment on which a threat actor was very active, and rather than being used to take action against the threat actor, the EDR learned that the threat actor's activity was "normal".

I know of another organization that was hit by a threat actor, and during the after action review, they found that the threat actor had used "net user" (native tool/LOLBin) to create new user accounts within their environment. They installed EDR, and were not satisfied with the default detection rules, so they created one to detect the use of net.exe to create user accounts. They were able to do this because they knew that within their organization, they did not use this LOLBin to manage user accounts, and they also knew which app they used, which admins did this work, and from which workstations. As such, they were able to write a detection rule with 100% fidelity, knowing that any detection was going to be malicious in nature.

What happens if you outsource your "security stack", even just part of it, and don't manage that stack yourself (usually referred to as MDR or XDR)? Outsourcing your security stack can become even more of an issue, because while you have access to expertise (you hope), you're faced with another issue all together. Those experts in response, and detection engineering are now faced with receiving data from literally hundreds of other infrastructures, all in similar (but different) states of disarray as yours. The challenge then becomes, how do you write detections so that they work, but do not flood the SOC (and ultimately, customers) with false positives?

In some cases, the answer is, you don't. There are times when the activity that is 100% malicious on one infrastructure, is part of a critical business process for others. While I worked for one MDR company in particular, we saw that...a lot. We had a customer that had their entire business built on sharing Office documents with embedded macros over the Internet...and yes, that's exactly how a lot of malware made/makes it on to networks. We also had other customers for whom MSWord or Excel spawning cmd.exe or PowerShell (i.e., running a macro) could be a false positive. Under such circumstances, do you keep the detections and run the risk of regularly flooding the SOC with alerts that are all false positives, or do you take a bit of a different approach and focus on detecting post-exploitation activity only?

Figure 1: LinkedIn Post

A recent LinkedIn post from Chris regarding the SOC survey results is shown in figure 1. One of the most significant issues for a SOC is the "lack of logs". This could be due to a number of reasons, but in my experience over the years, the lack of logs is very often the result of configuration management, or lack thereof. For example, by default MSSQL servers log failed login attempts, and modifications to stored procedures (enabling, disabling); successful logins are not recorded by default. I've also seen customers either disable auditing of both successful logins and failed login attempts, or turn up the auditing so high that the information needed when an attack occurs is overwritten quickly, sometimes within minutes, or even quicker. All of this goes back to how the foundation of the security stack, the operating system and installed applications, are built, configured, and managed.

Figure 2: SecureList blog Infection Flow

Figure 2 illustrates the Infection Flow from a recent SecureList article documenting findings regarding the SideWinder APT. The article includes the statement, "The attacker sends spear-phishing emails with a DOCX file attached. The document uses the remote template injection technique to download an RTF file stored on a remote server controlled by the attacker. The file exploits a known vulnerability (CVE-2017-11882) to run a malicious shellcode..."; yes, it really does say "CVE-2017-11882", and yes, the CVE was published over 7 years ago. I'm sharing the image and link to the article not to shame anyone, but rather to illustrate that the underlying technology employed by many organizations may be out of date, unpatched, and/or consisting of default, easily compromised configurations.

The point I'm making here is that security stacks built on a weak foundation are bound to have problems, perhaps even catastrophic ones. A strong foundation begins with an asset inventory (of both systems and applications), and attack surface reduction (through configuration, patching, etc.). Very often, it doesn't take a great deal to harden systems; for example, here's a Huntress blog post where Dray provided free PowerShell code to provide a modicum of "hardening" to endpoints.

Common issues include:

- Publicly exposed RDP on servers and workstations, with no MFA; no one is watching the logs, so they don't see the brute force attacks, from public IP addresses

- Publicly exposed RDP with MFA, but other services not covered by MFA (SMB, MSSQL) are also exposed, so the MFA can be disabled; this applies to other security services, as well, such as anti-virus, and even EDR

- Exposed, unpatched, out of date services

- Disparate endpoints that are not covered by security services (webcams, anyone?)

Wednesday, February 19, 2025

Lina's Write-up

Lina recently posted on LinkedIn that she'd published another blog post. Her blog posts are always well written, easy to follow, fascinating, and very informative, and this one did not disappoint.

In short, Lina says that she found a bunch of Chinese blog posts and content describing activity that Chinese cybersecurity entities have attributed to what they refer to as "APT-C-40", or the NSA. So, she read through them, translated them, and mapped out a profile of the NSA by overlaying the various write-ups.

Lina's write-up has a lot of great technical information, and like the other stuff she's written, is an enthralling read. Over the years, I've mused with others I've worked with as to whether or not our adversaries had dossiers on us, or other teams, be they blue or red. As it turns out, thanks to Lina, we now know what they do, what those dossiers might look like, and the advantage that the eastern countries have over the west.

For me, the best part of the article was Lina's take-aways. It's been about 30 yrs since I touched a Solaris system, so while I found a lot of what Lina mentioned in the article interesting (like how the Chinese companies knew that APT-C-40 were using American English keyboards...), I really found the most value in the lessons that she learned from her review and translation of open Chinese reporting. Going forward, I'll focus on the two big (for me) take-aways:

There is a clear and structured collaboration...

Yeah...about that.

A lot of this has to do with the business models used for DFIR and CTI teams. More than a few of the DFIR consulting teams I've been a part of, or ancillary to, have been based on a utilization model, even the ones that said they weren't. A customer call comes in, and the scoping call results in an engagement of a specific length; say, 24 or 48 hrs, or something like that. The analyst has to collect information, "do" analysis and write a report, eating any time that goes over the scoped time frame, or taking shortcuts in analysis and reporting to meet the timeline. As such, there's little in the way of cross-team collaboration, because, after all, who's going to pay for that time?

In 2016, I wrote a blog post about the Samas (or SamSam) ransomware activity we'd seen to that point. This was based on correlation of data across half a dozen engagements, each worked by a different analyst. The individual analysts did not engage with each other; rather, they simply proceeded through the analysis and reporting of their engagement, and were then assigned to other engagements.

Shortly after that blog post was published, Kevin Strickland published his analysis of another aspect of the attacks; specifically, the evolution of the ransomware itself.

Two years later, additional information was published about the threat group itself, some of which had been included in the original blog post.

The point is that many DFIR teams do not have a business model that facilitates communications across engagements, and as such, analysts aren't well practiced at large scale communications. Some teams are better at this than others, but that has a lot to do with the business model and culture of the team itself.

Overall, there really isn't a great deal of collaboration within teams and organizations, largely because everyone is silo'd off by business models; the SOC has a remit that doesn't necessarily align with DFIR, and vice versa; the CTI team doesn't have much depth in DFIR skill sets, and what the CTI team publishes isn't entirely useful on a per-engagement basis to the DFIR team. I've worked with CTI analysts who are very, very good at what they do, like Allison Wikoff (re: Mia Ash), but there was very little overlap between the CTI and IR teams within those organizations.

Now, I'm sure that there's a lot of folks reading this right now who're thinking, "hey, hold on...I/we collaborate...", and that may very well be the case. What I'm sharing is my own experience over the passed 25 yrs, working in DFIR as a consultant, in FTE roles, running and working with SOCs, working in companies with CTI teams, etc.

This is an advantage that the east has over the west; collaboration. As Lina mentioned, a lot of the collaboration in the west is through closed, invite-only groups, so a lot of what is found isn't necessarily shared widely. As a result, those that are not part of those groups don't have access to information or intel that might validate their own findings, or fill in some gaps. Further, those who aren't in these groups have information that would fill in gaps for those who are, but that information can't be shared, nor developed.

...Western methodologies typically focus on constructing a super timeline...

My name is Harlan, and I'm a timeliner. Not "super timelines"...while I'm a huge fan of Kristinn (heck, I bought the guy a lollipop with a scorpion inside once), I'm a bit reticent to hand over control of my timeline development to log2timeline/plaso. This is due, in part, to knowing where the gaps are, what artifacts the tool parses, and which ones it doesn't. Plaso and it's predecessor are great tools, but they don't get everything, particularly not everything I need for my investigations, based on my analysis goals.

Okay, getting back on point...I see what Lina's saying, or perhaps it's more accurate to say, yes, I'm familiar with what she describes. In several instances, I've done a good bit of adversary profiling myself, without the benefit of "large scale data analysis using AI" because, well, AI wasn't available, and I started out my investigation looking for those things. In one instance, I could see pretty clearly not just the hours of operation of the adversary, but we'd clearly identified two different actors within the group going through shift changes on a regular basis. On the days where there was activity on one of the nexus endpoints, we'd see an actor log in, open a command prompt/cmd.exe, and then interact with the Event Logs (not clearing them). Then, about 8 hrs later (give or take), that actor would log out, and another actor would log in and go directly to PowerShell.

Adversary profiling, going beyond IOCs and TTPs to look at hours of operation/operational tempo, situational awareness, etc., is not something that most DFIR teams are tasked or equipped for, and deriving that sort of insight from intrusion data is not something either DFIR or CTI teams are necessarily equipped/staffed for. This doesn't mean that it doesn't happen, just that it's not something that we, in the West, see in reporting on a regular basis. We simply don't have a culture of collaboration, neither within nor across organizations. Rather, if detailed information is available, many times it's thought to be held close to the vest, as part of a competitive advantage. In my experience, it's less about competitive advantage, and more often the case that, while the data is available, it's not developed into intel, nor insights.

Conclusion
I really have to applaud Lina for not only taking the time to, as she put it, dive head-first into this rabbit hole, and for putting forth the effort and having the courage to publish her findings. In his book Call Sign Chaos, Gen. Mattis referred to the absolute need to be well-read, and that applies not just to warfighters, but across disciplines, as well. However, in order for that to be something that we can truly take advantage of, we need writing like Lina's to educate and inspire us.

Sunday, February 16, 2025

The Role of AI in DFIR

The role of AI in DFIR is something I've been noodling over for some time, even before my wife first asked me the question of how AI would impact what I do. I guess I started thinking about it when I first saw signs of folks musing over how "good" AI would be for cybersecurity, without any real clarity, nor specification as to how that would work.

I recently received a more pointed question regarding the use of AI in DFIR, asking if it could be used to develop investigative plans, or to identify both direct and circumstantial evidence of a compromise.

As I started thinking about the first part of the question, I was thinking to myself, "...how would you create such a thing?", but then I switched to "why?" and sort of stopped there. Why would you need an AI to develop investigative plans? Is it because analysts aren't creating then? If that's the case, then is this really a problem set for which "AI" is a solution?

About a dozen years ago, I was working at a company where the guy in charge of the IR consulting team mandated that analysts would create investigative plans. I remember this specifically because the announcement came out on my wife's birthday. Several months later, the staff deemed the mandate a resounding success, but no one was able to point to a single investigative plan. Even a full six months after the announcement, the mandate was still considered a success, but no one was able to point to a single investigative plan.

My point is, if your goal is to create investigative plans and you're looking to AI to "fill the gap" because analysts aren't doing it, then it's possible that this isn't a problem for which AI is a solution.

As to identifying evidence or artifacts of compromise, I don't believe that's necessarily a problem set that needs AI as the solution, either. Why is that? Well, how would the model be trained? Someone would have to go out and identify the artifacts, and then train the model. So why not simply identify and document the artifacts?

There was a recent post on social media regarding investigating WMI event consumers. While the linked resource includes a great deal of very valuable information, it's missing one thing...specific event records within the WMI-Activity/Operational Event Log that apply to event bindings. This information can be found (it's event ID 5861) and developed, and my point is that sometimes, automation is a much better solution than, say, something like AI, because what we see at the 'training set' is largely insufficient.

What do I mean by that? One of the biggest, most recurring issues I continue to see in DFIR circles is the misrepresentation (some times subtle, some times gross) of artifacts such as AmCache and ShimCache. If sources such as these, which are very often incomplete and ambiguous, leaving pretty significant gaps in understanding of the artifacts, are what constitutes the 'training set' for an AI/LLM, then where is that going to leave us when the output of these models is incorrect? And at this point, I'm not even talking about hallucinations, just models being trained with incorrect information.

Expand that beyond individual artifacts to a SOAR-like capability; the issues and problems simply become compounded as complexity increases. Then, take it another step/leap further, going from a SOAR capability within a single organization, to something so much more complex, such as an MDR or MSSP. Training a model in a single environment is complex enough, but training a model across multiple, often wildly disparate environments increases that complexity by orders of magnitude. Remember, one of the challenges all MDRs face is that what is a truly malicious event in one environment is often a critical business process in others.

Okay, let's take a step back for a moment. What about using AI for other tasks, such as packet analysis? Well, I'm so glad you asked! Richard McKee had that same question, and took a look at passing a packet capture to DeepSeek:

The YouTube video associated with the post can be found here.

Something else I see mentioned quite a bit is how AI is going to impact DFIR, by allowing "bad guys" to uncover zero day exploits. That's always been an issue, and I'm sure that the new issue with AI is that bad guys will cycle faster on developing and deploying these exploits. However, this is only really an issue for those who aren't prepared; if you don't have an asset inventory (of both systems and applications), haven't done anything to reduce your attack surface, haven't established things like patching and IR procedures...oh, wait. Sorry. Never mind. Yeah, that's going to be an issue for a lot of folks.

Monday, January 20, 2025

Artifacts: Jump Lists

In order to fully understand digital analysis, we need to have an understanding of the foundational methodology, as well as the various constituent artifacts on which a case may be built. The foundational methodology starts with your goals...what are you attempting to prove or disprove...and once you understand the goals of your analysis, you can assemble the necessary artifacts to leverage in pursuit of those goals.

Like many of the artifacts we might examine on a Windows system, Jump Lists can provide useful information, but they are most useful when viewed in conjunction with other artifacts. Viewing artifacts in isolation deprives the analyst of valuable context.

Dr. Brian Carrier recently published an article on Jump List Forensics over on the CyberTriage blog. In that article, he goes into a good bit of depth regarding both the Automatic and Custom Jump Lists, and for the sake of this article, I'm going to cover just the Automatic Jump Lists.

As Brian stated in his article, Jump Lists have been around since Windows 7; I'd published several articles on Jump Lists going back almost 14 years at this point. Jump Lists are valuable to analysts because they're (a) created as a result of user interaction via the Windows Explorer shell, (b) evidence of program execution, and (c) evidence of data or file access.

Automatic Jump Lists follow the old Windows OLE "structured storage" format. Microsoft refers to this as the "compound file binary" format and has thoroughly documented the format structures. Some folks who've been around the industry for a while will remember that the OLE format is what Office documents used to use, and that there was a good bit of metadata associated with these documents. In fact, a good way to find the old school "OG" analysts still hanging around the industry is to mention the Blair document. And the format didn't disappear when Office was updated to the newer style format; rather, the format is used an other areas, such as Jump Lists, and at one point was used for Sticky Notes.

Here's my code for parsing the "structured storage" format; this was specifically developed for Windows 7 Automatic Jump Lists, but the basic code can be repurposed for OLE files, in general, or specifically updated for specific field (i.e., the DestList stream) in newer versions of Windows.

As you saw in Brian's article, Automatic Jump Lists are specific to each user, and are found within the user's profile path. Each Automatic Jump List is named using an "application identifier" or "AppID". This is a value that identifies the application used to open the target files (Notepad, Notepad++, MSWord, etc.), and is consistent across platforms. This means that an AppID that refers to a particular application on a Windows system will remain the same on other Windows systems.

Microsoft has referred to the "structured storage" format as a "file system within a file"; if you do a study of the format, you'll see why. This structure results in various 'streams' being within the file, and for Automatic Jump Lists, there two types of streams. Most of the streams in a Automatic Jump List file contain a stream structure that follows the Windows shortcut/LNK file format.

The other type of stream is referred to as the "DestList" stream, and the structure of this stream on Windows 7 systems was first documented about 14 yrs ago. The following figure illustrates an Automatic Jump List opened in the Structured Storage Viewer, with the DestList stream highlighted.

The structure of the DestList stream changed slightly between Windows 7 and 10 (and maybe again with Windows 11, I haven't looked yet...), but the overall structure of the Automatic Jump List files remains essentially the same.

Summary
Automatic Jump Lists help analysts validate that a user was active on the system via the Windows shell (as well as when), that they launched applications (program execution), and that they used those applications to open files (file/data access), and when they did so. As such, parsing Jump Lists and including the data in a timeline can add a good deal of granularity and context to the timeline, particularly as it pertains to user activity.

As always, Automatic Jump Lists should be used in conjunction with other artifacts, such as Prefetch, UserAssist, RecentDocs, etc., and should not be viewed in isolation, pursuant to the analyst's investigative goals.

Something else to remember is this...Automatic Jump Lists are generated by the operating system as the user interacts with the environment. As such, if an application is added, the user uses that application and Automatic Jump Lists are generated, and then the user removes the application, the Automatic Jump Lists remain. The same thing happens with other artifacts, such as Recents shortcuts/LNK files, Registry values, etc. So, as with other artifacts, Automatic Jump Lists can provide indications of applications previously installed or files that previously existed on (or were accessed from) the endpoint.

Monday, January 06, 2025

Carving

Recovering deleted data, or "carving", is an interesting digital forensics topic; I say "interesting" because there are a number of different approaches and techniques that may be valuable, depending upon your goals.

For example, I've used X-Ways to recover deleted archives from the unallocated space of a web server. A threat actor had moved encrypted archives to the web server, and we'd captured the password they used via EDR telemetry. The carving revealed about a dozen archives, which we opened using the captured password, which allowed our customer to understand what data had been exfil'd, and their risk and exposure.

But carving can be about more than just recovering files from unallocated space. We can carve files and records from unstructured data, or we can treat 'structured' data as unstructured and attempt to recover records. We did this quite a bit during PCI forensic investigations, and found a much higher level of accuracy/fidelity when we carved for track 1 and 2 data, rather than just credit card numbers.

We can also carve within files themselves. Several common file formats are essentially databases, and some are described as a "file system within a file". As such, deleted records and data can be recovered from such file formats, if necessary.

I recently ran across a fascinating post from TheDFIRJournal recently, regarding file carving encrypted virtual disks. The premise of the post is that some file encryption/ransomware software does not encrypt entire files, just rather just part of it, for the sake of speed. In the case of virtual disks, a partially encrypted file may mean that, while the disk itself is useable, there may be valuable evidence available within the virtual disk file itself.

I should note that I did recently see a ransomware deployment that used a "--mode fast" switch at the command line, possibly indicating that the entire file would not be encrypted, but rather only a specific number of bytes of the file. As such, with larger files, such as virtual disks, WEVT files, etc., there might be an opportunity to recover valuable data, so file and record carving techniques would be valuable, depending upon your specific investigative goals.

The premise raised in the article is not unique; in fact, I've run into it before. In 2017, when NotPetya hit, we received a number of system images from customers where the MBR was overwritten. We had someone on our team who could reconstruct the MBR, and we also ran carving for WEVTX records, recovering Security-Auditing/4688 records indicating process creation. The customers had not enabled full command lines being recorded, but we were able to reconstruct enough data to illustrate the sequence of processes specific to the infection and impact. So, having a disk image where the MBR and/or the MFT is overwritten is not a new situation, simply one we haven't encountered recently.

TheDFIRJournal article covers a number of tools, including PhotoRec, scalpel (not currently being maintained), and Willi Ballenthin's EVTXtract. The article also covers Simson Garfinkel's bulk_extractor, but looking at the bulk_extractor Github, there do not appear to be releases for Windows starting with version 2.0. While some folks have stated that bulk_extractor-rec's capabilities have been added to bulk_extractor, that's kind of a moot point, and the latest release of bulk_extractor-rec will have to suffice.

Addendum, 7 Jan 2025: Thanks to Brian Maloney for sharing that the bulk_extractor 2.0 for Windows CLI tool can be found here.

Also from the article, the author mentioned the use of a customer EVTXParser script, which can be found here. I like this approach, as I'd done something similar with the WinXP/2003 EVT files, where I'd written lfle.pl to parse EVT records from unstructured data, which could include a .EVT file. I wrote this script (a 'compiled' Windows EXE is also available) after finding two complete records embedded in an .EVT file that were not "visible" via the Event Viewer, nor any other tools that started off by reading the file header to determine where the records were located. The script then evolved into something you could run against any data source. While not the fastest tool, at the time it was the only tool available that would take this approach.

In the past, I've done carving on unallocated space within a disk image, using something like blkls to get the uallocated space into on contiguous file of unstructured data. From there, running tools like bulk_extractor allow for record carving.

I've also has pretty good success running bulk_extractor across memory dumps; this is something I talked about/walked through in my book, Investigating Windows Systems.

Carving can also be done on individual files. For example, in 2013, Mari DeGrazia published a great blog post on recovering deleted data from SQLite databases, and carving Registry hive files for deleted keys and values, as well as examining unallocated space within hive files is something I've been a fan of for quite some time. My thanks go to Jolanta Thomassen for 'cracking the code' on deleted cells within Registry hive files!

Here's a presentation I put together a while back that includes information regarding unallocated space within Registry hive files.

Addendum, 13 Jan: Damien Attoe released his first blog post regarding a tool he's working on called "sqbite"; the alpha functionality is what's currently available, and Damien plans to release additional functionality in March. Reading through his blog post, it appears that Damien is working toward something similar to what Mari talked about and released. It's going to be interesting to see what he develops!