Sunday, July 31, 2022

Virtual Images for Testing

Many within the DFIR community make use of virtual systems for testing...for detonating malware, trying things within a "safe", isolated environment, etc. However, sometimes it can be tough to get hold of suitable images for creating that testing environment.

I've collected a bunch of links to VirtualBox VMs for Windows, but I cannot attest to all of them actually working. But, if you'd like to try any of them, here they are...

MS Edge developer virtual machines (Win7 - 10, limited time)
Windows 7 Image, reports no activation needed
Win95 virtual machine
Various MS virtual machines (MS-DOS, Windows, etc.)
Windows 11 Dev Environment (eval)
Use Disk2vhd to create a virtual machine from an existing installation
ReactOS - clone of Windows 5.2 (XP/2003)

There's no shortage of Linux and Unix variant OS VMs available. For example, you can find Solaris VMs here. For MacOS Big Sur, you can try this site.

Back in 1994 and '95, while I was in graduate school, I went to Frye's Electronics in Sunnyvale (across the street from a store called "Weird Stuff") and purchased a copy of OS/2 2.1. I did that because the box came with a $15 coupon for the impending OS/2 Warp 3.0. If you'd like to give the OS/2 Warp OS a shot, you can try this v4.52 download, or try this site for other versions of OS/2.

If you're a fan of CommodoreOS, you can give this site a shot. For AmigaOS, try here, or here. How about Plan9?

General Download Sites

Hope that helps!

EDR Blindness, pt II

As a follow-on to my earlier blog post, I've seen a few more posts and comments regarding EDR 'bypass' and blinding/avoiding EDR tools, and to be honest, my earlier post stands. However, I wanted to add some additional thoughts...for example, when considering EDR, consider the technology, product, and service in light of not just the threat landscape, but also the other telemetry you have available. 

This uberAgent article was very interesting, in particular the following statement:

“DLL sideloading attack is the most successful attack as most EDRs fail to detect, let alone block it.”

The simple fact is, EDR wasn't designed to detect DLL side loading, so this is tantamount to saying, "hey, I just purchased this brand new car, and it doesn't fly, nor does it drive underwater...". 

Joe Stocker mentions on Twitter that the Windows Defender executable file, MpCmdRun.exe, can be used for malicious DLL side loading. He's absolutely right. I've seen EXEs from other AV tools...Kaspersky, McAfee, Symantec...used for DLL side loading during targeted threat actor attacks. This is nothing new...this tactic has been used going back over a decade or more.

When it comes to process telemetry, most EDR starts by collecting information about the process creation, and many will get information about the DLLs or "modules" that are loaded. Many will also collect telemetry about network connections and Registry keys/values accessed during the lifetime of the process, but that's going a bit far afield and off topic for us.

There are a number of ways to detect issues with the executable file image being launched. For example, we can take a hash of the EXE on disk and compare it to known good and known bad lists. We can check to see if the file is signed. We can check to see if the EXE contains file version information, and if so, compare the image file name to the embedded original file name. 

Further, many EDR frameworks allow us to check the prevalence of executables within the environment; how often has this EXE been seen in the environment? Is this the first time the EXE has been seen?

However, something we cannot do, because it's too 'expensive', is to maintain a known good list of all application EXE files, their associated DLLs, as well as their hashed and locations, and then compare what we're seeing being launched to that database. This is why we need to have other measure in place in a defense in depth posture. Joe goes on to mention what he'd like to see in an EDR tool, so the question is, is there an available framework that allows this condition to be easily identified, so that it can be acted upon?

DLL side loading is not new. Over a decade ago, we were seeing legitimate EXEs from known vendors being dropped on systems, often in the ProgramData folder, and the "malicious" DLL being dropped in the same folder. However, it's been difficult to detect because the EXE file that is launched to initiate the process is a well-known, legitimate EXE, but one of the DLLs it loads, which if often found in the same folder as the EXE, is in fact malicious. Telemetry might pick up the fact that the process, when launched, had some suspicious network activity associated with it, perhaps even network activity that we've never seen before, but a stronger indicator of something "bad" would be the persistence mechanism employed. Hey, why is this signed, known good EXE, found in a folder that it's not normally found in (i.e., the ProgramData folder), being launched from a Run key, or from a Scheduled Task that was just created a couple of minutes ago?

The point is, while EDR frameworks may not directly identify DLL side loading, as described in my previous blog post, we can look for other phases of the attack cycle to help use identify such things. We may not directly detect the malicious DLL being "side loaded", but a good bit of activity is required for a threat actor to get to the point of DLL side loading...gain access to the system or infrastructure, create files on the system, create persistence, etc. We can detect all of these activities to help us identify potential DLL side loading, or simply inhibit or even obviate further phases of the attack.

Kevin had an interesting tweet recently about another EDR 'bypass', this one involving the use of WindowsTerminal (wt.exe) instead of the usual command prompt (cmd.exe). I've been interested in the use of Windows Subsystem for Linux (wsl.exe) for some time, and would be interested to see how pervasive it is in various environments. However, the point is, if you're able to monitor new processes being created via the cmd.exe, are you also able to do the same with other shells, such as Powershell, wt.exe, or wsl.exe?

Finally, something that I've seen for years in the industry is that it doesn't matter how many alerts are being generated by an EDR framework if no one is listening, or if the alerts are misinterpreted and not acted upon. In this blog post on history repeating itself I shared an example of what it looked like well before the advent of EDR when no one was listening for alerts. Not long after that event, we saw in the industry what happened when firewall (or other device) logs were generated but no one monitored them. This is something that we've seen consistently over the past 30+ years.

I was once on an engagement where the threat actor had accessed a Linux-based infrastructure and been able to access the corporate Windows network; two disparate infrastructures that were not supposed to be connected at all were actually a flat network. The threat actor collected data in an unprotected archive and copied (not "moved") it to the Linux infrastructure. We had both copies of the archive, as well as corresponding netflow that demonstrated the transfer. One of the other analysts in the room offered their insight that this threat actor was not "sophisticated" at all, and was, in fact, rather sloppy with respect to their opsec (they'd also left all of their staged tools on the systems).

I had to remind the team that we were there several months after the threat actor had taken the data and left. They'd completed what they had wanted to do, completely unchallenged, and we were here looking at their footprints.

I've seen other incidents where an external SOC or threat hunting team had sent repeated email notifications to a customer, but no action was taken. It turned out that when the contract was set up, one person was designated to receive the email notifications, but they'd since left the organization. In more than a few instances, alert emails went to a communal inbox and the person who had monitored that inbox had since left the company, or was on vacation, or was too overwhelmed with other duties to keep up with the emails. If there is no plan for what happens when an alert is generated and received, it really doesn't matter what technology you're using.

Tuesday, July 26, 2022

Rods and Cones, and EDR "blindness"

I ran across an interesting post recently regarding blinding EDR on Windows systems, which describes four general techniques for avoiding EDR monitoring. Looking at the techniques, I've seen several of these techniques in use on actual, real world incidents. For example, while I was with the Crowdstrike Overwatch team, we observed a threat actor reach out to determine systems with Falcon installed; of the fifteen systems queried, we knew from our records that only four were covered. We lost visibility because the threat actor moved to one of the other eleven systems. I've also seen threat actors "disappear from view" when they've used the Powershell console rather than cmd.exe, or when the threat actor has shell-based/RDP access to systems and uses a GUI-based tool. EDR telemetry includes process creation information, so we might "see" a GUI-based tool being launched but after that, no new processes are created based on whatever options the threat actor chose, or buttons they pushed, so without some other visibility (file system, Registry, network telemetry) we'd have no visibility into what they were doing.

I know this sounds pretty simplistic to some, but I really don't get the impression that it's commonly understood throughout the industry, and not just with customers.

I've previously discussed EDR bypass in this blog. Anyone who's been involved in a SOC or SOC-like role that involves EDR, including IR engagements where EDR is deployed, has seen many of the same conditions, such as incomplete deployment or roll-out of capabilities. A while back, for some products, EDR monitoring did not natively include the ability to block processes or isolate systems; this was a separate line item, if the capability was available. We see customers purchase the capability, and then complain when it didn't work...only to find out that it hadn't been deployed. I've seen infrastructures with 30,000 or 50,000 endpoints, and EDR deployed to only 200 or fewer systems.

The take-away here is that when it comes to EDR, finding a blind spot isn't really the profound challenge it's made out to be, particularly if you understand not only the technology but also the business aspect from the customer's point of view.

All that being said, what does this have to do with "rods and cones"? Okay, just bear with me for a moment. When I was an instructor in the military, we had an exercise for student Lts where we'd take them out at night and introduce them to the nuances of operating at night. Interestingly, I learned a great deal from this exercise, because when I went through the same school several years previously, we didn't have this exercise...I assume we were simply expected to develop an understanding ourselves. Anyway, the point is that the construction of the human eye means that the cones clustered around the center of the eye provide excellent acuity during daylight hours, but at night/low light levels, we don't see something when we're looking directly at it. Rather, for that object to come into "view" (relatively speaking), we need to look slightly to one side or the other so that the object "appears" to us via the rods of the eye.

What does this have to do with EDR? Well, assuming that everything else is in place (complete deployment, monitoring, etc.), if we are aware of blind spots, we need to ensure that we have "nearby" visibility. For example, in addition to process creation events, can we also monitor file system, Registry, and network events? Anyone who's dealt with this level of telemetry knows that it's a great deal of data to process, so if you're not collecting, filtering, and monitoring these events directly through EDR, do you have some other means or capability of getting this information if you need to do so? Can you retrieve/get access to the USN change journal and/or the MFT, to the Registry (on a live system or via triage retrieval), and/or to the Windows Event Logs? The fact is that while EDR does provide considerable visibility (and in a timely manner), it doesn't 'see' everything. As such, when a threat actor attempts to bypass EDR, it's likely that they're going to be visible or leave tracks through some other means which we can access via another data source.

An analogy I like to use at this point is that when walking by a pond, if someone throws a rock into the pond, we don't actually have to see them throw the rock, nor do we have see that a rock broke the surface of the pond, to understand what's happened. We can hear the splash and see ripples against the shore, and know that something happened. When it comes to monitoring endpoints, the same is true...we don't always have to 'see' an action to know that something happened, particularly something that requires a closer look or further investigation. Many times, we can observe or alert on the effect of that action, the impact of the action within the environment, and understand that additional investigation is required.

An additional thought regarding process creation is that process completion is a "blind spot" within some EDR products. SysMon has a process termination event (event ID 5), but most EDR tools only include process creation telemetry, and additional access and analysis are needed to validate whether the process executed completely, or if something occurred that prevented the process from completing normally. For example, many SOC analysts have likely seen threat actors use the Mp-Preference Powershell module to impact Windows Defender, and made statements in tickets and incident diaries to that effect, but what happens if Windows Defender is not enabled, and some other security control is used instead? Well, the command will have no effect; using the Mp-Preference module to, say, set exclusions in Defender will not result in Registry entries or the corresponding Windows Event Log records if Defender is disabled.

So, keeping in mind that for something bad to happen on a system, something has to happen, the take-away here is that EDR by-passes or blind spots are something to be understood, not feared. Many times, we can get that "good harvest from low-hanging fruit" (quote attributed to David Kleinatland), but sometimes we need a bit more. When we employ (or deploy) one tool, we need to understand both it's strengths and shortcomings, and come up with a plan to address any gaps. When considering EDR, we have to understand the impact of coverage (or lack thereof), both on the endpoint as well as across the infrastructure; we have to understand what it can and cannot 'see', how to properly configure it, and how to use it most effectively to "find bad". Sometimes, this may mean that we'll get a signal that something may be amiss, rather than something that clearly jumps out to us as "this is BAD!!!" Sometimes, we may rely on the impact of the action, rather than directly identifying the action itself.

Sunday, July 24, 2022

History Repeats Itself

It's said that those who do not study history are doomed to repeat it. I'd suggest that the adage should be extended to, "those who do not study history and learn from her lessons are doomed to repeat it."

My engagement with technology began at an early age; in the early '80s, I was programming BASIC on a very early IBM-based PC, the Timex-Sinclair 1000, and Mac IIe. By the mid-'80s, I'd programmed in BASIC and Pascal on the TRS-80. However, it wasn't until I completed my initial military training in 1990 that I began providing technology as a service to others; I was a Communications Officer in the Marine Corps, providing trained technical Marines, servicing technical assets, in support of others. I had been taught about the technology...radios, phones, switchboards, etc...and troubleshooting in school, and now I had to apply what I had learned, as well as continue to learn from the experiences of others.

My first unit was deployed to an exercise in Nov, 1990. We set up radio and phone communications as part of our plan, which included a TA-938 phone configured as a "hot line" (think the "Bat phone" from the '60s Batman serial) to another unit. The idea...and implementation...was that when someone picked up the handset on one end, the phone on the other end would ring, a red light would blink. It was set up this way, tested, and found to function as expected.

Except when it didn't. As the platoon leader, I took overnight shifts and filled in on switchboard watch, etc. One night, we got a call from the operations tent that the hotline "wasn't working", so I sent one of my young Marines over to check the phone out. When he returned, he reported that the Captain in the ops tent had repeated that the phone "wasn't working", and said that they hadn't done anything to attempt to troubleshoot or address the problem themselves. The Marine continued by saying that they'd checked the batteries...TA-938 (the TA-838, similar in form to the TA-938, is shown on the right) phones took 2 D cell batteries...and found that someone had, indeed, attempted to troubleshoot the phone. An interesting aspect of the TA-938 phone is that both batteries had to be placed in the phone with their positive terminals facing up, which is something very different from what most folks are used to. The Marine shared that they'd ensured the correct placement of the batteries, and everything seemed to be functioning correctly but when they picked up the hot line, no one on the other end answered. 

About 20 minutes later, we got another call that the hot line in the ops tent wasn't work; this time, there was more frustration in the caller's voice. So, I went over and checked out the phone...only to find that the phone had been disconnected from the wires and thrown across the tent. Fortunately, the phones are "ruggedized", meaning that they are intended to take some level of punishment. I got the phone reconnected, checked to ensure that the batteries hadn't been tampered with, and picked up the handset...only to have the line answered on the second ring. I spoke with the person on the other end, explaining the "issue" we'd experienced at our unit, and found that there was only one person manning the other end of the hot line that night, and that they'd gone for a bathroom break, stopped on their way back to get a cup of coffee, gotten into a conversation while they were out, and had only recently returned to the tent.

The lesson I learned here is that technology most often functions as designed or configured, not as expected. If we don't understand the technology, then we may have expectations of the technology that are far out of alignment with its design. As such, when those expectations aren't met, we may declare that the technology "doesn't work". As someone providing services based on technology...Communications Officer, DFIR consultant, SOC manager, etc...we often need to keep this in mind.

Another lesson I learned that served me very well when I was a DFIR consultant was the Captain's statement that he "hadn't done anything" to troubleshoot the issue. Not understanding the technology, his statement was meant to convey that he hadn't done anything of consequence, or anything that was terribly important. I've also seen this during targeted threat actor/nation-state APT response engagements, where an admin will say that they "didn't do anything", when the evidence clearly demonstrated that they deleted Registry keys, cleared Windows Event Logs, etc. Again, these actions may be part of their normal, day-to-day operations, things that they normally do as a course of business; as such, it doesn't stand out to them that their actions were of consequence.

And it's more than just lessons in technology and it's deployment. History also provides us with, among others, lessons in working with others. Some refer to this is as "leadership", but in my experience, those who talk about leadership in this way often present it as a bit one-sided.

When I was in military training, we'd have various exercises where we'd leave the school house and go out into the woods to actually do that things we'd learned. One of my favorites was patrolling. One of the students (including myself) would write up a "patrol order", specifying equipment to bring, as well as specific roles for other students. The day of the exercise, we'd take buses out to a remote area that we hadn't encountered in previous exercises, and be dropped off...only to find that some students had not brought the specified equipment. In one instance, the gear list specified that everyone bring a poncho, but only about half of the students showed up with ponchos...and there was no, "hey, give me a sec, I'll be right back." One time, the student who was supposed to be the navigator for the patrol showed up without a compass. 

About 20 yrs later, I'm on an IR engagement, one on which the customer very specifically asked me to be an advisor to his team; rather than doing all the work myself, guide his team in the approach, providing "knowledge transfer" throughout the process. At one point, I went to an administrator and asked him to copy the Event Log files from an XP system, onto a thumb drive. I was very specific about my request; I specifically stated that I did not want them to open the Event Viewer and export the contents to text. Instead, I needed him to copy the files to the thumb drive, preserving their original format. After acknowledging that they understood, I went off to complete another task, collecting the thumb drive when I returned. Later that night in my hotel room, before shutting down for the evening, I wanted to take a look at the data (follow a hunch) so I tried to access the copied files with my tools, assuming that my request had been followed; none of them worked. I opened the files in a hex editor, only to see that the administrator had exported the contents of the Event Logs to text, and renamed the output files the the ".evt" extension. 

The lesson here is that, even in the face of leadership (be it "good" or "bad"), there's an element of followership. People have a choice...if they don't understand, they can choose (or not) to ask for help. Or, they can choose to follow the request or guidance, or not. Given that, if something has a material impact on the outcome of your engagement or project, consider doing that thing yourself, or closely supervising it. Be thankful when things go right, and be especially grateful when someone involved sees an issue and shares their justification for doing something other than what was requested. However, you need to also be ready for those instances where things do not go as requested, because it will happen at some point.

Let's do more than just study history. Let's dig into it intentionally, with purpose, and develop and learn from those lessons.

Saturday, July 23, 2022

Turning Open Reporting Into Detections

I saw this tweet from Ankit recently, and as soon as I read through it, I thought I was watching "The Matrix" again. Instead of seeing the "blonde, brunette, redhead" that Cypher saw, I was seeing actionable detection opportunities and pivot points. How you choose to use them...detections in EDR telemetry or from a SIEM, threat hunts, or specifically flagging/alerting on entries in DFIR up to you, but there are some interesting...and again, actionable...opportunities, nonetheless.

From the tweet itself...

%Draft% is environment variable leading to PowerShell
Environment variables are good...because someone has to set that variable using...wait for it...w  a  i  t   f  o  r    i  t...the 'set' command. This means that if the variable is set via the command line, the process can be detected. 

Reads another registry value's base64 blob
Blobs are good...because they're usually of a particular value type (i.e., binary) and usually pretty big. I wrote the RegRipper plugin some time ago to address this exact issue, to find values of a certain size or larger.

If the blob isn't binary and is a base64-encoded string, there are a number of ways to detect strings that are base64 encoded.

What's not stated in the body of the tweet, but instead visible in one of the images is that the Run key value written for persistence has interesting data. First, the type is "REG_EXPAND_SZ", which may be a good detection opportunity. This may take some research to determine how prevalent it is in your environment, or in your customers environments, but Microsoft's documentation says that values within the Run key contain a "command line no longer than 260 characters". From this, we can assume that the value data are strings, and of type REG_SZ. For my own use, I've updated one of my RegRipper plugins to specifically look for instances where values in the Run key (HKLM or HKCU) are other than "REG_SZ" types.

Next, the command line itself has a number of unique items you can hunt on. Even if another attack changes the name of the key and value in which the data blob is stored, the command line still offers ample opportunities for detections.

If you don't have EDR telemetry available, try parsing the Microsoft-Windows-Shell-Core%4Operational Event Log, specifically event IDs 9707/9708. Or, if you're sending the data from that Windows Event Log to a SIEM, try searching on elements from within the command line.

The point of all this is that there is very often actionable info in open reporting, things that we can turn into detections via either EDR telemetry or SIEM searches, for threat hunting, or add to our automated DFIR parsing process as a means of retaining "corporate knowledge" and expanding the experience base of all analysts.

Tuesday, July 19, 2022

Fully Exploiting Data Sources

Very often, we view data sources as somewhat one dimensional, and don't think about how we can really get value from that data source. We're usually working on a case, just that investigation that's in front of us, and we're so "heads down" that we may not consider that what we see as a single data source, or an entry from that data source (artifact, indicator), is really much more useful, more valuable, than how we're used to viewing it.

So, what am I talking about? Let's consider some of the common data sources we access during investigations, and how they're accessed. Consider something that we're looking at during an investigation...say, a data source that we often say (albeit incorrectly) indicates program execution the "AppCompatCache", or "ShimCache". Let's say that we parse the AppCompatCache, and find an entry of interest, a path to a file with a name that is relevant to our investigation. Many of us will look at that entry and just think, "...that program executed at this time...". But would that statement be correct?

As with most things in life, the answer is, "it depends." For example, if you read Caching Out: The Value Of ShimCache to Investigators (Mandiant), it becomes pretty clear that the AppCompatCache is not the same on all versions of Windows. On some, an associated time stamp does indeed indicate that the file was executed, but on others, only that the file existed on the system, and not that it was explicitly executed. The time stamp associated with the entry is not (with the exception of 32-bit Windows XP) the time that the file was executed; rather, it's the last modification time from the $STANDARD_INFORMATION attribute in the MFT record for that file. To understand if that time stamp corresponds to the time that the file was executed, we need to consider artifact constellations, correlating the data point with other data sources to develop the context, to develop a better understanding of the data source (and point), and to validate our findings.

Further, we need to remember that ShimCache entries are written at shutdown; as a result, a file may exist on the system long enough to be included in the ShimCache, but a shutdown or two later, that entry will no longer be available within the data source. This can tell us something about the efforts of the threat actor or malware author (malware authors have been known to embed and launch copies of sdelete.exe), and it also tells us something about the file system at a point in time during the incident.

The point is that the data sources we rely on very often have much more value and context than we realize or acknowledge, and are often much more nuanced that we might imagine. With the ShimCache, for example, an important factor to understand is which version of Windows from which the data was retrieved...because it matters. And that's just the beginning.

I hope this is beginning to shine light on the fact that the data sources we very often rely on are actually multidimensional, have context and nuance, and have a number of attributes. For example, some artifacts (constituents of data sources) do not have an indefinite lifetime on the system, and some artifacts are more easily mutable than others. To that point, Joe Slowik wrote an excellent paper last year on Formulating a Robust Pivoting Methodology. On the top of the third page of that paper, Joe refers to IOCs as "compound objects linking multiple observations and context into a single indicator", and I have to say, that is the best, most succinct description I think I've ever seen. The same can be said for indicators found with the various data sources we access during investigations, so the question is, are we fully exploiting those data sources?

Sunday, July 17, 2022

StartupApproved\Run, pt II

On the heels of my last blog post on this topic, I had a couple of thoughts and insights that I wanted to research a bit, and then address. I wanted to take a look at ways that the StartupApproved\Run key might be impacted, so I started by grabbing the contents of that key based on what we saw from the previous post, which are illustrated in figure 1.

Fig 1: StartupApproved\Run key contents

Then, I captured the contents of the Run key, illustrated in figure 2.

Fig 2: Run key contents

As you can see in figure 2, there appears to be an entry missing, the "com.squirrel.Teams.Teams" value. We know from the previous blog post that this value was disabled on 14 Jul 2021, just over a year ago. I have no idea how that happened, as it wasn't part of an intentional test at the time, and was just a matter of me not wanting Teams to launch every time I logged in.

As part of this research effort, I deleted the OneDrive value from the Run key (see figure 2 above) via RegEdit, and rebooted the system. When I re-opened RegEdit and navigated to the Run key in my user hive, I confirmed that the OneDrive value was no longer in the Run key. However, when I navigated to the corresponding StartupApproved\Run key, I found that the corresponding OneDrive value still appeared as illustrated in figure 1. From this then, yes, it appears that if you delete a value from the Run key via RegEdit, that entry is not removed from the corresponding StartupApproved\Run key. 

For step 2 in this experiment, I added a value to the Run key via RegEdit; I created a new string value, named it "Calc", and then added the path, "C:\Windows\system32\calc.exe". I rebooted the system, logged in, and the calculator opened on my desktop...but there was no "Calc" value in the corresponding StartupApproved\Run key! 

I then removed the Calc value via RegEdit, and then typed the following command:

reg add HKCU\Software\Microsoft\Windows\CurrentVersion\Run /v Calc /t REG_SZ /d C:\windows\system32\calc.exe /f

After ensuring that the command succeeded, I looked at the contents of the Run key via RegEdit and could see the new value. However, I could not see a corresponding value in the StartupApproved\Run key!

Finally, after having removed the "calc" value from the Run key, I added it back via RegEdit, and then opened the Task Manager Startup Tab to see the "Windows Calculator" value. I then disabled the value via the Startup Tab, and verified that a "calc" value was added to the StartupApproved\Run key, as illustrated in figure 3.

Fig. 3: StartupApproved\Run key after disabling Calc value

So, the question becomes, how do entries make it into the StartupApproved\Run key? If neither the use of RegEdit nor reg.exe to add a value to the Run key explicitly lead to corresponding values being added to the StartupApproved\Run key (say, by the operating system), then how are they added? Looking back at figure 1, all of the entries in the StartupApproved\Run key were for applications that were added via an installation process, such as an MSI or a setup.exe file. Maybe this is what needs to be better understood and addressed about these keys.

Saturday, July 09, 2022

Does "Autostart" Really Mean "Autostart"?

Most DFIR and SOC analysts are familiar with the Run keys as autostart locations within the Windows Registry:


Values beneath these keys are automatically run asynchronously upon system start and user login, respectively. This is something we've know for a while, and we've dutifully incorporated these autostart locations into our "indicators of program execution" artifact category.

It turns out, that may not be the case.

Wait...what? Did I just say that a value listed in one of the aforementioned Run keys may not, in fact, be executed at system start or user login?? 

Yes...yes, I did.

Let's first start with validating that the entries themselves have been run. We know that we can parse the  Microsoft-Windows-Shell-Core%4Operational Event Log, extracting event ID 9707 and 9708 events to see when execution of the values beneath the Run (and RunOnce) keys was started, and then completed (respectively). We can use whatever tool we like to open or parse the Windows Event Log file, and the filter through it to see that, yes, on this date, at this time, these entries were, in fact, launched. That's a great way to validate our findings, based on the entries in the Run key.

It happens that there's another set of Registry keys at play:


If you navigate to the above key, you'll see that the value names within the key reflect the value names beneath the corresponding Run key, but that the data is different. Figure 1 illustrates the data from one of my systems.

Fig 1: StartupApproved\Run key values

As you see in figure 1, the value names reflect entries in the Run key, but the values all have different data. First, the binary data for each values is 12 bytes in length. If the first 4 bytes (DWORD) is 0x02 or 0x06, then the corresponding value in the Run key is enabled. If the first DWORD is 0x03, then the value is disabled, and the remaining 8 bytes (QWORD) is the FILETIME object representing the data and time that the value was disabled. Figure 2 illustrates this data parsed via RegRipper Pro:

Fig 2: RegRipper plugin output

Now, here's the kicker...what you see in figures 1 and 2 reflect what happens if the values are disabled via the Startup tab in Task Manager. If you use RegEdit or reg.exe to navigate to the Run keys, you can simply delete entries to prevent them from executing. However, if you use the Task Manager Startup tab to enumerate startup entries, and you disable entries that are found in the Run keys, you see what is illustrated in figures 1 and 2.

What's really interesting about this StartupApproved Registry key is that there's another subkey:


As you might have guessed by the name, this applies to the Startup folder in the file system. Yes, MS has added Registry key that, if there are entries in the corresponding Startup folder, there will also be a StartupFolder key what contains value names that mirror those entries. On systems where there are no entries in the Startup folder for the account, I have not found a corresponding StartupFolder key. 

Figure 3 illustrates what this looks like via RegRipper Pro.

Fig 3: Disabled StartupFolder entry, via RegRipper Pro

If you use Task Manager to re-enable the values, the data for the corresponding entry beneath the StartupApproved key is changed back to the "enabled" value (first DWORD = 0x02, remaining QWORD all 0s).

At this point, the takeaway is, rather that just checking the Run key, correlate the entries the value data within the StartupApproved\Run key, and validate both via the corresponding event IDs in the Microsoft-Windows-Shell-Core%4Operational Event Log.

But wait, there's more!

If the entries in the Run key were instead disabled via Autoruns, something entirely different happens. If a value is disabled beneath the Run key, then when Autoruns is closed, an AutorunsDisabled subkey is created beneath the Run key, and the disabled value is moved to that subkey. If an entry beneath the Startup folder is disabled via Autoruns, an AutorunsDisable subdirectory is created beneath the Startup folder, and the entry (shortcut, executable, batch file, etc.) is moved to that subdirectory. 

So What?
Why does any of this matter? Well, first off, it means that if we assume that just because there's an entry in a Run key, that the application pointed to was actually executed, we may be wrong. If anything, this reinforces the need to validate and correctly understand findings and artifacts.

How could this be used? Well, one thought is that a threat actor could create a distraction; by putting an entry in a Run key and then disabling it in the corresponding StartupApproved\Run key, admins and analysts might be caught unaware and pursue a line of analysis that actually takes them down a rabbit hole, and cause them not to investigate the real issue.

Think that's a bit far fetched? I've seen it happen.

Friday, July 01, 2022

Distros and RegRipper, pt deux

Now and again I pop my head up and take a look around to see where RegRipper has been, and is being, used. My last blog post on this topic had quite a few listings, but sometimes changing the search terms reveals something new, or someone else has decided to use RegRipper since the last time I looked.

References to RegRipper go way back, almost as far as RegRipper itself (circa 2008):
SANS blog (2009)
SANS blog (2010)
SANS Infosec Handler's Diary blog (2012)
Kali Tools (RR v2.5)
SANS Blog, Mass Triage, pt 4 (2019)

The latest commercial forensics platform that I've found that employs RegRipper is Paraben E3. I recently took a look at the evaluation version, and found "" (RegRipper v3.0 with modifications) in the C:\Program Files\Paraben Corporation\Electronic Evidence Examiner\PerlSmartAnalyzer folder, along with the "plugins" subfolder.

You can see the Registry parsing in action and how it's incorporated into the platform at the Paraben YouTube Channel:
AppCompatCache parsing
Reviewing Data from AmCache

Reviewing the videos, there's something very familiar about the output illustrated on-screen. ;-)

Other Resources (that incorporate RegRipper)
YouTube video by Ric Messier
CAINE forensics video
PacktPub Subscription
LIFARS Whitepaper on Shellbags
Windows Registry Forensics, 1/e (PDF)
Paradigm Solutions blog
Jason Shaver's NPS thesis (2015)

That's just one more step toward world domination! This is where I tent my fingers and say "Excellent" like Mr. Burns!

PS: While I was looking around recently, I saw something I hadn't seen before...early in Jan, 2020, an issue with the Parse::Win32Registry module parsing 64-bit time stamps was identified. I'd updated the module code, recompiled the EXEs, and put them up on Github. 

I found recently that James, the author of the module, had updated it in Sept, 2020. That's great, but there are a few other tweaks I'd made to the code, one that allowed me to check to see if hives are 'dirty'. 

Thursday, May 26, 2022

USB Device Redux, with Timelines

If you ask DFIR analysts, "What is best in life?", the answer you should hear is, "...creating timelines!" After all, industry luminaries such as Andrew said, "Time is the most important thing in life, and timelines are one of the most useful tools for investigation and analysis.", and Chris said, "The timeline is the central concept of all investigative work."

My previous blog post addressed USB-connected devices, but only from the perspective of Windows Event Logs. In this blog post, I wanted to include data from the Registry, incorporated in a timeline so that the various data sources could be viewed through a common lens, in a single pane of glass. 

I stated by using wevtutil.exe to export current copies of the five Windows Event Logs to a central location. I then used reg.exe to do the same thing for the System hive. I then used my timeline process (outlined in several of my books) to create the events file from the six data sources; I used wevtx.bat to parse the Windows Event Logs, and three newly created RegRipper Pro plugins to parse the relevant data from the System hive. The specific keys, values and data parsed from the hive were based largely on Yogesh's blog post, and this academic paper posted at the ResearchGate site. I created the initial plugins, and then modified them to display TLN-format output, for inclusion in timelines.

For this research, there where three specific devices I was interested iPod, my iPhone, and a SanDisk Cruzer USB thumb drive. After creating the overall events file, I used the "type" and "find" commands to look for events associated specifically with those devices, isolated each into their own individual "overlay" events file, and then created timelines from each of those events files. This approach makes it easy to "see" what's going on and create artifact constellations, as I don't have to filter out "noise" associated with other events, and I still have the overall events file that I refer to. 

What I'm sharing below are partial timelines of events, just enough to demonstrate events based on intentionally limited data sources, so that initial artifact constellations can be developed. From this point, the constellations can be built out; for example, accessing files the SanDisk Cruzer will produce Windows shortcut files pointing to files on the "E:\" volume. Again, these timeline overlays are not complete, but are intended to demonstrate Registry artifacts associated with USB-connected devices alongside Windows Event Log artifacts.

A while back, I inserted my iPod into my computer in order to retrieve music files, via iTunes, so that I could transfer them to my iPhone. I didn't think much about it at the time, but the connection was clearly "remembered" by Windows 10, specifically via the Registry.

Here are the events around the insertion:

Sun Jan  2 19:41:21 2022 Z
  REG                        - First Inserted - Apple iPod [6&3091e96e&0&0000]
  REG                        - First Install - Apple iPod [6&3091e96e&0&0000]
  EVTX     Stewie     - Microsoft-Windows-WPD-MTPClassDriver/1005;Apple Inc.,Apple iPod,4.3.5,40
  REG                        - Last Inserted - Apple iPod [6&3091e96e&0&0000]

Sun Jan  2 19:41:15 2022 Z
  EVTX     Stewie            - Microsoft-Windows-DeviceSetupManager/112;iPod,{fc916355-34ea-555c-9e24-3c59f6125097},2,42,11

And here are the events around the removal of the device from the computer, a little more than 14 minutes later:

Sun Jan  2 19:55:46 2022 Z
  REG                        - Last Removal - Apple iPod [6&3091e96e&0&0000]

The completed message string for the "Microsoft-Windows-DeviceSetupManager/112" event above is:

Device 'Apple iPod' ({fc916355-34ea-555c-9e24-3c59f6125097}) has been serviced, processed 6 tasks, wrote 34 properties, active worktime was 11748 milliseconds.

I state this specifically because following the "Last Removal" event on 2 Jan 2022, the timeline contains an additional 9 events from 6 Jan to 22 May, all for the same "Microsoft-Windows-DeviceSetupManager/112" event records for the iPod, but the last three string entries are different. In every case, only 1 task is run, and the active worktime runs from 0 to 31 milliseconds. I know that the iPod was not plugged in during these times, and as such, this seems to be an artifact the installation process.

I have connected my iPhone to this Windows 10 system via a USB cable, to transfer pictures from it, and to transfer music files to it, via iTunes. Here was see one such connection on 7 May 2022:

Sat May  7 14:16:35 2022 Z
  REG                        - Last Removal - @oem119.inf,iphone.appleusb.devicedesc%;Apple Mobile Device USB Composite Device [00008030000E6C6C11DA802E]
  REG                        - Last Removal - Apple iPhone [6&139bb8e1&1&0000]

Sat May  7 14:14:57 2022 Z
  EVTX     Stewie            - Microsoft-Windows-WPD-MTPClassDriver/1005;Apple Inc.,Apple iPhone,15.4.1,40
  EVTX     Stewie            - Microsoft-Windows-DeviceSetupManager/112;Apple iPhone,{7e8068a1-2d62-53fb-8285-a12072dfa871},4,34,296

Sat May  7 14:14:56 2022 Z
  REG                        - Last Inserted - Apple iPhone [6&139bb8e1&1&0000]
  REG                        - Last Inserted - @oem119.inf,iphone.appleusb.devicedesc%;Apple Mobile Device USB Composite Device [00008030000E6C6C11DA802E]

There's information later in the timeline regarding another connection to the system, this time to copy pictures off of the iPhone. The "Last Inserted" and "Last Removal" events are from a different Registry key as seen above, as noted by the serial number in brackets at the end of the "event".

Fri Apr 15 16:23:13 2022 Z
  REG                        - Last Removal - @oem119.inf,iphone.appleusbmux.devicedesc%;Apple Mobile Device USB Device [6&139bb8e1&1&0001]


Fri Apr 15 16:19:02 2022 Z
  EVTX     Stewie            - Microsoft-Windows-WPD-MTPClassDriver/1005;Apple Inc.,Apple iPhone,15.4.1,40

Fri Apr 15 16:18:57 2022 Z
  EVTX     Stewie            - Microsoft-Windows-DeviceSetupManager/112;Apple iPhone,{7e8068a1-2d62-53fb-8285-a12072dfa871},4,34,140
  REG                        - Last Inserted - @oem119.inf,iphone.appleusbmux.devicedesc%;Apple Mobile Device USB Device [6&139bb8e1&1&0001]

The artifact constellation for the SanDisk Cruzer thumb drive is a bit different from that of the iPhone and the iPod. In this case, the events around the last time the device was inserted and then removed from the computer is less than a minute...

Mon May 16 22:07:08 2022 Z
  EVTX     Stewie            - Microsoft-Windows-Partition/1006;1,8208,262401,false,0,0,0,0,0,7,SanDisk,Cruzer,8.02,2443931D6C0226E3,...
  REG                        - Last Removal - SanDisk Cruzer USB Device
  REG                        - Last Removal - Cruzer   [E:\]

Mon May 16 22:06:26 2022 Z
  EVTX     Stewie            - Microsoft-Windows-Ntfs/145;3,{1e09345e-d3d4-11e8-92fd-1c4d704c6039},2,E:,false,0,{fab772f6-83e6-5d5f-1086-740d39e45bff},8,SanDisk ,16,Cruzer ...
  EVTX     Stewie            - Microsoft-Windows-Partition/1006;1,8208,262401,false,0,0,0,512,8036285952,7,SanDisk,Cruzer,8.02,2443931D6C0226E3,Integrated : ...

Mon May 16 22:06:24 2022 Z
  EVTX     Stewie            - Microsoft-Windows-Partition/1006;1,8208,262401,false,0,0,0,0,0,7,SanDisk,Cruzer,8.02,2443931D6C0226E3,...
  EVTX     Stewie            - Microsoft-Windows-DeviceSetupManager/112;Cruzer,{81fa6fcf-bfc9-5887-bdbc-2cffb6be0b29},4,34,281
  REG                        - Last Inserted - Cruzer    [E:\]
  REG                        - Last Inserted - SanDisk Cruzer USB Device

Note that several of the events, particularly those from the Partition/Diagnostic Event Log, are shortened here for readability.

Each of the above three devices appears in the Registry, specifically in the System hive, sometimes in multiple locations. For example, the SanDisk Cruzer thumb drive appears in both the USBStor and WPDBUSENUM subkeys.

From the USBStor key:
    DeviceDesc     : @disk.inf,%disk_devdesc%;Disk drive
    Mfg            : @disk.inf,%genmanufacturer%;(Standard disk drives)
    Service        : disk                          
    FriendlyName   : SanDisk Cruzer USB Device     
    First Install  : 2021-09-09 17:37:15Z     
    First Inserted : 2021-09-09 17:37:15Z     
    Last Inserted  : 2022-05-16 22:06:24Z     
    Last Removal   : 2022-05-16 22:07:08Z     

From the WPDBUSENUM key:
    DeviceDesc     : Cruzer                        
    FriendlyName   : E:\                           
    First Install  : 2021-09-09 17:37:17Z     
    First Inserted : 2021-09-09 17:37:17Z     
    Last Inserted  : 2022-05-16 22:06:24Z     
    Last Removal   : 2022-05-16 22:07:08Z

The Apple devices appear beneath the USB key, based on the vendor ID:
    DeviceDesc     : @oem119.inf,%iphone.appleusb.devicedesc%;Apple Mobile Device USB Composite Device
    Mfg            : @oem119.inf,%aapl%;Apple, Inc.
    Service        : usbccgp                       
    FriendlyName   : @oem119.inf,%iPhone.AppleUSB.DeviceDesc%;Apple Mobile Device USB Composite Device
    First Install  : 2022-01-02 19:41:16Z     
    First Inserted : 2022-01-02 19:41:15Z     
    Last Inserted  : 2022-01-02 19:41:15Z     
    Last Removal   : 2022-01-02 19:55:46Z     

    DeviceDesc     : Apple iPod                    
    Mfg            : Apple Inc.                    
    Service        : WUDFWpdMtp                    
    FriendlyName   : Apple iPod                    
    First Install  : 2022-01-02 19:41:21Z     
    First Inserted : 2022-01-02 19:41:21Z     
    Last Inserted  : 2022-01-02 19:41:21Z     
    Last Removal   : 2022-01-02 19:55:46Z     

    DeviceDesc     : @oem119.inf,%iphone.appleusbmux.devicedesc%;Apple Mobile Device USB Device
    Mfg            : @oem119.inf,%aapl%;Apple, Inc.
    Service        : WINUSB                        
    FriendlyName   : @oem119.inf,%iPhone.AppleUsbMux.DeviceDesc%;Apple Mobile Device USB Device
    First Install  : 2022-01-02 19:41:16Z     
    First Inserted : 2022-01-02 19:41:16Z     
    Last Inserted  : 2022-01-02 19:41:16Z     
    Last Removal   : 2022-01-02 19:55:46Z     

    DeviceDesc     : @oem119.inf,%iphone.appleusb.devicedesc%;Apple Mobile Device USB Composite Device
    Mfg            : @oem119.inf,%aapl%;Apple, Inc.
    Service        : usbccgp                       
    FriendlyName   : @oem119.inf,%iPhone.AppleUSB.DeviceDesc%;Apple Mobile Device USB Composite Device
    First Install  : 2022-01-02 19:56:40Z     
    First Inserted : 2022-01-02 19:56:40Z     
    Last Inserted  : 2022-05-07 14:14:56Z     
    Last Removal   : 2022-05-07 14:16:35Z     

    DeviceDesc     : Apple iPhone                  
    Mfg            : Apple Inc.                    
    Service        : WUDFWpdMtp                    
    FriendlyName   : Apple iPhone                  
    First Install  : 2022-01-02 19:56:46Z     
    First Inserted : 2022-01-02 19:56:46Z     
    Last Inserted  : 2022-05-07 14:14:56Z     
    Last Removal   : 2022-05-07 14:16:35Z     

Additional Resources
Note that per Yogesh's blog post, the "Microsoft-Windows-Kernel-PnP/Device Configuration" Event Log may also contain information about the connected devices.

One More Thing
While I was doing some research for this blog post, I ran across this entry for event ID 112, albeit from the Microsoft-Window-TaskScheduler/Operational" Event Log. Once again, please stop referring to event records solely by their ID, and start including the event source, as well.  

Tuesday, May 17, 2022

USB Devices Redux

Back in 2005, Cory Altheide and I published the first paper on tracking USB storage devices across Windows systems; at the time, the focus was Windows XP. A lot has happened since then...I know, that's an the Windows platform has developed and expanded, initially with Vista, then Windows 7, and even with Windows 10 there have been developments that have come (and gone) just between the various Win10 builds.

With respect to USB devices in particular, not long ago, we (the community) became aware that the Microsoft-Windows-DriverFrameworks-UserMode/Operational Event Log contained quite a bit of information (see this post for event IDs to track) that a digital forensic analyst could use to determine if and when USB devices had been connected to (and disconnected from) the system. This was a pretty profound finding, and very valuable...and then, for some unknown reason, that Windows Event Log was disabled by default. 

Also, in researching information for this topic, I found that the EMDMgmt key in the Software hive, which is associated with ReadyBoost and provided insight into USB-connected devices, is no longer available either. Okay, so one less artifact, one artifact removed from the we just need to adapt.

This is really nothing new, to be honest. DFIR analysts need to be adaptable, regardless of whether we're in a consultant or FTE role. If you're a consultant, you're going to see a new environment on every engagement, and there will be new things to deal with and discover. A while back, a teammate discovered that the customer had LANDesk installed on their systems, and the software monitoring component recorded a great deal of information regarding executed processes right there in the Registry. It's not too different in an internal, FTE role, as you're likely going to run across legacy builds, software loads that haven't been/can't be updated for some reason, applications or packages users have installed specifically to meet the needs of their department or of a customer, etc. Further, we've seen various resources come and go; for example, we got used to having Registry hive backups available during investigations, and then we lost access to them; they're no longer available by default. In addition to the Microsoft-Windows-DriverFrameworks-UserMode/Operational Event Log, the Microsoft-Windows-TaskScheduler/Operational Event Log seems to be disabled by default. As such, when we find that a data source or artifact that we're familiar with and used to using is no longer available, we have to invest in determining an alternate means for determining the same or similar information; we have to rebuild those artifact constellations. 

Something else that has developed over time alongside the development of the Windows platform is how USB devices are treated. For example, Nicole Ibrahim described some time ago how some USB connected devices, specifically smartphones, are treated differently based on the protocol used. Nicole's presentation on the topic can be found here.

The overall point is that we can no longer consider all USB-connected devices to be the same, and as such, we may need to look in different locations within the OS, including different locations within the Registry and within different Windows Event Logs, to find the information pursuant to our analysis goals. Pursuant to this, I sat down with my own local system and started going through the Windows Event Logs, via the Event Viewer, one at a time, looking for indications of connected devices. What I found was that records of connections were dispersed across multiple logs, depending upon the type of device connected (i.e., smartphone/digital camera, ext HDD w/ enclosure, thumb drive, etc.).

As a caveat, these event records are not exclusive; that is to say that the individual event source/ID pairs do not pertain solely to USB connected devices. In many cases, the same event source/ID pair was found to contain information specific to the local physical hard drive, as well as to the different volumes on that hard drive. Also, all of these events are for the latest build of Windows 10 only, because that's all I have to test against.

So here's a look at the five resources I found; keep in mind this is limited based on what I have available to test with, but it should serve as a good starting point...

Event Source: WPD-MTPClassDriver
Event ID: 1005

Fig 1: Event ID 1005

Figure 1 illustrates where I'd connected my iPhone to the system to pull pictures; I also have entries for my iPod (yes, I still have an iPod...) where I wanted to transfer music to my iPhone. Due to the protocol used, this is also where we'd likely find digital cameras, as well.

Event Source: DeviceSetupManager
Event ID: 112

Fig 2: Event ID 112

Figure 2 shows where I'd connected a SanDisk Cruzer thumb drive to the computer.

Event Source: StorageSpaces-Driver
Event ID: 207

Fig 3: Event ID 207

I shared figure 3 because this is specifically an external HDD, one of those "wallet" drives with a small form factor/enclosure. This device is small enough that it's powered from the USB connection; I don't have a larger enclosure that requires an additional power source to test against.

Event Source: Partition
Event ID: 1006

Fig 4: Event ID 1006, XML view

Figure 4 is a bit different, because the "friendly view" simply says, "for internal use only". However, taking a look at the XML view, we see that the SanDisk Cruzer thumb drive and it's serial number pops right out!

Event Source: Ntfs
Event ID: 145, 142

Fig 5: Event ID 145

Figure 5 shows the Ntfs/145 event record (in part) from a previous connection of the SanDisk Cruzer thumb drive. Event ID 142 provides additional information, including regarding the volume assignment (C:, D:, F:, etc.), if the volume is bootable, etc., which can be used to tie to shellbags and Windows Portable Devices artifacts in the Registry.

From a forensic perspective, if you're interested in tracking USB devices connected to systems, I'd recommend enabling the Microsoft-Windows-DriverFrameworks-UserMode/Operational Event Log, forwarding those event records off of the system (for processing via a SIEM), as well as threat hunting or some other mechanism to ensure that if the log is disabled again that this is detected and responded to in the appropriate manner.

Note that enabling the Microsoft-Windows-DriverFrameworks-UserMode/Operational Event Log is as straightforward as setting the "Enabled" value in the HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\WINEVT\Channels\Microsoft-Windows-DriverFrameworks-UserMode/Operational key to "1" and rebooting the system. As a tip to DFIR analysts who need to perform verification and to threat hunters, the reverse technique (setting the "Enabled" value to "0") can be used disable normally-available Windows Event Logs.

Additional Resources
HEFC Blog - Sunday Funday, Daily Blog #197

Friday, May 13, 2022

Understanding Data Sources and File Formats

Following on the heels of my previous post regarding file formats and sharing the link to the post on LinkedIn, I had some additional thoughts that would benefit greatly from not blasting those thoughts out as comments to the original post, but instead editing and refining them via this medium.

My first thought was, is it necessary for every analyst to have deep, intimate knowledge of file formats? The answer to that is a resounding "no", because it's simply not possible, and not scalable. There are too many possible file formats for analysts to be familiar with; however, if a few knowledgeable analysts, ones who understand the value of the file format information to DFIR, CTI, etc., document the information and are available to act as resources, then that should suffice. With the format and it's value documented and reviewed/updated regularly, this should act as a powerful resource.

What is necessary is that analysts be familiar enough with data sources and file formats to understand when something is amiss, or when something is not presented by the tools, and know enough recognize that fact. From there, simple troubleshooting steps can allow the analyst to develop a thoughtful, reasoned question, and to seek guidance. 

So, why is understanding file formats important for DFIR analysts?

1. Parsing
First, you need to understand how to effectively parse the data. Again, not every analyst needs to be an expert in all file formats - that's simply impossible. But, if you're working on Windows systems, understanding file formats such as the MFT and USN change journal, and how they can be tied together, is important. In fact, it can be critical to correctly answering analysis questions. Many analysts parse these two file separately, but David and Matt's TriForce tool allowed these files (and the $LogFile) to be automatically correlated.

So, do you need to be able to write an OLE parser when so many others already exist? No, not at all. However, we should have enough of an understanding to know that certain tools allow us to parse certain types of files, albeit only to a certain level. We should also have enough understanding of the file format to recognize that not all files that follow the format are necessarily going to have the same content. I know, that sounds somewhat rudimentary, but there are lot of new folks in the DFIR industry who don't have previous experience with older, perhaps less observed file formats. 

Having an understanding of the format also allows us to ask better questions, particularly when it comes to troubleshooting an issue with the parser. Was something missed because the parser did not address or handle something, or is it because our "understanding" of the file format is actually a "misunderstanding"?

2. The "Other" Data
Second, understanding file formats provides insight into what other data the file format may contain, such as deleted data, "slack", and metadata. Several file formats emulate file systems; MS describes OLE files as "a file system within a file", and Registry hives are described as a "hierarchal database". Each of these file formats has their own method for addressing deleted content, as well as managing "slack". Further, many file formats maintain metadata that can be used for a variety of purposes. In 2002, an MSWord document contained metadata that came back to haunt Tony Blair's administration. More recently, Registry hive files were found to contain metadata that identified hives as "dirty", prompting further actions from DFIR analysts. Understanding what metadata may be available is also valuable when that metadata is not present, as observed recently in "weaponized" LNK files delivered by Emotet threat actors, in a change of TTPs.

3. Carving
Third, "file carving" has been an important topic since the early days of forensic analysis, and analysts have been asked to recover deleted files from a number of file systems. Recovering, or "carving" deleted files can be arduous and error prone, and if you understand file formats, it may be much more fruitful to carve for file records, rather than the entire file. For example, understanding the file and record structure of Windows 2000 and XP Event Logs (.evt files) allowed records to be recovered from memory and unallocated space, where carving for entire files would yield limited results, if any. In fact, understanding the record structure allowed for complete records to be extracted from "unallocated space" within the .evt files themselves. I used the successfully where an Event Log file header stated that there were 20 records in the file, but I was able to extract 22 complete records from the file.

Even today, the same holds true for other types of "records", including "records" such as Registry key and value nodes, etc. Rather than looking for file headers and then grabbing the subsequent X number of bytes, we can instead look for the smaller records. I've used this approach to extract deleted keys and values from the unallocated space within a Registry hive file, and the same technique can be used for other data sources, as well.

4. What's "In" The File
Finally, understanding the file format will help understand what should and should not be resident in the file. One example I like to look back on occurred during a PCI forensic investigation; an analyst on our team ran our process for searching for credit card numbers (CCNs) and stated in their draft report that CCNs were found "in" a Registry hive file. As this is not something we'd seen previously, this peaked our curiosity, and some of use wanted to take a closer look. It turned out that what had happened was this...the threat actor had compromised the system, and run their process for locating CCNs. At the time, the malware used would (a) dump process memory from the back office server process that managed CCN authorization and processing to a file, (b) parse the process memory dump with a 'compiled' Perl script that included 7 regex's to locate CCNs, and then (c) write the potential CCNs to a text file. The threat actor then compressed, encrypted, and exfiltrated the output file, deleting the original text file. This deleted text file then became part of unallocated space within the file system, and the sectors that comprised the file were available for reallocation. 

Later, as the Registry hive file "grew" and new data was added, sectors from the file system were added to the logical structure of the file, and some of those sectors were from the deleted text file. So, while the CCNs were found "in" the logical file structure, they were not actually part of the Registry. The CCN search process we used at the time returned the "hit" as well as the offset within the file; a visual inspection of the file via a hex editor illustrated that the CCNs were not part of the Registry structure, as they were not found to be associated with any key or value nodes.

As such, what at first looked like a new threat actor TTP was really just how the file system worked, which had a significant impact on the message that was delivered to Visa, who "ran" the PCI Council at the time.