Windows Incident Response

Thursday, April 27, 2023

Program Execution

By now, I hope you've had a chance to read and consider the posts I've written discussing the need for validation of findings (third one here). Part of the reason for this series was a pervasive over-reliance on single artifacts as a source of findings that I and others have seen within the community over the past 2+ decades. One of the most often repeated examples of this is relying on ShimCache or AmCache artifacts as evidence of program execution.

ShimCache
ShimCache, or AppCompatCache (the name of the Registry value where the data is found) is often looked to as evidence of program execution when really what it demonstrates is that the file was on the system.

From this blog post from Mandiant:

It is important to understand there may be entries in the Shimcache that were not actually executed.

There you go. That's from 2015. And this is why we need to incorporate artifacts such as the ShimCache into an overall constellation, rather then viewing artifacts such as these in isolation. This 13Cubed video provides a clear explanation regarding the various aspects of the ShimCache artifact as it relates to Windows 10; note that the title of the video includes "the most misunderstood artifact".

AmCache
AmCache is another one of those artifacts that is often offered up as "evidence of program execution", as seen in this LinkedIn post. However, the first referenced URL in that post belies the fact that this artifact is "evidence of program execution", as well as other statements in the post (i.e., that AmCache is "populated after system shutdown"). From the blog post:

During these tests, it was found that the Amcache hive may have artifacts for executables that weren’t executed at all.

A bit more extensive treatment of the AmCache artifact can be found here. While you may look at the PDF and think, "TL;DR", the short version is that an entry in the AmCache does not explicitly mean, by itself, that the file was executed.

The point is that research demonstrates that, much like the ShimCache artifact, we cannot simply look at an entry and state, "oh, that is evidence of program execution". Even if you don't want to take the time reach and digest either the blog post or the PDF, simply understand that by itself, an AmCache entry does not demonstrate evidence of program execution.

So, again...let's all agree to stop looking just to ShimCache or just to AmCache as evidence of program execution, and instead look to multiple data sources and to artifact constellations to establish whether a program was executed or not.

For some insight as to how ShimCache and AmCache can be used together, check out this blog post from WithSecure.

Keep in mind that even when combining these two artifacts, it still doesn't provide clear indications that the identified executable was launched, and successfully executed. We need to seek other artifacts (Windows Event Log, Registry, etc.) to determine this aspect of the executable.

PCA
Earlier this year, AboutDFIR.com published a blog post regarding a new artifact (new to Windows 11) that appears to demonstrate evidence of program execution. Much like other artifacts (see above), this one has nuances or conditions, in that you cannot look to it to demonstrate execution of all programs; rather this one seems to apply to either GUI programs, or CLI program launched via a GUI. This is important to remember, whether you see an application of interest listed in one of the artifacts, or if you don't...context matters.

The blog post provides insight into the artifacts, as well as images of the artifacts, and samples you can download and examine yourself. This YouTube video mentions another associated artifact; specifically, Windows Event Log records of interest. Adding the artifacts to a timeline is a pretty trivial exercise; the text-based artifacts are easy to script, and the process for adding Windows Event Logs to a timeline is something that already exists.

Monday, April 24, 2023

New Events Ripper Plugins

I recently released four new Events Ripper plugins, mssql.pl, scm7000.pl, scm7024.pl and apppopup26.pl.

The mssql.pl plugin primarily looks for MS SQL failed login events in the Application Event Log. I'd engaged in a response where we were able to validate the failed login attempts first in the MS SQL error logs, but then I learned that the events are also listed in the Windows Event Log, specifically the Application Event Log, and I wanted to provide that insight to the analyst.

The plugin lists the usernames attempted and the frequency of each, as well as the source IP address of the login attempts and their frequency. In one instance, we saw almost 35000 failed login attempts, from 4 public IP addresses, three of which were all from the same class C subnet. This not only tells a great deal about the endpoint itself, but also provides significant information that the analyst can use immediately, as well as leverage as pivot points into the timeline. The plugin does not yet list successful MS SQL logins because, by default, that data isn't recorded, and I haven't actually seen such a record.

The plugin also looks for event records indicating settings changes, and lists the settings that changed. Of specific interest is the use of the xp_cmdshell stored procedure.

So, why does this matter? Not long ago, AhnLab published an article stating that they'd observed attacks against MS SQL servers resulting in the deployment of Trigona ransomware.

The scm7000.pl plugin locates "Service Control Manager/7000" event records, indicating that a Windows service failed to start. This is extremely important when it comes to validation of findings; just because something (i.e., something malicious) is listed as a Windows service does not mean that it launches and runs every time the endpoint is restarted. This is just as important to understand, alongside Windows Error Reporting events, AV events, application crash events, etc. This is why we cannot treat individual events or artifacts in isolation; events are in reality composite objects, and provide (and benefit from) context from "nearby" events.

The scm7024.pl plugin looks for "Service Control Manager/7024" records in the System Event Log, which indicate that a service terminated.

The apppopup26.pl plugin looks for "Application Popup/26" event records in the Application Event Log, and lists the affected applications, providing quick access to pivot points for analysis. If an application of interest to your investigation is listed, the simplest thing to do is pivot into the timeline to see what other events occurred "near" the event in question. Similar to other plugins, this one can provide indications of applications that may have been on the system at one point, and may have been removed.

Events Ripper has so far proven to be an extremely powerful and valuable tool, at least to me. I "see" something, document it, add context, analysis tips, reference, etc., and it becomes part of an automated process. Sharing these plugins means that other analysts can benefit from my experiences, without having to have ever seen these events before.

The tool is described here, with usage information available here, as well as via the command line.

On Validation, pt III

From the first two articles (here, and here) on this topic arises the obvious question...so what? Not
validating findings has worked well for many, to the point that the lack of validation is not recognized. After all, who notices that findings were not verified? The peer review process? The manager? The customer? Given just the fact how pervasive training materials and processes are that focus solely on single artifacts in isolation should give us a clear understanding that validating findings is not a common practice. That is, if the need for validation is not pervasive in our industry literature, and if someone isn't asking the question, "...but how do you know?", then what leads us to assume that validation is part of what we do?

Consider a statement often seen in ransomware investigation/response reports up until about November 2019; that statement was some version of "...no evidence of data exfiltration was observed...". However, did anyone ask, "...what did you look at?" Was this finding (i.e., "...no evidence of...") validated by examining data sources that would definitely indicate data exfiltration, such as web server logs, or the BITS Client Event Log? Or how about indirect sources, such as unusual processes making outbound network connections? Understanding how findings were validated is not about assigning blame; rather, it's about truly understanding the efficacy of controls, as well as risk. If findings such as "...data was not exfiltrated..." are not validated, what happens when we find out later that it was? More importantly, if you don't understand what was examined, how can you address issues to ensure that these findings can be validated in the future?

When we ask the question, "...how do you know?", the next question might be, "...what is the cost of validation?" And at the same time, we have to consider, "...what is the cost of not validating findings?"

The Cost of Validation

In the previous blog posts, I presented "case studies" or examples of things that should be considered in order to validate findings, particular in the second article. When considering the 'cost' of validation, what we're asking is, why aren't these steps performed, and what's preventing the analyst from taking the steps necessary to validate the findings?

For example, why would an analyst see a Run key value and not take the steps to validate that it actually executed, including determining if that Run key value was disabled? Or parse the Shell-Core Event Log and perhaps see how many times it may have executed? Or parse the Application Event Log to determine if an attempt to execute the program pointed to resulted in an application crash? In short, why simply state that program execution occurred based on nothing more than observing the Run key value contents?

Is it because taking those steps is "too expensive" in terms of time or effort, and would negatively impact SLAs, either explicit or self-inflicted? Does it take too long do so, so much so that the ticket or report would not be issued in what's considered a "timely" manner?

Could you issue the ticket or report in order to meet SLAs, make every attempt to validate your findings, and then issue an updated ticket when you have the information you need?

The Cost of Not Validating
In our industry, an analyst producing a ticket or report based on their analysis is very often well abstracted from the final effects, based on decisions made and resources deployed due to their findings. What this means is that whether in an internal/FTE or consulting role, the SOC or DFIR analyst may not ever know the final disposition of an incident and how that was impacted by their findings. That analyst will likely never see the meeting where someone decides either to do nothing, or to deploy a significant staff presence over a holiday weekend.

Let's consider case study #1 again, the PCI case referenced in the first post. Given that it was a PCI case, it's likely that the bank notified the merchant that they were identified as part of a common point of purchase (CPP) investigation, and required a PCI forensic investigation. The analyst reported their findings, identifying the "window of compromise" as four years, rather than the three weeks it should have been. Many merchants have an idea of the number of transactions they send to the brands on a regular basis...for smaller merchants, it may be a month, and for larger vendors, a week. They also have a sense of the "rhythm" of credit card transactions; some merchants have more transactions during the week and fewer on the weekends. The point is that when the PCI Council needed to decide on a fine, they take the "window of compromise" into account.

During another incident in the financial sector, a false positive was not validated, and was reported as a true positive. This led to the domain controller being isolated, which ultimately triggered a regulatory investigation.

Consider this...what happens when you tell a customer, "OMGZ!! You have this APT Umpty-Fratz malware running as a Windows service on your domain controller!!", only to later find out that every time the endpoint is restarted, the service failed to start (based on "Service Control Manager/7000" events, or Windows Error Reporting events, application crashes, etc.)? The first message to go out sounds really, REALLY bad, but the validated finding says, "yes, you were compromised, and yes, you do need a DFIR investigation to determine the root cause, but for the moment, it doesn't appear that the persistence mechanism worked."

Conclusion
So, what's the deal? Are you validating findings? What say you?

Sunday, April 16, 2023

On Validation, pt II

My first post on this topic didn't result in a great deal of engagement, but that's okay. I wrote the first post with part II already loaded in the chamber, and I'm going to continue with this topic because, IMHO, it's immensely important.

I've see more times than I care to count findings and reports going out the door without validation. I saw an analyst declare attribution in the customer's parking lot, as the team was going on-site, only to be proven wrong and the customer opting to continue the response with another team. Engagements such as this are costly to the consulting team through brand damage and lost revenue, as well as costly to the impacted organization, through delays and additional expenses to reach containment and remediation, all while a threat actor is active on their network.

When I sat down to write the first post, I had a couple more case studies lined up, so here they are...

Case Study #3
Analysts were investigating incidents within an organization, and as part of the response, they were collecting memory dumps from Windows endpoints. They had some information going into the investigations regarding C2 IP addresses, based on work done by other analysts as part of the escalation process, as well as from intel sources and open reporting, so they ran ASCII string searches for the IP addresses against the raw memory dumps. Not getting any hits, declared in the tickets that there was no evidence of C2 connections.

What was missing from this was the fact that IP addresses are not employed by the operating system and applications as ASCII strings. Yes, you may see an IP address in a string that starts with "HTTP://" or "HTTPS://", but by the time the operating system translates and ingests the IP address for use, it's converted to 4 bytes, and as part of a structure. Tools like Volatility provide the capability to search for certain types of structures that include IP addresses, and bulk_extractor searches for other types of structures, with the end result being a *.pcap file.

In this case, as is often the case, analyst findings are part of an overall corporate-wide process, a process that includes further, follow-on findings such as "control efficacy", identifying the effectiveness of various controls and solutions within the security tech stack to address situations (prevent, detect, respond to) incidents, and simply stating in the ticket that "no evidence of communication with the C2 IP address was found" is potentially incorrect, in addition to not addressing how this was determined. If no evidence of communications from the endpoint was found, then is there any reason to submit a block for the IP address on the firewall? Is there any reason to investigate further to determine if a prevention or detection control failed?

In the book Investigating Windows Systems, one of the case studies involves both an image and a memory dump, where evidence of connections to an IP address were found in the memory dump that were not found in application logs within the image, using the tools mentioned above. What this demonstrates is that it's entirely possible for evidence to be found using entirely different approaches, and that not employing the full breadth of what an analyst has available to them is entirely insufficient.

Case Study #4
Let's look at another simple example - as a DFIR analyst, you're examining either data collected from an endpoint, or an acquired image, and you see a Run key value that is clearly malicious; you've seen this one before in open reporting. You see the same path/file location, same file name.

What do you report?

Do you report, "...the endpoint was infected with <malicious thing>...", or do you validate this finding?

Do you:
- determine if the file pointed to by the value exists
- determine if the Run key value was disabled <-- wait, what??
- review the Microsoft-Windows-Shell-Core/Operational Event Log to see if the value was processed
- review the Application Event Log, looking for crash dumps, WER or Application Popup records for the malware
- review the Security Event Log for Process Creation events (if enabled)
- review Sysmon Event Log (if available)
- review the SRUM db for indications of the malware using the network

If not, why? Is it too much of a manual process to do so? Can the playbook not be automated through the means or suite you have available, or via some other means?

But Wait, There's More...
During my time as a DFIR analyst, I've seen command lines used to created Windows services, followed by the "Service Control Manager/7045" record in the System Event Log indicating that a new service was installed. I've also seen those immediately followed by a "Service Control Manager/7009" or "Service Control Manager/7011" record, indicating that the service failed to start, rather than the "Service Control Manager/7036" record you might expect. Something else we need to look for, going beyond simply "a Windows service was installed", is to look for indications of Windows Error Reporting events related to the image executable, application popups, or application crashes.

I've seen malware placed on systems that was detected by AV, but the AV was configured to "take no action" (per AV log messages), so the malware executed successfully. We were able to observe this within the acquired image by validating the impacts on the file system, Registry, Windows Event Log, etc.

I've seen threat actors push malware to multiple systems; in one instance, the threat actor pushed their malware to six systems, but it only successfully executed on four of those systems. On the other two, the Application Event Log contained Windows Error Reporting records indicating that there was an issue with the malware. Further examination failed to reveal the other impacts of the malware that had been observed on the four systems that had been successfully infected.

I worked a PCI case once where the malware placed on the system by the threat actor was detected and quarantined by AV within the first few hours it was on the system, and the threat actor did not return to the system for six weeks. It happened that that six weeks was over the Thanksgiving and Christmas holidays, during a time of peak purchasing. The threat actor returned after Christmas, and placed a new malware executable on the system, one that was not detected by AV, and the incident was detected a week later. In the report, I made it clear that while the threat actor had access to the system, the malware itself was not running and collecting credit card numbers during those six weeks.

Conclusion
In my previous post, I mentioned that Joe Slowik referred to indicators/artifacts as 'composite objects', which is something that, as an industry, we need to understand and embrace. We cannot view artifacts in isolation, but rather we need to consider their nature, which includes both being composite objects, as well as their place within a constellation. We need to truly embrace the significance of an IP address, a Run key value, or any other artifact what conducting and reporting on analysis.

Friday, April 07, 2023

Deriving Value From Open Reporting

There's a good bit of open reporting available online these days, including (but not limited to) the annual reports that tend to be published around this time of year. All of this open reporting amounts to a veritable treasure trove of information, either directly or indirectly, that can be leveraged by SOC and DFIR analysts, as well as detection engineers, to extend protections, as well as detection and response capabilities.

Sometimes, open reporting will reference incident response activities, and then focus solely on malware reverse engineering. In these cases, information about what would be observed on the endpoint needs to be discerned through indirect means. However, other open reporting, particularly what's available from TheDFIRReport, is much more comprehensive and provides much clearer information regarding the impact of the incident and the threat actor's activities on the endpoint, making it much easier on SOC and DFIR analysts to pursue investigations.

Let's take a look at some of what's shared in a recent write-up of a ransomware incident that started with a "malicious" ISO file. Right away, we get the initial access vector from the title of the write-up!

Before we jump in, though, we're not going to run through the entire article; the folks at TheDFIRReport have done a fantastic job of documenting what they saw six ways to Sunday, and there's really no need to run through everything in the article! Also, this is not a criticism, nor a critique, and should not be taken as such. Instead, what I'm going to do here is simply expand a bit on a couple of points of the article, nothing more. What I hope you take away from this is that there's a good bit of value within write-ups such as this one, value beyond just the words on paper.

The incident described in the article started with a phishing email, delivering a ZIP archive that contained an ISO file, which in turn contained an LNK file. There's a lot to unravel, just at this point. First off, the email attachment (by default) will have the MOTW attached to it, and MOTW propagation to the ISO file within the archive will depend up on the archival tool used to open it.

Once the archive is opened, the user is presented with the ISO file, and by default, Windows systems allow the user to automatically mount the disk image file by double-clicking it. However, this behavior can be easily modified, for free, while still allowing users to access disk image files programmatically, particularly as part of legitimate business processes. In the referenced Huntress blog post, Dray/@Purp1eW0lf provided Powershell code that you can just copy out of the blog post and execute on your system(s), and users will be prevented from automatically mounting disk image files by double-clicking on them, while still allowing users to access the files programmatically, such as mounting VHD files via the Disk Manager.

Next, Microsoft issued a patch in Nov 2022 that enables MOTW propagation inside mounted disk images files; had the system in this incident been patched, the user would have been presented with a warning regarding launching the LNK file. The section of the article that addresses defense evasion states, "These packages are designed to evade controls such as Mark-of-the-Web restrictions." This is exactly right, and it works...if the archival tool used to open the zip file does not propagate MOTW to the ISO file, then there's nothing to be propagated to from the ISO file to the embedded LNK file, even if the patch is installed.

Let's take a breather here for a second...take a knee. We're still at the initial access point of an incident that resulted in the domain-wide deployment of ransomware; we're at the desk of that one user who received the phishing email, and the malicious actions haven't been launched yet...and we've identified three points at which we could have inhibited (archiver tool, patched system) or obviated (enable programmatic disk image file access only) the rest of the attack chain. I bring this up because many times we hear how much security "costs", and yet, there's a free bit of Powershell that can be copied out of a blog post, that could have been applied to all systems and literally stopped this attack cycle that, according to the timeline spanned 5 days, in its tracks. The "cost" of running Dray's free Powershell code versus the "cost" of an infrastructure being encrypted and ransomed...what do those scales look like to you?

Referencing the malicious ISO file, the article demonstrates how the user mounting the disk image file can be detected via the Windows Event Log, stating that the "activity can be tracked with Event 12 from Microsoft-Windows-VHDMP/Operational" Event Log. Later, in the "Execution" section of the article, they state that "Application crashes are recorded in the Windows Application event log under Event ID 1000 and 1001", as a result of...well...the application crashing. Not only can both of these events be extracted as analysis pivot points using Events Ripper, but the application crashes observed in this incident serve to make my point regarding validation, specifically with respect to analysts validating findings.

The article continues illustrating the impact of the attack chain on the endpoint, referencing several other Windows Event Log records, several of which (i.e., "Service Control Manager/7045" events) are also covered/addressed by Events Ripper.

Conclusion
Articles like this one, and others from TheDFIRReport, are extremely valuable to the community. Where a good bit of open reporting articles will include things like, "...hey, we had 41 Sobinokibi ransomware response engagements in the first half of the year..." but then do an in-depth RE of one sample, with NO host-based impact or artifacts mentioned, articles such as this one do a great job of laying the foundation for artifact constellations, so that analysts can validate findings, and then use that information to help develop protections, detections, and response procedures for future engagements. Sharing this kind of information means that it's much easier for detect incidents like these much earlier in the attack cycle, with the goal of obviating file encryption.

Wednesday, April 05, 2023

Unraveling Rorschach

Checkpoint recently shared a write-up on some newly-discovered ransomware dubbed, "Rorschach". The write-up was pretty interesting, and had a good bit of content to unravel, so I thought I'd share the thoughts that had developed while I read and re-read the article.

From the article, the first things that jumped out at me were:

Check Point Research (CPR) and Check Point Incident Response Team (CPIRT) encountered a previously unnamed ransomware strain...

...and...

While responding to a ransomware case...

So, I'm reading this, and at this point, I'm anticipating some content around things like initial access, as well as threat actor "actions on objectives", as they recon and prepare the environment for the ransomware deployment.

However, there isn't a great deal stated in the article about how the ransomware got on the system, nor about how the threat actor gained access to the infrastructure. The article almost immediately dives into the malware execution flow, with no mention of how the system was compromised. We've seen this before; about 3 yrs ago, one IR consulting firm posted a 25-page write-up (which is no longer available) on Sobinokibi ransomware. The write-up started off by saying that during the first half of the year, the firm had responded to 41 Sobinokibi ransomware cases, and then dove into reverse engineering and analysis of one sample, without ever mentioning how the malware got on the system. As you read through Checkpoint's write-up, one of the things they point out (spoiler alert!!) is the speed of the encryption algorithm...if this is something to be concerned about, shouldn't we look to those threat actor activities that we can use to inhibit or obviate the remaining attack chain, before the ransomware is deployed?

Let's take a look at some other interesting statements from the article...

The ransomware is partly autonomous, carrying out tasks that are usually manually performed during enterprise-wide ransomware deployment...

Looking at the description of the actions performed by the ransomware executable, this is something we very often see in RaaS offerings. In June 2020, I read a write-up of a RaaS offering that included commands using "net stop" to halt 156 Windows services, taking something of a "spray-and-pray" approach; there was not advance recon that determined that those 156 services were actually running in the environment. Checkpoint's list of services the ransomware attempts to stop is much shorter, but similarly, there doesn't seem to be any indication that the list is targeted, that it's based on prior recon of the environment. In short, spray-and-pray, take the "shotgun" approach.

However, a downside of this is that while you may be able to detect it (parent process will by cy.exe, running the "net stop" commands), by that point, it may be too late. You'd need to have software-base response in place, with rules that state, "if these conditions are met on the endpoint, kill the process on the endpoint." Sending an alert to a SOC will be too late; by the time the alert makes it to the SOC console, files on the endpoint will already be encrypted.

The ransomware was deployed using DLL side-loading of a Cortex XDR Dump Service Tool, a signed commercial security product, a loading method which is not commonly used to load ransomware.

While I can't report seeing this used with ransomware specifically, DLL side-loading via a known good application is a technique that has been used extensively. Even going back a decade or more, I remember seeing legit Kaspersky, McAfee, and Symantec apps dropped in a ProgramData subfolder along with a malicious DLL, and launched as a Windows service, or via a Scheduled Task. The question I had at the time, and one that I still have when I see this sort of tactic used is, does anyone notice the legit program? What I mean is, when I've heard an analyst say that they found PlugX launched via DLL side-loading using a legit Kaspersky app, I've asked, "...were Kaspersky products used in the environment?" Most often, this doesn't seem to be a question that's asked. In the case of the Rorschach ransomware, were Palo Alto software products common in the environment, or was this Cortex tool completely new to the environment? Could something like this be used as a preventive or detective technique? After all, if a threat actor takes a tailored approach to the legit application used, deploying something that is common in the target environment and vulnerable to DLL side-loading, this would indicate a heightened level of situational awareness, rather than just "...I'll use this because I know it works."

At one point in the article, the authors state that the ransomware "...clears the event logs of the affected machines...", and then later state, "Run wevutil.exe to clear the the following Windows event logs: Application, Security, System and Windows Powershell." Okay, so we go from "clears the event logs" (implying that all Windows Event Logs are cleared) to stating that only four specific Windows Event Logs are cleared; that makes a difference. The command to enumerate and clear all Windows Event Logs is a pretty simple one-liner, where there are clearly four instances of wevtutil.exe launched, per figure 2 in the article. And why those four Windows Event Logs? Is it because the threat actor knows that their activities will appear in those logs, or is it because the threat actor understands that most analysts focus on those four Windows Event Logs, based on their training and experience? Is the Powershell Event Log cleared because Powershell is used at some point during the initial access or recon/prep phases of the attack, or are these Windows Event Logs cleared simply because the malware author believes that they are the files most often sought by SOC and DFIR analysts?

One article based on Checkpoint's analysis states, "After compromising a machine, the malware erases four event logs (Application, Security, System and Windows Powershell) to wipe its trace", implying that the malware author was aware of the traces left by the malware, and was trying to inhibit response; clearing the Windows Event Logs does not "erase" the event records, but does require extra effort on the part of the responder.

Conclusion

Even though this ransomware/file encrypting executable was found as a result of at least one response engagement, while the analysis of the malware itself is interesting, there's really very little (if any) information in the article regarding how the threat actor gained access to the environment, nor any of the steps taken by the threat actor to recon and prepare the environment prior to deploying the ransomware. From the malware analysis we know a few things that are interesting and useful, but little in the way of what we can do to detect this threat actor early in their attack cycle, allowing defenders to prevent, or detect and respond to the attack.

Monday, April 03, 2023

On Validation

I've struggled with the concept of "validation" for some time; not the concept in general, but as it applies specifically to SOC and DFIR analysis. I've got a background that includes technical troubleshooting, so "validation" of findings, or the idea of "do you know what you know, or are you just guessing", has been part of my thought processes going back for about...wow...40 years.

Here's an example...when setting up communications during Team Spirit '91 (military exercises in South Korea), my unit had a TA-938 "hot line" with another unit. This is exactly what it sounds like...it was a directly line to that other unit, and if one end was picked up, the other end would automatically ring. Yes, a "Bat phone". Just like that. Late one evening, I was in the "SOC" (our tent with all of the communications equipment) and we got a call that the hot line wasn't working. We checked connections, checked and replaced the batteries in the phone (the TA-938 phones took 2 D cell batteries, both facing the same direction), etc. There were assumptions and accusations thrown about as to why the phone wasn't working, as my team and I worked through the troubleshooting process. We didn't work on assumption; instead, we checked, rechecked, and validated everything. In the end, we found nothing wrong with the equipment on our end; however, the following day, we did find out what the issue was - at the other end, there was only one Marine in the tent, and that person had left the tent for a smoke break during the time of the attempted calls.

We could have just said, "oh, it's the batteries...", and replaced them...and we'd have the same issue all over again. Or, we could have just stated, "...the equipment on the other end was faulty/broken...", and we would not have made a friend of the maintenance chief from that unit. There were a lot of assumptions we could have made, conclusions we could have jumped to...and we'd have been wrong. We could have made stated findings that were trusted, and resulted in decisions being made, assets and resources being allocated, etc., all for the wrong reason. The end result is that my team and I (especially me, as the officer) would have lost credibility, and the trust and confidence of our fellow team members, and our commanding officer. As it was, validating our findings led to the right decisions being made, which were again validated during the exercise after action meetings.

Okay, so jump forward 32 years to present day...how does this idea of "validation" apply to SOC and DFIR analysis? I mean, this seems like such an obvious thing, right? Of course we validate our findings...but do we, really?

Case Study #1
A while back, I attended a conference during which one of the speakers walked through a PCI investigation they'd worked on. As the speaker walked through their presentation, they talked about how they'd used a single artifact, a ShimCache entry for the malware, to demonstrate program execution. This single artifact was used as the basis of the finding that the malware had been on the system for four years.

For those readers not familiar with PCI forensic investigations, the PCI Council specifies a report format and "dashboard", where the important elements of the report are laid in a table at the top of the report. One of those elements is "window of compromise", or the time between the original infection and when the breach was identified and remediated. Many merchants track the number of credit card transactions they process on a regular basis, including not only during periods of "regular" spending habits, but also off-peak and peak/holiday seasons, and as a result, the "window of compromise" can give the merchant, the bank, and the brand an approximate number of potentially compromised credit card numbers. As you'd imagine, given any average, the number of compromised credit card numbers would be much greater over a four year span than it would for, say, a three week "window of compromise".

As you'd expect, analysts submitting reports rarely, if ever, find out the results of their work. I was a PCI forensic analyst for about three and a half years, and neither I nor any of my teammates (that I'm aware of) heard what happened to a merchant after we submitted our reports. Even so, I cannot imagine that a report with a "window of compromise" of four years was entirely favorable.

But that begs the question - was the "window of compromise" really four years? Did the analyst validate their finding using multiple data sources? Something I've seen multiple times is that malware is written to the file system, and then "time stomped", often using time stamps retrieved from a native system file. This way, the $STANDARD_INFORMATION attribute time stamps from the $MFT record for the file appear to indicate that the file is "long lived", and has existed on the system for quite some time. This time stomping occurs before the Application Compatibility functionality of the Windows operating system creates an entry for the file, and the last modification time that's recorded for the entry is the one that's "time stomped". As a result, a breach that occurred in May 2013 and was discovered three weeks later ends up having the malware itself being reported as placed on the system in 2009. What impact this had, or might have had on a merchant, is something that we'll never know.

Misinterpreting ShimCache entries has apparently been a time-honored tradition within the DFIR community. For a brief walk-through (with reference links) of ShimCache artifacts, check out this blog post.

Case Study #2
In the spring of 2021, analysts were reporting, based solely on EDR telemetry, that within their infrastructure threat actors were using the Powershell Set-MpPreference module to "disable Windows Defender". This organization, like many others, was tracking such things as control efficacy (the effectiveness of controls) in order to make decisions regarding actions to take, and where and how to allocate resources. However, these analysts were not validating their findings; they were not checking the endpoints themselves to determine if Windows Defender had, in fact, been disabled, and if the threat actor's attempts had actually impacted the endpoints. As it turns out, that organization had a policy at the time of disabling Windows Defender on installation, as they had chosen another option for their security stack. As such, stating in tickets that threat actors were disabling Windows Defender, without validating these findings, led to quite a few questions, and impacted the credibility of the analysts

Artifacts As Composite Objects
Joe Slowik spoke at RSA in 2022, describing indicators, or technical observables, as "composite objects". This is an important concept in DFIR and SOC analysis, as well, and not just in CTI. We cannot base our findings on a single artifact, treating it as a discrete, atomic indicator, such as an IP address just being a location, or tied to a system, or a ShimCache entry denoting time of execution. We cannot view a process command line within EDR telemetry, by itself, as evidence of program execution. Rather, we need to recognize that artifacts are, in fact, composite objects; in his talk, Joe references Mandiant's definition of indicators of compromise, which can help us understand and visualize this concept.

Composite objects are made up of multiple elements. An IP address is not just a location, as the IP address is an observable with context. Where was the IP address observed, when was it used, and how was it used? Was it the source of an RDP, or a type 3 login? If the IP address was the source of a successful login, what was the username used? Was the IP address the source of a connection seen in web server or VPN logs? Is it the C2 address?

If we consider a ShimCache entry, we have to remember that (a) the entry itself does NOT explicitly demonstrate program execution, and that (b) the time stamp is mutable. That is, what we see could have been modified before we saw it. For example, we often see analysts hold up a ShimCache entry as evidence of program execution, often as the sole indicator. We have to understand and remember that the time stamp associated with a ShimCache entry is the last modification time for the entry, taken from the $STANDARD_INFORMATION attribute within the MFT. I've seen several instances where the file is placed on the system and then time stomped (the time stamp is easily mutable) before the entry was added to the Application Compatibility database. This is all in addition to understanding that an entry in the ShimCache does NOT mean that the file was executed. Note that the same is true for AmCache entries, as well.

We can validate indicators of compromise by including them in constellations, including them alongside other associated indicators, as doing so increases fidelity and brings valuable context to our analysis. We see this illustrated when performing searches for PCI data within acquired images; if you just search for a string of 16 characters starting with "4", you're going to get a LOT of results. If you look for strings of characters based on a bank ID number (BIN), length of the string, and if it passes the Luhn check, you're still going to get a lot of results, but not as many. If you also search for the characteristics associated with track 1 and track 2 data, your search results are going to be a smaller set, but with much higher fidelity because we've added layers of context.

Cost
So the question becomes, what is the cost of validating something versus not validating it? What is the impact or result of either? This seems on the surface like it's a silly question, maybe even a trick question. I mean, it looks that way when I read back over the question after typing it in, but then I think back to all the times I've seen when something hasn't been validated, and I have to wonder, what prevented the analyst from validating their finding, rather than simply basing their finding on a single artifact, out of context?

Let's look at a simple example...we receive an alert that a program executed, based on SIEM data or EDR telemetry. This alert can be based on elements of the command line, process parentage, or a combination thereof. Let's say that based on a number of factors and reliable sources, we believe that the command line is associated with malicious activity.

What do you report?

Do you report that this malicious thing executed, or do you investigate further to see if the malicious thing really did execute, and executed successfully? How would be go about investigating this, what data sources would be look to?

As you're thinking about this, as you're walking through this exercise, something I'd like you to keep in mind is that question, what would prevent you from actually examining those data sources you identify? Is there some "cost" (effort, time, other resources) that prevent you from doing so?

Saturday, March 25, 2023

Password Hash Leakage

If you've been in the security community for even a brief time, or you've taking training associated with a certification in this field, you've likely encountered the concept of password hashes. The "Reader's Digest" version of password hashes are that passwords are subject to a one-way cryptographic algorithm for storage, and that same algorithm is applied to passwords that are input, and a comparison is made for authentication. The basic idea is that the password is not stored in its original form.

Now, we 'see' password hashes being collected by threat actors all the time; grab a copy of the AD database, or of Registry hives from an endpoint. Or, why bother with hashes, when you can use NPPSpy or enable WDigest?

Or, if you wanted to maintain unauthenticated persistence, you could enable RDP and Sticky Keys.

Okay, so neither of those last two instances involves password hashes, so what if that's what you were specifically interested in? What if you wanted to get password hashes, or continue to receive password hashes, even across password resets? There are more than a few ways to go about doing this, all of which take advantage of available "functionality"; all you have to do is set up a file or document to attempt to connect to a remote, threat actor-controlled resource.

Collecting hashes is nothing new...check out this InSecure.org article from 1997. Further, hashes can be leaked via an interesting variety of routes and applications; take a look at this Securify article from 2018. Also, consider the approach presented in ACE Responder's tweet regarding modifying Remote Desktop Client .rdp files.

One means of enabling hash leaks across password resets is to modify the iconfilename field in specifically placed LNK/Windows shortcut files, which is similar to what is described in this article, except that you set the IconLocation parameter to point to a threat actor-controlled resource. There's even a free framework for creating shortcuts called "LNKBomb" available online.

Outlook has been a target for NTLM hash leakage attacks; consider this Red Team Notes article from 2018. More recently, Microsoft published this blog article explaining CVE-2023-23397, and how to investigate attempts to exploit the vulnerability. This PwnDefend article shares some thoughts as to persisting hash collection via the Registry, enabling the "long game".

So, What?
Okay, so what's the big deal? Why is this something that you even need to be concerned about?

Well, there's been a great deal of discussion regarding the cyber crime, and in particular, the ransomware economy for some time now. This is NOT a euphemism; cyber crime is an economy focused on money. In 2016, the Samas ransomware actors were conducting their own operations, cradle to grave; at the time, they targeted Java-based JBoss CMS systems as their initial access points. Over the years, an economy has developed around initial access, to the point where there are specialists, initial access brokers (IABs), who obtain and sell access to systems and infrastructures. Once initial access is achieved, they will determine what access is available, to which organization, and it would behoove them to retain access, if possible. Say they sell access, and the threat actor is "noisy", is caught, and the implant or backdoor placed by the IAB (not the initial access point itself) is "burned". NTLM leakage is a means for ensuring later, repeated access, given that one of the response and remediation recommendations is very often a global password change. If one of the routes into the infrastructure used by the IAB require authentication, then setting up a means for receiving password hashes enables continued access.

What To Do About It
There are a number of ways to address this issue. First, block outbound communications over ports 139 and 445 (because of course you've already blocked inbound communication attempts over those ports!!), and monitor your logs for attempts to do so.

Of course, consider using some means of MFA, particularly for higher privilege access.

If your threat hunting allows for access to endpoints (rather than log sources sent to a SIEM) and file shares, searching for LNK files in specific locations and checking their iconfilename attributes is a good hunt, and something you may want to enable on a regular, repeated basis, much like a security patrol.

For SOC detections, look for means by which this activity...either enabling or using these attempts at hash leakage...might be detected.

From a DFIR perspective, my recommendation would be to develop an evidence intake process that includes automated parsing and rendering of data sources prior to presenting the information to the DFIR analyst. Think of this as a means of generating alerts, but instead of going to the SOC console, these "alerts" are enriched and decorated for the DFIR analyst. This process should include parsing of LNK files within specific locations/paths in the acquired evidence, as parsing all LNK files might not be effective, nor timely.

The "Why" Behind Tactics

Very often we'll see mention in open reporting of a threat actor's tactics, be they "new" or just what's being observed, and while we may consider how our technology stack might be used to detect these tactics, or maybe how we'd respond to an incident where we saw these tactics used, how often to do we consider why the tactic was used?

To see the "why", we have to take a peek behind the curtain of detection and response, if you will.

If you so much as dip your toe into "news" within the cyber security arena, you've likely seen mention that Emotet has returned after a brief hiatus [here, here]. New tactics observed associated with the deployment of this malware include the fact that the lure document is an old-style MS Word .doc file, which presents a warning message to the user to copy the file to a 'safe' location and reopen it. The lure document itself is in excess of 500MB in size (padded with zeros), and when the macros are executed, a DLL that is similarly zero-padded to over 500MB is downloaded.

Okay, why was this approach taken? Why pad out two files to such a size, albeit with zeros?

Well, consider this...SOC analysts are usually front-line when responding to incident alerts, and they may have a lot of ground to cover while meeting SLAs during their shift, so they aren't going to have a lot of time to invest in investigations. Their approach to dealing with the .doc or even the DLL file will be to first download them from the endpoint...if they can. That's right...does the technology they're using have limits on file sizes for download, and if so, what does it take to change that limit? Can the change be made in a timely manner such that the analyst can simply reissue the request to download the file, or does the change take some additional action. If additional action is required, it likely won't be followed up on.

Once they have the file, what are they going to do? Parse it? Not likely. Do they have the tools available, and skills for parsing and analyzing old-style/OLE format .doc files? Maybe. But it's easier to just upload the file to an automated analysis framework...if that framework doesn't have a file size limit of it's own.

Oh, and remember, all of that space full of zeros means the threat actor can change the padding contents (flip a single "0" to a "1") and change the hash without impacting the functionality of the file. So...yeah.

So, what's happening here is that whether or not it's specifically intended, these tactics are targeting analysts, relying on their lacking in experience, and targeting response processes within the security stack. Okay, "targeting" implies intent...let's say, "impacting" instead. You have to admit that when looking at these tactics and comparing them to your security stack, in some cases, these are the effects we're seeing, this is what we see happening when we peek behind the curtain.

Consider this report from Sentinel Labs, which mentions the use of the "C:\MS_DATA\" folder by threat actors. Now, consider the approach taken by a SOC analyst who sees this for the first time; given that some SOC analysts are remote, they'll likely turn to Google to learn about this folder, and find that the folder is used by the Microsoft Troubleshooting tool (TSSv2), and at that point, perhaps deem it "safe" or "benign". After all, how many SOCs maintain a central, searchable repository of curated, documented intrusion intel? For those that do, how many analysts on those teams turn to that repository first, every time?

How about DFIR consulting teams? How many DFIR consulting teams have an automated process for parsing acquired data, and automatically tagging and decorating it based on intrusion intel developed from previous engagements?

In this case, an automated process could parse the MFT and automatically tag the folder with a note for analysts, with tips regarding how to validate the use of TSSv2, and maybe even tag any files found within the folder.

When seeing tactics listed in open reporting, it's not just a good idea to consider, "does my security stack detect this?", but to also think about, "what happens if we do?"

Thursday, March 16, 2023

Threat Actors Changing Tactics

I've been reading a bit lately on social media about how cyber security is "hard" and it's "expensive", and about how threat actors becoming "increasingly sophisticated".

The thing is, going back more than 20 yrs, in fact going back to 1997, when I left military active duty and transitioned to the private sector, I've seen something entirely different.

On 7 Feb 2022, Microsoft announced their plans to change how the Windows platform (OS and applications) handled macros in Office files downloaded from the Internet; they were planning to block them, by default. Okay, so why is that? Well, it turns out that weaponized Office docs (Word documents, Excel spreadsheets, etc.) were popular methods for gaining access to systems.

As it turns out, even after all of the discussion and activity around this one, single topic, weaponized documents are still in use today. In fact, March 2023 saw the return of Emotet, delivered via an older-style MS Word .doc file that was in excess of 500MB in size. This demonstrates that even with documented incidents and available protections, these attacks will still continue to work, because the necessary steps to help protect organizations are never taken. In addition to using macros in old-style MS Word documents, the actors behind the new Emotet campaigns are also including instructions to the recipient for...essentially...bypassing those protection mechanisms.

Following the Feb 2022 announcement from Microsoft, we saw some threat actors shift to using disk image files to deploy their malware, due in large part to the apparent dearth of security measures (at the time) to protect organizations from such attacks. For example, a BumbleBee campaign was observed using IMG files to help spread malware.

MS later updated Windows to ensure "mark-of-the-web" (MotW) propagation to files embedded within disk image files downloaded from the Internet, so that protection mechanisms were available for some file types, and that at least warnings would be generated for others.

We then saw a shift to the use of macros in MS OneNote files, as apparently these file weren't considered "MS Office files" (wait...what??).

So, in the face of this constant shifting in and evolution of tactics, what are organizations to do to address these issues and protect themselves?

Well, the solution for the issue of weaponized Office documents existed well prior to the Microsoft announcement in Feb 2022; in fact, MS was simply implementing it where orgs weren't doing so. And the thing is, the solution was absolutely free. Yep. Free, as in "beer". A GPO, or a simple Registry modification. That's it.

The issue with the use of disk image files is that when received and a user double-clicks them, they're automatically mounted and the contents accessible to the user. The fix for this...disabling automatically mounting the image files when the user double-clicks them...is similarly free. With two simple Registry modifications, users are prevented from automatically mounting 4 file types - ISO, IMG, VHD, and VHDX. However, this does not prevent users from programmatically accessing these files, such as via a legitimate business process; all it does is prevent the files from being automatically mounted via double-clicking.

And did I mention that it's free?

What about OneNote files? Yeah, what about them?

My point is that we very often say, "...security is too expensive..." and "...threat actors are increasing in sophistication...", but even with changes in tactics, is either statement really true? As an incident responder, over the years, I've seen the boots-on-the-ground details of attacks, and a great many of them could have been prevented or at the very least significantly hampered had a few simple, free modifications been made to the infrastructure.

The Huntress team posted an article recently that includes Powershell code that you can copy-paste and use immediately, and will address all three of the situations/conditions discussed in this blog post.

Sunday, March 12, 2023

On Using Tools

I've written about using tools before in this blog, but there are times when something comes up that provokes a desire to revisit a topic, to repeat it, or to evolve and develop the thoughts around it. This is one of those posts.

When I first released RegRipper in 2008, my intention was that once others saw the value in the tool, it would organically just grow on its own as practitioners found value in the tool, and sought to expand it. My thought was that once analysts started using it, they'd see the value proposition in the tool, and all see that the real power that comes from it is that it can easily be updated; "easily" by either developing new plugins, or seeking assistance in doing so.

That was the vision, but it's not something that was ever really realized. Yes, over time, some have created their own plugins, and of those, some have shared them. However, for the most part, the "use case" behind RegRipper has been "download and RUNALLTHETHINGS", and that's pretty much it.

On my side, there are a few assumptions I've made with respect to those using RegRipper, specifically around how they were using it. One assumption has been that whomever downloaded and is using the tool has a purposeful, intentional reason for doing so, that they understand their investigative goals and understand that there's value in using tools like RegRipper to extract information for analysis, to validate other findings and add context, and to use as pivot points into further analysis.

Another assumption on my part is that if they don't find what they're looking for, don't find something that "helps", or don't understand what they do find, that they'll ask. Ask me, ask someone else.

And finally, I assume that when they find something that either needs to be updated in a plugin, or a new plugin needs to be written to address something, that they'll do so (copy-paste is a great way to start), or reach out to seek assistance in doing so.

Now, I'm assuming here, because it's proved impossible to engage others in the "community" in a meaningful conversation regarding tool usage, but it appears to me that most people who use tools like RegRipper assume that the author is the expert, that they've done and seen everything, that they know everything, and that they've encapsulated all of that knowledge and experience in a free tool. The thing is, I haven't found that to be the case in most tools, and that is most definitely NOT the case when it comes to RegRipper.

Why would anyone need to update RegRipper?

Lina recently tweeted about the need for host forensics, and she's 10,000% correct! SIEMs only collect those data sources that are pointed at them, and EDR tools can only collect and alert on so much. As such, there are going to be analysis gaps, gaps that need to be filled in via host forensics. And as we've seen over time, a lot changes about various endpoint platforms (not just Windows). For example, we've been aware of the ubiquitous Run keys and how they're used for persistence; however, there are keys that can be used to disable the Run key values (Note: the keys and values can be created manually...) without modifying the Run key itself. As such, if you're checking the contents of the Run key and stating that whatever is listed in the values was executed, without verifying/validating that information, then is this correct? If you're not checking to see if the values were disabled (this can be done via reg.exe), and if you're not validating execution via the Shell-Core and Application Event Logs, then is the finding correct? I saw the value in validating findings when determining the "window of compromise" during PCI forensic exams, because the finding was used to determine any regulatory fines levied against the merchant.

My point is that if you're running a tool and expecting it to do everything for you, then maybe there needs to be a re-examination of why the tool is being run in the first place. If you downloaded RegRipper 6 months ago and haven't updated it in any way since then, is it still providing you with the information you need? If you haven't added new plugins based on information you've been seeing during analysis, at what point does the tool cease to be of value? If you look closely at the RegRipper v3.0 distro available on Github, you'll notice that it hasn't been updated in over 2 1/2 yrs. I uploaded a minor update to the main engine a bit ago, but the plugins themselves exist as they were in August 2020. Since then, I've been developing an "internal" custom version of RegRipper, complete with MITRE ATT&CK and category mappings, Analysis Tips, etc. I've also started developing plugins that output in JSON format. However, all of these are things that either I proposed in 2019 and got zero feedback on, or someone close to me asked about. Not a week goes by when I don't see something online, research it, and it ends up in a plugin (or two, or five...).

If you're using a tool, any tool (RegRipper, plaso, etc.), do you understand it's strengths and weaknesses, do you understand what it does and does not do, or do you just assume that it gives you what you need?

Sunday, February 26, 2023

Devices

This interview regarding one of the victims of the University of Idaho killings having a Bluetooth speaker in her room brings up a very important aspect of digital forensic analysis; that technology that we know little about is very pervasive in our lives. While the interview centers around the alleged killer's smart phone, the same concept applies to Windows systems, and specifically mobile systems such as laptops and tablets. Very often, there are remnants or artifacts left over as a result of prior activity (user interaction, connected devices, etc.) that we may not be aware of, and in more than a few instances, these artifacts may exist well beyond the deletion of applications.

Something I've mentioned previously here in this blog is that where you look for indications of Bluetooth or other connections may depend upon the drivers and/or applications installed. Some laptops or tablets, for example, may come with Bluetooth chipsets and drivers, and their own control applications, while other systems may have to have an external adapter. Or...and this is a possibility...the internal chipset may have been disabled in favor of an external adapter, such as a USB-connected Bluetooth adapter. As such, we can cover a means for extracting the necessary identifying information, just as Brian did here in his blog in 2014, but that specific information may not apply to other systems. By way of example, participants in this analysis test would have found information about connected Bluetooth devices in an entirely different location. The publicly available RegRipper v3.0 includes three plugins for extracting information about Bluetooth-connected devices from the Registry, one of which is specific to certain Broadcom drivers.

WiFi
Okay, not what we'd specifically consider "devices", but WiFi connections have long been valuable in determining the location of a system at a point in time, often referred to as geolocation. Windows systems maintain a good deal of information about WiFi access points they've connected to, much like smartphones in the "Bluetooth" section above. We "see" this when we have the system (Windows laptop, or a smartphone) away from a WiFi access point for a period of time, and then return...once we're back within range, if the system is configured to do so, it will automatically reconnect to the access point.

While I've done research into discovering and extracting information from the endpoint, others have used this information to determine the location of systems. I've talked to analysts who've been able to demonstrate that a former employee for their company met with a competitor prior to leaving the company and joining the competitor's team. In a few instances, those orgs have had DLP software installed on the endpoint, and were able to show that during that time, files were copied to USB devices, or sent off of the system via a personal email account.

USB Devices
Speaking of USB devices...

USB devices connected to Windows systems have long been an interest within the digital forensics community; in 2005, Cory Altheide and I co-authored the first peer-reviewed, published paper on the topic. Since then, there has been extensive writing on this topic. For example, Nicole Ibrahim, formerly of G-C Partners, has written about USB-connected devices, and the different artifacts left by their use, based on the device type (thumb drive, external hard drive, smartphone) and protocols used. I've even written several blog posts in the past year, covering artifacts that remain as a result not of USB devices being connected to a Windows system, but changes in Windows itself (here, and here). Over time, as Windows evolves, the artifacts left behind by different activities can change; we've even seen this between Windows 10 builds. As a result, we need to keep looking at the same things, the same activities, and ensure that our analysis process is keeping up, as well.

To that end, Kathryn Hedley recently shared a very good article on her site, khyrenz.com. She's also shared other great content, such as what USB connections look like with no user logged into the system. While Kathryn's writing covers specifically USB devices, she does address the issue of validation by providing insight into additional data sources.

Saturday, February 25, 2023

Why Write?

I shared yet another post on writing recently; I say "yet another" because I've published blog posts on the topic of "writing" several times. But something I haven't really discussed is why should we write, nor what we should write about?

In his book, Call Sign Chaos, Jim Mattis said, "If you haven't read hundreds of books, you are functionally illiterate, and you will be incompetent, because your personal experiences alone aren't broad enough to sustain you." While this is true for the warfighter, it is equally (and profoundly) true for other professions, and there's something else to the quote that's not as obvious. It's predicated on other professionals writing. In his book, Mattis described his reading as he moved into a theatre of operations, going back through history to learn what challenges previous commanders had faced, what they'd attempted to overcome those challenges, and what they'd learned.

While the focus of his book was on reading and professional development/preparation, the underlying "truth" is that someone...a previous commander, a historian, an analyst, someone...needs to write. This is what we need more of in cybersecurity...yes, there are books available, and lists available online, but what's missing is the real value, going beyond simple lists and instructions, to the how and the why, and perhaps more importantly, to what was learned.

So, if you are interested in developing content, what are some things you can write about? Here are some ideas...

Book Reviews
With all of the books that are out there that cover topics in DFIR, one of the few things we see are book reviews.

A book review is not a listing of the chapters and what each chapter contains.

What I mean by a book review is how you found it; was it well written, easy to follow? Was there something that could have made it better, perhaps more valuable, and if so, what was it? What impact did the contents have on your daily work? Is there something you'd like to see; perhaps a deeper explanation, more screen captures, maybe exercises at the end of sections or chapters would be beneficial?

And, if you found something that could be improved, maybe make clear, explicit recommendations. I've seen where folks have asked for "more screen captures" without saying of what, nor for what reason (i.e, what would be the goal or impact of doing so).

Conference Talks
Many times, particularly during 'conference season', we'll see messages on social media along the lines of "so-and-so is about to go on stage...", or we'll see a picture of someone on a stage, with the message, "so-and-so talking about this-and-that...", but what we don't see is commentary about what was said. So we know a person is going to talk about something, or did talk about something, but we know little beyond that, like how did what they say impact the listener/attendee? This is a great way to develop and share content, and is similar to book reviews...talk about how what you heard (or read) impacted you, or impacted your approach to analysis.

General Engagement
Speaking of social media, this is a great way to get started with the habit of writing...articulate your thoughts regarding something you see, rather than just clicking "Like", or some other button offered by the platform.

Monday, February 20, 2023

WEVTX Event IDs

Now and again, we see online content that moves the community forward, a step or several steps. One such article appeared on Medium recently, titled Forensic Traces of Exploiting NTDS. This article begins developing the artifact constellations, and walks through forensics analysis of different means of credential theft on an Active Directory server.

We need to see more of these sorts of "how to investigate..." articles that go beyond just saying, "...look at the <data source>...". Articles like this can be very useful because they help other analysts understand how to go about investigating these and similar issues.

The sole shortcoming of this article is that the research was clearly conducted by someone used to looking at forensic artifacts in a list; each artifact is presented individually, isolated from others, rather than as part of an artifact constellation. Analysts who come from a background such as this tend to approach analysis in this way, because this is how they were taught.

Further, about halfway through the article we see a reference to "Event ID 400"; the subsequent images illustrate the event source as being "Kernel-PNP". However, this isn't specified. If you Google for "event ID 400", you find event sources such as Powershell, Microsoft-Windows-TerminalServices-Gateway, Performance Diagnostics, Veritas Enterprise Vault, and that's just on the first page.

About a third of the way down the article (sorry, images are numbered for reference) there's an image with the caption "Event ID 4688". The important thing that readers need to understand with this image is that these do not appear in the Security Event Log by default. For these events to appear, successful Process Tracking needs to be enabled, and there's an additional step, a Registry modification that needs to be made, in order for full command lines to appear in the event record. This is important for analysts to understand, so that they do not expect the records to be present by default. Also, you can parse the Security Registry hive using the RegRipper auditpol.pl plugin to determine the audit configuration for the system, validating what you should expect to see in the Security Event Log.

When examining the Windows Event Log as a data source during an investigation, what's actually available in the logs is dependent upon the version of Windows, the installed applications, the configuration of the Security Event Log, etc. Don't assume when reading articles such as this online that, while profoundly useful, you're going to see the log entries in the systems you engage with and examine.

Monday, February 13, 2023

Training and CTFs

The military has a couple of adages...one, "you fight like you train", and another being, "the more you sweat in peace, the less you bleed in war." The idea behind these adages is that progressive, realistic training prepares you for the job at hand, which is often one performed under "other than optimal" conditions. You start by learning in the classroom, then in the field, and then under austere conditions, so that when you do have to perform the function(s) or task(s) under similar conditions, you're prepared and it's not a surprise. This is also true of law enforcement, as well as other roles and functions. Given the pervasiveness of this style of training and familiarization, I would think that it's suffice to say that it's a highly successful approach.

The way DFIR CTFs, while fun, are being constructed and presented, they are doing those in the field a disservice, as they do not encourage analysts to train the way they should be fighting. In fact, they tend to cement and even encourage bad habits.

Let me say right now that I understand the drive behind CTF challenges, particularly those in the DFIR field. I understand the desire to make something available for others to use to practice, and perhaps rate themselves against, and I do appreciate the work that goes into such things. Honestly, I do, because I know that it isn't easy.

Let me also say that I understand why CTFs are provided in this manner; it's because this is how many analysts are "taught", and it's because this is how other CTFs are presented. I also understand that presenting challenges in this manner provides for an objective measure against which to score individual participants; the time it takes to complete the challenge, the time between answering subsequent questions, and the number of correct responses are all objective measures that can be handled by a computer program, and really provide little wiggle room. So, we have analysts who "come up" in the industry, taking courses and participating in CTFs that are all structured in a similar manner, and they go on to create their own CTFs, based on that same structure.

However, the issue remains...the way DFIR CTFs are presented, they encourage something much less than what we should be doing, IRL. We continue to teach analysts that reviewing individual artifacts in isolation is "sufficient", and there's no direction or emphasis on concepts such as validation, toolmarks, or artifact constellations. In addition, there's no development of incident intelligence to be shared with others, both in the DFIR field, and adjacent to it (SOC, detection engineering, CTI, etc.).

Hassan recently posted regarding CTFs and "deliberate practice"; while I agree with his thoughts in principle, these tend to fall short. Yes, CTFs are great, because they offer the opportunity to practice, but they fall short in a couple of areas. One in particular is that they really aren't necessarily "deliberate practice"; perhaps a different way of saying that is that it's "deliberate practice" in the wrong areas, because we're telling those who participate in these challenges that answering obscure questions, in a manner that isolates that information from other other information needed to "solve" the case, is the standard to strive for, and this should not ever be the case.

Another way that these DFIR CTFs fall short is that they tend to perpetuate the belief that examiners should look at artifacts one a time, in isolation from other artifacts (particularly others in the constellation). Given that Windows is an operating system, with a lot going on, our old way of viewing artifacts...the way we've always done it...no longer serves us well. It's like trying watch a rock concert in a stadium by looking through a key hole. We can no longer open one Windows Event Log file in a GUI viewer, search for somethings we think might be relevant, close that log file, open another one, and repeat. Regardless of how comfortable we are with this approach, it is terribly insufficient and leaves a great many gaps and unanswered questions in even what appears to be the most rudimentary case.

Let's take a look at an example; this CyberDefenders challenge, as Hassan mentioned CyberDefenders in a comment. The first thing we see is that we have to sign up, and then sign in to work the challenge, and that none of the analyst case notes from how they solved the CTF are available. The same has been true of other CTFs, including (but not limited to) those such as the 2018 DefCon DFIR CTF. Keeping case notes is something that analysts should be deliberately practicing, as well as sharing them.

Second, we see that there are 32 questions to be answered in the CTF, the first of which is, "what is the OS product name?" We already know from one of the tags for the CTF that the image is Windows, so how important is the "OS product name"? This information does not appear to be significant to any of the follow-on questions, and seems to be solely for the purpose of establishing some sort of objective measure. Further, in over 2 decades of DFIR work, addressing wide range of response scenarios (malware, ransomware, PCI, APT, etc.), I don't think I've ever had a customer ask more than 4 or 5 questions...max. In the early days, there was most often just one question customers were interested in:

Is there malware on this system?

As time progressed, many customers wanted to know:

How'd they get in?

Who are they?

Are they still in my network?

What did they take?

Most often, whether engaging in PCI forensic exams, or in "APT" or targeted threat response, those four questions, or some variation thereof, were in the forefront of customer's minds. In over two decades of DFIR work, ranging from individual systems up to the enterprise, I never had a case where a customer asked 32 questions (I've seen CTFs with 51 questions), and I've never had a customer (or a co-worker/teammate) ask me for the the LogFile sequence number of an Excel spreadsheet. In fact, I can't remember a single case (none stands out in my mind) where the LogFile sequence number of any file was a component or building block of an overall investigation.

Now, I'm not saying this isn't true for others...honestly, I don't know, as so few in our field actually share what they do. But from my experience, in working my own cases, and working cases with others, none of the questions asked in the CTF were pivotal to the case.

So, What's The Answer?

The answer is that forensic challenges need to be adapted, worked, and "graded" differently. CTFs should be more "deliberate practice", aligned to how would DFIR work should be done, and perpetuating and reinforcing good habits. Analysts need to keep and share case notes, being transparent about their analytic goals and thought processes, because this is now we learn overall. And I don't just mean that this is how that analyst, the one who shares these things, learns; no, I mean that this is how we all learn. In his book, Call Sign Chaos, retired Marine General Jim Mattis said that our own "personal experiences alone are not broad enough to sustain us"; while this thought applies to a warfighter reading, this portion of the quote applies much more broadly to mean that if we're stuck in our own little bubble, not sharing what we've done and what we know with others, then we're not improving, adapting and growing in our profession.

If we're looking to provide others with "deliberate practice", then we need to change the way we're providing that opportunity.

Additional Resources
John Asmussen - Case_notes.py

My 2018 DefCon DFIR CTF write-ups (part 1, part 2)

Dr. Ali Hadi's data sets site (my write up for challenge 7)