Thursday, June 01, 2023

Events Ripper Update

Working a recent incident, I came across something very unusual. I started by going back into a previous investigation run against the endpoint that had been conducted a month ago, and extracting the WEVTX files collected as part of that investigation. So, the WEVTX files were retrieved from the endpoint on 30 Apr, and when I created the timeline, I found that the four most recent time segments were from 1 June 2024...that's right, 2024!

As I was using some of the indicators we already had (file and process names) to pivot into the timeline, I saw that I had Security Event Log records from 2020...now, that is weird! After all, it's not often that I see Security Event Log records going back a week or month, let alone 3 years!

Another indicator was the sessions.pl output from Events Ripper; I had logins lasting 26856 hours (1119 days), and others lasting -16931 hours (over 705 days). Given how the session span is calculated, I knew some was "off" in the Security (and very likely, other) Event Logs, particular the records associated with logon and logoff events. 

I knew something was up, but I also knew that finding the "what was up" was also based largely on my experience, and might not be something a new or more junior analyst would be familiar with. After all, if an analyst was to create a timeline (and I'm seeing everyday that's a pretty big "if"), and if they were pivoting off of known indicators to build context, then how likely would it be that they had the experience to know that something was amiss?

So, naturally, I wrote an Events Ripper plugin (timechange.pl) to pull Security-Auditing/4616 event records from the Security Event Log and display the available information in a nice table. The plugin collects all of these events, with the exception of sub-second time changes (which can be fairly common), and displays them in a table showing the user, the time changed from, the time changed to, and via which process. I wrote the plugin, and it produced an entry on the next investigation...not one that had much impact on what was going on, as the system clock was updated by a few minutes, but this simply shows me how the use of plugins like this can be very valuable for elevating interesting and important artifacts to the analyst for review without requiring that analyst to have extensive experience.

Friday, May 26, 2023

Events Ripper Updates

I updated an Events Ripper plugin recently, and added two new ones...I tend to do this when I see something new to that I don't have to remember to run a command, check a box on a checklist, or take some other step. If I have to do any of these, I'm not going to remember these steps, so instead, I just create a plugin, drop it into the "plugins" folder, and it gets run every time, for every investigation. What's really cool is that I can re-run Events Ripper after I add addition Windows Event Log files to the mix, or after creating a new plugin (or updating a current one); most often, it's just hitting the up-arrow while in the command prompt, and boom, it's done.

Here's a look at the updates:

bitsclient.pl - I added some filtering capabilities to this plugin, so that known-good URLs (MS, Google, Chrome, etc.) don't clutter the output with noise. There is a lot of legitimate use of BITS on a Windows system, so this log file is likely going to be full of things that aren't a concern for the analyst, and are simply noise, obscuring the signal. I'm sure I'll be updating this again as I see more things that need to be filtered out.

posh600.pl - I wanted a means for collecting PowerShell scripts from event ID 600 records in the Windows PowerShell Event Log, so I wrote this plugin. As with other plugins, this will provide pivot points into the timeline, helping to more easily elevate potentially malicious activity to the analyst, leveraging automation to facilitate analysis.

Similar to the bitsclient.pl plugin, I took steps to reduce the volume of information presented to the analyst. Specifically, I've seen on several investigations that there are a LOT of PowerShell scripts run as part of normal operations. So, while the plugin will collect all scripts, it only displays those that appear 5 or fewer times in log; that is, it shows those that appear least frequently.

Something I really like about this plugin is the use of data structures (via Perl) to manipulate the data, and how they lead to the data being presented. Anyone who's looked at the data in the Windows Powershell.evtx log file knows that for each script, there are something like 6 records, each with a lot of information that's not entirely useful to a DFIR or SOC analyst. Also, there are infrastructures that use a LOT of PowerShell for normal IT ops, so the question becomes, how do we reduce the data that the analyst needs to wade through, particularly those analysts that are less experienced? Well, the approach I took was to first collect unique instances of all of the scripts, along with the time at which they were seen. Then, the plugin only displays those that appeared 5 or fewer times in the log (this value is configurable by anyone using the plugin, just change the value of "$cap" on line 42). By displaying each script alongside the time stamp, it's easy for an analyst to quickly 'see' those scripts that were run least frequently, to 'see' what the scripts are doing, and to correlate data with other sources, validating their findings

So far, this technique has proven effective; during a recent investigation into two disparate endpoints that exhibited similar malicious activity, we were able to correlate the PowerShell scripts with RMM access logs to validate the specific RMM tool as the means of access. By viewing the scripts listed in reverse order (soonest first) based on time stamps, we were able to not only correlate the activity against the threat actor's batch file, with the logins, but also demonstrate that, on one endpoint, the batch file did not complete. That is, several lines from the batch file, evidenced as PowerShell scripts extracted from the WEVTX log file did not appear in the data on one endpoint. This demonstrated that the attack did not complete, which is why we detected the first endpoint our telemetry, but not the second one (because it didn't succeed).

As a side note, on the first endpoint, the timeline demonstrated that the coin miner crashed not long after it was started. Interestingly enough, that's what led to the next plugin (below) being created. ;-)

For those who are interested and want to do their own testing, line 42 of the plugin (open it in Notepad++, it won't bite...) lists the value "$cap", which is set to "5". Change that value, and re-run the plugin against your events file; do this as often as you like.

I will say that I'm particularly proud/fond of this plugin because of the use of data structures; Perl "hash of hashes" to get a list of unique scripts, transitioning to a Perl "hash of arrays" to facilitate least frequency of occurrence analysis and display to the analyst. Not bad for someone who has no formal training in computer science or data structures! It reminds me the time I heard Martin Roesch, the creator of the snort IDS, talk about how he failed his CS data structures course!

nssm.pl - apparently, nssm.exe logs to the Application Event Log, which helps to identify activity and correlate Window Event Log records with EDR telemetry. Now, a review of EDR telemetry indicates that nssm.exe has legitimate usage, and is apparently included in a number of packages, so look at the output of the plugin as potential pivot points, not that it's bad just because it's there.

And, hey, I'm not the only one who finds value in Events Ripper...Dray used it recently, and I didn't even have to pay him (this time)!!

Tuesday, May 16, 2023

Composite Objects and Constellations

Okay, to start off, if you haven't seen Joe Slowik's RSA 2022 presentation, you should stop now and go watch it. Joe does a great job of explaining and demonstrating why IOCs are truly composite objects, that there's much more to an IP address than just it being...well...an IP address. When we start thinking in these terms, in terms of context, the IOCs we see and share can become much more actionable. 

Why does any of this matter? Once, in a DFIR consulting firm far, far away, our team was working PCI forensics investigations, and Visa was sending us monthly lists of IOCs that we had search for during every case. We'd get three lists...one of file names, one of file paths, and one of hashes. There was no correlation between the various lists, nothing like, "...a file with this name and this hash existing in this folder...". Not at all. Just three lists, without context. Not entirely helpful for us, and any hits we found could be similarly lacking in any meaning or context..."hey, yeah, we found that this folder existed on one system...", but nothing beyond that was asked for, nor required. The point is that an IOC is often more than just what we see at face value...a file has a hash, a time frame that it existed on the system (or was seen on other systems), functionality associated with the file (if it's an executable file), etc. Similarly, an IP address is more than just four dot-separated octets...there's the time frame it was associated with an endpoint, the context with respect to how it was associated with the endpoint (was it the source IP for a login...what type...or lateral movement, was it a C2 address, was it the source of an HTTP request), etc.

Symantec recently posted an article regarding a group identified as "LanceFly", and their use of the Merdoor backdoor. In the article, table 1 references different legitimate applications used in DLL sideloading to load the backdoor; while the application names are listed, there are a couple of items that might be important missing. For example, what folders were used? For each of the legitimate applications used, were they or other products from the vendor used in the environment (i.e., what is the prior prevalence)? Further, there's no mention of persistence, nor how the legitimate application is launched in order to load the Merdoor backdoor. Prior to table 1, the article states that the backdoor itself persists as a Windows service, but there's no mention of how, once the legit application and the sideloaded DLL are place on the system, how the legit app is launched.

This is something I've wondered when I've seen previous cases involving DLL sideloading...specifically, how did the threat actor decide which legitimate application to use? I've seen cases, going back to 2013, where legit apps from McAfee, Kaspersky, and Symantec were used by threat actors to sideload their malicious DLLs, but when I asked the analyst working that case if the customer used that application within their environment, most often they had no idea.

Why does this matter? If the application used is new to the environment, and the EDR tool used within the infrastructure includes the capability, then you can alert on new applications, those that have never been 'seen' in the environment. Does your organization rely on Windows Defender? Okay, so if you see a Symantec or Kaspersky application for the first time...or rather, your EDR 'sees' it...then this might be something to alert on.

Viewing indicators as composite objects, and as part of constellations, allows us to look to those indicators as something a bit more actionable than just an IP address or a file name/hash. Viewing indicators as composite objects helps add context to what we're seeing, as well as better communicate that context to others. Viewing indicators as one element in a constellation allows us to validate what we're seeing. 

The Windows Registry

When it comes to analyzing and understanding the Windows Registry, where do we go, as an industry, to get the information we need?

Why does this even matter?

Well, an understanding of the Registry can provide insight into the target (admin, malicious insider, cyber
criminal, nation-state threat actor) by what they do, what they don't do, and how they go about doing it.

The Registry can be used to control a great deal of functionality and access on endpoints, going beyond just persistence. Various keys and values within the Registry can determine what we can see or not see, what we can do or not do, 

For example, let's say a threat actor enables RDP on an endpoint...this is something we see quite often. This could even be a Windows 10 or Windows 11 laptop; that is, it doesn't just have to be a server. When they enable it, do they also create a user account, add it to a group that has remote access, and then hide the new user account from the Welcome Screen? Do they enable Sticky Keys? Regardless of the various settings that they enable or disable, how do they go about doing so? Manually, or via a batch file or script of some kind?

The settings enabled or disabled, and the manner employed, can tell you something about the target. Are they prepared? Was it likely that they'd conducted some recon and developed some situational awareness of the environment, or as we see with many RaaS offerings, was it more of a "spray-and-pray" approach? If they used sc.exe (or some other means) to disable services, was that list specific and unique to the environment, or was it more of a "wish list" where many of the listed services didn't even exist on the endpoint?

Something that's been seen recently is the LogonType value being created, often as part of a batch file. This is interesting because the value itself appears to apply to Windows XP systems, but it's been seen being created on Windows 10 endpoints, as well as server variants of Windows. The order of the modifications, the timing between the modifications, and the position of the LogonType value within the list of modifications has been consistent across multiple endpoints, owned by unrelated customers. All of this, combined with the fact that the LogonType value apparently has no impact on the endpoints to which it was deployed, indicates that the "threat actor" is deploying this script of settings modifications without consideration for how "noisy" or unique it is.  

Okay, so let's consider persistence mechanisms, some of which can be a bit esoteric. For example, @0gtweet shared an interesting technique on 13 Dec 2022, and John Hammond shared a video of the technique on 12 May 2023. Now, if you take a really close look at it, this really isn't a "persistence" technique, per se, in the traditional sense...because in order to activate it, the threat actor has to have access to the system, or have some other means to run the "query" command. Maybe this could be used as a chained persistence technique; that is, what "persists" is the use of the "query" command, such as in an LNK file in the user's StartUp folder, or in another autoruns/persistence location, so that the "query" command is run, which in turn, runs the command created through the technique described by @0gtweet. 

So, consider this...threat actor compromises an admin account on an endpoint, and modifies the Registry so that the user accounts Startup folder is no longer the traditional "Startup" folder (i.e., "%userprofile%\AppData\Roaming\Microsoft\Windows\Start Menu\Programs\Startup"), but is something like "Temp". Then, they modify the "query" key with a value that launches their malware, or a downloader for their malware, and then drop an LNK file to run the new "query" entry in the new "Startup" location whenever that admin user logs in.

Now, here's something to think about...set this up and run it in a test environment, and see what the process lineage looks like, and try to figure out from that lineage what happened (i.e., working backwards). Pretty cool, eh?

Speaking of persistence, what about false flags? Say, the threat actor drops some "malware" on an endpoint, and adds a value to the Run key, but disables it. The SOC or DFIR analyst sees the Run key value being set and figures, "ah, gotcha!", and not knowing about other values within the Registry, doesn't understand that the value has been disabled. Just as with a military inspection, you leave something for the inspector to find so that they're satisfied and move on; in this case, the analyst may decide that they've sussed out the bad guy, delete the Run key value and referenced file, and move on...all while the threat actor's real persistence mechanism is still in place.

The point is, there's a good bit within the Registry that controls what access and capabilities the operating system and, to some extent, applications provide, and understanding that helps us understand a bit about the target we're interested in, whether they be a cyber criminal, threat actor, or malicious insider. 

Friday, May 05, 2023

Events Ripper Updates

As you may know, I'm a pretty big proponent for documenting things that we "see" or find during investigations, and then baking those things back into the parsing and decoration process, as a means of automating and retaining corporate knowledge. This means that something I see once can be added to the parsing, decoration, and enrichment process, so that I never have to remember to look for it again. Things I've seen before can be raised up through the "noise" and brought to my attention, along with any references or necessary context. This makes subsequent investigations more efficient, and gets me to where I'm actually doing analysis much sooner.

One of the ways I do this is by creating simple plugins for Events Ripper, a proof-of-concept tool for "mining" Windows Event Log data for pivot points that can be applied to analysis, and in particular timeline analysis. Events Ripper uses the events file, the intermediate step between normalizing Windows Event Log events into a timeline, extracting pivot points and allowing me to build the picture of what happened, and when, a great deal faster than doing so manually.

The recently created or updated plugins include:

sec4797.pl 
Check for "Microsoft-Windows-Security-Auditing/4797" events, indicating that a user account was checked for a blank password. I'd never seen these events before, but they popped up during a recent investigation, and helped to identify the threat actor's activity, as well as validate the compromised account they were using.

filter.pl 
"Microsoft-Windows-Security-Auditing/5156", and /5158 events; this plugin output is similar to what we see with ShimCache parsers, in that it lists the applications for which the Windows Filtering Platform allows connections, or allows to bind to a local port, respectively. Similar to "Service Control Manager" events illustrating a new service being installed, this plugin may show quite a few legitimate applications, but it's much easier to go through that list and see a few suspicious or malicious applications than it is to manually scroll through the timeline. Searching the timeline for those applications can really help focus the investigation on specific timeframes of activity.

defender.pl 
Windows Defender event IDs 1116, 1117, 2051, and 5007, all in a single plugin, allowing us to look for detections and modifications to Windows Defender. Some modifications to Windows Defender may be legitimate, but in recent investigations, exclusions added to Windows Defender have provided insight into the compromised user account, as well as the folders the threat actor used for staging their tools.

msi.pl
Source "MsiInstaller", with event IDs 11707 (successful product installation), 11724, and 1034 (both successful product removal).

scm.pl 
Combined several event IDs (7000, 7009, 7024, 7040, and 7045) events, all with "Service Control Manager" as the source, into a single plugin. This plugin is not so much the result of recent investigations, as it is the desire to optimize validation; a service being created or installed doesn't mean that it successfully runs each time the system is restarted.

appissue.pl 
Combined "Application Hang/1002", "Application Error/1000", and "Windows Error Reporting/1001" events into a single plugin, very often allowing us to see the threat actor's malware failing to function.

Each of the new or updated plugins is the result of something observed or learned during recent investigations, and allow me to find unusual or malicious events to use as pivot points in my analysis.

We can do the same things with RegRipper plugins, Yara or Sigma rules, etc. It simply depends upon your framework and medium.

Thursday, April 27, 2023

Program Execution

By now, I hope you've had a chance to read and consider the posts I've written discussing the need for  validation of findings (third one here). Part of the reason for this series was a pervasive over-reliance on single artifacts as a source of findings that I and others have seen within the community over the past 2+ decades. One of the most often repeated examples of this is relying on ShimCache or AmCache artifacts as evidence of program execution.

ShimCache
ShimCache, or AppCompatCache (the name of the Registry value where the data is found) is often looked to as evidence of program execution when really what it demonstrates is that the file was on the system.

From this blog post from Mandiant:

It is important to understand there may be entries in the Shimcache that were not actually executed.

There you go. That's from 2015. And this is why we need to incorporate artifacts such as the ShimCache into an overall constellation, rather then viewing artifacts such as these in isolation. This 13Cubed video provides a clear explanation regarding the various aspects of the ShimCache artifact as it relates to Windows 10; note that the title of the video includes "the most misunderstood artifact".

AmCache
AmCache is another one of those artifacts that is often offered up as "evidence of program execution", as seen in this LinkedIn post. However, the first referenced URL in that post belies the fact that this artifact is "evidence of program execution", as well as other statements in the post (i.e., that AmCache is "populated after system shutdown"). From the blog post:

During these tests, it was found that the Amcache hive may have artifacts for executables that weren’t executed at all.

A bit more extensive treatment of the AmCache artifact can be found here. While you may look at the PDF and think, "TL;DR", the short version is that an entry in the AmCache does not explicitly mean, by itself, that the file was executed.

The point is that research demonstrates that, much like the ShimCache artifact, we cannot simply look at an entry and state, "oh, that is evidence of program execution". Even if you don't want to take the time reach and digest either the blog post or the PDF, simply understand that by itself, an AmCache entry does not demonstrate evidence of program execution.

So, again...let's all agree to stop looking just to ShimCache or just to AmCache as evidence of program execution, and instead look to multiple data sources and to artifact constellations to establish whether a program was executed or not.

For some insight as to how ShimCache and AmCache can be used together, check out this blog post from WithSecure.

Keep in mind that even when combining these two artifacts, it still doesn't provide clear indications that the identified executable was launched, and successfully executed. We need to seek other artifacts (Windows Event Log, Registry, etc.) to determine this aspect of the executable.

PCA
Earlier this year, AboutDFIR.com published a blog post regarding a new artifact (new to Windows 11) that appears to demonstrate evidence of program execution. Much like other artifacts (see above), this one has nuances or conditions, in that you cannot look to it to demonstrate execution of all programs; rather this one seems to apply to either GUI programs, or CLI program launched via a GUI. This is important to remember, whether you see an application of interest listed in one of the artifacts, or if you don't...context matters.

The blog post provides insight into the artifacts, as well as images of the artifacts, and samples you can download and examine yourself. This YouTube video mentions another associated artifact; specifically, Windows Event Log records of interest. Adding the artifacts to a timeline is a pretty trivial exercise; the text-based artifacts are easy to script, and the process for adding Windows Event Logs to a timeline is something that already exists. 

Monday, April 24, 2023

New Events Ripper Plugins

I recently released four new Events Ripper plugins, mssql.pl, scm7000.pl, scm7024.pl and apppopup26.pl

The mssql.pl plugin primarily looks for MS SQL failed login events in the Application Event Log. I'd engaged in a response where we were able to validate the failed login attempts first in the MS SQL error logs, but then I learned that the events are also listed in the Windows Event Log, specifically the Application Event Log, and I wanted to provide that insight to the analyst.

The plugin lists the usernames attempted and the frequency of each, as well as the source IP address of the login attempts and their frequency. In one instance, we saw almost 35000 failed login attempts, from 4 public IP addresses, three of which were all from the same class C subnet. This not only tells a great deal about the endpoint itself, but also provides significant information that the analyst can use immediately, as well as leverage as pivot points into the timeline. The plugin does not yet list successful MS SQL logins because, by default, that data isn't recorded, and I haven't actually seen such a record.

The plugin also looks for event records indicating settings changes, and lists the settings that changed. Of specific interest is the use of the xp_cmdshell stored procedure. 

So, why does this matter? Not long ago, AhnLab published an article stating that they'd observed attacks against MS SQL servers resulting in the deployment of Trigona ransomware.

The scm7000.pl plugin locates "Service Control Manager/7000" event records, indicating that a Windows service failed to start. This is extremely important when it comes to validation of findings; just because something (i.e., something malicious) is listed as a Windows service does not mean that it launches and runs every time the endpoint is restarted. This is just as important to understand, alongside Windows Error Reporting events, AV events, application crash events, etc. This is why we cannot treat individual events or artifacts in isolation; events are in reality composite objects, and provide (and benefit from) context from "nearby" events.

The scm7024.pl plugin looks for "Service Control Manager/7024" records in the System Event Log, which indicate that a service terminated.

The apppopup26.pl plugin looks for "Application Popup/26" event records in the Application Event Log, and lists the affected applications, providing quick access to pivot points for analysis. If an application of interest to your investigation is listed, the simplest thing to do is pivot into the timeline to see what other events occurred "near" the event in question. Similar to other plugins, this one can provide indications of applications that may have been on the system at one point, and may have been removed.

Events Ripper has so far proven to be an extremely powerful and valuable tool, at least to me. I "see" something, document it, add context, analysis tips, reference, etc., and it becomes part of an automated process. Sharing these plugins means that other analysts can benefit from my experiences, without having to have ever seen these events before.

The tool is described here, with usage information available here, as well as via the command line.

On Validation, pt III

From the first two articles (here, and here) on this topic arises the obvious question...so what? Not
validating findings has worked well for many, to the point that the lack of validation is not recognized. After all, who notices that findings were not verified? The peer review process? The manager? The customer? Given just the fact how pervasive training materials and processes are that focus solely on single artifacts in isolation should give us a clear understanding that validating findings is not a common practice. That is, if the need for validation is not pervasive in our industry literature, and if someone isn't asking the question, "...but how do you know?", then what leads us to assume that validation is part of what we do?

Consider a statement often seen in ransomware investigation/response reports up until about November 2019; that statement was some version of "...no evidence of data exfiltration was observed...". However, did anyone ask, "...what did you look at?" Was this finding (i.e., "...no evidence of...") validated by examining data sources that would definitely indicate data exfiltration, such as web server logs, or the BITS Client Event Log? Or how about indirect sources, such as unusual processes making outbound network connections? Understanding how findings were validated is not about assigning blame; rather, it's about truly understanding the efficacy of controls, as well as risk. If findings such as "...data was not exfiltrated..." are not validated, what happens when we find out later that it was? More importantly, if you don't understand what was examined, how can you address issues to ensure that these findings can be validated in the future?

When we ask the question, "...how do you know?", the next question might be, "...what is the cost of validation?" And at the same time, we have to consider, "...what is the cost of not validating findings?"

The Cost of Validation
In the previous blog posts, I presented "case studies" or examples of things that should be considered in order to validate findings, particular in the second article. When considering the 'cost' of validation, what we're asking is, why aren't these steps performed, and what's preventing the analyst from taking the steps necessary to validate the findings? 

For example, why would an analyst see a Run key value and not take the steps to validate that it actually executed, including determining if that Run key value was disabled? Or parse the Shell-Core Event Log and perhaps see how many times it may have executed? Or parse the Application Event Log to determine if an attempt to execute the program pointed to resulted in an application crash? In short, why simply state that program execution occurred based on nothing more than observing the Run key value contents? 

Is it because taking those steps is "too expensive" in terms of time or effort, and would negatively impact SLAs, either explicit or self-inflicted? Does it take too long do so, so much so that the ticket or report would not be issued in what's considered a "timely" manner? 

Could you issue the ticket or report in order to meet SLAs, make every attempt to validate your findings, and then issue an updated ticket when you have the information you need?

The Cost of Not Validating
In our industry, an analyst producing a ticket or report based on their analysis is very often well abstracted from the final effects, based on decisions made and resources deployed due to their findings. What this means is that whether in an internal/FTE or consulting role, the SOC or DFIR analyst may not ever know the final disposition of an incident and how that was impacted by their findings. That analyst will likely never see the meeting where someone decides either to do nothing, or to deploy a significant staff presence over a holiday weekend.

Let's consider case study #1 again, the PCI case referenced in the first post. Given that it was a PCI case, it's likely that the bank notified the merchant that they were identified as part of a common point of purchase (CPP) investigation, and required a PCI forensic investigation. The analyst reported their findings, identifying the "window of compromise" as four years, rather than the three weeks it should have been. Many merchants have an idea of the number of transactions they send to the brands on a regular basis...for smaller merchants, it may be a month, and for larger vendors, a week. They also have a sense of the "rhythm" of credit card transactions; some merchants have more transactions during the week and fewer on the weekends. The point is that when the PCI Council needed to decide on a fine, they take the "window of compromise" into account.

During another incident in the financial sector, a false positive was not validated, and was reported as a true positive. This led to the domain controller being isolated, which ultimately triggered a regulatory investigation.

Consider this...what happens when you tell a customer, "OMGZ!! You have this APT Umpty-Fratz malware running as a Windows service on your domain controller!!", only to later find out that every time the endpoint is restarted, the service failed to start (based on "Service Control Manager/7000" events, or Windows Error Reporting events, application crashes, etc.)? The first message to go out sounds really, REALLY bad, but the validated finding says, "yes, you were compromised, and yes, you do need a DFIR investigation to determine the root cause, but for the moment, it doesn't appear that the persistence mechanism worked."

Conclusion
So, what's the deal? Are you validating findings? What say you?

Sunday, April 16, 2023

On Validation, pt II

My first post on this topic didn't result in a great deal of engagement, but that's okay. I wrote the first post with part II already loaded in the chamber, and I'm going to continue with this topic because, IMHO, it's immensely important. 

I've see more times than I care to count findings and reports going out the door without validation. I saw an analyst declare attribution in the customer's parking lot, as the team was going on-site, only to be proven wrong and the customer opting to continue the response with another team. Engagements such as this are costly to the consulting team through brand damage and lost revenue, as well as costly to the impacted organization, through delays and additional expenses to reach containment and remediation, all while a threat actor is active on their network.

When I sat down to write the first post, I had a couple more case studies lined up, so here they are...

Case Study #3
Analysts were investigating incidents within an organization, and as part of the response, they were collecting memory dumps from Windows endpoints. They had some information going into the investigations regarding C2 IP addresses, based on work done by other analysts as part of the escalation process, as well as from intel sources and open reporting, so they ran ASCII string searches for the IP addresses against the raw memory dumps. Not getting any hits, declared in the tickets that there was no evidence of C2 connections.

What was missing from this was the fact that IP addresses are not employed by the operating system and applications as ASCII strings. Yes, you may see an IP address in a string that starts with "HTTP://" or "HTTPS://", but by the time the operating system translates and ingests the IP address for use, it's converted to 4 bytes, and as part of a structure. Tools like Volatility provide the capability to search for certain types of structures that include IP addresses, and bulk_extractor searches for other types of structures, with the end result being a *.pcap file.

In this case, as is often the case, analyst findings are part of an overall corporate-wide process, a process that includes further, follow-on findings such as "control efficacy", identifying the effectiveness of various controls and solutions within the security tech stack to address situations (prevent, detect, respond to) incidents, and simply stating in the ticket that "no evidence of communication with the C2 IP address was found" is potentially incorrect, in addition to not addressing how this was determined. If no evidence of communications from the endpoint was found, then is there any reason to submit a block for the IP address on the firewall? Is there any reason to investigate further to determine if a prevention or detection control failed?

In the book Investigating Windows Systems, one of the case studies involves both an image and a memory dump, where evidence of connections to an IP address were found in the memory dump that were not found in application logs within the image, using the tools mentioned above. What this demonstrates is that it's entirely possible for evidence to be found using entirely different approaches, and that not employing the full breadth of what an analyst has available to them is entirely insufficient.

Case Study #4
Let's look at another simple example - as a DFIR analyst, you're examining either data collected from an endpoint, or an acquired image, and you see a Run key value that is clearly malicious; you've seen this one before in open reporting. You see the same path/file location, same file name. 

What do you report?

Do you report, "...the endpoint was infected with <malicious thing>...", or do you validate this finding? 

Do you:
- determine if the file pointed to by the value exists
- determine if the Run key value was disabled  <-- wait, what??
- review the Microsoft-Windows-Shell-Core/Operational Event Log to see if the value was processed
- review the Application Event Log, looking for crash dumps, WER or Application Popup records for the malware
- review the Security Event Log for Process Creation events (if enabled)
- review Sysmon Event Log (if available)
- review the SRUM db for indications of the malware using the network

If not, why? Is it too much of a manual process to do so? Can the playbook not be automated through the means or suite you have available, or via some other means?

But Wait, There's More...
During my time as a DFIR analyst, I've seen command lines used to created Windows services, followed by the "Service Control Manager/7045" record in the System Event Log indicating that a new service was installed. I've also seen those immediately followed by a "Service Control Manager/7009" or "Service Control Manager/7011" record, indicating that the service failed to start, rather than the "Service Control Manager/7036" record you might expect. Something else we need to look for, going beyond simply "a Windows service was installed", is to look for indications of Windows Error Reporting events related to the image executable, application popups, or application crashes.

I've seen malware placed on systems that was detected by AV, but the AV was configured to "take no action" (per AV log messages), so the malware executed successfully. We were able to observe this within the acquired image by validating the impacts on the file system, Registry, Windows Event Log, etc.

I've seen threat actors push malware to multiple systems; in one instance, the threat actor pushed their malware to six systems, but it only successfully executed on four of those systems. On the other two, the Application Event Log contained Windows Error Reporting records indicating that there was an issue with the malware. Further examination failed to reveal the other impacts of the malware that had been observed on the four systems that had been successfully infected.

I worked a PCI case once where the malware placed on the system by the threat actor was detected and quarantined by AV within the first few hours it was on the system, and the threat actor did not return to the system for six weeks. It happened that that six weeks was over the Thanksgiving and Christmas holidays, during a time of peak purchasing. The threat actor returned after Christmas, and placed a new malware executable on the system, one that was not detected by AV, and the incident was detected a week later. In the report, I made it clear that while the threat actor had access to the system, the malware itself was not running and collecting credit card numbers during those six weeks.

Conclusion
In my previous post, I mentioned that Joe Slowik referred to indicators/artifacts as 'composite objects', which is something that, as an industry, we need to understand and embrace. We cannot view artifacts in isolation, but rather we need to consider their nature, which includes both being composite objects, as well as their place within a constellation. We need to truly embrace the significance of an IP address, a Run key value, or any other artifact what conducting and reporting on analysis.

Friday, April 07, 2023

Deriving Value From Open Reporting

There's a good bit of open reporting available online these days, including (but not limited to) the annual reports that tend to be published around this time of year. All of this open reporting amounts to a veritable treasure trove of information, either directly or indirectly, that can be leveraged by SOC and DFIR analysts, as well as detection engineers, to extend protections, as well as detection and response capabilities. 

Sometimes, open reporting will reference incident response activities, and then focus solely on malware reverse engineering. In these cases, information about what would be observed on the endpoint needs to be discerned through indirect means. However, other open reporting, particularly what's available from TheDFIRReport, is much more comprehensive and provides much clearer information regarding the impact of the incident and the threat actor's activities on the endpoint, making it much easier on SOC and DFIR analysts to pursue investigations.

Let's take a look at some of what's shared in a recent write-up of a ransomware incident that started with a "malicious" ISO file. Right away, we get the initial access vector from the title of the write-up! 

Before we jump in, though, we're not going to run through the entire article; the folks at TheDFIRReport have done a fantastic job of documenting what they saw six ways to Sunday, and there's really no need to run through everything in the article! Also, this is not a criticism, nor a critique, and should not be taken as such. Instead, what I'm going to do here is simply expand a bit on a couple of points of the article, nothing more. What I hope you take away from this is that there's a good bit of value within write-ups such as this one, value beyond just the words on paper.

The incident described in the article started with a phishing email, delivering a ZIP archive that contained an ISO file, which in turn contained an LNK file. There's a lot to unravel, just at this point. First off, the email attachment (by default) will have the MOTW attached to it, and MOTW propagation to the ISO file within the archive will depend up on the archival tool used to open it. 

Once the archive is opened, the user is presented with the ISO file, and by default, Windows systems allow the user to automatically mount the disk image file by double-clicking it. However, this behavior can be easily modified, for free, while still allowing users to access disk image files programmatically, particularly as part of legitimate business processes. In the referenced Huntress blog post, Dray/@Purp1eW0lf provided Powershell code that you can just copy out of the blog post and execute on your system(s), and users will be prevented from automatically mounting disk image files by double-clicking on them, while still allowing users to access the files programmatically, such as mounting VHD files via the Disk Manager.

Next, Microsoft issued a patch in Nov 2022 that enables MOTW propagation inside mounted disk images files; had the system in this incident been patched, the user would have been presented with a warning regarding launching the LNK file. The section of the article that addresses defense evasion states, "These packages are designed to evade controls such as Mark-of-the-Web restrictions." This is exactly right, and it works...if the archival tool used to open the zip file does not propagate MOTW to the ISO file, then there's nothing to be propagated to from the ISO file to the embedded LNK file, even if the patch is installed.

Let's take a breather here for a second...take a knee. We're still at the initial access point of an incident that resulted in the domain-wide deployment of ransomware; we're at the desk of that one user who received the phishing email, and the malicious actions haven't been launched yet...and we've identified three points at which we could have inhibited (archiver tool, patched system) or obviated (enable programmatic disk image file access only) the rest of the attack chain. I bring this up because many times we hear how much security "costs", and yet, there's a free bit of Powershell that can be copied out of a blog post, that could have been applied to all systems and literally stopped this attack cycle that, according to the timeline spanned 5 days, in its tracks. The "cost" of running Dray's free Powershell code versus the "cost" of an infrastructure being encrypted and ransomed...what do those scales look like to you?

Referencing the malicious ISO file, the article demonstrates how the user mounting the disk image file can be detected via the Windows Event Log, stating that the "activity can be tracked with Event 12 from Microsoft-Windows-VHDMP/Operational" Event Log. Later, in the "Execution" section of the article, they state that "Application crashes are recorded in the Windows Application event log under Event ID 1000 and 1001", as a result of...well...the application crashing. Not only can both of these events be extracted as analysis pivot points using Events Ripper, but the application crashes observed in this incident serve to make my point regarding validation, specifically with respect to analysts validating findings.

The article continues illustrating the impact of the attack chain on the endpoint, referencing several other Windows Event Log records, several of which (i.e., "Service Control Manager/7045" events) are also covered/addressed by Events Ripper.

Conclusion
Articles like this one, and others from TheDFIRReport, are extremely valuable to the community. Where a good bit of open reporting articles will include things like, "...hey, we had 41 Sobinokibi ransomware response engagements in the first half of the year..." but then do an in-depth RE of one sample, with NO host-based impact or artifacts mentioned, articles such as this one do a great job of laying the foundation for artifact constellations, so that analysts can validate findings, and then use that information to help develop protections, detections, and response procedures for future engagements. Sharing this kind of information means that it's much easier for detect incidents like these much earlier in the attack cycle, with the goal of obviating file encryption.

Wednesday, April 05, 2023

Unraveling Rorschach

Checkpoint recently shared a write-up on some newly-discovered ransomware dubbed, "Rorschach". The write-up was pretty interesting, and had a good bit of content to unravel, so I thought I'd share the thoughts that had developed while I read and re-read the article.

From the article, the first things that jumped out at me were:

Check Point Research (CPR) and Check Point Incident Response Team (CPIRT) encountered a previously unnamed ransomware strain...

...and...

While responding to a ransomware case...

So, I'm reading this, and at this point, I'm anticipating some content around things like initial access, as well as threat actor "actions on objectives", as they recon and prepare the environment for the ransomware deployment.

However, there isn't a great deal stated in the article about how the ransomware got on the system, nor about how the threat actor gained access to the infrastructure. The article almost immediately dives into the malware execution flow, with no mention of how the system was compromised. We've seen this before; about 3 yrs ago, one IR consulting firm posted a 25-page write-up (which is no longer available) on Sobinokibi ransomware. The write-up started off by saying that during the first half of the year, the firm had responded to 41 Sobinokibi ransomware cases, and then dove into reverse engineering and analysis of one sample, without ever mentioning how the malware got on the system. As you read through Checkpoint's write-up, one of the things they point out (spoiler alert!!) is the speed of the encryption algorithm...if this is something to be concerned about, shouldn't we look to those threat actor activities that we can use to inhibit or obviate the remaining attack chain, before the ransomware is deployed?

Let's take a look at some other interesting statements from the article...

The ransomware is partly autonomous, carrying out tasks that are usually manually performed during enterprise-wide ransomware deployment...

Looking at the description of the actions performed by the ransomware executable, this is something we very often see in RaaS offerings. In June 2020, I read a write-up of a RaaS offering that included commands using "net stop" to halt 156 Windows services, taking something of a "spray-and-pray" approach; there was not advance recon that determined that those 156 services were actually running in the environment. Checkpoint's list of services the ransomware attempts to stop is much shorter, but similarly, there doesn't seem to be any indication that the list is targeted, that it's based on prior recon of the environment. In short, spray-and-pray, take the "shotgun" approach.

However, a downside of this is that while you may be able to detect it (parent process will by cy.exe, running the "net stop" commands), by that point, it may be too late. You'd need to have software-base response in place, with rules that state, "if these conditions are met on the endpoint, kill the process on the endpoint." Sending an alert to a SOC will be too late; by the time the alert makes it to the SOC console, files on the endpoint will already be encrypted.

The ransomware was deployed using DLL side-loading of a Cortex XDR Dump Service Tool, a signed commercial security product, a loading method which is not commonly used to load ransomware.

While I can't report seeing this used with ransomware specifically, DLL side-loading via a known good application is a technique that has been used extensively. Even going back a decade or more, I remember seeing legit Kaspersky, McAfee, and Symantec apps dropped in a ProgramData subfolder along with a malicious DLL, and launched as a Windows service, or via a Scheduled Task. The question I had at the time, and one that I still have when I see this sort of tactic used is, does anyone notice the legit program? What I mean is, when I've heard an analyst say that they found PlugX launched via DLL side-loading using a legit Kaspersky app, I've asked, "...were Kaspersky products used in the environment?" Most often, this doesn't seem to be a question that's asked. In the case of the Rorschach ransomware, were Palo Alto software products common in the environment, or was this Cortex tool completely new to the environment? Could something like this be used as a preventive or detective technique? After all, if a threat actor takes a tailored approach to the legit application used, deploying something that is common in the target environment and vulnerable to DLL side-loading, this would indicate a heightened level of situational awareness, rather than just "...I'll use this because I know it works."

At one point in the article, the authors state that the ransomware "...clears the event logs of the affected machines...", and then later state, "Run wevutil.exe to clear the the following Windows event logs: Application, Security, System and Windows Powershell." Okay, so we go from "clears the event logs" (implying that all Windows Event Logs are cleared) to stating that only four specific Windows Event Logs are cleared; that makes a difference. The command to enumerate and clear all Windows Event Logs is a pretty simple one-liner, where there are clearly four instances of wevtutil.exe launched, per figure 2 in the article. And why those four Windows Event Logs? Is it because the threat actor knows that their activities will appear in those logs, or is it because the threat actor understands that most analysts focus on those four Windows Event Logs, based on their training and experience? Is the Powershell Event Log cleared because Powershell is used at some point during the initial access or recon/prep phases of the attack, or are these Windows Event Logs cleared simply because the malware author believes that they are the files most often sought by SOC and DFIR analysts?

One article based on Checkpoint's analysis states, "After compromising a machine, the malware erases four event logs (Application, Security, System and Windows Powershell) to wipe its trace", implying that the malware author was aware of the traces left by the malware, and was trying to inhibit response; clearing the Windows Event Logs does not "erase" the event records, but does require extra effort on the part of the responder.

Conclusion
Even though this ransomware/file encrypting executable was found as a result of at least one response engagement, while the analysis of the malware itself is interesting, there's really very little (if any) information in the article regarding how the threat actor gained access to the environment, nor any of the steps taken by the threat actor to recon and prepare the environment prior to deploying the ransomware. From the malware analysis we know a few things that are interesting and useful, but little in the way of what we can do to detect this threat actor early in their attack cycle, allowing defenders to prevent, or detect and respond to the attack.

Monday, April 03, 2023

On Validation

I've struggled with the concept of "validation" for some time; not the concept in general, but as it applies specifically to SOC and DFIR analysis. I've got a background that includes technical troubleshooting, so "validation" of findings, or the idea of "do you know what you know, or are you just guessing", has been part of my thought processes going back for about...wow...40 years.

Here's an example...when setting up communications during Team Spirit '91 (military exercises in South Korea), my unit had a TA-938 "hot line" with another unit. This is exactly what it sounds like...it was a directly line to that other unit, and if one end was picked up, the other end would automatically ring. Yes, a "Bat phone". Just like that. Late one evening, I was in the "SOC" (our tent with all of the communications equipment) and we got a call that the hot line wasn't working. We checked connections, checked and replaced the batteries in the phone (the TA-938 phones took 2 D cell batteries, both facing the same direction), etc. There were assumptions and accusations thrown about as to why the phone wasn't working, as my team and I worked through the troubleshooting process. We didn't work on assumption; instead, we checked, rechecked, and validated everything. In the end, we found nothing wrong with the equipment on our end; however, the following day, we did find out what the issue was - at the other end, there was only one Marine in the tent, and that person had left the tent for a smoke break during the time of the attempted calls.

We could have just said, "oh, it's the batteries...", and replaced them...and we'd have the same issue all over again. Or, we could have just stated, "...the equipment on the other end was faulty/broken...", and we would not have made a friend of the maintenance chief from that unit. There were a lot of assumptions we could have made, conclusions we could have jumped to...and we'd have been wrong. We could have made stated findings that were trusted, and resulted in decisions being made, assets and resources being allocated, etc., all for the wrong reason. The end result is that my team and I (especially me, as the officer) would have lost credibility, and the trust and confidence of our fellow team members, and our commanding officer. As it was, validating our findings led to the right decisions being made, which were again validated during the exercise after action meetings.

Okay, so jump forward 32 years to present day...how does this idea of "validation" apply to SOC and DFIR analysis? I mean, this seems like such an obvious thing, right? Of course we validate our findings...but do we, really?

Case Study #1
A while back, I attended a conference during which one of the speakers walked through a PCI investigation they'd worked on. As the speaker walked through their presentation, they talked about how they'd used a single artifact, a ShimCache entry for the malware, to demonstrate program execution. This single artifact was used as the basis of the finding that the malware had been on the system for four years.

For those readers not familiar with PCI forensic investigations, the PCI Council specifies a report format and "dashboard", where the important elements of the report are laid in a table at the top of the report. One of those elements is "window of compromise", or the time between the original infection and when the breach was identified and remediated. Many merchants track the number of credit card transactions they process on a regular basis, including not only during periods of "regular" spending habits, but also off-peak and peak/holiday seasons, and as a result, the "window of compromise" can give the merchant, the bank, and the brand an approximate number of potentially compromised credit card numbers. As you'd imagine, given any average, the number of compromised credit card numbers would be much greater over a four year span than it would for, say, a three week "window of compromise". 

As you'd expect, analysts submitting reports rarely, if ever, find out the results of their work. I was a PCI forensic analyst for about three and a half years, and neither I nor any of my teammates (that I'm aware of) heard what happened to a merchant after we submitted our reports. Even so, I cannot imagine that a report with a "window of compromise" of four years was entirely favorable.

But that begs the question - was the "window of compromise" really four years? Did the analyst validate their finding using multiple data sources? Something I've seen multiple times is that malware is written to the file system, and then "time stomped", often using time stamps retrieved from a native system file. This way, the $STANDARD_INFORMATION attribute time stamps from the $MFT record for the file appear to indicate that the file is "long lived", and has existed on the system for quite some time. This time stomping occurs before the Application Compatibility functionality of the Windows operating system creates an entry for the file, and the last modification time that's recorded for the entry is the one that's "time stomped". As a result, a breach that occurred in May 2013 and was discovered three weeks later ends up having the malware itself being reported as placed on the system in 2009. What impact this had, or might have had on a merchant, is something that we'll never know.

Misinterpreting ShimCache entries has apparently been a time-honored tradition within the DFIR community. For a brief walk-through (with reference links) of ShimCache artifacts, check out this blog post.

Case Study #2
In the spring of 2021, analysts were reporting, based solely on EDR telemetry, that within their infrastructure threat actors were using the Powershell Set-MpPreference module to "disable Windows Defender". This organization, like many others, was tracking such things as control efficacy (the effectiveness of controls) in order to make decisions regarding actions to take, and where and how to allocate resources. However, these analysts were not validating their findings; they were not checking the endpoints themselves to determine if Windows Defender had, in fact, been disabled, and if the threat actor's attempts had actually impacted the endpoints. As it turns out, that organization had a policy at the time of disabling Windows Defender on installation, as they had chosen another option for their security stack. As such, stating in tickets that threat actors were disabling Windows Defender, without validating these findings, led to quite a few questions, and impacted the credibility of the analysts

Artifacts As Composite Objects
Joe Slowik spoke at RSA in 2022, describing indicators, or technical observables, as "composite objects". This is an important concept in DFIR and SOC analysis, as well, and not just in CTI. We cannot base our findings on a single artifact, treating it as a discrete, atomic indicator, such as an IP address just being a location, or tied to a system, or a ShimCache entry denoting time of execution. We cannot view a process command line within EDR telemetry, by itself, as evidence of program execution. Rather, we need to recognize that artifacts are, in fact, composite objects; in his talk, Joe references Mandiant's definition of indicators of compromise, which can help us understand and visualize this concept. 

Composite objects are made up of multiple elements. An IP address is not just a location, as the IP address is an observable with context. Where was the IP address observed, when was it used, and how was it used? Was it the source of an RDP, or a type 3 login? If the IP address was the source of a successful login, what was the username used? Was the IP address the source of a connection seen in web server or VPN logs? Is it the C2 address? 

If we consider a ShimCache entry, we have to remember that (a) the entry itself does NOT explicitly demonstrate program execution, and that (b) the time stamp is mutable. That is, what we see could have been modified before we saw it. For example, we often see analysts hold up a ShimCache entry as evidence of program execution, often as the sole indicator. We have to understand and remember that the time stamp associated with a ShimCache entry is the last modification time for the entry, taken from the $STANDARD_INFORMATION attribute within the MFT. I've seen several instances where the file is placed on the system and then time stomped (the time stamp is easily mutable) before the entry was added to the Application Compatibility database. This is all in addition to understanding that an entry in the ShimCache does NOT mean that the file was executed. Note that the same is true for AmCache entries, as well.

We can validate indicators of compromise by including them in constellations, including them alongside other associated indicators, as doing so increases fidelity and brings valuable context to our analysis. We see this illustrated when performing searches for PCI data within acquired images; if you just search for a string of 16 characters starting with "4", you're going to get a LOT of results. If you look for strings of characters based on a bank ID number (BIN), length of the string, and if it passes the Luhn check, you're still going to get a lot of results, but not as many. If you also search for the characteristics associated with track 1 and track 2 data, your search results are going to be a smaller set, but with much higher fidelity because we've added layers of context. 

Cost
So the question becomes, what is the cost of validating something versus not validating it? What is the impact or result of either? This seems on the surface like it's a silly question, maybe even a trick question. I mean, it looks that way when I read back over the question after typing it in, but then I think back to all the times I've seen when something hasn't been validated, and I have to wonder, what prevented the analyst from validating their finding, rather than simply basing their finding on a single artifact, out of context?

Let's look at a simple example...we receive an alert that a program executed, based on SIEM data or EDR telemetry. This alert can be based on elements of the command line, process parentage, or a combination thereof. Let's say that based on a number of factors and reliable sources, we believe that the command line is associated with malicious activity.

What do you report?

Do you report that this malicious thing executed, or do you investigate further to see if the malicious thing really did execute, and executed successfully? How would be go about investigating this, what data sources would be look to? 

As you're thinking about this, as you're walking through this exercise, something I'd like you to keep in mind is that question, what would prevent you from actually examining those data sources you identify? Is there some "cost" (effort, time, other resources) that prevent you from doing so?

Saturday, March 25, 2023

Password Hash Leakage

If you've been in the security community for even a brief time, or you've taking training associated with a certification in this field, you've likely encountered the concept of password hashes. The "Reader's Digest" version of password hashes are that passwords are subject to a one-way cryptographic algorithm for storage, and that same algorithm is applied to passwords that are input, and a comparison is made for authentication. The basic idea is that the password is not stored in its original form. 

Now, we 'see' password hashes being collected by threat actors all the time; grab a copy of the AD database, or of Registry hives from an endpoint. Or, why bother with hashes, when you can use NPPSpy or enable WDigest

Or, if you wanted to maintain unauthenticated persistence, you could enable RDP and Sticky Keys.

Okay, so neither of those last two instances involves password hashes, so what if that's what you were specifically interested in? What if you wanted to get password hashes, or continue to receive password hashes, even across password resets? There are more than a few ways to go about doing this, all of which take advantage of available "functionality"; all you have to do is set up a file or document to attempt to connect to a remote, threat actor-controlled resource.

Collecting hashes is nothing new...check out this InSecure.org article from 1997. Further, hashes can be leaked via an interesting variety of routes and applications; take a look at this Securify article from 2018. Also, consider the approach presented in ACE Responder's tweet regarding modifying Remote Desktop Client .rdp files.

One means of enabling hash leaks across password resets is to modify the iconfilename field in specifically placed LNK/Windows shortcut files, which is similar to what is described in this article, except that you set the IconLocation parameter to point to a threat actor-controlled resource. There's even a free framework for creating shortcuts called "LNKBomb" available online.

Outlook has been a target for NTLM hash leakage attacks; consider this Red Team Notes article from 2018. More recently, Microsoft published this blog article explaining CVE-2023-23397, and how to investigate attempts to exploit the vulnerability. This PwnDefend article shares some thoughts as to persisting hash collection via the Registry, enabling the "long game".

So, What?
Okay, so what's the big deal? Why is this something that you even need to be concerned about?

Well, there's been a great deal of discussion regarding the cyber crime, and in particular, the ransomware economy for some time now. This is NOT a euphemism; cyber crime is an economy focused on money. In 2016, the Samas ransomware actors were conducting their own operations, cradle to grave; at the time, they targeted Java-based JBoss CMS systems as their initial access points. Over the years, an economy has developed around initial access, to the point where there are specialists, initial access brokers (IABs), who obtain and sell access to systems and infrastructures. Once initial access is achieved, they will determine what access is available, to which organization, and it would behoove them to retain access, if possible. Say they sell access, and the threat actor is "noisy", is caught, and the  implant or backdoor placed by the IAB (not the initial access point itself) is "burned". NTLM leakage is a means for ensuring later, repeated access, given that one of the response and remediation recommendations is very often a global password change. If one of the routes into the infrastructure used by the IAB require authentication, then setting up a means for receiving password hashes enables continued access.

What To Do About It
There are a number of ways to address this issue. First, block outbound communications over ports 139 and 445 (because of course you've already blocked inbound communication attempts over those ports!!), and monitor your logs for attempts to do so.

Of course, consider using some means of MFA, particularly for higher privilege access.

If your threat hunting allows for access to endpoints (rather than log sources sent to a SIEM) and file shares, searching for LNK files in specific locations and checking their iconfilename attributes is a good hunt, and something you may want to enable on a regular, repeated basis, much like a security patrol.

For SOC detections, look for means by which this activity...either enabling or using these attempts at hash leakage...might be detected.

From a DFIR perspective, my recommendation would be to develop an evidence intake process that includes automated parsing and rendering of data sources prior to presenting the information to the DFIR analyst. Think of this as a means of generating alerts, but instead of going to the SOC console, these "alerts" are enriched and decorated for the DFIR analyst. This process should include parsing of LNK files within specific locations/paths in the acquired evidence, as parsing all LNK files might not be effective, nor timely.

The "Why" Behind Tactics

Very often we'll see mention in open reporting of a threat actor's tactics, be they "new" or just what's being observed, and while we may consider how our technology stack might be used to detect these tactics, or maybe how we'd respond to an incident where we saw these tactics used, how often to do we consider why the tactic was used?

To see the "why", we have to take a peek behind the curtain of detection and response, if you will.

If you so much as dip your toe into "news" within the cyber security arena, you've likely seen mention that Emotet has returned after a brief hiatus [here, here]. New tactics observed associated with the deployment of this malware include the fact that the lure document is an old-style MS Word .doc file, which presents a warning message to the user to copy the file to a 'safe' location and reopen it. The lure document itself is in excess of 500MB in size (padded with zeros), and when the macros are executed, a DLL that is similarly zero-padded to over 500MB is downloaded.

Okay, why was this approach taken? Why pad out two files to such a size, albeit with zeros? 

Well, consider this...SOC analysts are usually front-line when responding to incident alerts, and they may have a lot of ground to cover while meeting SLAs during their shift, so they aren't going to have a lot of time to invest in investigations. Their approach to dealing with the .doc or even the DLL file will be to first download them from the endpoint...if they can. That's right...does the technology they're using have limits on file sizes for download, and if so, what does it take to change that limit? Can the change be made in a timely manner such that the analyst can simply reissue the request to download the file, or does the change take some additional action. If additional action is required, it likely won't be followed up on.

Once they have the file, what are they going to do? Parse it? Not likely. Do they have the tools available, and skills for parsing and analyzing old-style/OLE format .doc files? Maybe. But it's easier to just upload the file to an automated analysis framework...if that framework doesn't have a file size limit of it's own.

Oh, and remember, all of that space full of zeros means the threat actor can change the padding contents (flip a single "0" to a "1") and change the hash without impacting the functionality of the file. So...yeah.

So, what's happening here is that whether or not it's specifically intended, these tactics are targeting analysts, relying on their lacking in experience, and targeting response processes within the security stack. Okay, "targeting" implies intent...let's say, "impacting" instead. You have to admit that when looking at these tactics and comparing them to your security stack, in some cases, these are the effects we're seeing, this is what we see happening when we peek behind the curtain.

Consider this report from Sentinel Labs, which mentions the use of the "C:\MS_DATA\" folder by threat actors. Now, consider the approach taken by a SOC analyst who sees this for the first time; given that some SOC analysts are remote, they'll likely turn to Google to learn about this folder, and find that the folder is used by the Microsoft Troubleshooting tool (TSSv2), and at that point, perhaps deem it "safe" or "benign". After all, how many SOCs maintain a central, searchable repository of curated, documented intrusion intel? For those that do, how many analysts on those teams turn to that repository first, every time? 

How about DFIR consulting teams? How many DFIR consulting teams have an automated process for parsing acquired data, and automatically tagging and decorating it based on intrusion intel developed from previous engagements?

In this case, an automated process could parse the MFT and automatically tag the folder with a note for analysts, with tips regarding how to validate the use of TSSv2, and maybe even tag any files found within the folder.

When seeing tactics listed in open reporting, it's not just a good idea to consider, "does my security stack detect this?", but to also think about, "what happens if we do?"

Thursday, March 16, 2023

Threat Actors Changing Tactics

I've been reading a bit lately on social media about how cyber security is "hard" and it's "expensive", and about how threat actors becoming "increasingly sophisticated". 

The thing is, going back more than 20 yrs, in fact going back to 1997, when I left military active duty and transitioned to the private sector, I've seen something entirely different. 

On 7 Feb 2022, Microsoft announced their plans to change how the Windows platform (OS and applications) handled macros in Office files downloaded from the Internet; they were planning to block them, by default. Okay, so why is that? Well, it turns out that weaponized Office docs (Word documents, Excel spreadsheets, etc.) were popular methods for gaining access to systems. 

As it turns out, even after all of the discussion and activity around this one, single topic, weaponized documents are still in use today. In fact, March 2023 saw the return of Emotet, delivered via an older-style MS Word .doc file that was in excess of 500MB in size. This demonstrates that even with documented incidents and available protections, these attacks will still continue to work, because the necessary steps to help protect organizations are never taken. In addition to using macros in old-style MS Word documents, the actors behind the new Emotet campaigns are also including instructions to the recipient for...essentially...bypassing those protection mechanisms.

Following the Feb 2022 announcement from Microsoft, we saw some threat actors shift to using disk image files to deploy their malware, due in large part to the apparent dearth of security measures (at the time) to protect organizations from such attacks. For example, a BumbleBee campaign was observed using IMG files to help spread malware.

MS later updated Windows to ensure "mark-of-the-web" (MotW) propagation to files embedded within disk image files downloaded from the Internet, so that protection mechanisms were available for some file types, and that at least warnings would be generated for others.

We then saw a shift to the use of macros in MS OneNote files, as apparently these file weren't considered "MS Office files" (wait...what??).

So, in the face of this constant shifting in and evolution of tactics, what are organizations to do to address these issues and protect themselves? 

Well, the solution for the issue of weaponized Office documents existed well prior to the Microsoft announcement in Feb 2022; in fact, MS was simply implementing it where orgs weren't doing so. And the thing is, the solution was absolutely free. Yep. Free, as in "beer". A GPO, or a simple Registry modification. That's it. 

The issue with the use of disk image files is that when received and a user double-clicks them, they're automatically mounted and the contents accessible to the user. The fix for this...disabling automatically mounting the image files when the user double-clicks them...is similarly free. With two simple Registry modifications, users are prevented from automatically mounting 4 file types - ISO, IMG, VHD, and VHDX. However, this does not prevent users from programmatically accessing these files, such as via a legitimate business process; all it does is prevent the files from being automatically mounted via double-clicking. 

And did I mention that it's free?

What about OneNote files? Yeah, what about them?

My point is that we very often say, "...security is too expensive..." and "...threat actors are increasing in sophistication...", but even with changes in tactics, is either statement really true? As an incident responder, over the years, I've seen the boots-on-the-ground details of attacks, and a great many of them could have been prevented or at the very least significantly hampered had a few simple, free modifications been made to the infrastructure.

The Huntress team posted an article recently that includes Powershell code that you can copy-paste and use immediately, and will address all three of the situations/conditions discussed in this blog post.

Sunday, March 12, 2023

On Using Tools

I've written about using tools before in this blog, but there are times when something comes up that provokes a desire to revisit a topic, to repeat it, or to evolve and develop the thoughts around it. This is one of those posts. 

When I first released RegRipper in 2008, my intention was that once others saw the value in the tool, it would organically just grow on its own as practitioners found value in the tool, and sought to expand it. My thought was that once analysts started using it, they'd see the value proposition in the tool, and all see that the real power that comes from it is that it can easily be updated; "easily" by either developing new plugins, or seeking assistance in doing so.

That was the vision, but it's not something that was ever really realized. Yes, over time, some have created their own plugins, and of those, some have shared them. However, for the most part, the "use case" behind RegRipper has been "download and RUNALLTHETHINGS", and that's pretty much it.

On my side, there are a few assumptions I've made with respect to those using RegRipper, specifically around how they were using it. One assumption has been that whomever downloaded and is using the tool has a purposeful, intentional reason for doing so, that they understand their investigative goals and understand that there's value in using tools like RegRipper to extract information for analysis, to validate other findings and add context, and to use as pivot points into further analysis. 

Another assumption on my part is that if they don't find what they're looking for, don't find something that "helps", or don't understand what they do find, that they'll ask. Ask me, ask someone else. 

And finally, I assume that when they find something that either needs to be updated in a plugin, or a new plugin needs to be written to address something, that they'll do so (copy-paste is a great way to start), or reach out to seek assistance in doing so.

Now, I'm assuming here, because it's proved impossible to engage others in the "community" in a meaningful conversation regarding tool usage, but it appears to me that most people who use tools like RegRipper assume that the author is the expert, that they've done and seen everything, that they know everything, and that they've encapsulated all of that knowledge and experience in a free tool. The thing is, I haven't found that to be the case in most tools, and that is most definitely NOT the case when it comes to RegRipper.

Why would anyone need to update RegRipper? 

Lina recently tweeted about the need for host forensics, and she's 10,000% correct! SIEMs only collect those data sources that are pointed at them, and EDR tools can only collect and alert on so much. As such, there are going to be analysis gaps, gaps that need to be filled in via host forensics. And as we've seen over time, a lot changes about various endpoint platforms (not just Windows). For example, we've been aware of the ubiquitous Run keys and how they're used for persistence; however, there are keys that can be used to disable the Run key values (Note: the keys and values can be created manually...) without modifying the Run key itself. As such, if you're checking the contents of the Run key and stating that whatever is listed in the values was executed, without verifying/validating that information, then is this correct? If you're not checking to see if the values were disabled (this can be done via reg.exe), and if you're not validating execution via the Shell-Core and Application Event Logs, then is the finding correct? I saw the value in validating findings when determining the "window of compromise" during PCI forensic exams, because the finding was used to determine any regulatory fines levied against the merchant.

My point is that if you're running a tool and expecting it to do everything for you, then maybe there needs to be a re-examination of why the tool is being run in the first place. If you downloaded RegRipper 6 months ago and haven't updated it in any way since then, is it still providing you with the information you need? If you haven't added new plugins based on information you've been seeing during analysis, at what point does the tool cease to be of value? If you look closely at the RegRipper v3.0 distro available on Github, you'll notice that it hasn't been updated in over 2 1/2 yrs. I uploaded a minor update to the main engine a bit ago, but the plugins themselves exist as they were in August 2020. Since then, I've been developing an "internal" custom version of RegRipper, complete with MITRE ATT&CK and category mappings, Analysis Tips, etc. I've also started developing plugins that output in JSON format. However, all of these are things that either I proposed in 2019 and got zero feedback on, or someone close to me asked about. Not a week goes by when I don't see something online, research it, and it ends up in a plugin (or two, or five...).

If you're using a tool, any tool (RegRipper, plaso, etc.), do you understand it's strengths and weaknesses, do you understand what it does and does not do, or do you just assume that it gives you what you need?