Sunday, October 10, 2021

Data Exfiltration, Revisited

I've posted on the topic of data exfiltration before (here, etc.) but often it's a good idea to revisit the topic. After all, it was almost two years ago that we saw the first instance of ransomware threat actors stating publicly that they'd exfiltrated data from systems, using this a secondary means of extortion. Since then, we've continued to see this tactic used, along with other tertiary means of extortion based on data exfiltration. We've also seen several instances where the threat actor ransom notes have stated that data was exfiltrated but the public "shaming" sites were noticeably empty.

As long as I've been involved in what was first referred to as "information security" (later referred to as "cyber security"), data exfiltration has been a concern to one degree or another, even in the absence of clearly-stated and documented analysis goals. With the advent of PCI forensic investigations (circa 2007-ish), "data exfiltration" became a formalized and documented analysis goal for every investigation, whether the merchant asked for it or not. After all, what value was the collected data if the credit card numbers were extracted from memory and left sitting on the server? Data exfiltration was/is a key component necessary for the crime, and as such, it was assumed often without being clearly identified.

One of the challenges of determining data exfiltration is visibility; systems and networks may simply not be instrumented in a manner that allows us to determine if data exfiltration occurred. By default, Windows systems do not have a great deal of data sources and artifacts that demonstrate data exfiltration in either a definitive or secondary manner. While some do exist, they very often are not clearly understood and investigated by those who then state, "...there was no evidence of data exfiltration observed..." in their findings.

Many years ago, I responded to an incident where an employee's home system had been compromised and a keystroke logger installed. The threat actor observed through the logs that the employee had remote access to their work infrastructure, and proceeded to use the same credentials to log into the corporate infrastructure. These were all Windows XP and 2003 systems, so artifacts (logs and other data sources) were limited in comparison to more modern versions of Windows, but we had enough indicators to determine that the threat actor had no idea where they were. The actor conducted searches that (when spelled correctly) were unlikely to prove fruitful...the corporate infrastructure was for a health care provider, and the actor was searching for terms such as "banking" and "password". All access was conducted through RDP, and as such, there were a good number of artifacts populated when the actor accessed files.

At that point, data exfiltration could have occurred through a number of means. The actor could have opened a file, and taken a picture or screen capture of their own desktop...they could have "exfiltrated" the data without actually "moving" it.

Jump forward a few years, and I was working on an APT investigation when EDR telemetry demonstrated that the threat actor had archived files...the telemetry included the password used in the command line. Further investigation led us to a system with a publicly-accessible IIS web server, albeit without any actual formal web sites being served. Web server logs illustrated that the threat actor downloaded zipped archives from that system successfully, and file system metadata indicated that the archive files were deleted once they'd been downloaded. We carved unallocated space and recovered a dozen accessible archives, which we were able to open using the password observed in EDR telemetry. 

In another instance, we observed that the threat actor had acquired credentials and was able to access OWA, both internally and externally. What we saw the threat actor do was access OWA from inside the infrastructure, create a draft email, attach the data to be exfiltrated to the email, and then access the email from outside of the infrastructure. At that point, they'd open the draft email, download the attachment, and delete the draft email. 

When I first began writing books, my publisher had an interesting method for transferring manuscript files. They sent me instructions for accessing their FTP site via Windows Explorer (as opposed to the command line), which left remnants on the system well beyond the lifetime of the book itself.

My point is that there are a number of ways to exfiltrate data from systems, and detecting data exfiltration can be extremely limited without necessary visibility. However, there are data sources on Windows systems that can provide definitive indications of data exfiltration (i.e., BITS upload jobs, web server logs, email, network connections/pcaps in memory dumps and hibernation files, etc.), as well as potential indications of data exfiltration (i.e., shellbags, SRUM, etc.). These data sources are relatively easy (almost trivial) to check, and in doing so, you'll have a comprehensive approach to addressing the issue.

Friday, October 08, 2021

Tips for DFIR Analysts, pt III

Learn to think critically. Don't take what someone says as gospel, just because they say it. Support findings with data, and clearly communicate the value or significance of something.

Be sure to validate your findings, and never rest your findings on a single artifact. Find an entry for a file in the AmCache? Great. But does that mean it was executed on the system? No, it does not...you need to validate execution with other artifacts in the constellation (EDR telemetry, host-based effects such as an application prefetch file, Registry modifications, etc.).

Have a thorough process, one that you can add to and extend. Why? Because things are always changing, and there's always something new. If you can automate your process, then so much the better...you're not loosing time and enabling crushing inefficiencies. So what do you need to look for? Well, the Windows Subsystem for Linux has been around for some time, and has even been updated (to WSL2). There are a number of versions of Linux you can install via WSL2, including Parrot OS. As one would expect, there's now malware targeting WSL2 (Lumen Black Lotus LabsTomsHardware, The Register).

Learn to communicate clearly and concisely. This includes both the written and spoken form. Consider using the written form to make the spoken form easier to communicate, by first writing out what you want to communicate.

Things are not always what they seem. Just because someone says something is a certain way doesn't make it the case. It's not that they're lying; more often than not, it's that they have a different perspective. Look at it this way...a user will have an issue, and you'll ask them to walk through what they did, to see if you can replicate the issue. You'll see data that indicates that they took a specific action, but they'll say, "I didn't do anything." What they mean is that they didn't do anything unusual or different from what they do on a daily basis.

There can often be different ways to achieve the same goal, different routes to the same ending. For example, Picus Security shared a number of different ways to delete shadow copies, which included resizing the VSC storage to be less than what was needed. From very cursory research, if a VSC does not fit into the available space, it gets deleted. This means that VSCs created will likely be deleted, breaking brittle detections looking for vssadmin.exe being used to directly delete the VSCs. Interestingly enough, I found this as a result of this tweet asking about a specific size (i.e., 401Mb).

Another example of different approaches to the same goal is using sc.exe vs reg.exe to control Windows services, etc. Different approaches may be predicated upon the threat actor's skill set, or it may be based on what the threat actor knows about the environment (i.e., situational awareness). Perhaps the route taken was due to the threat actor knowing the blind spots of the security monitoring tools, or of the analysts responding to any identified incidents.

There are also different ways to compromise a system...via email, the browser, any accessible services, even WSL2 (see above).

An issue within or challenge of DFIR has long been signal-to-noise ratio (SNR). In the early days of DFIR, circa Win2000/WinXP systems, the issue had to do with limited data...limited logging, limited tracking of user activity, etc. As a result, there are limited artifacts to tie to any particular activity, making validation (and to an extent, attribution) difficult, at best.

As DFIR has moved to the enterprise and analysts began engaging with EDR telemetry, we've also seen surges in host-based artifacts, not only between versions of Windows (XP, to Win7, to Win10) but also across builds within Windows 10...and now Windows 11 is coming out. With more people coming into DFIR, there's been a corresponding surge in the need to understand the nature of these artifacts, particularly within the context of other artifacts. This has all led to a perfect storm; increases in available data (more applications, more Windows Event Logs, more Registry entries, more data sources) and at the same time, a compounding need to correctly and accurately understand and interpret those artifacts. 

This situation can easily be addressed, but it requires a cultural change. What I mean is that a great deal of the parsing, enrichment and decoration of available data sources can be automated, but without DFIR analysts baking what they've discovered...new constellations or elements of constellations...back into the process, this entire effort becomes pointless beyond the initial creation. What allows automation such as this continue to add value over time is that it is developed, from the beginning to be expanded; for new data sources to be added, but also findings to be added.

Hunting isn't just for threat hunters. We most often think of "threat hunting" as using EDR telemetry to look for "badness", but DFIR analysts can do the same thing using an automated approach. Whether full images are acquired or triage data is collected across an enterprise, the data sources can be brought to a central location, parsed, enriched, decorated, and then presented to the analyst with known "badness" tagged for viewing and pivoting. From there, the analyst can delve into the analysis much sooner, with greater context, and develop new findings that are then baked back into the automated process.

Addendum, 9 Oct: Early in the above blog post, I stated, "Find an entry for a file in the AmCache? Great. But does that mean it was executed on the system? No, it does not...you need to validate execution with other artifacts in the constellation...". After I posted a link to this post on LinkedIn, a reader responded with, "...only it does."

However, this tweet states, "Amcache entries are created for executables that were never executed. Executables that were launched and then deleted aren't recorded. Also, Amcache entries aren't created for executables in non-standard locations (e.g., "C:\1\2\") _unless_ they were actually executed." 

Also, this paper states, on the second half of pg 24 of 66, "The appearance of a binary in the File key in AmCache.hve is not sufficient to prove binary execution but does prove the presence of the file on the system." Shortly after that, it does go on to say, "However, when a binary is referenced under the Orphan key, it means that it was actually executed." As such, when an analyst states that they found an entry "in" the AmCache.hve file, it is important to state clearly where it was found...specificity of language is critical. 

Finally, in recent instances I've engaged with analysts who've stated that an entry "in" the AmCache.hve file indicated program execution, and yet, no other artifacts in the constellation (Prefetch file, EDR telemetry, etc.) were found. 

EDR Bypasses

During my time in the industry, I've been blessed to have opportunities to engage with a number of different EDR tools/frameworks at different levels. Mike Tanji offered me a look at Carbon Black before carbonblack.com existed, while it still used an on-prem database. I spent a very good deal of time working directly with Secureworks Red Cloak, and I've seen CrowdStrike Falcon and Digital Guardian's framework up close. I've seen the birth and growth of Sysmon, as well as MS's "internal" Process Tracking (which requires an additional Registry modification to record full command lines). I've also seen Nuix Adaptive Security up close (including seeing it used specifically for threat hunting), which rounds out my exposure. So, I haven't seen all tools by any stretch of the imagination, but more than one or two.

Vadim Khrykov shared a fascinating tweet thread regarding "EDR bypasses". In the thread, Vadim lists three types of bypasses:

1. Technical capabilities bypass - focusing on telemetry EDR doesn't collect
2. EDR configuration bypass - EDR config being "aggressive" and impacting system performance 
3. EDR detection logic bypass - EDR collects the telemetry but there is no specific detection to alert on the technique used

Vadim's thread got me to thinking about bypasses I've seen or experienced over the years....

1. Go GUI

Most EDR tools are really good about collecting information about new processes that are created, which makes them very valuable when the threat actor has only command line access to the system, or opts to use the command line. However, a significant blind spot for EDR tools is when GUI tools are used, because in order to access the needed functionality, the threat actor makes selections and pushes buttons, which are not registered by the EDR tools. This is a blind spot, in particular, for EDR tools that cannot 'see' API calls.

As such, this does not just apply to GUI tools; EXE and DLL files can either run external commands (which are picked up by EDR tools), or access the same functionality via API calls (which are not picked up by EDR tools).

This has the overall effect of targeting analysts who may not be looking to artifact constellations. That is to say that analysts should be validating tool impacts; if an action occurred, what are the impacts of that action on the eco-system (i.e., Registry modifications, Windows Event Log records, some combination thereof, etc.)? This way, we can see the effects of an action even in the absence of telemetry specifically of that action. For example, did a button push lead to a network connection, or modify firewall settings, or establish persistence via WMI? We may not know that the button was pushed, but we would still see the artifact constellations (even partial ones) of the impact of that button push.

Take Defender Control v1.6, for example. This is a simple GUI tool with a couple of buttons that allows the user to disable or enable Windows Defender. EDR telemetry will show you when the process is created, but without looking further for Registry modifications and Windows Event Log records, you won't know if the user opened it and then closed it, or actually used it to disable Windows Defender.

2. Situational awareness 

While I was with CrowdStrike, I did a lot of public presentations discussing the value of threat hunting. During these presentations I included a great deal of telemetry taken directly from the Falcon platform, in part demonstrating the situational awareness (or lack thereof) of the threat actor. We'd see some who didn't seem to be aware or care that we were watching, and we'd see some who were concerned that someone was watching (checking for the existence of Cb on the endpoint). We'd also see other threat actors who not only sought out which specific EDR platform was in use, but also began reaching out remotely to determine other systems on which that platform was not installed, and then moving laterally to those systems.

I know what you're thinking...what's the point of EDR if you don't have 100% coverage? And you're right to think that, but over the years, for a variety of reasons, more than a few organizations impacted by cyber attacks have had limited coverage via EDR monitoring. This may have to do with Vadim's reason #2, or it may have to do with basic reticence to install EDR capability (concerns about the possibility of Vadim's reason #2...).

3. Vadim's #2++

To extend Vadim's #2 a bit, some other things I've seen over the years is customers deploying EDR frameworks, albeit only on a very limited subset of systems. 

I've also seen where deploying EDR within an impact organization's environment has been inhibited by...well...the environment, or the staff. I've seen the AD admin refuse to allow EDR to be installed on _his_ systems because we (my team) might remove it at the end of the engagement and leave a backdoor. I've seen users in countries with very strict privacy laws refuse to allow EDR to be installed on _their_ systems. 

I've seen EDR installed and run in "learning" mode during an engagement, so that the EDR "learned" that the threat actor's actions were "good".

One of the biggest variants of this "bypass" is an EDR that is sending alerts to the console, but no one is watching. As odd as it may sound, this happens considerably more often than you'd think.

EDR is like any other tool...it's value depends heavily upon how it's used or employed. When you look closely at such tools, you begin to see their "blind spots", not just in the sense of things that are not monitored, but also how DFIR work significantly enhances the visibility into a compromise or incident.