Windows Incident Response: March 2022

Sunday, March 27, 2022

Scheduled Tasks and Batteries

Krzysztof shared another blog post recently, this one that addresses the battery use and the battery level of a system, and how it applies to an investigation.

At first thought, I'm sure a lot of you are asking, "wait...what?", but think about it for a moment. Given the pandemic, a lot of folks are working remote...a LOT. There are a number of firms that are international, with offices in a lot of different countries all over the world, and a great many of those folks are working remotely. Yes, we've always had remote workers and folks working outside of office environments, but the past 2+ years have seen something of a forced explosion in remote workers.

Those remote workers are using laptops.

And it's likely that they're not always connected to a power supply; that is, there will be times when the systems are running on batteries. As such, Krz's blog post is a significant leap forward in the validation of program execution. After all, Krz points out one particular artifact in his blog post, describing it as "one of the few artifact providing process termination." (emphasis added)

So, why does this matter? Well, a couple of years ago (okay, more than "a couple") I was working a PCI forensic examination for an organization ("merchant") that had been hit with credit card theft. In examining the back office server (where all of the credit card purchases were processed), we found that there was indeed credit card theft malware on the system. We found the original installation date, which was a key component of the examination; this is because one of the dashboard items we had to complete on the report (Visa, then the place holder for the as-yet-unformed PCI Council, had very structured requirements for reports) was the "window of compromise"...how long was it from the original infection until the theft of data was halted. So, again, we saw the original installation date of the malware in late November of that year, but two days later, we could see that an "on demand" AV scan detected and quarantined the malware. Then, a bit more than 6 weeks later, the malware was again placed on the system, and this time we tracked repeated AV scans that did not detect this new malware.

We made absolutely sure to clearly note this in the "window of compromise". Why? Because most merchants have a pretty good idea of the numbers of credit cards processed throughout the year, particularly during different seasons (spring break, other holidays, etc.). We were able to clearly demonstrate that during the Thanksgiving and Christmas holiday seasons, the malware was, in fact, not installed and running on the system. This means that during "Black Friday", as well as the run-up to Christmas, the malware was not stealing credit card numbers from this merchant. We needed to make absolutely sure that this was understood, so that when any action was taken or fines were levied against the merchant, this fact was taken into account.

This is just one example of why we need to validate program execution. Krz's blog post clearly demonstrates yet another. Over the past 2+ years, there's been an explosion of remote work, one that has persisted. Employee systems clearly serve as an entry point into organizations, and as Krz pointed out, many threat actors opt to use Scheduled Tasks as their persistence mechanism.

As Krz pointed out, the Task Scheduler UI has options for setting these conditions; however, the LOLBin schtasks.exe does not provide command line options for enabling or disabling the condition. So, by default, when schtasks.exe is used to create a scheduled task, these conditions are going to be set as Krz shared, by default.

As Krz pointed out, there are some ways to change this; for example, if the threat actor has access via the UI (logged in via RDP, or any other remote desktop capability) they can open the Task Scheduler and make the change. Another way is to use the schtasks.exe command line option to create the scheduled task from an XML file. Or, perhaps you can use Powershell, if it gives you greater control of the conditions of the scheduled task.

From a threat hunting perspective, look for either the use of schtasks.exe with the "/xml" command line option, or for alternate means of creating scheduled tasks that allow for modification of the conditions. For example, Powershell's Set-ScheduledJobOption module includes options such as "-RequireNetwork" and "-StartIfOnBattery".

From a DFIR perspective, analysts can either scan all scheduled task XML files for those set to not stop running when the system goes to batteries, or simply open the XML file for the suspicious task and view it manually. Anytime analysts see the Task Scheduler UI being opened (observed during DFIR analysis), they might want to consider taking a closer look at any changes to tasks that may have occurred.

Something else to consider during DFIR analysis, particularly when it comes to malware persisting as a schedule task, is the idle state of the system. I'll just leave that right there, because it applies directly to what Krz described in his blog post, and impacts the validation of program execution in a very similar manner.

Thursday, March 24, 2022

Windows Event Log Evasion Review

Before I kick this blog post off, I'd like to thank Lina L for her excellent work in developing and sharing her work, both on Twitter, as well as in a blog post. Both are thoughtful, cogent, and articulate.

In her blog post, Lina references detection techniques, something that is extremely important for all analysts to understand. What Lina is alluding to is the need for analysts to truly understand their tools, and how they work.

Back around 2007-ish, the team I was on had several members (myself included) certified to work PCI forensic investigations. Our primary tool at the time for scanning acquired images and data for credit card numbers (CCNs) was EnCase (at the time, a Guidance Software product)...I believe version 6.19 or thereabouts. We had a case where JCB and Discover cards were included, but the tool was not finding the CCNs. Troubleshooting and extensive testing revealed that the built-in function, isValidCreditCard(), did not apply to those CCNs. As such, we worked with a trusted resource to write the necessary regexes, and override the function call. While this was slower than using the built-in function, accuracy took precedence over speed.

The point is, as analysts, we need to understand how our tools work, what they do, and what they can and cannot do. This also applies to the data sources we rely on, as well. As such, what I'm going to do in this blog post is expand on some of the items Lina shared in part 3 of her blog post, "Detection Methodology". She did a fantastic job of providing what amounts to elements of an artifact constellation when it comes to the evasion technique that she describes.

Let's take a look at some of the detection methodologies Lina describes:

1. Review event logs for 7045, 7035, 7034, 7036, 7040, 4697 service creation

Most of these event IDs appear in the System Event Log, with a source of "Service Control Manager" (event ID 4697 is found in the Security Event Log) and they all can provide indications of and useful information about a service.

2. Review registry keys for services:

Given the content of Lina's post, this is an excellent source of data. In the face of anti-forensic techniques, also consider extracting deleted keys and values from unallocated space within the hive file, as well. Also, look for unusual LastWrite times for the parent keys, particularly if you're performing a live "triage" response and collecting Registry hives from live systems.

3. Review command line logging for signs of services being created using “sc.exe”, “at.exe”

I'm not sure about how "at.exe" would be used to create a service (maybe we can get Lina to share a bit more about that particular item...), but definitely look for the use of "sc.exe" if you have some sort of command line logging. This can take the form of enabling Process Tracking (along with the associated Registry modification to add the full command line) in the Security Event Log, installing Sysmon, or employing an EDR capability.

4. Review artefacts for execution i.e. Shimcache, Prefetch, Amcache

Prefetch files are a great artifact if you're looking for artifacts of execution, but there are a couple of things analysts need to keep in mind.

First, Prefetch files can contain up to 8 time stamps that refer to when the target file was executed; as such, be sure to extract them all.

Second, application prefetching is controlled via a Registry value, and while that value is enabled by default on workstation versions of Windows, it is not enabled by default on the server versions. Accordingly, this means that it can be disabled on workstations; as such, if you're NOT seeing a Prefetch file when you would expect to, check the value and the LastWrite time of the key. I highly recommend this as a means of validation, because without it, saying, "...the file was not executed..." is just guessing. Think about it...why delete a file or artifact, when you can simply tell Windows to not generate it?

Third, Dr Ali Hadi has done some great work documenting the effect of launching files from ADSs on Prefetch file creation and location. I'm not sure how many folks are aware of this, but it's something to keep in mind, particularly if a threat actor or group has previously demonstrated a proclivity for such things.

Finally, it's easy to say that ShimCache and AmCache entries constitute "artifacts of execution", but that's not completely accurate. While both data sources do constitute indicators of execution, they're just that...indicators...and should not be considered in isolation. While they may be incorporated into artifact constellations that demonstrate program execution, by themselves they do not definitively demonstrate execution.

So, if we create an artifact constellation, incorporating ShimCache, AmCache, Prefetch, command line logging, etc., and all of the constituent elements include time stamps that align, then yes, we do have "artifacts of execution".

7. Detect the malicious driver on the disk (can also be done without memory forensics by correlating creation timestamps on disk)

This is a good detection technique, but we need to keep things like file system tunneling and time stomping in mind. I've seen cases where the adversary time stomped their malware, and when it was 'recorded' in the ShimCache data, the analyst (mis)interpreted the time stamp as the time of execution. This meant that during a PCI forensic investigation, the "window of compromise" reported to the PCI Council was 4 yrs, rather than the true value of 3 wks. For anyone who's been involved in such an investigation, you very likely fully understand the significance of this "finding" being reported to the Council.

Some Additional Thoughts, Re: Windows Event Log Evasion
Locard's Exchange Principle tells us that when two objects come into contact with each other, material is exchanged between them. We can apply this equally well to the digital realm, and what this ultimately boils down to is that for something bad to happen on a system, something has to happen.

As a result, when it comes to Windows Event Log "evasion", we would want to look for other changes to the system. For example, if I want to hide specific event records within the Security Event Log, there are three approaches I might take. One is to use the LOLBin auditpol.exe to enable ALLTHETHINGS!, and overwhelm the Security Event Log, and subsequently, the analyst. I've been one engagements before where so much was being audited in the Security Event Log that while we had data going back weeks or months in the other logs, the Security Event Logs covered maybe 4 hours. I've also seen much less, about half of that. The overall point here is that the Windows Event Logs are essentially circular buffers, where older event records aren't "pushed off the stack" and into unallocated space...they're overwritten. As such, "older" records aren't something you're going to be able to recover from unallocated space, as you would in the case of cleared Windows Event Logs. As a result, things like type 10 login events (event ID 4624 events) can get overwritten quickly, and will not be recoverable.

Clearing Windows Event Logs is easy (via wevtutil.exe), but it is also easy to detect, based on artifact constellations such as a Prefetch file for the LOLBin, event IDs 1104 and 102, etc. And, as we saw with a great deal of the work done following the NotPetya attacks in 2017, cleared event records are pretty trivial to recover from unallocated space. I say, "trivial", but I completely understand that for some, something like this would not be trivial at all. While we do have a process available to help us recover the data we need, it isn't easy for some. However, clearing Windows Event Logs is tantamount to a streaker running across the field at the Super Bowl or the World Cup...it's a way of letting everyone know that something bad is going on.

The other option might be to simply disable logging all together. You can do this via the Registry, without disabling the Event Log service, and the great thing is that if there's nothing deleted, there's nothing to recover. ;-)

The third option is a bit more involved, and as such, potentially prone to discovery, but it is pretty interesting. Let's say you're accessing the system via RDP, and you have admin-level access. You can use different techniques to escalate your privileges to System level, but you have to also be aware that when you do, your activities will populate artifacts in a different manner (but this can be a good thing!!). So, start by using wevtutil.exe to backup all but the last 4 min of the Security Event Log, using a command such as:

wevtutil epl Security Security_export.evtx /q:"*[System[TimeCreated[timediff(@SystemTime) >= 14400000]]]" /ow:True

Now, this is where the elevate privileges come in...you need stop the EventLog service (via 'net stop'), but you first have to stop its dependent services. Once everything is stopped, type:

copy /Y Security_export.evtx C:\Windows\system32\winevt\Logs\Security.evtx

Now, restart all the services, in reverse order that you stopped them, and delete the Security_export.evtx file. You've now got a Security Event Log that is missing the last 4 min worth of event records, and the "interesting" thing is that when I tested this, and then dumped the Security Event Log, there was no gap in record sequence numbers. The log apparently picks right up with the next sequence number.

But again, keep in mind that this activity, just like the other options presented, is going to leave traces. For example, if you use PSExec to elevate your privileges to System, you're going to need to accept the EULA the first time you run it. You can use something else, sure...but there's going to be a Prefetch file created, unless you disable prefetching. If you disable Prefetching, this causes a Registry modification, modifying the key LastWrite time. And if you disable Prefetching, what about the "command line logging" Lina mentioned?

Ultimately, the choice of option you decided upon, even if it's one that's not mentioned in either Lina's blog post or this one, is going to depend upon you weighing your options regarding the artifacts and traces you leave.

Sunday, March 06, 2022

The (Mis)Use of Artifact Categories, pt II

My previous post on this topic presented my thoughts on how the concept of "artifact categories" were being misused.

My engagement with artifact categories goes back to 2013, when Corey Harrell implemented his thoughts on categories via auto_rip. I saw, and continue to see, the value in identifying artifact categories, but as I alluded to in my previous post, it really seems that the categories are being misused. Where the artifacts should be viewed as providing an indication of the categories and requiring further analysis (including, but not limited to the population of artifact constellations), instead, the artifacts are often misinterpreted as being emphatic statements of the event or condition occurring. For example, while an entry in the ShimCache or AmCache.hve file should indicate that the file existed on the system at one point and may have been executed, too often either one is simply interpreted as "program execution", and the analyst moves on with no other validation. Without validation, these "findings" lead to incorrect statements or understanding of the incident itself.

Program Execution
There was discussion of the "program execution" category in my previous post, along with discussion of the need to validate that the program did, indeed, execute. Often we'll see some indication of a program or process being launched (via EDR telemetry, an entry in the Windows Event Log, etc.) and assume that it completed successfully.

Keeping that in mind, there are some less-than-obvious artifacts we can look to regarding indications of "program execution"; for example, consider the Microsoft-Windows-Shell-Core\Operational.evtx Event Log file.

Some notable event IDs of interest (all with event source "Shell-Core"):

Event ID 9705/9706 - start/finish processing of Run/RunOnce keys

Event ID 9707/9708 - indicates start and stop of process execution, with corresponding PID.

Event ID 62408/62409 - start/finish processing of <process>

Some additional, useful information - from Geoff Chappell's site, we can see that the event ID 9707 records pertain to the "ShellTraceId_Explorer_ExecutingFromRunKey_Info" symbol.

While the events are restricted to what's processed via the Run and RunOnce keys, we may see the records provide us with full process command lines. For example, and entry for an audio processor included "WavesSvc64.exe -Jack", which appears to be the full command line for the process. This is a bit better information than we tend to see when Process Tracking is enabled in the Security Event Log, leading to event ID 4688 (Process Creation) records being generated; many organizations will enable this setting, but then not also enable the corresponding Registry modification that allows the full command line to be added to the record.

"Program Execution" is not the only category we should look to go beyond the basic indicators that different resources provide analysts; for example, devices connected to a Windows system, particularly via USB connections, are another category that would benefit from some clarification and exploration.

USB Devices
Cory Altheide and I published a paper on tracking USB devices across Windows systems in 2005; at the time, Windows XP was the dominant platform. Obviously, that field has changed pretty dramatically in the ensuing 17 years. In 2015, Jason Shaver's NPS master's thesis on the same topic was published, giving us an updated view of the topic. As you'd expect, Jason's thesis is a pretty thorough treatment of the topic, but for anyone who's been involved in DFIR analysis of Windows systems is likely aware that there have been a number of build updates to Windows 10 that, over time, have changed how the operating system responds to or records various activities.

In addition, Nicole Ibrahim has done considerable work regarding different types of USB devices and their impact on Windows systems, based on the protocols used. The take-away from Nicole's research is that not all devices are treated the same, just because they're connected via a USB cable. Different devices (i.e., USB thumb drives, smartphones, etc.), while presented via a common interface, often use different underlying protocols, and therefore require different examination routes.

A note on connecting USB devices to Windows systems...Kevin Ripa recently published an article on that topic where he points out that how you connect the device can make a difference. The key here is to be very aware of the process Kevin went through, as well as his assumptions...both of which he was very clear about. For example, the first device he connected wasn't a USB thumb drive; rather, it was a hard drive connected via a USB cable. Then, he connected the same hard drive via a T-300D Super Speed Toaster, and stated, Now I don’t know about you, but I would have expected a device such as this to simply be a pass-through device. I'd have to say, given that the device is described as a stand-alone disk duplicator, I would not have assumed that it would "simply be a pass-through device" at all. As such, I don't agree that the tools are telling lies, but I do agree that if you're connecting a device to a Windows 10 system, you need to be clear about the device, as well as how it's connected.

Back to Nicole's research, the take-away is that not all "USB" devices are the same. For example, a USB thumb drive is not the same as a smartphone, even though both are connected via a USB cable. For USB thumb drives and external hard drives, I've found a good bit of evidence in the Microsoft-Windows-StorageSpaces-Driver\Operational.evtx Event Log, specifically in event ID 207, which contains drive model and serial number information.

For my iPhone, the Microsoft-Windows-WPD-MTPClassDriver\Operational.evtx Event Log contains event ID 1005 records contain messages such as "Customizing for <device>", which points to the smartphone. I've connected my iPhone to my computer upon occasion, to copy pictures over or to use iTunes to update my music playlists on the device. These event ID 1005 records correspond to when I've connected that phone to the system, and will likely show other iPhones I've connected, as well - I don't do this too often, so the log should not be too terribly populated.

For other useful Windows Event Logs that can provide additional information that may be of use to your investigation, consider checking out Nasreddinne or Adam's pages, both of which allude to "obscure" or less-commonly-referred-to event logs.

Take-Away
What we have to remember from all this is that the artifact categories provide an indication of that category, a sign post that tells the analyst that further examination is necessary. As such, the indicator is the beginning of the examination, not the end. Because a file name appeared in the ShimCache or AmCache, we should not simply jump ahead and assume that the file was executed; rather, we should look for other indications of program execution (Prefetch file, impact of the file execution on the system, etc.), other artifacts in the constellation, before establishing the finding that the program was, in fact, executed.

DFIR Reporting

A request that's been pretty consistent within the industry over time has had to do with reporting. I'd see a request, some responses, someone might ask for a template, and then the exchange would die off...I assumed that it had moved to DMs or offline. Then you'd see the discussion pop up again later, in some other forum.

I get it...writing is hard. I have the benefit of having had to write throughout my career, but also of putting intentional, dedicated effort into DFIR reporting, in that I had been very purposeful in seeking feedback from my boss, and incorporating that feedback into report writing. I was able to get to the point of having reports approved with minimal (if any) changes pretty quickly.

As a result, in 2014, Windows Forensic Analysis Toolkit 4/e was published, and in this edition, I included a chapter on reporting. It was (and still is) a general overview addressing a lot of things that folks miss when it comes to technical reporting, going from the importance of spelling and grammar to the nature of an "Executive Summary" and beyond.

About a year ago, Josh Brunty wrote a blog post on Writing DFIR Reports; 7 yrs later, and his blog post included some of the same thoughts and recommendations I'd shared in my book. It was validating to see that others had had (and shared) similar thoughts regarding reporting. Different words, different experiences, different person, but similar thoughts and direction. Cool.

So why does any of this matter? Not to put too fine a point on it, but it doesn't matter how good or thorough you are, it doesn't matter if you're technically light years beyond the bad guys. If you can't communicate your findings to those to whom it matters, in an actionable manner...who cares? What does any of it matter?

Coworkers and others in the community have long chided me for my insistence on correct spelling and use of terminology (i.e., specificity of language), in some instances saying, "yeah, well you know what I meant...". Okay, but a report or Jira ticket regarding an incident is not intended to be a placeholder for "what you meant", because 6 months or a year from now, you may not remember what you meant. Or, as is more often the case, someone other than you who's looking at the report will get the wrong information, because they're bringing a whole new set of understanding to "what you meant". As to spelling, let's say you absent-mindedly drop the last digit from the last octet of an IP address...rather than ".245", it's ".24". If you were to search for that IP address across other data sources, you'd still find it, even with the final digit missing. But what happens when you drop that IP address into a block list? Or, what happens if (and I ask because this has happened) the digit is mistakenly dropped from the second octet? All of it fails...searches, blocks, etc. At the very least misspelling words simply makes the report or ticket appear unprofessional, particularly given the number of tools out there with built-in spell check. However, misspelling can also lead to material impacts to the information itself, altering the course of the investigation. Misspelling a system name or IP address can be the difference between an AD server and a user's workstation. Misspelling an IP address or domain means that it's not actually blocked, and traffic continues to flow to and from that remote system, unabated. The issue is that when the report or ticket is the means of communication, intention and "you know what I meant" is impossible to convey.

When I got started in the private sector information security business back around 1997-1998, I was working with a team that was performing vulnerability assessments. We had a developer on our team who'd developed an Access database of report excerpts based on various findings; we could go into the interface and select the necessary sections (based on our findings) and auto-populate the initial report. We had a couple of instances where someone either wrote up a really good finding, or re-wrote a finding that we already had a truly exceptional manner, and those entries were updated in the database so that someone could take advantage of them. Our team was small, and everyone was on-board with the understanding that generating the initial report was not the end of our reporting effort.

However, we had one forensic analyst as part of our "team"...they didn't go on vulnerability assessments with us, they did their "own thing". But when their work was done, it usually took two analysts from the assessment team to unravel the forensic findings and produce a readable report for the customer. This meant that resources were pulled from paying work in order to ensure that the contract for the forensic work made some money and didn't go into the red.

In a world where remote work, and work-from-home (WFH) is becoming much more the norm, being able to clearly, concisely, and accurately communicate is even more important. As such, if you're writing something that someone else is going to pick up later that day or week, and use as the next step for their work, then you want to make sure that you communicate what needs to be communicated in a manner that they can understand and use.

At the very least, you want to communicate in such a manner that you can come back a year later and understand what you'd meant. I say this, and like with most things in "cyber", we all look at this as a hypothetical, something that may happen, but it's not likely. Until it does. I worked with one team where we said that regularly - someone would continually ask, "what's the standard?" for writing, and we'd say, "...so that you can come back to it a year later and understand what was going on, and correctly explain what happened." This was what our boss, a former cop, had said. And no one listened, until an analyst we'll call "Dave" had to review a report that had been written 12 months prior. He couldn't make heads or tails of the report...what exactly had been examined, what was the original analyst looking for, and what did they find? All of this was a mystery, not just to "Dave" but to other analysts who reviewed the report. The reports were all written from the company, to the customer, but we tracked down the analyst through contracting...it was Dave. The customer was raising a legal complaint against our company, and we had no way of figuring out if they had a legitimate complaint or not because no one, not even Dave, could figure out what had been delivered.