Sunday, April 16, 2023

On Validation, pt II

My first post on this topic didn't result in a great deal of engagement, but that's okay. I wrote the first post with part II already loaded in the chamber, and I'm going to continue with this topic because, IMHO, it's immensely important. 

I've see more times than I care to count findings and reports going out the door without validation. I saw an analyst declare attribution in the customer's parking lot, as the team was going on-site, only to be proven wrong and the customer opting to continue the response with another team. Engagements such as this are costly to the consulting team through brand damage and lost revenue, as well as costly to the impacted organization, through delays and additional expenses to reach containment and remediation, all while a threat actor is active on their network.

When I sat down to write the first post, I had a couple more case studies lined up, so here they are...

Case Study #3
Analysts were investigating incidents within an organization, and as part of the response, they were collecting memory dumps from Windows endpoints. They had some information going into the investigations regarding C2 IP addresses, based on work done by other analysts as part of the escalation process, as well as from intel sources and open reporting, so they ran ASCII string searches for the IP addresses against the raw memory dumps. Not getting any hits, declared in the tickets that there was no evidence of C2 connections.

What was missing from this was the fact that IP addresses are not employed by the operating system and applications as ASCII strings. Yes, you may see an IP address in a string that starts with "HTTP://" or "HTTPS://", but by the time the operating system translates and ingests the IP address for use, it's converted to 4 bytes, and as part of a structure. Tools like Volatility provide the capability to search for certain types of structures that include IP addresses, and bulk_extractor searches for other types of structures, with the end result being a *.pcap file.

In this case, as is often the case, analyst findings are part of an overall corporate-wide process, a process that includes further, follow-on findings such as "control efficacy", identifying the effectiveness of various controls and solutions within the security tech stack to address situations (prevent, detect, respond to) incidents, and simply stating in the ticket that "no evidence of communication with the C2 IP address was found" is potentially incorrect, in addition to not addressing how this was determined. If no evidence of communications from the endpoint was found, then is there any reason to submit a block for the IP address on the firewall? Is there any reason to investigate further to determine if a prevention or detection control failed?

In the book Investigating Windows Systems, one of the case studies involves both an image and a memory dump, where evidence of connections to an IP address were found in the memory dump that were not found in application logs within the image, using the tools mentioned above. What this demonstrates is that it's entirely possible for evidence to be found using entirely different approaches, and that not employing the full breadth of what an analyst has available to them is entirely insufficient.

Case Study #4
Let's look at another simple example - as a DFIR analyst, you're examining either data collected from an endpoint, or an acquired image, and you see a Run key value that is clearly malicious; you've seen this one before in open reporting. You see the same path/file location, same file name. 

What do you report?

Do you report, "...the endpoint was infected with <malicious thing>...", or do you validate this finding? 

Do you:
- determine if the file pointed to by the value exists
- determine if the Run key value was disabled  <-- wait, what??
- review the Microsoft-Windows-Shell-Core/Operational Event Log to see if the value was processed
- review the Application Event Log, looking for crash dumps, WER or Application Popup records for the malware
- review the Security Event Log for Process Creation events (if enabled)
- review Sysmon Event Log (if available)
- review the SRUM db for indications of the malware using the network

If not, why? Is it too much of a manual process to do so? Can the playbook not be automated through the means or suite you have available, or via some other means?

But Wait, There's More...
During my time as a DFIR analyst, I've seen command lines used to created Windows services, followed by the "Service Control Manager/7045" record in the System Event Log indicating that a new service was installed. I've also seen those immediately followed by a "Service Control Manager/7009" or "Service Control Manager/7011" record, indicating that the service failed to start, rather than the "Service Control Manager/7036" record you might expect. Something else we need to look for, going beyond simply "a Windows service was installed", is to look for indications of Windows Error Reporting events related to the image executable, application popups, or application crashes.

I've seen malware placed on systems that was detected by AV, but the AV was configured to "take no action" (per AV log messages), so the malware executed successfully. We were able to observe this within the acquired image by validating the impacts on the file system, Registry, Windows Event Log, etc.

I've seen threat actors push malware to multiple systems; in one instance, the threat actor pushed their malware to six systems, but it only successfully executed on four of those systems. On the other two, the Application Event Log contained Windows Error Reporting records indicating that there was an issue with the malware. Further examination failed to reveal the other impacts of the malware that had been observed on the four systems that had been successfully infected.

I worked a PCI case once where the malware placed on the system by the threat actor was detected and quarantined by AV within the first few hours it was on the system, and the threat actor did not return to the system for six weeks. It happened that that six weeks was over the Thanksgiving and Christmas holidays, during a time of peak purchasing. The threat actor returned after Christmas, and placed a new malware executable on the system, one that was not detected by AV, and the incident was detected a week later. In the report, I made it clear that while the threat actor had access to the system, the malware itself was not running and collecting credit card numbers during those six weeks.

Conclusion
In my previous post, I mentioned that Joe Slowik referred to indicators/artifacts as 'composite objects', which is something that, as an industry, we need to understand and embrace. We cannot view artifacts in isolation, but rather we need to consider their nature, which includes both being composite objects, as well as their place within a constellation. We need to truly embrace the significance of an IP address, a Run key value, or any other artifact what conducting and reporting on analysis.

No comments: