Tuesday, May 17, 2022

USB Devices Redux

Back in 2005, Cory Altheide and I published the first paper on tracking USB storage devices across Windows systems; at the time, the focus was Windows XP. A lot has happened since then...I know, that's an understatement...as the Windows platform has developed and expanded, initially with Vista, then Windows 7, and even with Windows 10 there have been developments that have come (and gone) just between the various Win10 builds.

With respect to USB devices in particular, not long ago, we (the community) became aware that the Microsoft-Windows-DriverFrameworks-UserMode/Operational Event Log contained quite a bit of information (see this post for event IDs to track) that a digital forensic analyst could use to determine if and when USB devices had been connected to (and disconnected from) the system. This was a pretty profound finding, and very valuable...and then, for some unknown reason, that Windows Event Log was disabled by default. 

Also, in researching information for this topic, I found that the EMDMgmt key in the Software hive, which is associated with ReadyBoost and provided insight into USB-connected devices, is no longer available either. Okay, so one less artifact, one artifact removed from the constellation...now we just need to adapt.

This is really nothing new, to be honest. DFIR analysts need to be adaptable, regardless of whether we're in a consultant or FTE role. If you're a consultant, you're going to see a new environment on every engagement, and there will be new things to deal with and discover. A while back, a teammate discovered that the customer had LANDesk installed on their systems, and the software monitoring component recorded a great deal of information regarding executed processes right there in the Registry. It's not too different in an internal, FTE role, as you're likely going to run across legacy builds, software loads that haven't been/can't be updated for some reason, applications or packages users have installed specifically to meet the needs of their department or of a customer, etc. Further, we've seen various resources come and go; for example, we got used to having Registry hive backups available during investigations, and then we lost access to them; they're no longer available by default. In addition to the Microsoft-Windows-DriverFrameworks-UserMode/Operational Event Log, the Microsoft-Windows-TaskScheduler/Operational Event Log seems to be disabled by default. As such, when we find that a data source or artifact that we're familiar with and used to using is no longer available, we have to invest in determining an alternate means for determining the same or similar information; we have to rebuild those artifact constellations. 

Something else that has developed over time alongside the development of the Windows platform is how USB devices are treated. For example, Nicole Ibrahim described some time ago how some USB connected devices, specifically smartphones, are treated differently based on the protocol used. Nicole's presentation on the topic can be found here.

The overall point is that we can no longer consider all USB-connected devices to be the same, and as such, we may need to look in different locations within the OS, including different locations within the Registry and within different Windows Event Logs, to find the information pursuant to our analysis goals. Pursuant to this, I sat down with my own local system and started going through the Windows Event Logs, via the Event Viewer, one at a time, looking for indications of connected devices. What I found was that records of connections were dispersed across multiple logs, depending upon the type of device connected (i.e., smartphone/digital camera, ext HDD w/ enclosure, thumb drive, etc.).

As a caveat, these event records are not exclusive; that is to say that the individual event source/ID pairs do not pertain solely to USB connected devices. In many cases, the same event source/ID pair was found to contain information specific to the local physical hard drive, as well as to the different volumes on that hard drive. Also, all of these events are for the latest build of Windows 10 only, because that's all I have to test against.

So here's a look at the five resources I found; keep in mind this is limited based on what I have available to test with, but it should serve as a good starting point...

Event Source: WPD-MTPClassDriver
Event ID: 1005

Fig 1: Event ID 1005

Figure 1 illustrates where I'd connected my iPhone to the system to pull pictures; I also have entries for my iPod (yes, I still have an iPod...) where I wanted to transfer music to my iPhone. Due to the protocol used, this is also where we'd likely find digital cameras, as well.

Event Source: DeviceSetupManager
Event ID: 112

Fig 2: Event ID 112

Figure 2 shows where I'd connected a SanDisk Cruzer thumb drive to the computer.

Event Source: StorageSpaces-Driver
Event ID: 207

Fig 3: Event ID 207

I shared figure 3 because this is specifically an external HDD, one of those "wallet" drives with a small form factor/enclosure. This device is small enough that it's powered from the USB connection; I don't have a larger enclosure that requires an additional power source to test against.

Event Source: Partition
Event ID: 1006

Fig 4: Event ID 1006, XML view

Figure 4 is a bit different, because the "friendly view" simply says, "for internal use only". However, taking a look at the XML view, we see that the SanDisk Cruzer thumb drive and it's serial number pops right out!

Event Source: Ntfs
Event ID: 145, 142

Fig 5: Event ID 145

Figure 5 shows the Ntfs/145 event record (in part) from a previous connection of the SanDisk Cruzer thumb drive. Event ID 142 provides additional information, including regarding the volume assignment (C:, D:, F:, etc.), if the volume is bootable, etc., which can be used to tie to shellbags and Windows Portable Devices artifacts in the Registry.

From a forensic perspective, if you're interested in tracking USB devices connected to systems, I'd recommend enabling the Microsoft-Windows-DriverFrameworks-UserMode/Operational Event Log, forwarding those event records off of the system (for processing via a SIEM), as well as threat hunting or some other mechanism to ensure that if the log is disabled again that this is detected and responded to in the appropriate manner.

Note that enabling the Microsoft-Windows-DriverFrameworks-UserMode/Operational Event Log is as straightforward as setting the "Enabled" value in the HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\WINEVT\Channels\Microsoft-Windows-DriverFrameworks-UserMode/Operational key to "1" and rebooting the system. As a tip to DFIR analysts who need to perform verification and to threat hunters, the reverse technique (setting the "Enabled" value to "0") can be used disable normally-available Windows Event Logs.

Additional Resources
HEFC Blog - Sunday Funday, Daily Blog #197

Friday, May 13, 2022

Understanding Data Sources and File Formats

Following on the heels of my previous post regarding file formats and sharing the link to the post on LinkedIn, I had some additional thoughts that would benefit greatly from not blasting those thoughts out as comments to the original post, but instead editing and refining them via this medium.

My first thought was, is it necessary for every analyst to have deep, intimate knowledge of file formats? The answer to that is a resounding "no", because it's simply not possible, and not scalable. There are too many possible file formats for analysts to be familiar with; however, if a few knowledgeable analysts, ones who understand the value of the file format information to DFIR, CTI, etc., document the information and are available to act as resources, then that should suffice. With the format and it's value documented and reviewed/updated regularly, this should act as a powerful resource.

What is necessary is that analysts be familiar enough with data sources and file formats to understand when something is amiss, or when something is not presented by the tools, and know enough recognize that fact. From there, simple troubleshooting steps can allow the analyst to develop a thoughtful, reasoned question, and to seek guidance. 

So, why is understanding file formats important for DFIR analysts?

1. Parsing
First, you need to understand how to effectively parse the data. Again, not every analyst needs to be an expert in all file formats - that's simply impossible. But, if you're working on Windows systems, understanding file formats such as the MFT and USN change journal, and how they can be tied together, is important. In fact, it can be critical to correctly answering analysis questions. Many analysts parse these two file separately, but David and Matt's TriForce tool allowed these files (and the $LogFile) to be automatically correlated.

So, do you need to be able to write an OLE parser when so many others already exist? No, not at all. However, we should have enough of an understanding to know that certain tools allow us to parse certain types of files, albeit only to a certain level. We should also have enough understanding of the file format to recognize that not all files that follow the format are necessarily going to have the same content. I know, that sounds somewhat rudimentary, but there are lot of new folks in the DFIR industry who don't have previous experience with older, perhaps less observed file formats. 

Having an understanding of the format also allows us to ask better questions, particularly when it comes to troubleshooting an issue with the parser. Was something missed because the parser did not address or handle something, or is it because our "understanding" of the file format is actually a "misunderstanding"?

2. The "Other" Data
Second, understanding file formats provides insight into what other data the file format may contain, such as deleted data, "slack", and metadata. Several file formats emulate file systems; MS describes OLE files as "a file system within a file", and Registry hives are described as a "hierarchal database". Each of these file formats has their own method for addressing deleted content, as well as managing "slack". Further, many file formats maintain metadata that can be used for a variety of purposes. In 2002, an MSWord document contained metadata that came back to haunt Tony Blair's administration. More recently, Registry hive files were found to contain metadata that identified hives as "dirty", prompting further actions from DFIR analysts. Understanding what metadata may be available is also valuable when that metadata is not present, as observed recently in "weaponized" LNK files delivered by Emotet threat actors, in a change of TTPs.

3. Carving
Third, "file carving" has been an important topic since the early days of forensic analysis, and analysts have been asked to recover deleted files from a number of file systems. Recovering, or "carving" deleted files can be arduous and error prone, and if you understand file formats, it may be much more fruitful to carve for file records, rather than the entire file. For example, understanding the file and record structure of Windows 2000 and XP Event Logs (.evt files) allowed records to be recovered from memory and unallocated space, where carving for entire files would yield limited results, if any. In fact, understanding the record structure allowed for complete records to be extracted from "unallocated space" within the .evt files themselves. I used the successfully where an Event Log file header stated that there were 20 records in the file, but I was able to extract 22 complete records from the file.

Even today, the same holds true for other types of "records", including "records" such as Registry key and value nodes, etc. Rather than looking for file headers and then grabbing the subsequent X number of bytes, we can instead look for the smaller records. I've used this approach to extract deleted keys and values from the unallocated space within a Registry hive file, and the same technique can be used for other data sources, as well.

4. What's "In" The File
Finally, understanding the file format will help understand what should and should not be resident in the file. One example I like to look back on occurred during a PCI forensic investigation; an analyst on our team ran our process for searching for credit card numbers (CCNs) and stated in their draft report that CCNs were found "in" a Registry hive file. As this is not something we'd seen previously, this peaked our curiosity, and some of use wanted to take a closer look. It turned out that what had happened was this...the threat actor had compromised the system, and run their process for locating CCNs. At the time, the malware used would (a) dump process memory from the back office server process that managed CCN authorization and processing to a file, (b) parse the process memory dump with a 'compiled' Perl script that included 7 regex's to locate CCNs, and then (c) write the potential CCNs to a text file. The threat actor then compressed, encrypted, and exfiltrated the output file, deleting the original text file. This deleted text file then became part of unallocated space within the file system, and the sectors that comprised the file were available for reallocation. 

Later, as the Registry hive file "grew" and new data was added, sectors from the file system were added to the logical structure of the file, and some of those sectors were from the deleted text file. So, while the CCNs were found "in" the logical file structure, they were not actually part of the Registry. The CCN search process we used at the time returned the "hit" as well as the offset within the file; a visual inspection of the file via a hex editor illustrated that the CCNs were not part of the Registry structure, as they were not found to be associated with any key or value nodes.

As such, what at first looked like a new threat actor TTP was really just how the file system worked, which had a significant impact on the message that was delivered to Visa, who "ran" the PCI Council at the time.

Tuesday, May 03, 2022

Putting It All Together

It's great when a plan, or a puzzle, comes together, isn't it? 

I'm not just channeling my inner Hannibal Smith...I'm talking about bringing various pieces or elements together to build a cohesive, clear picture, connecting the dots into a cohesive analysis.

To kick this off, Florian had this to say about threat actors moving to using ISO/IMG files as result of Microsoft disabling VBA macros in docs downloaded from the Internet, a change which results in entirely new artifact constellations. After all, a change in TTPs is going to result in changes as to how the system is impacted, and a change in the resultant constellations. So, this sets the stage for our example.

In this case, the first piece of the puzzle is this tweet from Max_Mal_, which points to the BumbleBee campaign (more info from TAG here), described in the Orion Threat Alert. Per the tweet, the infection looks like this:

Zip -> ISO -> LNK -> rundll32.exe (LOLBin) -> Cobalt Strike

This all starts with a zip archive being delivered to or downloaded by the user; however, what's not mentioned or described here are the system impacts. Downloading the archive often (depending upon the process) results in MOTW being "attached" to the zip archive. This tweet thread by Florian Roth includes a couple of resources that discuss MOTW, one of which is an excellent article by Mike Wolfe that provides a really nice explanation and details regarding MOTW. I've been fascinated by NTFS alternate data streams (ADSs) since I first encountered them, in particular how they're used by the OS, as well as by the adversary. As a result, I've been similarly interested in really leveraging MOTW in every way possible.

The other useful component of Florian's thread is this tweet by Nobutaka Mantani regarding MOTW propagation support in archiver software for Windows. This is huge. What it means is that when the ISO file is extracted from the zip archive by one of the software products that supports MOTW propagation, the extracted file "inherits" MOTW, albeit without the same contents as the original MOTW. Rather, the MOTW attached to the ISO file points back to the zip archive. This then gives us a great deal of insight into the origin of the extracted file, even if the zip archive is deleted.

The benefit is that this may provide us with a detection opportunity, something that depends upon the framework and/or approach you're using. For example, it may be a good idea to alert on files being written to suspicious locations (ProgramData folder, user's StartUp folder, etc.) by one of the archiver software packages, where the target file has a MOTW. Or, we can search for such files, via either proactive or DFIR threat hunting, as a means of locating "badness" on systems or within images. Imagine having an automated process that parses the MFT, either from a triage file collection or from an acquired image, and automatically identifies for the analyst all files with MOTW in suspicious locations. MOTW propagation also appears to occur if the downloaded zip archive includes an LNK file (rather than an ISO file), or a batch file, as well. There have been instances where a batch file is extracted from an archive, to the user's StartUp folder, and has an associated MOTW. As such, automating the detection process, via alerts based on EDR telemetry, or via proactive or DFIR hunting, provides for efficiency and consistency in the analysis process.

So, where we once had to deal with weaponized documents, we're now extracting files from an archive, mounting an ISO or IMG file, and accessing the embedded LNK file within the "new volume". All of this results in a completely new artifact constellation, one that we have to understand in order to fully address coming attacks.

Sunday, May 01, 2022

Changes In The Use Of LNK Files

Not long ago, I posted regarding how LNK files can be (ab)used; the post refers to LNK file metadata, and how, if the LNK file is sent by the threat actor, that metadata can be used to learn about the threat actor's environment. I first saw this mentioned by JPCERT in 2016, where they included an interesting graph (figure 1) in their post to illustrate the point.

Tony Lambert recently shared via his blog a change in Emotet TTPs, that the threat actor group had moved to using LNK files as an initial delivery mechanism. In the post, Tony described this as "a really interesting TTP change", and that it was "odd but not unexpected". Tony also shared a link to download a copy of the LNK file, as well as metadata parsed from the LNK sample via EXIFTool. I don't often use EXIFTool for this sort of metadata extraction, and I wanted to take a look for myself...here's what I found:

guid        {00021401-0000-0000-c000-000000000046}
shitemidlist    My Computer/C:\/Windows/system32/cmd.exe
**Shell Items Details (times in UTC)**
 C:0          M:0          A:0         Windows (9)  
 C:0          M:0          A:0         system32 (9)  
 C:0          M:0          A:0         cmd.exe (9)  
commandline    /v:on /c findstr "rSIPPswjwCtKoZy.*" Password2.doc.lnk > "%tmp%\VEuIqlISMa.vbs" & "%tmp%\VEuIqlISMa.vbs"
iconfilename    shell32.dll          
hotkey       0x0               
showcmd      0x1               


GUID/ID pairs:
{46588ae2-4cbc-4338-bbfc-139326986dce}/4   SID: S-1-5-21-1499925678-132529631-3571256938-1001

GUID : {1ac14e77-02e7-4e5d-b744-2eb1ae5198b7}

At first glance, this may not seem odd to anyone; however, for me I can see that a great deal of the metadata that researchers would look to has been stripped from the LNK file.

I'm just going to let that soak in.

Okay, so...what we don't see in the above output, and what I verified visually (via UltraEdit), are time stamps. That's right...the time stamps in the LNK file header, as well as the time stamps in the shell items that comprise the relative path, have all been zero'd out. See the lines in the above output that build the path with the folders for "Windows" and "System32", and the file "cmd.exe"? All of these shell items, by default, include time stamps. However, for this LNK file, they're zero'd out.

Remember that JPCERT article from 2016? None of the items we'd expect to find in an LNK file, those elements of metadata mentioned in the JPCERT article...volume serial number, MAC address, machine ID/NetBIOS name...are available in this LNK file. I'm familiar with what it takes to get to this point, having constructed the smallest functioning LNK file (make_lnk.txt) that I could, and one that is missing these same elements of metadata.

For example, here's a sample from Astaroth, and here are samples of output from LNK files created using various native means. However, all of these LNK files, as well as the LNK files from figs 5 and 6 of the Mandiant APT29 report, contain these items of metadata. The one Tony found does not.

So, many thanks to Tony (LinkedIn, Twitter) for not only pointing out the change in Emotet TTPs, but for also sharing a publicly accessible link to the example LNK file. This was truly a fascinating find!

As a final thought, the lack of metadata does not mean that the same techniques to locate other, similar samples...create a Yara rule and submit as a VT retro-hunt...won't work. In fact, the dearth of the missing metadata (that would be a great title for a blog post series!!) is actually an indicator that we can use for a pretty solid search!