Showing posts with label toolmarks. Show all posts
Showing posts with label toolmarks. Show all posts

Saturday, September 03, 2022

LNK Builders

I've blogged a bit...okay, a LOT...over the years on the topic of parsing LNK files, but a subject I really haven't touched on is LNK builders or generators. This is actually an interesting topic because it ties into the cybercrime economy quite nicely. What that means is that there are "initial access brokers", or "IABs", who gain and sell access to systems, and there are "RaaS" or "ransomware-as-a-service" operators who will provide ransomware EXEs and infrastructure, for a price. There are a number of other for-pay services, one of which is LNK builders.

In March, 2020, the Checkpoint Research team published an article regarding the mLNK builder, which at the time was version 2.2. Reading through the article, you can see that the building includes a great deal of functionality, there's even a pricing table. Late in July, 2022, Acronis shared a YouTube video describing how version 4.2 of the mLNK builder available.

In March, 2022, the Google TAG published an article regarding the "Exotic Lily" IAB, describing (among other things) their use of LNK files, and including some toolmarks (drive serial number, machine ID) extracted from LNK metadata. Searching Twitter for "#exoticlily" returns a number of references that may lead to LNK samples embedded in archives or ISO files. 

In June, 2022, Cyble published an article regarding the Quantum LNK builder, which also includes features and pricing scheme for the builder. The article indicates a possible connection between the Lazarus group and the Quantum LNK builder; similarities in Powershell scripts may indicate this connection.

In August, 2022, SentinelLabs published an article that mentioned both the mLNK and Quantum builders. This is not to suggest that these are the only LNK builders or generators available, but it does speak to the prevalence of this "*-as-a-service" offering, particularly as some threat actors move away from the use of "weaponized" (via macros) Office documents, and toward the use of archives, ISO/IMG files, and embedded LNK files.

Freeware Options
In addition to creating shortcuts through the traditional means (i.e., right-clicking in a folder, etc.), there are a number of freely available tools that allow you to create malicious LNK files. However, from looking at them, there's little available to indicate that they provide the same breadth of capabilities as the for-pay options listed earlier in this article. Here's some of the options I found:

lnk-generator (circa 2017)
Booby-trapped shortcut (circa 2017) - includes script
LNKUp (circa 2017) - LNK data exfil payload generator
lnk-kisser (circa 2019) - payload generator
pylnk3 (circa 2020) - read/write LNK files in Python
SharpLNKGen-UI (circa 2021) - expert mode includes use of ADSs (Github)
Haxx generator (circa 2022) - free download
lnkbomb - Python source, EXE provided
lnk2pwn (circa 2018) - EXE provided
embed_exe_lnk - embed EXE in LNK, sample provided 

Next Steps
So, what's missing in all this is toolmarks; with all these options, what does the metadata from malicious LNK files created using the builders/generators look like? Is it possible that given a sample or enough samples, we can find toolmarks that allow us to understand which builder was used?

Consider this file, for example, which shows the parsed metadata from several samples (most recent sample begins on line 119). The first two samples, from Mandiant's Cozy Bear article, are very similar; in fact, they have the same volume serial number and machine ID. The third sample, beginning on line 91, has a lot of the information we'd look to use for comparison removed from the LNK file metadata; perhaps the description field could be used instead, along with specific offsets and values from the header (given that the time stamps are zero'd out). In fact, besides zero'd out time stamps, there's the SID embedded in the LNK file, which can be used to narrow down a search.

The final sample is interesting, in that the PropertyStoreDataBlock appears to be well-populated (unlike the previous samples in the file), and contains information that sheds light on the threat actor's development environment.

Perhaps, as time permits, I'll be able to use a common executable (the calculator, Solitaire, etc.), and create LNK files with some of the freeware tools, noting the similarities and differences in metadata/toolmarks. The idea behind this would be to demonstrate the value in exploring file metadata, regardless of the actual file, as a means of understanding the breadth of such things in threat actor campaigns.

Sunday, June 06, 2021

Toolmarks: LNK Files in the news again

 As most regular readers of this blog can tell you, I'm a bit of a fan of LNK files...a LNK-o-phile, if you will. I'm not only fascinated by the richness of the structure, but as I began writing a parser for LNK files, I began too see some interesting aspects of intelligence that can be gleaned from LNK files, in particular, those created within a threat actors development environment, and deployed to targets/infrastructures. First, there are different ways to create LNK files using the Windows API, and what's really cool is that each method has it's own unique #toolmarks associated with it!  

Second, most often there is a pretty good amount of metadata embedded in the LNK file structure. There are file system time stamps, and often we'll see a NetBIOS system name, a volume S/N, a SID, or other pieces of information that we can use in a VirusTotal retro-hunt in order to build out a significant history of other similar LNK files.

In the course of my research, I was able to create the smallest possible functioning LNK file, albeit with NO (yes, you read that right...) metadata. Well, that's not 100% true...there is metadata within the LNK file. Specifically, the Windows version identifier is still there, and this is something I purposely left. Instead of zero'ing it out, I altered it to an as-yet-unseen value (in this case, 0x0a). You can also alter each version identifier to their own value, rather than keeping them all the same.

Microsoft recently shared some information about NOBELIUM sending LNK files embedded within ISO files, as did the Volexity team. Both discuss aspects of the NOBELIUM campaign; in fact, they do so in a similar manner, but each with different details. For example, the Volexity team specifically states the following (with respect to the LNK file):

It should be noted that nearly all of the metadata from the LNK file has been removed. Typically, LNK files contain timestamps for creation, modification, and access, as well as information about the device on which they were created.

Now, that's pretty cool! As someone who's put considerable effort into understanding the structure of LNK files, and done research into creating the smallest, minimal, functioning LNK file, this was a pretty interesting statement to read, and I wanted to learn more.

Taking a look at the metadata for the reports.lnk file (from fig 4 in the Microsoft blog post, and fig 3 of the Volexity blog post) we see

guid               {00021401-0000-0000-c000-000000000046}
shitemidlist    My Computer/C:\/Windows/system32/rundll32.exe
**Shell Items Details (times in UTC)**
  C:0                   M:0                   A:0                  Windows  (9)
  C:0                   M:0                   A:0                  system32  (9)
  C:0                   M:0                   A:0                  rundll32.exe  (9)

commandline  Documents.dll,Open
iconfilename   %windir%/system32/shell32.dll
hotkey             0x0
showcmd        0x1

***LinkFlags***
HasLinkTargetIDList|IsUnicode|HasArguments|HasExpString|HasIconLocation

***PropertyStoreDataBlock***
GUID/ID pairs:
{46588ae2-4cbc-4338-bbfc-139326986dce}/4      SID: S-1-5-21-8417294525-741976727-420522995-1001

***EnvironmentVariableDataBlock***
EnvironmentVariableDataBlock: %windir%/system32/explorer.exe

***KnownFolderDataBlock***
GUID  : {1ac14e77-02e7-4e5d-b744-2eb1ae5198b7}
Folder: CSIDL_SYSTEM

While the file system time stamps embedded within the LNK file structure appear to have been zero'd out, a good deal of metadata still exists within the structure itself. For example, the Windows version information (i.e., "9") is still available, as are the contents of several ExtraData blocks. The SID listed in the PropertyStoreDataBlock can be used to search across repositories, looking for other LNK files that contain the same SID. Further, the fact that these blocks still exist in the structure gives us clues as to the process used to create the original LNK file, before the internal structure elements were manipulated.

I'm not sure that this is the first time this sort of thing has happened; after all, the MS blog post makes no mention of metadata being removed from the LNK file, so it's entirely possible that it's happened before but no one's thought that it was important enough to mention. However, items such as ExtraDataBlocks and which elements exist within the structure not only give us clues (toolmarks) as to how the file was created, but the fact that metadata elements were intentionally removed serve as additional toolmarks, and provide insight into the intentions of the actors.

But why use an ISO file? Well, interesting you should ask.  Matt Graeber said:

Adversaries choose ISO/IMG as a delivery vector b/c SmartScreen doesn't apply to non-NTFS volumes

In the ensuring thread, @arekfurt said:

Adversaries can also use the iso trick to put evade MOTW-based macro blocking with Office docs.

Ah, interesting points! The ISO file is downloaded from the Internet, and as such, would likely have a zone identifier ADS associated with it (I say, "likely" because I haven't seen it mentioned as a toolmark), whereas once the ISO file is mounted, the embedded files would not have zone ID ADSs associated with them. So, the decision to use an ISO file was intentional, and not just cool...in fact, it appears to have been intentionally used for defense evasion.

Saturday, April 10, 2021

On #DFIR Analysis, pt II - Describing Artifact Constellations

 I've been putting some serious thought into the topic of a new #DFIR model, and in an effort to extend and expand upon my previous post a bit, I wanted to take the opportunity to document and share some of my latest thoughts.

I've discussed toolmarks and artifact constellations previously in this blog, and how they apply to attribution. In discussing a new #DFIR model, the question that arises is, how do we describe an artifact or toolmark constellation in a structured manner, so that it can be communicated and shared?  

Of course, the next step after that, once we have a structured format for describing these constellations, is automating the sharing and "machine ingestion" of these constellation descriptions. But before we get ahead of ourselves, let's discuss a possible structure a bit more. 

The New #DFIR Model

First off, to orient ourselves, figure 1 illustrates the proposed "new" #DFIR model from my previous blog post. We still have the collect, parse, and enrich/decorate phases prior to the output and data going to the analyst, but in this case, I've highlighted the "enrich/decorate" phase with a red outline, as that is where the artifact constellations would be identified.

Fig 1: New DFIR Model 
We can assume that we would start off by applying some known constellation descriptions to the parsed data during the "enrich/decorate" phase, so the process of identifying a toolmark constellation should also include some means of pulling information from the constellation, as well as "marking" or "tagging" the constellation in some manner, or facilitating some other means of notifying the analyst. From there, the expectation would be that new constellations would be defined and described through analysis, as well as through open sources, and applied to the process.

We're going to start "small" in this case, so that we can build on the structure later. What I mean by that is that we're going to start with just DFIR data; that is, data collected as either a full disk acquisition, or as part of triage response to an identified incident. We're going to start here because the data is fairly consistent across Windows systems at this point, and we can add EDR telemetry and input from a SIEM framework at a later date. So, just for the sake of  this discussion, we're going to start with DFIR data.

Describing Artifact Constellations

Let's start by looking a common artifact constellation, one for disabling Windows Defender. We know that there are a number of different ways to go about disabling Windows Defender, and that regardless of the size and composition of the artifact constellation they all result in the same MITRE ATT&CK sub-technique. One way to go about disabling Windows Defender is through the use of Defender Control, a GUI-based tool. As this is a GUI-based tool, the threat actor would need to have shell-based access to the system, such through a local or remote (Terminal Services/RDP) login. Beyond that point, the artifact constellation would look like:
  • UserAssist entry in the NTUSER.DAT indicating Defender Control was launched
  • Prefetch file created for Defender Control (file system/MFT; not for Windows server systems)
  • Registry values added/modified in the Software hive
  • "Microsoft-Windows-Windows Defender%4Operational.evtx" event records generated
Again, this constellation is based solely on DFIR or triage data collected from an endpoint. Notice that I point out that one artifact in the constellation (i.e., the Prefetch file) would not be available on Windows server systems. This tells us that when working with artifact constellations, we need to keep in mind that not all of the artifacts may be available, for a variety of reasons (i.e., version of Windows, system configuration, installed applications, passage of time, etc.). Other artifacts that may be available but are also heavily dependent upon the configuration of the endpoint itself include (but are not limited to) a Security-Auditing/4688 event in the Security Event Log pertaining to Defender Control, indicating the launch of the application, or possibly a Sysmon/1 event pertaining to Defender Control, again indicating the launch of the application. Again, the availability of these artifacts depends upon the specific nature and configuration of the endpoint system.

Another means to achieve the same end, albeit without requiring shell-based access, is with a batch file that modifies the specific Registry values (Defender Control modifies two Registry values) via the native LOLBIN, reg.exe. In this case, the artifact constellation would not need to (although it may be) be preceded by a Security-Auditing/4624 (login) event of either type 2 (console) or type 10 (remote). Further, there would be no expectation of a UserAssist entry (no GUI tool needs to be launched), and the Prefetch file creation/modification would be for reg.exe, rather than Defender Control.  However, the remaining two artifacts in the constellation would likely remain the same.

Fig 2: WinDefend Exclusions
Of course, yet another means for "disabling Windows Defender" could be as simple as adding an exclusion to the tool, in any one or more of the five subkeys illustrated in figure 2. For example, we've seen threat actors create exceptions for any file ending in ".exe", found in specific paths, or any process such as Powershell.

The point is that while there are different ways to achieve the same end, each method has its own unique toolmark constellation, and the constellations could then be used to apply attribution.  For example, the first method for disabling Windows Defender described above was observed being used by the Snatch ransomware threat actors during several attacks in May/June 2020. Something like this would not be exclusive, of course, as a toolmark constellation could be applied to more than one threat actor or group. After all, most of what we refer to as "threat actor groups" are simply how we cluster IOCs and TTPs, and a toolmark constellation is a cluster of artifacts associated with the conduct of particular activity. However, these constellations can be applied to attribution.

A Notional Description Structure

At this point, a couple of thoughts or ideas jump out at me.  First, the individual artifacts within the constellation can be listed in a fashion similar to what's seen in Yara rules, with similar "strings" based upon the source. Remember, by the time we're to the "enrich/decorate" phase, we've already normalized the disparate data sources into a common structure, perhaps something similar to the five-field TLN format used in (my) timelines. The time field of the structure would allow us to identify artifacts within a specified temporal proximity, and each description field would need to be treated or handled (that is, itself parsed) differently based upon the source field. The source field from the normalized structure could be used in a similar manner as the various 'string' identifiers in Yara (i.e., 'ascii', 'nocase', 'wide', etc.) in that they would identify the specific means by which the description field should be addressed. 

Some elements of the artifact constellation may not be required, and this could easily be addressed through something similar to Yara 'conditions', in that the various artifacts could be grouped with parens, as well as 'and' and 'or', identifying those artifacts that may not be required for the constellation to be effective, although not complete. From the above examples, the Registry values being modified would be "required", as without them, Windows Defender would not be disabled. However, a Prefetch file would not be "required", particularly when the platform being analyzed is a Windows server. This could be addressed through the "condition" statement used in Yara rules, and a desirable side effect of having a "scoring value" would be that an identified constellation would then have something akin to a "confidence rating", similar to what is seen on sites such as VirusTotal (i.e., "this sample was identified as malicious by 32/69 AV engines"). For example, from the above bulleted artifacts that make up the illustrated constellation, the following values might be applied:

  • Required - +1
  • Not required - +1, if present
  • +1 for each of the values, depending upon the value data
  • +1 for each event record
If all elements of the constellation are found within a defined temporal proximity, then the "confidence rating" would be 6/6. All of this could be handled automatically by the scanning engine itself.

A notional example constellation description based on something similar to Yara might then look something like the following:

strings:

    $str1 = UserAssist entry for Defender Control
    $str2 = Prefetch file for Defender Control
    $str3 = Windows Defender DisableAntiSpyware value = 1
    $str4 = Windows Defender event ID 5010 generated
    $str5 = Windows Defender DisableRealtimeMonitoring value = 1
    $str6 = Windows Defender event ID 5001 generated

condition:

    $str1 or $str2 and ($str3 and $str4 and $str5 and $str6);

Again, temporal proximity/dispersion would need to be addressed (most likely within the scanning engine itself), either with an automatic 'value' set, or by providing a user-defined value within the rule metadata. Additionally, the order of the individual artifacts would be important, as well. You wouldn't want to run this rule and in the output find that $str1 was found 8 days after the conditions for $str3 and $str5 being met. Given that the five-field TLN format includes a time stamp as its first field, it would be pretty trivial to compute a temporal "Hamming distance", of sorts, a well as ensure proper sequencing of the artifacts or toolmarks themselves.  That is to say that $str1 should appear prior to $str3, rather than after it, but not so far so as to be unreasonable and create a false positive detection.

Finally, similar to Yara rules, the rule name would be identified in the output, along with a "confidence rating" of 6/6 for a Windows 10 system (assuming all artifacts in the cluster were available), or 5/6 for Windows Server 2019.

Counter-Forensics

Something else that we need to account for when addressing artifact constellations is counter-forensics, even that which is unintentional, such as the passage of time. Specifically, how do we deal with identifying artifact constellations when artifacts have been removed, such as application prefetching being disabled on Windows 10 (which itself may be part of a different artifact constellation), or files being deleted, or something like CCleaner being run?

Maybe a better question is, do we even need to address this circumstance? After all, the intention here is not to address every possible eventuality or possible circumstance, and we can create artifact constellations for various Windows functionality being disabled (or enabled).