Saturday, July 04, 2026

LNK Files in CTI

There's a good bit of file analysis that goes into CTI reports, including (but not limited to) malware analysis. But for some reason, not all files appear to be worthy of parsing and analysis. We also tend to see in-depth descriptions of the value of LNK files to forensic analysis, particularly when looking at user activity on an endpoint. However, while LNK files still tend to be a popular delivery mechanism for kicking off attacks, not a great deal of effort goes into analysis if these files, nor does effort go into recording metadata for use in detections or threat intel. 

Sure, we see reports that include screen capture of command lines embedded in LNK files but what we don't see is LNK file metadata truly, fully exploited. The last time I can remember really seeing LNK file metadata incorporated into analysis was the Mandiant write-up on CozyBear from Nov 2018, where figures 5 & 6 illustrate differences been 2016 and 2018 campaigns by comparing LNK file metadata. 

Figure 1: LNK metadata (Source: TheHackerNews)
A recent article from TheHackerNews described an attack chain that started off with a ZIP archive containing an LNK file, pretending to be a Hangul Word Processing (HWP) document as a lure. The article does not provide figure or image numbers, but does contain the image seen in Figure 1, albeit not with a description within close proximity to the image (you have to read on a bit of the description). This image does provide something of a comparison between two observed LNK files, albeit without the full breadth of metadata. While the image does describe the timestamps as "all zero (wiped)", there's no apparent reference to a machine ID/NetBIOS name field, either as populated or "wiped". Nor is there any mention of Extra Data blocks, and whether or not they exist, and are populated.

The point is that there is significant value in tracking LNK file metadata across campaigns, as doing so gives us a better view into threat actor tooling and situational awareness. For example, in the Mandiant comparison of the two CozyBear campaigns (2016, 2018), they used embedded timestamps to support a finding in their analysis. In Figure 1, we see in the comparison between the two LNK files that the timestamps were "zeroed out". By looking further into available metadata, we can make determinations around the threat actor tooling, as well as the process they use for developing the LNK files, and the lures, providing insight into their situational awareness.

But I get it; all of this requires rigor. First, analysts and organizations need to know that this information is available, and then they need to know how to extract it, aggregate it, and track it. Then, findings need to be supported by accumulated data, as part of a review process. 

Rigor in Threat Intel

I'm just going to say it.

IOCs are not "threat intel". 

Lists of IP addresses and domain names, without context, are data points and information, not "intel". Threat intel is based on patterns developed from the accumulation/aggregation of data.

In 2016, I took a look at about half a dozen Samas ransomware engagements, all worked by different IR analysts. All of these analysts were focused on servicing the IR consulting business model; that is, work the engagement, write the report, and deliver it to the customer so that they could move on to the next engagement. However, by looking across multiple engagements, I began to see commonalities and overlaps in threat actor activity, including initial access, as well as other phases of the attack that led up to the ransomware deployment within the impacted infrastructures. By seeing, verifying, and confirming activities that were observed consistently (i.e., initial access), confidence increased in the understanding of that attack phase. This was particularly valuable when, for whatever reason, logs or other artifacts weren't available. 

As a result of our supported observations and findings, we published a blog post describing our findings, and someone reading our blog post reached out to let us know that they'd look for some of the initial indicators, and were able to prevent their own organization from being ransomed. Later, a good bit of the content provided in that original 2016 blog post was transitioned to another blog post in 2018, and the company was then later purchased by Sophos. 

However, the point remains...we were able to develop an extremely granular understanding of the threat actor's attack chain and timing based on accumulating data across multiple engagements, and then aggregating it into threat intelligence about the threat actor.

Threat intel is based on patterns developed from an accumulation and aggregation of data, so we can use data from multiple incidents to fill in gaps in observations, understanding and detections. We can better understand a threat actor's capabilities and situational awareness, and develop a better understanding of how the threat actor operates not only in similar environments, but also across multiple disparate environments, and how they "respond" to various "stimulus" or obstacles. We get to see what really goes on when a threat actor gains access to an endpoint or infrastructure, what actions they take, in what sequence, and with what timing, as well as how they respond to challenges, such as when something they were observed doing or using on previous engagements is not available, or some security tooling hampers or completely inhibits their ability to continue in their attack. 

I've seen actors try 3 times to run the "ver" command before succeeding the 4th time. I've seen the same threat actor make multiple attempts to launch their malicious DLL via rundll32.exe, across multiple incidents. I've seen threat actors respond to security tooling deleting their malware by attempting to uninstall applications that aren't even installed on the endpoint.

I've also seen threat actors access an infrastructure, orient themselves, and then step off on their attack chain, installing multiple disparate persistence mechanisms. I've seen threat actors determine what's running on the endpoint before copying over their tooling, and I've seen threat actors simply blind the available tooling with no prior recon, as if they already knew what they were dealing with in the infrastructure.

Rigor
All that being said, we also have to understand that errors compound as we aggregate that data, as well. This is why analysts must take a rigorous approach to populating that aggregated data, one that includes review, where analysts need to be able to justify their findings, rather than simply have them thrown into the "pile" and accepted as "fact" or "truth", albeit without question. 

I know, I know...no one wants to hear that "threat intel" requires rigor. I get it. It's much easier to simply state something as "fact" than it is to provide the evidence to support that statement. 

Take data exfil, for example. I've "seen" data exfil a number of times, and proven it. During one incident, the threat actor archived data on one endpoint, where we were able to capture the archival command line (which included the threat actor's archive password), and we found that the threat actor had moved the archives to an endpoint that was running an accessible web server. They copied the files to a web directory, accessed the web server from "outside" and requested the files, then deleted the files from the web server. In this case, we had the captured command line, file names, proof of data exfil in the web server logs, and we imaged the physical disk for the server and recovered the deleted archives. 

During another engagement, a threat actor had accessed a "remote" Linux-based infrastructure, and transitioned to the corporate Windows-based infrastructure, something they weren't supposed to be able to do. The threat actor created archives on a Windows server, and copied them to a Linux system; we had copies of the archives on both endpoints, relevant file system time stamps, and NetFlow showing the transfer. 

But think about...how many times do we see a threat actor creating archives, or simply that WinZip or 7Zip was "run", and based on just that observation state, "...the threat actor exfiltrated data..."? I mean, sure, it's a logical assumption, but are we able to support that assumption with evidence? In a high stress engagement, where the impacted org is trying to assess risk, it's easy to say, "...data was exfiltrated...", but if that's all you've got, how do you then answer the question, "...to where??"

The same is true for other data, as well. If we see a bunch of failed login attempts (to RDP, MSSQL, etc.) that originate from a particular IP address or workstation name, and then we see a successful login, how is this finding described to a customer? Most often, it's "...failed login attempts originating from <identifier> resulted in a successful login..."; we say this because it's a logical assumption. Yet, I've seen endpoints with thousands of failed login attempts, and neither the user name nor the source IP address of the successful login are found on the list of failed login attempts. In addition, the failed login attempts often continue well after the successful login.

But it's an assumption, and as data is aggregated across multiple engagements, assumptions need to be clearly identified as such. Otherwise, they are simply an error that compounds across that data, and makes it's way into the "intel" as such.