Monday, February 04, 2019

Data Points And Analysis

In DFIR and "threat intel" analysis, very often individual data points are dismissed out of hand, as they are thought to be easily mutable.  We see it all the time, don't we? We find a data point, and instead of just adding it to our picture of the incident, instead, we ask, "Hey, what about this...?".  Very often, we hear in response, "...hackers change that all the time...", so we drop it.

Why do we do this?  Why do we not include "easily mutable" artifacts in our analysis?

A common example of this is the PE compile time, a time stamp value added to an executable file during the compilation process.  I'm not an expert in compilers or linkers, but this time stamp value is understood throughout the DFIR community to be easily mutable; that is, it doesn't "cost" an adversary much to change this value.  Many of us have seen where the PE compile time value, when converted, indicates that file was compiled in 1980, or possibly even a date in the future.  This value is thought to be easily mutable precisely because many of us have either seen it changed, or have actually changed it ourselves.  A consequence of this is that when someone brings up, "...hey, this time stamp value says...", we may be immediately met with the value itself being dismissed out of hand.

However, there may be considerable value in including these values in our corpus of viable, relevant data points. What I mean is, just because a value is understood to be easily mutable, what if it wasn't changed?  Why are we not including these values in our analysis because they could be changed, without checking to see if they had been changed?

Consider the FireEye blog post from Nov 2018 regarding the APT29/Cozy Bear phishing campaign; table 1 of the article illustrates an "operational timeline", which is a great idea.  The fourth row in the table illustrates the time at which the LNK file is thought to have been weaponized; this is a time stamp stored in a shell item, as an MS-DOS date/time value.  The specific value is the last modification time of the "system32" folder, and if you know enough about the format of LNK files, it's not hard at all to modify this time value, so it could be considered "easily mutable".  For example, open the file in binary mode, go to the location/offset with in the file, and overwrite the 16-bit value with 0's.  Boom.  You don't even have to mess with issues of endianness, just write 0's and be done with it.

However, in this case, the FireEye folks included the value in their corpus, and found that it had significant value.

Something else you'll hear very often is, "...yeah, we see that all the time...".  Okay, why dismiss it?  Sure, you see it all the time, but in what context?  When you say that you "see it all the time", does that mean you're seeing the same data points across disparate campaigns?

Let's consider Windows shortcut/LNK files again.  Let's say we retrieve the machine ID from the LNK file metadata, and we see "user-pc" again, and again, and again.  We also see the same node ID (or "MAC address") and the same volume serial number across different campaigns.  Are these campaigns all related to the same threat actor group, or different adversaries?  Either way, this would tell us something, wouldn't it?

The same can be said for other file and document metadata, including that found in phishing campaign lure documents, particularly the OLE format documents.  You see the same metadata across different campaigns?  Great.  Are the campaigns attributed to the same actors?

What about the embedded macros?  Are they obfuscated?  I've seen macros with no obfuscation at all, and I've seen macros with four or five levels of obfuscation, each level being completely different (i.e., base64 encoding, character encoding, differences in string concatenation, etc.).

All of these can be useful pieces of information to build out the threat intel picture.  Threat intel analysts need to know what's available, so that they can ask for it if it's not present, and then utilize it and track it.  DFIR analysts need to understand that there's more to answering the IR questions, and a small amount of additional work can yield significant dividends down the road, particularly when shared with analysts from other disciplines.


B!n@ry said...


I agree with you about verifying these mutable artifacts, because they might not have been modified, plus if we exclude them, then we might turn out to exclude everything. You mentioned a good example of just opening the binary file for the LNK file and going to the time offset and zeroing it out, would be one! There are so many other ways were anti-X stuff could happen. Therefore, it is not bad to verify instead of immediate exclude.

Also, LNK files and how they could be used for attribution and for checking campaigns, that also I think is very useful. And I would say yes, we could use them for checking that they belong to the same threat actor group.

Thanks again for another useful post.

Harlan Carvey said...


Thanks for leaving a comment...

> ...LNK files and how they could be used for attribution and for checking campaigns...

With the exception of the work that the FireEye guys have done, I really believe that this is an incredibly untapped resource of information.

Too many times, we make assumptions about data based on our aperture or collection bias. A number of years ago, I attended an ISOI-APT meeting in Ashburn, VA, and presented on a finding regarding a well-known PoisonIvy configuration we'd seen. I asked those in the room who were familiar with this malware, how it was delivered, and 100% of the folks said phishing.

I demonstrated delivery by subverting the user, and having them install it via USB. I also illustrated how the user had tried to "clean up" before returning their system, and we pulled the full malware binary out of a hibernation file.

As such, I do not believe that the full value of file structure metadata, particularly from LNK files (but also from .doc lure documents) has been tapped, nor realized.

Unknown said...

Morning Harlan

I agree that often assumptions are made that are not justified or supported by evidence beyond suspects can do something is often an easier state to achieve than have they done something.

I often see decisions made based on 'threat intelligence' because of said a domain or IP was malicious but it often lacks context and any form of formal grading to allow an investigator to assess and apply the intelligence to the investigation which often leads to inaccurate assumptions.

This combined with a reliance on tools to 'give' an answer when how the results were reached may not be fully understood. This was a question i raised at the SANS Digital Forensic Summit a number of years ago that off the shelf forensics tools were creating investigators with lower understanding as the tools did the work for them which may not be validated.

It concerns me that i see CVs now where Digital Forensics to recruiters and companies means they were trained to use FTK or Encase and that makes them an expert. I feel it is more about how we think rather than the tools available.

I wrote this a number of years ago on Threat Intelligence

Ranting stand down :)