Wednesday, November 29, 2006

Artifact classes

I've been doing some thinking about IR and CF artifacts over the past couple of weeks, and wanted to share my thoughts on something that may be of use, particularly if its developed a bit...

When approaching many things in life, particularly a case I'm investigating, I tend to classify things (image the scene in the Matrix where Agent Smith has Morpheus captive, and tells him that he's classified the human species as a virus*) based on information I've received...incident reports, interviews with the client, etc. By classify, I mean categorizing the incident in my mind...web page defacement, intrusion/compromise, inappropriate use, etc. To some extent, I think we all do this...and the outcome of this is that we tend to look for artifacts that support this classification. If I don't find these artifacts, or the artifacts that I do find do not support my initial classification, then I modify my classification.

A by-product of this is that if I've classified a case as, say, an intrusion, I'm not necessarily going to be looking for something else, such as illicit images, particularly if it hasn't been requested by the client. Doing so would consume more time, and when you're working for a client, you need to optimize your time to meet their needs. After all, they're paying for your time.

Now, what got me thinking is that many time in the public lists (and some that require membership) I'll see questions or comments that indicate that the analyst really isn't all that familiar with either the operating system in the image, or the nature of the incident they're investigating, or both. This is also true (perhaps more so) during incident response activities...not understanding the nature of an issue (intrusion, malware infection, DoS attack, etc.) can many times leave the responder either pursuing the wrong things, or suffering from simple paralysis and not knowing where to begin.

So, understanding how we classify things in our minds can lead us to classifying events and incidents, as well as classifying artifacts, and ultimately mapping between the two. This then helps us decide upon the appropriate course of action, during both live response (ie, an active attack) and post-mortem activities.

My question to the community is this...even given the variables involved (OS, file system, etc.), is there any benefit to developing a framework for classification, to include artifacts, to provide (at the very least) a roadmap for investigating cases?

Addendum, 30 Nov: Based on an exchange going on over on FFN, I'm starting to see some thought being put into this, and it's helping me gel (albiet not crystalize, yet) up my thinking, as well. Look at it this way...doctors have a process that they go through to diagnose patients. There are things that occur every time you show up at the doctor's office (height, weight, temperature, blood pressure), and there are those things that the doctor does to diagnose your particular "issue du jour". Decisions are made based on the info the doctor receives from the patient, and courses of action are decided. The doctor will listen to the patient, but also observe the patient's reaction to certain stimuli...because sometimes patients lie, or the doctor may be asking the wrong question.

Continuing with the medical analogy, sometimes it's a doctor that responds, sometimes a nurse or an EMT. Either way, they've all had training, and they all have knowledge of the human body...enough to know what can possibly be wrong and how to react.

Someone suggested that this may not be the right framework to establish...IMHO, at least it's something. Right now we have nothing. Oh, and I get to be Dr. House. ;-)

*It's funny that I should say that...I was interviewed on 15 May 1989 regarding the issue of women at VMI, and I said that they would initially be treated like a virus.


Bill Ethridge said...

I posted some in FFN about classifications, before reading this.

What you are proposing requires almost a cellular approach, moving in a different direction depending on the last "input. I would say you are prosing a Go game instead of chess.


H. Carvey said...


You may be takes almost a binary tree approach with different branches taken at each step based on inputs.

Anonymous said...

Interesting....I have actually been thinking about building a framework for doing exactly this. In my case though, I work solely in an environment with heavy network monitoring (all traffic captured, plus classification of some traffic), log aggregation and a good forensics team with the ability to do over the wire acquires. So if the malware arrives through our network, I can fairly quickly see most of the traffic generated by the bad stuff and kick off an IR process to deal with the physical stuff left on the drive or in memory. What we, and sounds like others lack is the ability to run the artifacts through a reliable db that can recognize morphed or intentionally changed malware and then determine what it did to the system. Kind of like a personal, updateable version of virustotal, with some intelligence built in to look at what files had MAC changes at that same couple of seconds and make a decision about where to look next. I've wanted to modify some of the NSRL perl scripts to include support for ssdeep....could be a good start.

You also mentioned over on FFN that you weren't looking necessarily for a signature based solution. Agreed, but if you do happen to know how a piece of malware works (see need for fuzzy hashing above to identify malware based on artifacts) could you not then extrapolate what other areas to go check and automate to some degree that functionality?

H. Carvey said...

What we, and sounds like others lack is the ability to run the artifacts through a reliable db that can recognize morphed or intentionally changed malware and then determine what it did to the system.

Exactly. And creating such a database will be extremely difficult...largely due to the fact that everyone wants it but few are willing to actually develop something like this.

The solution to this is that I'd like to teach people to fish. If you see a guy sitting there mumbling, "Man, I'd really like some fish", the initial reaction is to give him a fish. But that does nothing for him tomorrow. However, if you teach the guy how to to use a hook and line, make a net, bait to use, where to fish, how to observe habits and see when fish come out and where they go...then you're setting him up to take care of himself.

The idea is that 99.9999% of the IT and response population simply want a fish...a database of signatures, for file hashes, malware artifacts, etc. But we all see that with the various malware toolkits that are available, it's trivial to change the malware just enough so that it's not picked up by A/V software until a new signature is developed.

Regardless of what anyone thinks, there are not an infinite number of autostart locations in the file system and fact, the number is finite. The problem most folks have is that they can't remember them all...but they should be trying to memorize them.

In the movie "Blade", the lead character said, "Once you understand the nature of a thing, you know what it's capable of." This holds true for operating systems, as well as malware, and can be easily mapped to incidents such as intrusions.

Most folks want to be spoon-fed...and to steal another quote, "There is no spoon."

Anonymous said...

Goodness, Blade and The Matrix quoted in the same post?

If there are no spoons, sounds like there is a good market for selling spoons....personally, I will try to get something like this to work in house. Will post back with thoughts.

Bill Ethridge said...


Been following the posts in FFN and the lack of in some other forums you posted this.

The "Dr. House thing works, along with your taxonomy, and when you combine them with the binary tre approach you get an interesting situation.

We need to progress like House, we think of all the possibilities for the symptoms or other evidence we THINK we see. Then we try an approach based on that artifact and test the outcome. If it fails we must back up and try the next approach. The key is to be able to try a cure without killing the patient, IOW, we must have a safe retreat to our previous position. We also have to gaurd against false positives. Sometimes our test approach could cause behavior that would seem to indicate something that in fact doesn't exist.