Wednesday, February 03, 2010

Forensic Analysis and Intel Gathering

Continuing the vein of my previous post, while I do see some benefit to an intelligence rating system being adopted and utilized when it comes to forensic analysis, I'm not entirely sure that, particularly at this point, this is something that can be formalized or used more widely, for two reasons...

First, I don't see this as being particularly wide-spread. I do think that there are analysts out there who would see the value of this and take it upon themselves to adopt it and incorporate it into what they do. However, I don't necessarily see this as being part of introductory or maybe even intermediate level training. This might be training that's conducted internally by organizations that conduct analysis or provide analysis services, someplace where it's easier to answer questions and provide more direct guidance to analysts who are seeing this for the first time. Further, something like this may have already been adopted by an analyst who is associated with the intel community in some way.

Second, the rating system is somewhat objective, and this is where you can get really caught up in non-technical/political issues. For example, regarding my statement on the $STANDARD_INFORMATION and $FILE_NAME attributes; when making a statement like that, I would cite Brian Carrier's excellent book, as well as perhaps conduct some testing and document my findings. Based on this, I might assign a fairly high level of confidence to the information; but that's me. Or, what if the information is from the Registry...how is this information rated? Get a roomful of people, you'll get a lot of different answers.

But why is this important? Well, for one, a good deal of forensic analysis has progressed beyond finding files (i.e., pictures, movies, etc.) on systems. Consider issues surrounding web application security...there are vulnerabilities to these systems that allow a knowledgeable attacker to gain access to a system without writing anything to disk; all of the artifacts of the exploit would be in memory. Subsequently, nothing would necessarily be written to disk until the attacker moved beyond the initial step of gaining access, but at that point, anything written to disk might simply appear to be part of the normal system activity.

Consider Least Frequency of Occurrence, or LFO. Pete Silberman was on target when he said that malware has the LFO on a system, and that same sort of thinking applies to intrusions, as well. Therefore, we can't expect to find what we're looking for...the initial intrusion vector, indicators of what the intruder did, or even if the system is compromised...by only looking at one source of data. What we need to do is overlay multiple sources of data, all with their own indicators, and only then will we be able to determine the activity that occurs least frequently. Think of this as finding what we're looking for by looking for the effects of the artifacts being created; we would know that something was dropped in a pond without seeing it being dropped, but by observing the ripples or waves that resulted from the object being dropped into the pond.

Matt Frazier posted to the Mandiant blog recently regarding sharing indicators of compromise...the graphic in the post is an excellent example that demonstrates multiple sources of data. Looking at the graphic and understanding that not everything can be included (for the sake of space), I can see file system metadata, metadata from the EXEs themselves, and partial Registry data included as "indicators". In addition to what's there, I might include the Registry key LastWrite times, any Prefetch files, etc., and then look for "nearby" data, such as files being created in Internet cache or an email attachments directory.

Consider the Trojan Defense. With multiple data sources from the system, as well as potentially from outside the system, it would stand to reason that the relative confidence level and context of the data, based on individual ratings for sources as well as a cumulative rating for the data as a whole, would be of considerable value, not only to the analyst, but to the prosecutor. Or perhaps the defense. By that I mean that as much as most of us would want to have a bad guy punished, we also would not want to wrongly convict an innocent person.

In summary, I really see this sort of thought and analysis process as a viable tool. I think that many analysts have been using this to one degree or another, but maybe hadn't crystallized this in their minds, or perhaps hadn't vocalized it. But I also think that incorporating this into a level of training closer to the initial entry point for analysts and responders would go a long way toward advancing all analysis work. Whether as organizations or individuals, effort should be directed toward investigating and developing/supporting methods for quickly, efficiently, and accurately collecting the necessary information...I'd hate to see something this valuable fall by the wayside and not be employed, simply because someone thinks that it's too hard to learn or use.

2 comments:

darren_q said...

Hi Harlan,

Thankfully the intelligence analysis process is not hard to teach, learn, nor hard to apply (at least it wasn't for me) and has been around a long time. I did my first training in 1994, which was the Anacapa Sciences course, based on a course from the 1970's (wow, still relevant too). Anacapa still do training, and at one stage had an introductory course available for free on the website.

i2 also offer training, but is focused on their analyst notebooks product, rather than general thinking and analysis, even though it does touch on the intelligence cycle.

Otherwise, if you want to fly me over I'll run some courses, and I'd even throw in digital analysis for free! (I've been involved in intelligence training, investigator training and forensic analysis training, so have no problem combining the three)

Seriously, I'm working on writing up the methodology I use for digital analysis, which is based on the intelligence analysis process (and includes timelines, which was part of the course in the 70's). The problem is I keep adding more and more to it, and is getting quite lengthy. Perhaps it's not as simple as I think, or I need to simplify things from the level of detail I'm including.

Anyway, more great food for thought, thanks!

regards,
Darren

H. Carvey said...

...the intelligence analysis process is not hard to teach...

Perhaps not, but that's not the totality of the process. Over the years, I've observed analysts and wondered why, in many cases, their view of available data sources is, shall we say, "limited".

...I'm working on writing up the methodology...

Have you considered writing the methodology up succinctly, and then using the additional information as examples and case studies?