Thursday, July 07, 2005

System Analysis

Ever notice how you go to a conference or view something online that talks about "analysis", and all you see/hear/read about is data collection?

I was at a conference a couple of weeks ago, and one of the speakers was giving a presentation on "live data analysis". The speaker did a great job talking about collecting data...but that's not analysis. Or is it?

I have to say at this point, I'm confused. I've been thinking for a while that data collection and analysis are two different things. After all, we see it all around us as two separate actions. The men and women on TV shows like "CSI" do collection...and then they perform their analysis.

At the beginning of every month, I like to drop by the E-Evidence.info site to see what wonderful new papers and presentations are posted. I love reading through some of the stuff that appears on the site. At one point, I even went back through the archives from previous months and years. There's always something interesting posted here.

Yesterday, I ran across a paper about checking Windows boxen for signs of compromise. Reading through the paper, I think it's extremely useful to the right audience, but it doesn't say anything about analysis...it's all about running tools. Some of the descriptions of tools and why they are used are pretty bare, to say the least. But like I said, it's a great paper for the right audience.

I then read through a presentation on the "live analysis" of a Linux system. Again...go here, get this tool, run it. While the presentation does present issues such as Tor networks used for anonymity and privacy (if you're interested in that kind of thing, check out VPM), it really doesn't do much to cover "analysis". Even Dana Epp's 2004 presentation includes the word "analysis" in the title, but the presentation itself only glosses over any actual analysis.

My point is that data collection is easy...it's the analysis that's the hard part, and what we need to start focusing on. The tools are there. Techniques and methodologies for collecting data abound. But I'm not saying that we shouldn't keep presenting and writing on data collection...what I am saying is that if the presentation or paper has the word "analysis" in the title, then analysis should be discussed.

Okay, I know what you're going to say..."hey, dude, chill! There are just too many possible things that someone can look for when doing analysis." And I'd agree with you. But I also think that a really cool way to do presentations is to pick something...something you've seen or done, or something that's of interest to your audience (I know, it's really hard to get information on what others are interested in...)...and go over that in detail. I've decided that for my part, I'm going to start doing more to cover the actual analysis...now that you have the data, what is it telling you...in my papers and presentations. In fact, I submitted an abstract for the DoD CyberCrime Conference for a paper/presentation that does exactly that. I don't know if the proposal has been accepted yet or not, but my intention is to walk through the analysis of a corporate case, specifically the theft of proprietary information. It should be interesting...not only in putting the presentation together, but also in the audience's reaction.

10 comments:

Anonymous said...

I think analysis is closely related to reverse engineering, so you probably should look at reverse engineering presentations.

H. Carvey said...

In some capacities, you're right. But reverse engineering malware doesn't help you figure out if (a) you actually have an incident, and (b) if you do, what is it and how did it occur.

Also, most of the reveng presentations I've come across are *nix-specific, and contrary to what some believe, there isn't a one-to-one mapping of techniques, for a variety of reasons.

If you have any specific presentations in mind, please feel free to share them...

Anonymous said...

You're right: Knowing what information to gather is good, but knowing how to interprete that data is crucial. That being said, I'm not sure analytical techniques can be written down successfully. A lot of it depends on the examiner's past experience and knowing "what looks right."

Rather than give a step-by-step procedure for analysis, perhaps case studies with highlighted lessons might be better? A write-up of: Here's what I was told, here's what I did, and here's and what I found. Lesson learned: "Look x in the $data."

H. Carvey said...

Jesse,

Exactly. I think you're 100% on target, in that specific analytic steps won't work...b/c if you're able to codify something that way, you can put it in a tool. After all, that's what the if..then statement in programming is good for, right?

However, case studies are where it's at! Here's what I was faced with, here's what I did, and here's what I found. That's great stuff, and people really seem to love it. I got a lot of really good comments from folks regarding chapter 6 of my book, BTW, b/c that's the approach I took.

The only drawback I see is that most folks doing the investigations are probably technical, which means that for the most part, documentation is like holy water on a vampire (salt on a slug??) for them. Many times I'll see a post on a public forum that's billed as a "case study", and I'll ask the author, "given the tools you had on hand, why didn't you do X?" and they'll respond with "I did X, but I didn't see anything relevant." Well, when conducting analysis, the things you *don't* see can be more important than those you *do* see.

If there are some relevant case studies that are worth mentioning or linking to, let me know.

Anonymous said...

Good case study http://www.honeynet.org/scans/

H. Carvey said...

Anonymous,

In a way, they are "case studies". However, some of them may be way too complicated for most Windows adminsitrators...how many MCSEs are required to know PE header formats to obtain their certification?

Also, look at what you see on public lists like SF...someone finds a file, wants to know what it is or what it does, and posts the name of the file only. When following up to get a copy of the file, you most often find out that the system was reloaded and the file in question was lost.

What I was referring to when I mentioned case studies was something along the lines of someone writing up an incident, from cradle to grave...what they found, etc.

Anonymous said...

Someone just pointed me to the previous thread on the MISTI talk and I came here and found this thread, so I'll comment on it.

It is actually tough to find a single definition of analysis. Some people differentiate between analysis and examination (even though the definition of each refers to the other...). Some differentiate between analysis and interpretation.

I refer back to a paper that I did for DFRWS 2002 and IJDE Defining Digital Forensic Examination and Analysis Tool Using Abstraction Layers. The main purpose of most of our current analysis tools is to process abstraction layers in data. We import a raw disk image and the tool analyzes the data to produce partitions, files, unallocated space etc. Some people may also choose to manually perform the abstractions and parse the data structures by hand using a hex editor. Regardless, it is a form of analysis. The investigator still needs to interpret (or analyze if you want to use the word in this context as well) the results presented to him.

With "live analysis", the abstraction process is being performed by the suspect system, which can lie. My talk could have been more accurately entitled "Risks of Live Analysis..." (or something). In reality, the only difference between live analysis and dead analysis is the reliability of the data.

So, while running tools on the system can be viewed simply as data extraction there is actually a lot of analysis that had to go on behind the scenes for you to get the nicely formatted data. So, "live analysis" uses analysis techniques that are running on the suspect system, maybe some analysis techniques on your trusted evidence server, and interpretation about what the results mean.

brian

H. Carvey said...

Brian,

Who pointed you to the MISTI thread? I like to know who's reading the blog... ;-)

I think you make a very important point, in that we have to define what we mean by "analysis". In some cases, tools do analysis for us, as when we acquire data and EnCase or ProDiscover recognizes a particular file system and aids us by presenting it in a format we're familiar with.

What I'm referring to is when we see in a presentation where the speaker talks about collecting, say, information about processes; describing the tool used, piping the output over netcat to a waiting server, etc. Fine. So you hash it, but what do you do with it?

Well, you can look at it and see if there are any unusual processes running. Or you can correlate it with other (potentially corroborating) data, and see if there's something in the output of one tool that doesn't appear in the output of others. For example, say you run tlist.exe and openports.exe on a system, and when you view the output of each of the tools, you see a process bound to a listening port that doesn't appear in the output of tlist.exe. This is an example of analysis...determining what the data is telling you about your system.

Tools like rkdetect.vbs do this for us, by running WMI and sc.exe queries for service information, and looking for disparities in the output of the tools. In this way, Hacker Defender can be detected.

As a side note, I would agree with your statement about the difference between live and dead analysis, but I would add that b/c of the issue of reliability of the data, we may choose one over the other.

Anonymous said...

Harlan,

Good points. On the network side, traffic analysis would be pretty boring if people thought it only involved firing up Tcpdump, Ethereal, Argus, etc.

I submitted a proposal for DoD Cybercrime too. If we're both accepted, we'll have to chat in person.

H. Carvey said...

Richard,

...traffic analysis would be pretty boring if people thought it only involved firing up Tcpdump, Ethereal, Argus, etc.

Hold on...you're saying that there's more to traffic analysis than just Ethereal?? ;-)

I hope to see you at DC3. I'll keep an eye on the accepted speakers (if I get accepted, that is...) and if we're both on tap to speak, I'll end up having to get you to sign my copy of your book. You'll have to do a better job than Brian did, though... ;-)