Showing posts with label programming. Show all posts
Showing posts with label programming. Show all posts

Thursday, July 11, 2013

Programming and DFIR

I was browsing through an online list recently and I came across an older post that I'd written, that had to do with tools.  In it, I'd made the statement, "Tweaked my browser history parser to add other available data to the events, giving me additional context."  This brought to mind just how valuable even the smallest modicum of programming skill can be to an analyst.

This statement takes understanding data structures a step further because we're not simply recognizing that, say, a particular data structure contains a time stamp.  In this case, we're modifying code to meet the needs of a specific task.  However, simply understanding basic programming principles can be a very valuable skill for DFIR work, in general, as the foundational concepts behind programming teach us a lot about scoping, and programming in practice allows us to move into task automation and eventually code customization.

Note
David Cowen has been doing very well on his own blog-a-day-for-a-year challenge, and recently posted a blog regarding some DFIR analyst milestones that he outlined. In this post, David mentions that milestone 11 includes "basic programming".  This could include batch file programming, which is still alive and well, and extremely valuable...just ask Corey Harrell.  Corey's done some great things, such as automating exploiting VSCs, through batch files.

Scoping
My programming background goes back to the early '80s, programming BASIC on the Timex-Sinclair 1000 and Apple IIe.  In high school, I learned some basic Pascal on the TRS-80, and then in college, moved on to BASIC on the same platform.  Then in graduate school, I picked up some C (one course), some M68K
assembly, and a LOT of Java and MatLab, to the point that I used both in my thesis.  This may seem like a lot, but none of it was really very extensive.  For example, when I was programming BASIC in college, my programs included one that displayed the Punisher skull on the screen and played the "Peter Gunn theme" in the background, and another one interfaced with a temperature sensor to display fluctuations on the screen.  In graduate school, the C programming course required as part of the MSEE curriculum really didn't have us to much more than open, write to or read from, and then close a file.  Some of the MatLab stuff was a bit more extensive, as we used it in linear algebra, digital signal processing and neural network courses.  But we weren't doing DFIR work, nor anything close to it.

The result of this is not that I became an expert programmer...rather, take a look that something David had said in a recent blog post, specifically that an understanding of programming helps you put your goals into perspective and reduce the scope of the problem you are trying to solve.  This is the single most valuable aspect of programming experience...being able to look at the goals of a case, and break them down into compartmentalized, achievable tasks.  Far too many times, I have seen analysts simply overwhelmed by goals such as, "Find all bad stuff", and even when going back to the customer to get clarification as to what the goals of the case should be, they still are unable to compartmentalize the tasks necessary to complete the examination.

Task Automation
There's a lot that we do that is repetitive...not just in a single case, but if you really sit down and think about the things you do during a typical exam, I'm sure that you'll come across tasks that you perform over and over again.  One of the questions I've heard at conferences, as well as while conducting training courses, is, "How do I fully exploit VSCs?"  My response to that is usually, "what do you want to do?"  If your goal is to run all the tools that you ran against the base portion of the image against the available VSCs, then you should consider taking a look at what Corey did early in 2012...as far as I can see, and from my experience, batch scripting such as this is still one of the most effective means of automating tasks such as this, and there is a LOT of information and sample code freely available on the Interwebs for automating an almost infinite number of tasks.

If batch scripting doesn't provide the necessary flexibility, there are scripting languages (Python, Perl) that might be more suitable, and there are a number of folks in the DFIR community with varying levels of experience using these languages...so don't be afraid to reach out for assistance.

Code Customization
There's a good deal of open source code out there that allows us to do the things we do.  In other cases, a tool that we use may not be open source, but we do have open source code that allows us to manipulate the output of the tool into a format that is more useful, and more easily incorporated into our analysis process.  Going back to the intro paragraph to this post, sometimes we may need to tweak some code, even if it's to simply change one small portion of the output from a decimal to hex when displaying a number.  Understanding some basic coding lets us not only be able to see what a tool is doing, but it also allows us to adjust that code when necessary.

Being able to customize code as needed also means that we can complete our analysis tasks in a much more thorough and timely manner.  After all, for "heavy lifting", or highly repetitive tasks, why not let the computer do most of the work?  Computers are really good at doing the same thing, over and over again, really fast...so why not take advantage of that?

Summary
While there is no requirement within the DFIR community (at large) to be able to write code, programming principles can go a long way toward developing our individual skills, as well as developing each of us into better analysts.  My advice to you is:

Don't be overwhelmed when you see code...try opening the code in a text viewer and just reading it.  Sure, you may not understand Perl or C or Python, but most times, you don't need to understand the actual code to figure out what it's doing.

Don't be afraid to reach out for help and ask a question.  Have a question about some code?  Reach out to the author.  Many times, folks crowdsource their questions, reaching to the "community" as a whole, and that may work for some.  However, I've had much better success by reaching directly to the coder...I can usually find their contact info in the headers of the code they wrote.  Who better to answer a question about some code than the person who wrote it?

Don't be afraid to ask for assistance in writing or modifying code.  From the very beginning (circa 2008), I've had a standing offer to modify RegRipper plugins or create custom plugins...all you gotta do is ask (provide a concise description of what's needed, and perhaps some sample data...).  That's it.  I've found that in most cases, getting an update/modification is as simple as asking.

Make the effort to learn some basic coding, even if it's batch scripting.  Program flow control structures are pretty consistent...a for loop is a for loop.  Just understanding programming can be so much more valuable than simply allowing you to write a program.

Monday, June 28, 2010

Skillz

Remember that scene from Napoleon Dynamite where he talks about having "skillz"? Well, analysts have to have skillz, right?

I was working on a malware exam recently...samples had already been provided to another analyst for reverse engineering, and it was my job to analyze acquired images and answer a couple of questions. We knew the name of the malware, and when I was reviewing information about it at various sites (to prepare my analysis plan), I found that when the malware files are installed, their MAC times are copied from kernel32.dll. Okay, no problem, right? I'll just parse the MFT and get the time stamps from the $FILE_NAME attribute.

So I received the images and began my case in-processing. I then got to the point where I extracted the MFT from the image, and the first thing I did was run David Kovar's analyzemft.py against it. I got concerned after I ran it for over an hour, and all I got was a 9Kb file. I hit Ctrl-C in the command prompt and killed the process. I then ran Mark Menz's MFTRipperBE against the file and when I opened the output .csv file and ran a search for the file name, Excel told me that it couldn't find the string. I even tried opening the .csv file in an editor and ran the same search, with the same results. Nada.

Fortunately, as part of my in-processing, I had verified the file structure with FTK Imager, and then created a ProDiscover v6.5 project and navigated to the appropriate directory. From there, I could select the file within the Content View of the project and see the $FILE_NAME attribute times in the viewer.

I was a bit curious about the issue I'd had with the first two tools, so I ran my Perl code for parsing the MFT and found an issue with part of the processing. I don't know if this is the same issue that analyzemft.py encountered, but I made a couple of quick adjustments to my Perl script, and I was able to fairly quickly get the information I needed. I can see that the file has $STANDARD_INFORMATION and $FILE_NAME attributes, as well as as data attribute, that the file is allocated (from the flags), and that the MFT sequence number is 2. Pretty cool.

The points of this post are:

1. If you run a tool and do not find the output that you expect, there's likely a reason for it. Validate your findings with other tools or processes, and document what you do. I've said (and written in my books) that the absence of an artifact where you would expect to find one is itself an artifact.

2. Analysts need to have an understanding of what they're looking at and for, as well as some troubleshooting skills, particularly when it comes to running tools. Note that I did not say "programming" skills. Not everyone can, or wants to, program. However, if you don't have the skills, develop relationships with folks who do. But if you're going to ask someone for help, you need to be able to provide enough information that they can help you.

3. Have multiple tools available to validate your findings, should you need to do so. I ran three tools to get the same piece of information, of which I had documented the need in my analysis plan prior to receiving the data. One tool hung, another completed without providing the information, and I was able to get what I needed from the third, and then validate it with a fourth. And to be honest, it didn't take me days to accomplish that.

4. The GUI tool that provided the information doesn't differentiate between "MFT Entry Modified" and "File Modified"...I just have two time stamps from the $FILE_NAME attribute called "Modified". So I tweaked my own code to print out the time stamps in MACB format, along with the offset of the MFT entry within the MFT itself. Now, everything I need is documented, so if need be, it can be validated by others.