Saturday, March 22, 2014

Coding for Digital Forensic Analysis

Over the years, I've seen a couple of questions on the topic of coding for digital forensic analysis.  Many times, these questions tend to devolve into a quasi-religious debate over the programming language used, and quite honestly, that detracts from the discussion as a whole, because regardless of the language used, these questions are very often more about deconstructing or reconstructing data structures, processing logs, or simply obtaining context and meaning from the available mass of data.

Languages
Programming languages abound, and from what I've seen, the one chosen comes down to personal preference, usually based on experience and knowledge of the language.

I started programming BASIC on the Apple IIe back around 1982.  I typed a couple of BASIC programs into a Timex-Sinclair 1000, and then took the required course in BASIC programming my freshman year in college.  In high school, I took the brand new AP Computer Science course, which focused on using PASCAL...I ended up using TurboPASCAL at home to compile my programs and bring them in on a 5.25 in. floppy drive.  In graduate school, I took a C/C++ programming course, but it didn't involve much more than opening a file, writing to it, and then closing it.  I also did some M68000 assembly language programming. We used MatLab a lot in some of the different courses, particularly digital signal processing and neural networks, and I used it to perform some statistical analysis for my thesis.

In 1999, I started teaching myself to program Perl in order to have something to do.  I was working as a consultant at Predictive Systems, and we didn't have a security practice at the time.  I could hear the network operations team talking about how they needed someone who could program Perl, so I started teaching myself what I could so that maybe I could offer some assistance.  From there, I branched out to interacting with Windows systems through the API provided by Dave Roth's Perl modules.  At one point, while working at TDS, I had a fully-functional product that we were using to replace the use of ISS's Internet Scanner, due to the number of false positives and cryptic responses we were receiving.

Since then, I've used Perl for a wide variety of tasks, from IIS web server log processing to RegRipper to decoding binary data structures.

However, I'm NOT suggesting that Perl is the be-all and end-all of programming languages, particularly for DFIR work. Not at all.  All I've done is provided my experience.  Over the years, other programming languages have been found to be extremely useful.  I've seen R used for statistical analysis, and it makes a lot of sense to use this language for that task.  I've also seen a lot of programming in the DFIR space using C# and .NET, and even more using Python.  I've seen folks switch from another language to Python because "everyone is doing it".  I've seen so much of a use of Python, that I've started learning it myself (albeit slowly) in order to better understand the code, and even create my own.  The list of projects written in Python is pretty extensive, so it just makes sense to learn something about this language.

Defining the Problem
The programming language you use to solve a problem is really irrelevant.  I've got bits of Perl code lying around...stuff for parsing certain data structures, printing binary data to the console in hex editor format, etc...so if I need to pull something together quickly, I'm likely going to use a snippet of Perl code that I already have.  I'm sure that I'm no different from any other analyst in that regard.

But when it comes to the particular language or approach I'm using, it depends on what I'm trying to achieve...what are my goals?  If I'm trying to put something together that I'm going to either have a client use, or leave behind after I'm done for the client to use, I may opt for a batch file.  If I need something quickly, but with more flexibility than is offered by a batch file, I may opt for a Perl script.

I've found over time that some of the programming languages used can be difficult to work with...what I mean by that is that some of the tools written and made available by the authors display their results in a GUI, and are closed source.  So, you can't see what the tool is doing, and you can't easily incorporate the output of the tools using techniques like timeline analysis.

Task-Specific Programming
Some folks have asked about learning programming specifically for DFIR work; I'm not sure if there are any.  What it comes down to is, what do you want to do?  If you want to read binary data and parse out data structures based on some definition, then C/C++, C#, Python, Perl, and some other languages work well.  For many, some snippets of code are already available online for these tasks.  If you're trying to process text-based log files, I've found Perl to be well-suited for this task, but if you're more familiar with and comfortable with Python, go for it.

When it really comes down to it, it isn't about which programming language is "the best", it's about what your goal is, and how you want to go about achieving it.

Resources
Python tutorial
80+ Best Free Python Tutorials/books
List of free programming books, Python section

2 comments:

Corey Harrell said...

Harlan,

Excellent point about the how (language) doesn't matter as much as the what (problem). Probably one of the most beneficial things I've done was to teach myself how to script. Knowing how to script has allowed me to accomplish so much; from examining VSCs to data collections. The method I mostly used has been batch scripting with a little bit of Perl. Even simple batch scripts have enabled me to solve big problems in my job.

Shelly said...

I definitely related to this at work this week. I had a problem and my go-to recently has been Python as I was trying to learn something new. However, the more eloquent and straight-forward solution was in Perl. Still have a long way to go with both, but each time I have an issue to solve, its cool to pick up a little more.