Friday, January 14, 2005

Data Reduction and Representation

Data reduction and representation or reporting is a huge issue in the Windows world, even in IR. Maybe more so, in some cases. By now,you're probably thinking, what's he talking about? What I'm referring to the glut of data that is potentially available for troubleshooting and/or incident response purposes.

What kind of data am I talking about? Event Logs for one. My previous post pointed out ways of reducing the amount of noise and unusable data that goes into the Event Logs, so that you've got something really useful to work with. From there, there are a variety of tools available for parsing and managing the data, from Perl to LogParser. My focus is always freeware approaches, due to the advantages, such as being able to implement something with little to no monetary cost...the biggest 'cost' is the time it takes to learn something and develop a new skill.

One method of data reduction that I've used in the past is Marcus Ranum's "artificial ignorance". What this amounts to is that instead of using complicated heuristic algorithms to try to determine what's "bad" on a machine, you simply get a list of all the "known good" stuff' and have thetool show you everything else. For example, I used to use a simple Perl script to retrieve the contents of the HKLM\..\Run key from all of the systems (workstations, servers, et al) from across the enterprise. I'd have a small text file containing known good entries, and I'd have the script filter those entries out of what was being retrieved, leaving only the unusual stuff for me to look at. This works for just about anything...Registry keys, services and device drivers, processes, etc. Using Perl or VBScript to implement WMI calls, and then filtering the output based on a list of stuff you know to be good for your environment is pretty easy. I'm sure that there commercial products that may help, but as we all know, no two architectures are the same, so what you're really paying for is a nice GUI and a huge database of "known goods", most of which you're not usinganyway. Besides, if, as a sysadmin, you can't look at a listing from one of your systems and not know, or be able to quickly figure out, what's valid and what's not...well, let's just say that it would be a great learning experience. The same holds true for consultants and security engineers...with the right tools you can easily determine what's 'good' or 'bad' on a system. "AI" is easy to implement and adheres to the KISS principle.
The vague term "data munging" may apply to other forms of data reduction and representation. When performing IR activities, you will very often have many files to deal with for each machine, each file representing the output of a particular tool. In some cases, you may need to correlate the output from several tools, such as those that collect process information. If you run multiple tools (tlist, pslist, WMI scripts, openports/fport/portqry, etc) then you're going to have to process and analyze this data. Well, let me tell you...after having to print out all of these files, lay them out on a big table, and trace through each one with a ruler and a highlighter, I figured I'd write a Perl script to do the correlation (by PID) and presentation for me. I implemented this in the procdmp.pl script provided with my book. Taking it one step further, someone with even a little bit of Perl knowledge (or even just the desire) could add "artificial ignorance" to that, as well, reducing the data even further for their own infrastructure.

Let's say you do some security scanning, either on your own internal (corporate) network, or as part of contract (you're a consultant). Your tool of choice may be nmap...if so, take a look at fe3d for visualizationof the data. Fe3d may not be *the* answer you're looking for, as you may be more interested in that workstation that's running a web server than having pretty pictures to look at, but fe3d is an option, especially if you want to "wow" your customer or boss.

Finally, some things that go into data reduction...rather than reducing the data at the output end of the process (ie, you've already collected the data), try doing so at the front end. An example of this is the steps I mentioned in an earlier blog entry about configuring Event logs. Another way of handling the data is to know exactly what you're looking for. Personally, when it comes to either incident response or vulnerability assessments, I'd rather collect all that I can, and then reduce it anduse only what I need. However, this may not work for everyone. Perhaps looking only for specific things works for others.

No comments: