Saturday, February 02, 2013

BinMode: Parsing Java *.idx files, pt trios

Things have progressed a great deal since I last blogged on this subject.  Specifically, additional information and resources have been added to the ForensicsWiki page on this topic, and Brian has updated his Python code.  Mark Woan has created a .Net console application for parsing these files, as well, and his repo contains a PDF document that delineates the structure of the various versions of these files.

Running my own tool against the Java deployment cache on my system, I don't see much in the way of interesting data; most of what I have on this system is the result of accessing SANS webcasts.  However, parsing the data from the *.idx file that Corey provided, we see the following:

File: d:\cases\781da39f-6b6c0267.idx
content-length: 14226
last-modified: Sun Sep 12 15:15:32 2010 UTC

Server Response:
HTTP/1.1 200 OK
content-length: 14226
last-modified: Sun, 12 Sep 2010 15:15:32 GMT
content-type: text/plain
date: Sun, 12 Sep 2010 22:38:35 GMT
server: Apache/2
deploy-request-content-type: application/x-java-archive

The information displayed at the top of the output, above "Server Response", is from the header of the *.idx file, while the rest of the information is from Section 2 of the file.  For specifics of this data, take a look at the  PDF document that Mark provided.  Suffice to say, this is a great resource, because what you're seeing is extracted from the binary contents of the file.  Yes, the strings for the URL and IP address can be found via a text or keyword search, but an understanding of the data source and the data structure provides valuable context to the search hits.  Even better, a targeted, Sniper Forensics approach to going after the data is something that we can do now because of what we know about the data itself. what?  Now that we have this information available, how do we use it in exams?  Perhaps the most obvious would be to parse the contents of the *.idx files and check the output against the Malware Domain List, or "MDL".

Another method of analysis for this information would be to parse the data and correlate statistics from all of the available *.idx files (URL, IP address, content type, etc.), showing the stats as an overview before digging into the data itself.  Combining that two...MDL check and stats...would be a great way to perform data reduction. One might  incorporate checking against the MDL directly into a tool that parses the data within *.idx files for inclusion directly into a timeline, adding the pivot points directly to the timeline itself.  Incorporating this with other data...specifically, the user's web browser history...would allow an analyst to easily 'see' an Initial Infection Vector.

For me, the first step is to incorporate this information into a timeline...

Addendum:I updated my code recently to provide more than the output that you see above.  The new version includes options for CSV or TLN output.  It also includes a heuristic to help detect potentially malicious Java archives, as opposed to those that may be legit.

Interested in Windows DFIR trainingWindows Forensic Analysis, 11-12 Mar; Timeline Analysis, 9-10 Apr. Pricing and Calendar. Send email here to register.

No comments: