Saturday, January 19, 2013

BinMode: Parsing Java *.idx files

One of the Windows artifacts that I talk about in my training courses is application log files, and I tend to sort of gloss over this topic, simply because there are so many different kinds of log files produced by applications.  Some applications, in particular AV, will write their logs to the Application Event Log, as well as a text file.  I find this to be very useful because the Application Event Log will "roll over" as it gathers more events; most often, the text logs will continue to be written to by the application.  I talk about these logs in general because it's important for analysts to be aware of them, but I don't spend a great deal of time discussing them because we could be there all week talking about them.

With the recent (Jan, 2013) issues regarding a Java 0-day vulnerability, my interest in artifacts of compromise were piqued yet again when I found that someone had released some Python code for parsing Java deployment cache *.idx files.  I located the *.idx files on my own system, opened a couple of them up in a hex editor and began conducting pattern analysis to see if I could identify a repeatable structure.  I found enough information to create a pretty decent parser for the *.idx files to which I have access.

Okay, so the big question what?  Who cares?  Well, Corey Harrell had an excellent post to his blog regarding Finding (the) Initial Infection Vector, which I think is something that folks don't do often enough.  Using timeline analysis, Corey identified artifacts that required closer examination; using the right tools and techniques, this information can also be included directly into the timeline (see the Sploited blog post listed in the Resources section below) to provide more context to the timeline activity.

The testing I've been able to do with the code I wrote has been somewhat limited, as I haven't had a system that might be infected come across my desk in a bit, and I don't have access to an *.idx file like what Corey illustrated in his blog post (notice that it includes "pragma" and "cache control" statements).  However, what I really like about the code is that I have access to the data itself, and I can modify the code to meet my analysis needs, much the way I did with the Prefetch file analysis code that I wrote.  For example, I can perform frequency analysis of IP addresses or URLs, server types, etc.  I can perform searches for various specific data elements, or simply run the output of the tool through the find command, just to see if something specific exists.  Or, I can have the code output information in TLN format for inclusion in a timeline.

Regardless of what I do with the code itself, I know have automatic access to the data, and I have references included in the script itself; as such, the headers of the script serve as documentation, as well as a reminder of what's being examined, and why.  This bridges the gap between having something I need to check listed in a spreadsheet, and actually checking or analyzing those artifacts.

ForensicsWiki Page: Java
Sploited blog post: Java Forensics Using TLN Timelines
jIIr: Almost Cooked Up Some Java, Finding Initial Infection Vector

Interested in Windows DF training?  Check it out: Timeline Analysis, 4-5 Feb; Windows Forensic Analysis, 11-12 Mar.


Anonymous said...

Great post Harlan.

In terms of finding potential pivot points the malware domain list parser should work by piping the output of your idx parser and searching for any idx files containing urls similar to those on the MDL.

Other points of interest i've found are looking for things such as:

X-Powered-By: PleskLin

from the output of your urlcache tool (or something similar). This was due to the Plesk vulnerability that was the result of huge numbers of compromised websites.

So within your timeline if you saw a Plesk based webserver followed by java cache file/jar files it could be a potential pivot point to look at more closely.

H. Carvey said...

...points of interest i've found are looking for things such as...

Found where?

I ask, because I'm not seeing any server response lines in the *.idx files I've seen so far that are "X-" comments.


Anonymous said...

Sorry Harlan i'll clarify myself.

What i meant was that using the output from a tool such as your which provides the X-Powered-By field.

If an analyst was to search the urlcache output for something like PleskLin and then the next series of events in the timeline were IDX/jar file creation then it may justify a deeper look.

Not always the case but have had some pivot point quick wins in this area.

Corey Harrell said...

A script to automate examining IDX files is the way to go. I have been manually examining them and this method has fit my needs. However, being able to automate the process not only reduces errors but will save time. Nice posts you put together about IDX files.

H. Carvey said...


Thanks. I can see how this would be useful, but I would expand it to include the use of browsers other than IE.

@Corey, thanks.