My last post addressed parsing Java *.idx files, and since I released that post, a couple of resources related to the post have been updated. In particular, Joachim Metz has updated the ForensicsWiki page he started to include more information about the format of the *.idx files, with some information specific to what is thought to be the header of the files.
Also, Corey Harrell was kind enough to share the *.idx file from this blog post with me (click here to see the graphic of what the file "looks like" in Corey's post), and I ran it through the parser to see what I could find:
File: d:\test\781da39f-6b6c0267.idx
Times from header:
------------------------------
time_0: Sun Sep 12 15:15:32 2010 UTC
time_2: Sun Sep 12 22:38:40 2010 UTC
URL: http://xhaito.com/work/builds/exp_files/rox.jar
IP: 91.213.217.31
Server Response:
------------------------------
HTTP/1.1 200 OK
content-length: 14226
last-modified: Sun, 12 Sep 2010 15:15:32 GMT
content-type: text/plain
date: Sun, 12 Sep 2010 22:38:35 GMT
server: Apache/2
deploy-request-content-type: application/x-java-archive
Ah, pretty interesting stuff. Again, the "Times from header" section is comprised of, at this moment, data from those offsets within the header that Joachim has identified as possibly being time stamps. In the code, I have it display only those times that are not zero. What we don't have at the moment is information about the structure of the header so that we can identify to what the time stamps refer.
However, this code can be used to parse *.idx files and help determine to what the times refer. For example, in the output above we see that "time_0" is equivalent to the "last modified" field in the server response, and that the "time_2" field is a few seconds after the "date" field in the server response. Perhaps incorporating this information into a timeline might be useful, while research continues in order to identify what the time stamps represent. What is very useful is that the *.idx files are associated with a specific user profile, so for testing purposes, an analyst should be able to incorporate browser history and *.idx info into a timeline, and perhaps be able to "see" what the time stamps may refer to...if the analyst were to control the entire test environment, to include the web server, even more information may be developed.
Speaking of timelines, Sploited commented to my previous post regarding developing timelines analysis pivot points from other resources; in the comment, a script for parsing IE history files (urlcache.pl) was mentioned; I would suggest that incorporating a user's web history, as well as incorporating searches against the Malware Domain List might be extremely helpful in identifying initial infect vectors and entry points.
Interested in Windows DF training? Check it out: Timeline Analysis, 4-5 Feb; Windows Forensic Analysis, 11-12 Mar. Be sure to check the WindowsIR Training Page for updates.
The Windows Incident Response Blog is dedicated to the myriad information surrounding and inherent to the topics of IR and digital analysis of Windows systems. This blog provides information in support of my books; "Windows Forensic Analysis" (1st thru 4th editions), "Windows Registry Forensics", as well as the book I co-authored with Cory Altheide, "Digital Forensics with Open Source Tools".
Showing posts with label idx. Show all posts
Showing posts with label idx. Show all posts
Monday, January 21, 2013
Saturday, January 19, 2013
BinMode: Parsing Java *.idx files
One of the Windows artifacts that I talk about in my training courses is application log files, and I tend to sort of gloss over this topic, simply because there are so many different kinds of log files produced by applications. Some applications, in particular AV, will write their logs to the Application Event Log, as well as a text file. I find this to be very useful because the Application Event Log will "roll over" as it gathers more events; most often, the text logs will continue to be written to by the application. I talk about these logs in general because it's important for analysts to be aware of them, but I don't spend a great deal of time discussing them because we could be there all week talking about them.
With the recent (Jan, 2013) issues regarding a Java 0-day vulnerability, my interest in artifacts of compromise were piqued yet again when I found that someone had released some Python code for parsing Java deployment cache *.idx files. I located the *.idx files on my own system, opened a couple of them up in a hex editor and began conducting pattern analysis to see if I could identify a repeatable structure. I found enough information to create a pretty decent parser for the *.idx files to which I have access.
Okay, so the big question is...so what? Who cares? Well, Corey Harrell had an excellent post to his blog regarding Finding (the) Initial Infection Vector, which I think is something that folks don't do often enough. Using timeline analysis, Corey identified artifacts that required closer examination; using the right tools and techniques, this information can also be included directly into the timeline (see the Sploited blog post listed in the Resources section below) to provide more context to the timeline activity.
The testing I've been able to do with the code I wrote has been somewhat limited, as I haven't had a system that might be infected come across my desk in a bit, and I don't have access to an *.idx file like what Corey illustrated in his blog post (notice that it includes "pragma" and "cache control" statements). However, what I really like about the code is that I have access to the data itself, and I can modify the code to meet my analysis needs, much the way I did with the Prefetch file analysis code that I wrote. For example, I can perform frequency analysis of IP addresses or URLs, server types, etc. I can perform searches for various specific data elements, or simply run the output of the tool through the find command, just to see if something specific exists. Or, I can have the code output information in TLN format for inclusion in a timeline.
Regardless of what I do with the code itself, I know have automatic access to the data, and I have references included in the script itself; as such, the headers of the script serve as documentation, as well as a reminder of what's being examined, and why. This bridges the gap between having something I need to check listed in a spreadsheet, and actually checking or analyzing those artifacts.
Resources
ForensicsWiki Page: Java
Sploited blog post: Java Forensics Using TLN Timelines
jIIr: Almost Cooked Up Some Java, Finding Initial Infection Vector
Interested in Windows DF training? Check it out: Timeline Analysis, 4-5 Feb; Windows Forensic Analysis, 11-12 Mar.
With the recent (Jan, 2013) issues regarding a Java 0-day vulnerability, my interest in artifacts of compromise were piqued yet again when I found that someone had released some Python code for parsing Java deployment cache *.idx files. I located the *.idx files on my own system, opened a couple of them up in a hex editor and began conducting pattern analysis to see if I could identify a repeatable structure. I found enough information to create a pretty decent parser for the *.idx files to which I have access.
Okay, so the big question is...so what? Who cares? Well, Corey Harrell had an excellent post to his blog regarding Finding (the) Initial Infection Vector, which I think is something that folks don't do often enough. Using timeline analysis, Corey identified artifacts that required closer examination; using the right tools and techniques, this information can also be included directly into the timeline (see the Sploited blog post listed in the Resources section below) to provide more context to the timeline activity.
The testing I've been able to do with the code I wrote has been somewhat limited, as I haven't had a system that might be infected come across my desk in a bit, and I don't have access to an *.idx file like what Corey illustrated in his blog post (notice that it includes "pragma" and "cache control" statements). However, what I really like about the code is that I have access to the data itself, and I can modify the code to meet my analysis needs, much the way I did with the Prefetch file analysis code that I wrote. For example, I can perform frequency analysis of IP addresses or URLs, server types, etc. I can perform searches for various specific data elements, or simply run the output of the tool through the find command, just to see if something specific exists. Or, I can have the code output information in TLN format for inclusion in a timeline.
Regardless of what I do with the code itself, I know have automatic access to the data, and I have references included in the script itself; as such, the headers of the script serve as documentation, as well as a reminder of what's being examined, and why. This bridges the gap between having something I need to check listed in a spreadsheet, and actually checking or analyzing those artifacts.
Resources
ForensicsWiki Page: Java
Sploited blog post: Java Forensics Using TLN Timelines
jIIr: Almost Cooked Up Some Java, Finding Initial Infection Vector
Interested in Windows DF training? Check it out: Timeline Analysis, 4-5 Feb; Windows Forensic Analysis, 11-12 Mar.
Subscribe to:
Posts (Atom)