Windows Incident Response: metadata

Showing posts with label metadata. Show all posts

Monday, April 16, 2012

Metadata

I've blogged about metadata before (here, and here), but it's been a while, and this is a subject worth revisiting every so often. Metadata has long been an issue for users, and a valuable resource for investigators and forensic analysts. There are a number of file types (images, documents) that allow for embedded metadata...this doesn't mean that it's always populated, but I think you'd be surprised how much information is, in fact, leaked via embedded metadata. MS Office documents, PDFs, and JPG images are all known to be capable of carrying a range of embedded metadata.

One example of embedded metadata coming back to bite someone that I've referenced in my books is the Blair issue discussed by the ComputerBytesMan. This particular issue dated back to 2003, and it's clear that an older version of MS Word was used at the time. This version of MS Word used the OLE "structured storage" format; more recent versions of the Office documents don't use this format any longer, but it is used in Jump Lists, Sticky Notes, and IE session restore files.

Metadata has also brought down others. In the spring of 2012, metadata embedded in an image taken with a smartphone was used to track down the hacker "w0rmer".

One of the best tools I've found for collecting metadata from a wide range of file types (images, documents) is Phil Harvey's EXIFTool. This is a command line tool (available for Windows, Mac OS X, and Linux), which means it's easy to script; you can write simple batch files to extract metadata from all files in a folder, or all files of a particular type (JPG, DOC/DOCX, etc.) in a directory structure. If you prefer GUI tools, check out the EXIFToolGUI...simply remove the "(-k)" from the EXIFTool file name and put the GUI application in the same directory, and you're ready to go.

For more recent versions of MS Office documents, you might consider using read_open_xml_win.pl.

Removing embedded metadata can be pretty easy without employing any special tools. For example, you can remove embedded metadata from JPG images (the format used on digital cameras and smartphones) by using MS Paint to convert the image to TIFF format, then back to JPG.

Metadata can be a very valuable resource of investigators. Computer systems may include a number of images or documents from which metadata can be extracted. When examining systems, analysts should be sure to include looking for smartphone backups files, as images found in these backups may have considerable intelligence value.

Finding images or documents to check for embedded metadata is easy. Start with your own hard drive or file server. Alternatively, you can run Google searches (i.e., "site:domain.com filetype:doc") and find a great deal of documents available online.

Speaking of metadata, one file type that contains some interesting metadata is XP/2003 .job files. About three years ago, I had written a script to parse these files, and was recently asked to provide a copy of this script. I don't usually do that, as most often I don't hear back as to how well the script ran, if at all...but I decided to make an exception and provide the script this time. It turns out that the script had an issue, and Corey Harrell was nice enough to provide a couple of .job files for testing. As it turns out, when I wrote the script, I hadn't had any .job files that had never been run, and the script was failing because I hadn't dealt with the case where the time fields were all zero. Thanks to Corey, I was able to quickly get that fixed and provide a working copy.

Friday, November 13, 2009

Some Analysis Coolness

TimeLine Analysis

The most recent issue of Hakin9 is available now...my second article on timeline creation and analysis is in this one; it's a hands-on walk-through of using the tools I put together, and use on a regular basis. You know...eat your own dogfood, as it were.

What do I like so much about this analysis method? Well, it's fast, it's relatively easy, and it lets an analyst (i.e., me) see a bunch of stuff all together in one place. It's pretty cool to see things like a remote login, creation of the PSExecSvc service, see that service start, then see a bunch of other files being created...to include the data files created by the malware.

Another thing I like about timeline creation and analysis is this...let's say you've got an analyst (or a team) on-site working an engagement, and they're stuck with something; determining the avenue of infection or compromise...whatever. Now let's assume that it's an engagement involving sensitive data, and they're trying to scope everything AND do collections. You can have those analysts dump the file system metadata, extract selected files from the system or image, zip all of that up and send it to someone for analysis. Not only do you run your analysis in parallel...you're not sending that sensitive data out! That's right, folks...you can increase your response efficiency and effectiveness using off-site staff, without further exposing sensitive data!

The version of the tools used in the article are available for download from the Win4n6 Yahoo group. The tools are all separate, standalone tools for right now because, to be honest, I don't always use them all together. Sometimes, it's good to see activity in a different format...in others, it's good to see a limited subset of activity (say, just your Event Log records) all at once, before moving on. By having separate tools, the analyst can intelligently select what they want added to the timeline in order to build it out.

File and Document Metadata

When I used to present at LE-oriented conferences more often, I'd talk about a nifty little tool out there called MergeStreams. This is a great little tool that essentially allows you to "hide" an Excel spreadsheet inside a Word document. This only applies to pre-Office 2007 document formats, however. I'll say that again...it only works on versions of MSOffice that use the OLE compound document format. What I'd show is someone pasting pictures (re: illicit images) into a Word document and then merging those with an Excel spreadsheet. Name the file "myspreadsheet.xls" and you would see the Excel spreadsheet. Rename the file, giving it a .doc extension, and you'd see the Word document.

While we're talking about Office document metadata, now is a good time to revisit some tools for extracting metadata; for pre-Office 2007 documents that use the OLE structured storage format, I've used the tools from my book, oledmp.pl and wmd.pl quite effectively, and there's OffVis from MS; for Office 2007 documents, try cat_open_xml.pl.

Didier Stevens has come up with something similar for PDF documents. All I can say about this is...wow. Really. This takes me back to '99, when I was sitting in the EnCase Introductory Training course in Leesburg, VA, and we were talking about file signature analysis. Gone are the days where we can simply scan for file signatures and compare that to the file extension...in order to do a decent job, we now have to go deeper. Just because a file begins with "MZ", is it really a Windows PE file? Is the PDF or Word (pre-2007) document really just a document, or is it a container masking/hiding something else?

Remember, a lot of the anti-forensics techniques out there target the analyst and their training.

Speaking of files, have you seen this new plugin from Bit9 called FileAdvisor? It's apparently a shell plugin for Windows, so if you find a suspicious file on your system, you can right-click it, and hash it and submit it for analysis. To view results, you'll need to register at the site with your name, email address, and a password. I don't necessarily see this on every user's desktop, but I do see responders and analysts possibly having it installed on a system somewhere.

Memory Parsing/Analysis
Jeff Bryner has put together a Python script for extracting FaceBook artifacts from a memory dump called pdfbook. For Windows systems, the script parses memory dumps from pd...I wonder if you could do the same thing using a full memory dump, extracting just the memory used by the process? Jeff has also released yim2text, a Python script for extracting Yahoo chat artifacts. Very cool.

Saturday, January 05, 2008

Metadata, again...

I've blogged about metadata for various file types before, and the other day I saw a question regarding metadata in MS Works documents. That was pretty interesting, so I fired up my 'leet Google h4x0R skillz and entered in metadata + "MS Works" as my search terms, and I ended up finding something called Meta-Extractor from the folks at the National Library of New Zealand. This tool appears to be Java-based, is about a 10.7MB download, and appears to extract metadata from a variety of file formats...to include MS Works! That's interesting...I didn't even know that MS Works docs had metadata! My first real intro into Word metadata involved the Blair doc...and I'm aware that other Office OLE file formats have metadata, as well.

Another such tool is Metagoofil, from DarkNet. I haven't tried this one...but then I haven't had a great deal of need for things like metadata. When I have, I've written my own tools.

One of the more interesting ways to generate some cool metadata is to use MergeStreams to merge an Excel spreadsheet into a Word doc. I used to present on this at LE conferences all the time, along with things like NTFS alternate data streams, and hiding data in the Registry...but it looks like this stuff is just kewl nerd stuff and nothing more...

Friday, May 25, 2007

Prefetch Analysis

I've seen a couple of posts recently on other blogs (here's one from Mark McKinnon) pertaining to the Windows XP Prefetch capability, and I thought I'd throw out some interesting stuff on analysis that I've done with regards to the Prefetch folder.

First off, XP's Prefetch capability is meant to enhance the user eXPerience by helping frequently used applications load faster. Microsoft has a nice writeup on that, and portions of and references to that writeup are included in my book. XP has application prefetching turned on by default, and while Windows 2003 has the capability, only boot prefetching is turned on by default. So, XP systems are rich in data that can help you assess and resolve an incident investigation.

First off, XP can maintain up to 128 Prefetch files...these are files within the Windows\Prefetch directory that end in ".pf". These files contain a bunch of prefetched code, and the second half of the files generally contain a bunch of Unicode strings that point to various modules that were accessed when the application was launched. Also, each Prefetch file contains that run count (number of times the application has been run) as well as a FILETIME object representing the last time the application was launched, within the file itself (ie, metadata).

Okay, so how can this information be used during forensics analysis? Remember Harlan's Corollary to the First Law of Computer Forensics? If you acquire an image from a system...say, a user's laptop...and you're told that the user had this laptop for a year or so, and you don't find any .pf files...what does that tell you?

Mark talked about U3 Smart Technology, and some of the Prefetch artifacts left behind by the use of tools like this. Excellent observations, but keep in mind that the Prefetch files aren't specific to a user...they're system-wide. On a multi-user system, you may have to look other places to determine which user launched the application in the first place. Ovie does a great job talking about the UserAssist keys and how they can help you narrow down who did what on the system.

I've looked to the Prefetch folder for assistance with an investigation. In one instance, there was a suspicion that a user had deleted some files and removed software from the system, and attempted to cover his tracks. While it was clear that the user had done some of these things (ie, removed software, emptied their Recycle Bin, etc.) it was also clear that they hadn't gone through the trouble of running one of those tools that delete everything; most of the artifacts I would look for were still in place (can you guess from my book what those artifacts might have been?). I found a reference to defrag.exe in the Prefetch folder, but nothing to indicate that the user had run the defrag tool (XP's built-in, automatic anti-forensics capabilities are a subject for another post). It turns out that as part of the Prefetch capability, XP runs a limited defrag every 3 days...the Prefetch capability prefetches XP's own prefetch functionality. ;-)

In another instance, I wanted to see if a user had burned anything to CD from the system. I found the installed software (Roxio Sonic), but found no references in any of the user artifacts to actually launching the software. I did, however, find an IMAPI.EXE-XXXXXX.pf file in the Prefetch directory. Interestingly enough, the Unicode strings within the file included a reference to iTunes, which, it appeared, the user used a lot. It turns out that iTunes likes to know where your CD or DVD burner is...I confirmed this on another system on which I knew the user used iTunes, and had not burned any CDs.

So, as a wrap up, some things to look for when you're digging into the Prefetch directory:

- How many .pf files (between 0 and 128) are in the Prefetch directory?

- For each .pf file, get the last run time and the run count. The last run time is a FILETIME object, meaning that it is maintained in UTC format...you may need to adjust using information from the TimeZoneInformation Registry key (ie, ActiveTimeBias).

- Correlate .pf files and the last run times to UserAssist key entries to tie activity to a specific user, as well as the Event Logs.

- Run strings to get the Unicode strings from the file and see what other modules were accessed when the application was launched.

Finally, there is a ProDiscover ProScript on the DVD that ships with my book (in the ch5 directory) that will locate the Prefetch folder (via the Registry) and automatically parse the .pf files, listing the last run time and run count for each. I have since updated that ProScript to display its output in time-sorted order, showing the most recent time first. I've found that this makes analysis a bit easier.

Friday, May 11, 2007

PPT Metadata

I received an email recently asking if I had any tools to extract metadata from PowerPoint presentations. Chapter 5 of my book includes the oledmp.pl Perl script, which grabs OLE information from Office files; this includes Word documents, Excel spreadsheets, and PowerPoint presentations. I've run some tests using this script, and pulled out things like revision number, created and last saved dates, author name, etc.

Pretty interesting stuff. There may be more...maybe based on interest and time, someone can look into this...

Here's an example of the oledmp.pl output from a PPT file (some of the info is masked to protect privacy):

C:\perl>oledmp.pl file.ppt
ListStreams
Stream : ♣DocumentSummaryInformation
Stream : Current User
Stream : ♣SummaryInformation
Stream : Pictures
Stream : PowerPoint Document

Trash Bin Size
BigBlocks 0
SystemSpace 876
SmallBlocks 0
FileEndSpace 1558

Summary Information
subject
lastauth Mary
lastprinted
appname Microsoft PowerPoint
created 09.06.2002, 19:51:48
lastsaved 14.09.2004, 19:08:39
revnum 32
Title Title
authress John Doe

Pictures
Current User
♣SummaryInformation
PowerPoint Document
♣DocumentSummaryInformation

So what does all this mean? Well, we see the various streams that are embedded in the document, and an example of what is extracted from the SummaryInformation stream. Some of this information can be seen by right-clicking on the file in Windows Explorer, choosing Properties, and then choosing the Summary Tab, and then clicking the Advanced button.

Simple modifications to the oledmp.pl script will let you extract the stream tables, as well, showing even more available information.

Monday, September 25, 2006

MetaData and eDiscovery

In yesterday's CyberSpeak podcast, mention was made of issues with Office document metadata and eDiscovery. Several commercially available tools were mentioned, and I wanted to mention that there are freeware tools available.

First off, let me say that the tool I'll mention is one of my own...I'll be up front about that. It's a Perl module that I posted on CPAN, and it ships with a sample script called "testwd.pl". On Windows, if you're using ActiveState's ActivePerl, installation of the module is simple. Download the archive and extract the MSWord.pm file to \perl\site\lib\File. To install the necessary modules to support this module, use the following commands:

ppm install OLE-Storage
ppm install Startup
ppm install Unicode-Map

The sample script pulls out the data in a crude format...the original script that I based this module on (wmd.pl) did a better job of extracting the information in a pretty format. As an example, I'll use the Blair document:

C:\Perl>wmd.pl d:\cd\blair.doc
--------------------
Statistics
--------------------
File = d:\cd\blair.doc
Size = 65024 bytes
Magic = 0xa5ec (Word 8.0)
Version = 193
LangID = English (US)

Document was created on Windows.

Magic Created : MS Word 97
Magic Revised : MS Word 97

--------------------
Last Author(s) Info
--------------------
1 : cic22 : C:\DOCUME~1\phamill\LOCALS~1\Temp\AutoRecovery save of Iraq - securi
ty.asd
2 : cic22 : C:\DOCUME~1\phamill\LOCALS~1\Temp\AutoRecovery save of Iraq - securi
ty.asd
3 : cic22 : C:\DOCUME~1\phamill\LOCALS~1\Temp\AutoRecovery save of Iraq - securi
ty.asd
4 : JPratt : C:\TEMP\Iraq - security.doc
5 : JPratt : A:\Iraq - security.doc
6 : ablackshaw : C:\ABlackshaw\Iraq - security.doc
7 : ablackshaw : C:\ABlackshaw\A;Iraq - security.doc
8 : ablackshaw : A:\Iraq - security.doc
9 : MKhan : C:\TEMP\Iraq - security.doc
10 : MKhan : C:\WINNT\Profiles\mkhan\Desktop\Iraq.doc

--------------------
Summary Information
--------------------
Title : Iraq- ITS INFRASTRUCTURE OF CONCEALMENT, DECEPTION AND INTIMIDATION
Subject :
Authress : default
LastAuth : MKhan
RevNum : 4
AppName : Microsoft Word 8.0
Created : 03.02.2003, 09:31:00
Last Saved : 03.02.2003, 11:18:00
Last Printed : 30.01.2003, 21:33:00

--------------------
Document Summary Information
--------------------
Organization : default

Notice the bolded line above...this is extracted from the binary data of the file.

The module extracts the information, it just needs to be prettied up a bit. Another benefit of the module is that it extracts additional information from the OLE contents of the file. First off, it extracts information about the OLE "trash bins", where useful data could be hidden:

Trash Bin Size
BigBlocks 0
SystemSpace 940
SmallBlocks 0
FileEndSpace 1450

Also, the module collects information about the OLE streams within the file:

Stream : ☺CompObj
Stream : WordDocument
Stream : ♣DocumentSummaryInformation
Stream : ObjectPool
Stream : 1Table
Stream : ♣SummaryInformation

At this point, you're probably thinking, "yeah...so?" Well, there's a freeware utility available called MergeStreams that allows you to merge an Excel spreadsheet into a Word document. The resulting file is slightly smaller than the sum of both file sizes, and the file extension is ".doc"...so if you double click the file, it will open in Word and all of the word data will be visible. However, if you change the file extension to ".xls" and double-click the file, it will open in Excel, with none of the Word data/information visible. It's still there...it's just not being parsed by Excel.

Why is this important? Well, if I wanted to smuggle information out of an organization, I might put the information in a spreadsheet for easy access and searching and then merge it into an innocuous Word document and copy it to my thumb drive (or laptop hard drive). If on the off chance anyone was to search me or my devices, they'd see the Word document. If the double-clicked it, they'd see the innocuous, boring content I'd put there...and wave me on my merry way. The same could be true for email attachments.

The example that I use that gets the LEOs sitting up in their seats is to take three illicit images and paste them into a Word document. Merge the document with an Excel spreadsheet that may be widely circulated throughtout the company...financial forecasts, etc. Only those folks who know that the images are there will know to change the file extension to ".doc" so that they can view the images.

Interesting stuff. Like I said before, if you have a situation like what was mentioned in the podcast (i.e., you have to search a lot of files for specific metadata, such as the last author, or one of the last 10 authors), then something like the Perl module provides the necessary framework; combine it with any number of ways to enumerate the files in question (read the contents of a directory, read the file list from a file, etc.), Perl's regular expressions, and you can output to any format you like (HTML, XML, spreadsheet, database, text file, etc.).