Tuesday, January 18, 2005

MetaData

One topic that isn't covered in any great detail is metadata, particularly in files and documents on Windows systems.

The term metadata means "data about data". Searching the web, you'll find that this term has a variety of uses, depending upon the context. within the context of this blog entry, I'm going to use "metadata" to refer to data associated with a file, either accompanying it, or being contained within it. This can include file MAC times, NTFS alternate data streams (ADSs), file attributes, etc. Metadata can also include information or data contained in the file, as a part of the file or document itself.

You can use Frank Heyne's LADS tool to retrieve information about ADSs (see my previous post on data hiding for tools to retrieve the information stored within ADSs). LADS will find all ADSs, so if you've right-clicked on a file and filled out the Summary Information in the Property tab, or you've downloaded files via IE on XP SP2, you'll be guaranteed to see some ADSs. It's probably the others that you need to worry about, though.

Windows Media Files also contain metadata, as seen in the Extracting MetaData from Windows Media Files blog entry.

You can retrieve metadata from image files, particularly those created on digital cameras, from here. If you're interested in a Perl-based approach, check out the Perl EXIFTool script. It's unclear to me exactly how detailed this information is, and I don't think that it can definitively tie a particular image to a specific camera. However, it is interesting to see,and where possible, to add comments to your files.

To View/modify resources (ie, icons, dialogs, etc) within executable files on Windows systems, take a look at EXEScope and Resource Hacker

Something I presented in my book was a Perl script that used the PDF::API2 module to retrieve metadata from PDF documents.

You can the Win32::File::VersionInfo Perl module on Windows systems to retrieve file and product version information from within executable files. Many commercial companies (Microsoft, Adobe, HP, etc.) include this information in the EXE files, DLLs, and drivers they install on Windows systems. You can use this module to retrieve that information. If you're running ActiveState Perl, this module is trivial to install via PPM. I used modules like this when I was performing my analysis of the russiantopz IRC bot.

Metadata embedded in Office documents has been a big issue for quite some time. In fact, my first real introduction into the harm (re: embarrassment, information disclosure, or both) that it can cause was from the ComputerBytesMan's page on the topic. This is very interesting stuff. I included a Perl script with my book that can retrieve metadata from Word documents, and with a bit of work, it can be modified to pull similar data from Excel spreadsheets. Richard Smith, aka ComputerBytesMan, used a program he wrote himself to pull the revision history from the document, but you should be able to find the same information using strings.exe. Hint: the information is stored in the file in Unicode.

For a while, Microsoft had an article posted that manually dealt with removing this "hidden" data from Word files. Now, if you're using OfficeXP/2003, you might want to download the "Remove Hidden Data" tool.

Basically, once you go looking, you'd be surprised what information can be found in a variety of file formats.

2 comments:

Anonymous said...

The metadata recorded by a digital camera is almost always insufficient to tie an image to a particular camera. It almost always does, however, tie the image to a particular brand and model of camera, which can be just as useful. If the defendant owns a Foobar camera model 12345 and the image was taken with a Foobar camera model 12345 at the same time the crime occured, well, there's very little wiggle room.

H. Carvey said...

You're quite correct that EXIF data from images can only be used to tie the image to a particular type of camera...but this information can also be altered.

As a side note, when a digital camera is plugged into a Windows computer, in many cases it is recognized as a USB-connected storage device, and is assigned a drive letter. As such, a Registry entry is created and remains, even after the camera is disconnected. Checking the entries of the key, as well as the LastWrite time of the key, could provide useful information.