Tuesday, May 17, 2005

Data Reduction, revisited

I thought I'd take a moment to revisit the topic of data reduction.

What steps are you using to perform data reduction? What are you doing to sort the wheat from the chaff, as it were?

Some of the data reduction steps I'm aware of include:
  • Hash sets - look for known good or known bad files
  • File signature analysis - look for files whose header information doesn't match up nicely with the file extension
  • File version info - parse binary files for file version info, and flag those that don't have any
  • Keyword searches - depending on the case, look for files/sectors containing certain key words

Hash sets can be used to sift through those hundreds or thousands of operating system files...the ones that we know are good, and therefore we're not interested in them. You can also use hash sets to look for known bad files, as well.

What else are folks doing?

2 comments:

Anonymous said...

Timestamps are a big one for me, the owner of the file, and having a good idea of where to look and what to look for. Startup locations, user accounts, etc. You mentioned files not having any version info, with Log Parser you can search for files that have suspicious version information. For example
---------------------------
/* logparser.exe file:origionalfilename.sql?source="C:\*.*" preserveLastAccTime:ON -i:FS -rtp:-1

This will find files that have been renamed and had a suspicious original file name. Scanning only *.exe files would be quicker but would miss executables that have a renamed file extension.


*/
SELECT
CreationTime AS CreationTime,
EXTRACT_PATH(TO_LOWERCASE(path)) AS ContentPath,
TO_LOWERCASE(name) AS FileName,
TO_LOWERCASE(originalfilename) AS OriginalName
FROM %source%
WHERE
(originalfilename IS NOT NULL)
AND
(TO_LOWERCASE(name) <> TO_LOWERCASE(originalname)
AND TO_LOWERCASE(originalname) LIKE 'cmd.exe'
OR TO_LOWERCASE(originalname) LIKE 'mirc.exe'
OR TO_LOWERCASE(originalname) LIKE '%serv%.exe'
OR TO_LOWERCASE(originalname) LIKE 'nmap.exe'
OR TO_LOWERCASE(originalname) LIKE '%scan%.exe'
OR TO_LOWERCASE(originalname) LIKE '%hide%.exe'
OR TO_LOWERCASE(originalname) LIKE '%invis%.exe'
OR TO_LOWERCASE(originalname) LIKE '%snif%.exe'
OR TO_LOWERCASE(originalname) LIKE 'key%.exe'
OR TO_LOWERCASE(originalname) LIKE '%pass%.exe'
OR TO_LOWERCASE(originalname) LIKE '%crack%.exe'
OR TO_LOWERCASE(originalname) LIKE '%brute%.exe'
OR TO_LOWERCASE(originalname) LIKE '%vnc%.exe'
OR TO_LOWERCASE(originalname) LIKE '%spy%.exe'
OR TO_LOWERCASE(originalname) LIKE '%proxy%.exe'
OR TO_LOWERCASE(originalname) LIKE '%dump%.exe'
OR TO_LOWERCASE(originalname) LIKE '%smb%.exe'
OR TO_LOWERCASE(originalname) LIKE '%trojan%.exe'
)

ORDER BY CreationTime

----------------------------

It might be best to also scan for other information such as the LegalCopyright or Company because executables seem to have those fields more often than the original name. Microsoft Log Parser Toolkit has some other queries that can be used for to detect incidents.

H. Carvey said...

Adam,

Excellent! I'm going to be off-line for about a week, but when I get back, this is one of the myriad of things I'm going to have to take a look at.

Thanks!