Thursday, June 09, 2005

Some help needed with PE headers

The day before yesterday, I started digging into PE headers while looking at some malcode. One of the tools I've been using is PEView, and another is FileAlyzer. Both tools have proven extremely useful in viewing the PE Headers, as well as other information about the file, breaking things down a little more beyond a simple hex editor.

Here's my question, though. Just prior to the PE headers (ie, before the "PE\0\0") is the MS-DOS stub program, and according to everything I've been able to find on the topic, this is put in place by the linker. Evidently, this is a holdover dating back to MS-DOS 2.0, and for some reason is still in use today. This is the section of code that, when you open a PE file in a hex editor, you see something like "This program cannot be run in DOS mode." I've seen variations of this...which leads to the question. Between the notification about running in DOS mode and the PE header, you'll often times see lots of binary data. Sometimes (as with netsh.exe on XP Pro, for example), you'll see the letters spelling out "Rich".

Does anyone know what this is?

I'm fairly sure that it's the contents of the stub program added by the linker, but I'd like to get confirmation on that, and perhaps even see if it's possible to tie a PE file to a particular linker or development environment.

Here's an example of how I'm thinking that this could be used...let's say you've got a case where you're pretty sure that someone developed a program. You've got a copy of the executable (worm, whatever...) and you think that the suspect created it. You may or may not find bits and pieces of code in slack space. But let's say you find a development environment, such as Cygwin or Borland or MS Visual C++. Would it be possible to tie the stub program added by the linker to the development environment, by comparing the MS-DOS stub program in the PE file to the version of "winstub.exe" (or whatever the default is) on the suspect's machine?

Bonus Question: The "magic number" for a Windows executable is "0x5A4D" (or "MZ"). What is the significance of "MZ", and were did it come from?


Anonymous said...

MZ could stand for Mark Zbikowski, one of the Microsoft developers who created the MZ format. I don't know this for a fact, but several sources via Google point in this direction.

H. Carvey said...

I came across pretty much the same information myself...

Anonymous said...

Yes, it's MarkZ's initials. He told me this himself a couple of years ago, so I trust the source.

Here's how you can examine that stub program: Find a 32-bit exe that's smaller than about half a meg in size and open it with debug.exe:

debug notepad.exe

When you get the debug prompt, ES:IP will point to the beginning of the 16-bit code. Disassemble a bit of it:

17AB:0000 PUSH CS
17AB:0001 POP DS
17AB:0002 MOV DX,000E
17AB:0005 MOV AH,09
17AB:0007 INT 21
17AB:0009 MOV AX,4C01
17AB:000C INT 21
...and so on...

In English, this means:

Move the code segment into the data segment register.
Move the offset of the message to print into the DX register.
Move "09", or the "print" instruction into the AH register.
Call interrupt 0x21 to do the print.
Move "4c01" into AX, which will tell interrupt 0x21 to terminate the process (4c) and return 01 as the exit code.
Call interrupt 0x21 again.

Or, in less asstastic English:

Print "You must use Windows!" and then exit.

Крокодил Гена said...

You don't understand what he's asking. After Int 21h is called to terminate the process comes the $-terminated string, and after the dollar sign is 7 zeroes and then a bunch of garbled data that varies in size and content, but seems to always contain the word "Rich". I guess you could theoretically put anything here, but it seems like compilers (just a guess as to the culprit) are fairly consistent here. I have been COMPLETELY unable to find any mention of this section, it seems like every guide skips from 003Ch over to the offset it points at ("PE\0\0"). In notepad.exe it goes like this:

INT 21H (program ends)
$-terminated string
7 zeroes
Begin mystery section
EC 85 5B A1
A8 E4 35 F2
A8 E4 35 F2
A8 E4 35 F2
6B EB 3A F2
A9 E4 35 F2
6B EB 55 F2
A9 E4 35 F2
6B EB 68 F2
BB E4 35 F2
A8 E4 34 F2
63 E4 35 F2
6B EB 6B F2
A9 E4 35 F2
6B EB 6A F2
BF E4 35 F2
6B EB 6F F2
A9 E4 35 F2
52 69 63 68 ("Rich")
A8 E4 35 F2
End mystery section
A variable number of zeroes (padding, obviously)

Like I said, I haven't seen mention of this section anywhere! I'm desperate to know what generates it and why. I know this blog post is old but if anyone Googles this and knows the answer, please comment in my blog.

H. Carvey said...

I've found a little information about this section, but nothing beyond that it's added by the compiler/linker.

BTW, how can anyone post to your blog if they don't know what that blog is? Your blogger profile just contains an image, and no links to a blog.

Крокодил Гена said...

Sorry, I changed my settings. I don't log into Blogger often (read: ever).

Крокодил Гена said...

It's added by the microsoft linker which is used by every microsoft compiler, so that's why it's so widespread. If I compile my code with a different linker that section disappears. I'll try to look for what the information means in the linker documentation and comment when I find out more, unless you've already figured it all out. If they're dates of some kind it makes date forgery on PEs much more difficult.