Tuesday, December 30, 2014

What It Looks Like: Malware Infection via a Weaponized Document


Okay...I lied.  This is my last blog post of 2014.

A couple of weeks ago, Ronnie posted regarding some analysis of a weaponized document to the PhishMe.com blog.  There is some interesting information in the post, but I commented on Twitter that there was very little post-mortem analysis. In response, Ronnie sent me a copy of the document.  So, I dusted off a Windows 7 VM and took a shot at infecting it by opening the document.

Analysis Platform
32-bit Windows 7 Ultimate SP1, MS Office 2010, with Sysmon installed - VM running in Virtual Box.  As with previous dynamic analysis I've performed, Sysmon provides not only place holders to look for, but also insight into what can be trapped via a process creation monitoring tool.

Process
Run Windows Updates, reboot to a clean clone of the VM, and double-click the document (sitting on the user profile desktop).  The user profile used to access the document had Admin-level privileges, but UAC had not been disabled.  After waiting a few moments after the launch of the document, the application (MS Word) was closed, and the VM shut down cleanly.

I purposely did not run a packet capture tool, as that was something that had been done already.

Analysis
Initial attempts to view the file in a hex editor caused MSE to alert on TrojanDownloader:O97M/Tarbir.  After opening the file, waiting, and shutting down the VM cleanly, I created a timeline using file system, WEVTX, Prefetch, and Registry metadata.  I also created a separate micro-timeline from the USN change journal - I didn't want to overpopulate my main timeline and make it more difficult to analyze.

Also, when I extracted the file from the archive that I received, I named it "file.docx", based on the contents (the structure was not the older-style OLE format).  When I double-clicked the file, MS Word opened but complained that there was something wrong with the file.  I renamed the file to "file.doc", and everything ran in accordance with Ronnie's blog post.

Findings
As expected, all of the files that Ronnie mentioned were created within the VM, in the user's AppData\Local\Temp folder.  Also as expect, the timeline I created was populated by artifacts of the user's access to the file.  Since the "Enable Editing" button had to be clicked in order to enable macros (and run the embedded code), the TrustRecords key was populated with a reference to the file.  Keep in mind that many of the artifacts that were created (JumpList entries, Registry values, etc.) will persist well beyond the removal/deletion of the file and other artifacts.

While I did not capture any of the off-system communication (i.e., download of the malware), Sysmon provided some pretty interesting information.  I looked up the domain in Ronnie's post, and that gave me the IP address "50.63.213[.]1".  I then searched for that IP address in my timeline, and found one entry, from Sysmon...Powershell had reached off of the system (Sysmon/3 event) to that IP address (which itself translates to "p3nlhg346c1346.shr.prod.phx3.secureserver[.]net"), on port 80.  Artifacts of Powershell's off-system communications were the HKLM/Software/Microsoft/Tracing/powershell_RASMANCS and HKLM/Software/Microsoft/Tracing/powershell_RASAPI32 keys being created.

Per Ronnie's blog post, the file "444.exe" is downloaded.  The file is deleted after being copied to "msgss.exe".  The strings within this file (msgss.exe) indicate that it is a Borland Delphi file, and contains the strings "GYRATOR" and "TANNERYWHISTLE" (refer to the icon used for the file).  The PE compile time for the file is 19 Jun 1992 22:22:17 UTC.  The VirusTotal analysis of this file (originally uploaded to VT on 12 Dec) can be found here.

Persistence Mechanism:  User's Run key; the value "OutLook Express" was added to the key, pointing to the msgss.exe file.

An interesting artifact of the infection occurred at the same time that the msgss.exe file was created on the system and the Run key value created so that the malware would persist; the key "HKCU/full" was created.  The key doesn't have any values...it's just the key.

To extend Corey's discussion of Prefetch file contents just a bit, the Prefetch file for WinWord.exe included references to RASMAN.DLL, RASAPI32.DLL, as well as other networking DLLs (W2_32.DLL, WINHTTP.DLL).

Given the off-system communications, I located and extracted the WebCachev01.dat file that contains the IE history for the user, and opened it using ESE DatabaseView.  I found no indication of the host being contacted, via either IP address or name.  Additional testing is required but it would appear that the System.Net.WebClient object used by Powershell does not leave traces in the IE history (i.e., the use of the WinInet API for off-system communications would leave traces in the history).  If that's the case, then from an infrastructure perspective, we need to find another means of detecting this sort of activity, such as through process creation monitoring, the use of web proxies, etc.

Take-Aways
1. Threat intel cannot be based on analysis in isolation.

Okay, I understand that this is just a single document and a single infection, and does not specifically represent an APT-style threat, but the point here is that you can't develop "threat intelligence" by analyzing malware in isolation.  In order to truly develop "threat intelligence", you have to look how the adversary operates within the entire infrastructure eco-system; this includes the network, memory, as well as on the host.

I'm also aware that "APT != malware", and that's absolutely correct.  The findings I've presented here are more indicators than intel, but it should be easy to see not just the value of the analysis, but also how it can be extended.  For example, this analysis might provide the basis for determining how an adversary initially gained access to an infrastructure, i.e., the initial infection vector (IIV).  Unfortunately, due to a number of variables, the IIV is often overlooked, or assumed.  When the IIV is assumed, it's often incorrect.  Determining the IIV can be used to see where modifications can be made within the infrastructure in order to improve prevent, detection, and response.

Looking specifically at the analysis of this weaponized document, Ronnie provided some insight, which I was then able to expand upon, something anyone could have done.  The focus of my analysis was  to look at how the host system was impacted by this malware; I can go back and redo the analysis (re-clone the VM), and run the test again, this time pausing the VM and capturing the memory for analysis via Volatility, and extend the understanding of the impact of this document and malware even further.  Even with just the timeline, the available indicators have been expanded beyond the domain and hash (SHA-256) that was available as of 15 Dec.  By incorporating this analysis, we've effectively moved up the Pyramid of Pain, which is something we should be striving to do.  Also, be sure to check out Aaron's Value of Indicators blog post.

2.  Host analysis significantly extends response capability.

The one big caveat from this analysis is the time delta between "infection" and "response"; due to the nature of the testing, that delta is minimized, and for most environments, is probably unrealistic.  A heavily-used system will likely not have the same wealth of data available, and most systems will very likely not have process creation monitoring (Sysmon).

However, what this analysis does demonstrate is, what is available to the responder should the incident be discovered weeks or months after the initial infection.  One of the biggest misconceptions in incident response is that host-based analysis is expensive and not worth the effort, that it's better to just burn the affected systems down and then rebuild them.  What this analysis demonstrates is that through host analysis, we can find artifacts that persist beyond the deletion/removal of various aspects of the infection.  For example, the file 444.exe was deleted, but the AppCompatCache and Sysmon data provided indications that the file had been executed on the system (the USN change journal data illustrated the creation and subsequent deletion of the file).  And that analysis doesn't have to be expensive, time consuming, or difficult...in fact, it's pretty straightforward and simple, and it provides a wealth of indicators that can be used to scope an incident, even weeks after the initial infection occurred.

3.  Process creation monitoring radically facilitates incident response.

I used Sysmon in this test, which is a pretty good analog for a more comprehensive approach, such as Sysmon + Splunk, or Carbon Black.  Monitoring process creation lets us see command line arguments, parent processes, etc.  By analyzing this sort of activity, we can develop prevention and detection mechanisms.

This also shows us how incident response can be facilitated by the availability of this information.  Ever since my early days of performing IR, I've been asked what, in a perfect world, I'd want to have available to me, and it's always come back to a record of the processes that had been run, as well as the command line options used.  Having this information available in a centralized location would obviate the need to go host-to-host in order to scope the incident, and could be initially facilitated by running searches of the database.

Resources
Lenny Zeltser's Analyzing Malicious Documents Cheat Sheet

3 comments:

Drew said...

>it's always come back to a record of the processes that had been run, as well as the command line options used.

Have a look at WLS. It provides much concise detail about processes, inheritance, and command line options, along with several types of hashing. Very powerful with Splunk.

H. Carvey said...

Drew,

What is "WLS"?

Drew said...

WLS is "Windows Logging System" developed by DOE because they needed more context around process operation than Windows was providing.

http://energy.gov/sites/prod/files/cioprod/documents/Splunkified_-_the_Next_Evolution_of_Log_Analysis_-_Green_and_McCord.pdf