How to do a root cause analysis
This is an interesting question, in part because there's been discussion in the past regarding the need for conducting a root cause analysis, or "RCA".
An excellent resource for this topic is Corey Harrell's jIIr blog, as he's written blog posts regarding malware RCA, compromise RCA, and there are other posts that discuss topics associated with root cause analysis.
How to work with clients who are stressed, want answers now, point fingers, or heads to roll.
I think that like any other type of incident, it depends, and it's simply something that you need to be prepared for.
I remember one engagement that I was attempting to address. The customer who called was not part of the IT staff, and during the initial discussions, it was clear that there was a good deal of stress involved in this incident. At one point, we came down to the customer simply wanting us to get someone on site immediately, and we were trying to determine which site we needed to send someone to...the corporate offices were located in a different city than the data center, and as such, anyone sent might need to fly into a different airport. If the responder flew into the wrong one, they'd have to drive several hours to the correct location, further delaying response. The more the question was asked of the customer, the more frustrated they became, and they just didn't answer the question.
In my experience, the key to surviving trying times such as these are process and documentation. Process provides analysts with a starting point, particularly during stressful times when everything seems like a whirlwind and you're being pulled in different directions. Documenting what you did, and why, can save your butt after the fact, as well.
When I was in the military, like many military units, we'd go on training exercises. During one exercise, we came back to base and during our "hot washup" after-action meeting, one of the operators made the statement that throughout the exercise, "comm sucked", indicating that communications was inadequate. During the next training exercise, we instituted a problem reporting and resolution process, and maintained detailed records in a log book. Operators would call into a central point and the problem would be logged, reported to the appropriate section (we had tactical data systems, radar, and communications sections), and the troubleshooting and resolution of the issue would be logged, as well. After the exercise, we were in our "hot washup" when one of the operators got up and said "comm sucked", at which point we pushed the log book across the table and said, "show us where and why...". The operators changed their tune after that. Without the process and documentation, however, we would have been left with commanders asking us to explain an issue that didn't have any data to back it up. The same thing can occur during an incident response engagement in the private sector.
How to hit the ground running when you arrive at a client with little information.
During my time as an emergency incident responder, this happened often...a customer would call, and want someone on-site immediately. We'd start to ask questions regarding the nature of the incident (helped us determine staffing levels and required skill sets), and all we would hear back is, "Send someone...NOW!"
The key to this is having a process that responders use in order to get started. For instance, I like to have a list of questions available when a customer calls (referred to as a triage worksheet); these are questions that are asked of all customers, and during the triage process the analyst will rely on their experience to ask more probing questions and obtain additional information, as necessary. The responder to go on-site is given the completed questionnaire, and one of the first things they do is meet with the customer point of contact (PoC) and go through the questions again, to see if any new information has been developed.
One of the first things I tend to do during this process is ask the PoC to describe the incident, and I'll ask questions regarding the data that was used to arrive at various conclusions. For example, if the customer says that they're suffering from a malware infection, I would ask what they saw that indicated a malware infection...AV alerts or logs, network traffic logged/blocked at the firewall, etc.
Generally speaking, my next step would be to either ask for a network diagram, or work with the PoC to document a diagram of the affected network (or portion thereof) on a white board. This not only provides situational awareness, but allows me to start asking about network devices and available logs.
So, I guess the short answer is, in order to "hit the ground running" under those circumstances, have a process in place for collecting information, and document your steps.
How to communicate during an incident with respect to security and syngergy with other IRT members.
As with many aspects of incident response, it depends. It depends on the type and extent of incident, who's involved, etc. Most of all, it depends upon the preparedness of the organization experiencing the incident. I've seen organizations with Nextel phones, and the walkie-talkie functionality was used for communications.
Some organizations will use the Remedy trouble-ticketing system, or something similar. Most organizations will stay off of email all together, assuming that this has been 'hacked', and may even move to having key personnel meet in a war room. In this way, communications handled face-to-face, and where applicable, I've found this to be very effective. For example, if someone adds "it's a virus" to an email thread, it may be hard to track that person down and get specific details, particularly when that information is critical to the response. I have been in a war room when someone has made that statement, and then been asked very pointed questions about the data used to arrive at that statement. Those who have that data are willing to share it for the benefit of the entire response team, and those who don't learn an important lesson.
How to detect and deal with timestomping, data wiping, or some other antiforensic [sic] technique.
I'm not really sure how to address this one, in part because I'm not really sure what value I could add to what's already out there. The topic of time stomping, using either timestomp.exe or some other means, such as copying the time stamps from kernel32.dll via the GetFileTime/SetFileTime API calls, and how to detect their use has been addressed at a number of sites, including on the ForensicsWiki, as well as on Chris Pogue's blog.
How to "deal with" data wiping is an interesting question...I suppose that if the issue is one of spoliation, then being able to determine the difference between an automated process, and one launched specifically by a user (and when) may be a critical component of the case.
As far as "some other antiforensic[sic] technique", I would say again, it depends. However, I will say that the use of anti-forensic techniques should never be assumed, simply because one artifact is found, or as the case may be, not found. More than once, I've been in a meeting when someone said, "...it was
How to get a DFIR job, and keep it.
I think that to some degree, any response to this question would be very dependent upon where you're located, and if you're willing to relocate.
My experience has been that applying for jobs found online rarely works, particularly for those sites that link to an automated application/submission process. I've found that it's a matter of who you know, or who knows you. The best way to achieve this level of recognition, particularly in the wider community, is to engage with other analysts and responders, through online meetings, blogging, etc. Be willing to open yourself up to peer review, and ignore the haters, simply because haters gonna hate.
How to make sure management understands and applies your recomendations [sic] after an incident when they're most likely to listen.
Honestly, I have no idea. Our job as analysts and responders is to present facts, and if asked, possibly make recommendations, but there's nothing that I'm aware of that can make sure that management applies those recommendations. After all, look at a lot of the compliance and legislative regulatory requirements that have been published (PCI, HIPAA, NCUA, etc.) and then look at the news. You'll see a number of these bodies setting forth requirements that are not followed.
How to find hidden data; in registry, outside of the partition, ADS, or if you've seen data hidden in the MFT, slackspace, steganography, etc.
Good question...if something is hidden, how do you find it...and by extension, if you follow a thorough, documented process to attempt to detect data hidden by any of these means and don't find anything, does that necessarily mean that the data wasn't there?
Notice that I used the word "process" and "documented" together. This is the most critical part of any analysis...if you don't document what you did, did it really happen?
Let's take a look at each of the items requested, in order:
Registry - my first impression of this is that 'hiding' data in the Registry amounts to creating keys and/or values that an analyst is not aware of. I'm familiar with some techniques used to hide data from RegEdit on a live system, but those tend to not work when you acquire an image of the system and use a tool other than RegEdit, so the data really isn't "hidden", per se. I have seen instances where searches have revealed hits "in" the Registry, and then searching the Registry itself via a viewer has not turned up those same items, but as addressed in Windows Registry Forensics, this data really isn't "hidden", and it's pretty easy to identify if the hits are in unallocated space within the hive file, or in slackspace.
Outside the partition - it depends where outside the partition that you're referring. I've written tools to start at the beginning of a physical image and look for indications of the use of MBR infectors; while not definitive, it did help me narrow the scope of what I was looking at and for. For this one, I'd suggest looking outside the partition as a solution. ;-)
ADS - NTFS alternate data streams really aren't hidden, per se, once you have an image of the system. Some commercial frameworks even highlight ADSs by printing the stream names in red.
MFT - There've been a number of articles written on residual data found in MFT records, specifically associated with files transitioning from resident to non-resident data. I'm not specifically aware of an intruder hiding data in an MFT record...to me, it sounds like something that would not be too persistent unless the intruder had complete control of the system, to a very low level. If someone has seen this used, I would greatly appreciate seeing the data.
Slackspace - there are tools that let you access the contents of slackspace, but one of the things to consider is, if an intruder or user 'hides' something in slackspace, what is the likelihood that the data will remain available and accessible to them, at a later date? After all, the word "hiding" has connotations of accessing the data at a later date...by definition, slackspace may not be available. Choosing a file at random and hiding data in the slackspace associated with that file may not be a good choice; how would you guarantee that the file would not grow, or that the file would not be deleted? This is not to say that someone hasn't purposely hidden data in file slackspace; rather, I'm simply trying to reason through the motivations. If you've seen this technique used, I'd greatly appreciate seeing the data.
Steganography - I generally wouldn't consider looking for this sort of hidden data unless there was a compelling reason to do so, such as searches in the user's web history, tool downloads, and indications of the user actually using tools for this.
How to contain an incident.
Once again, the best answer I can give is, it depends. It depends on the type of incident, the infrastructure affected, as well as the culture of the affected organization. I've seen incidents in which the issue has been easy to contain, but I've also been involved in response engagements where we couldn't contain the issue because of cultural issues. I'm aware of times where a customer has asked the response team to monitor the issue, rather than contain it.
Again, many of the topics that the reader listed were more on the "soft" side of skills, and it's important that responders and analysts alike have those skills. In many cases, the way to address this is to have a process in place for responders to use, particularly during stressful times, and to require analysts to maintain documentation of what they do. Yes, I know...no one likes to write, particularly if someone else is going to read it, but you'll wish you had kept it when those times come.