Not long ago, I gave some presentations at a local high school on cybersecurity, and one of the questions that was asked was, "how do I get started in cybersecurity?" Given that my alma mater will establish a minor in cybersecurity this coming fall, I thought that it might be interesting to put some thoughts down, in hopes of generating a discussion on the topic.
So, some are likely going to say that in today's day and age, you can simply Google the answer to the question, because this topic has been discussed many times previously. That's true, but it's as blessing as much as it is a curse; there are many instances in which multiple opinions are shared, and at the end of the thread, there's no real answer to the question. As such, I'm going to share my thoughts and experience here, in hopes that it will start a discussion that others can refer to. I'm hoping to provide some insight to anyone looking to "get in" to cybersecurity, whether you're an upcoming high school or college graduate, or someone looking to make a career transition.
During my career, I've had the opportunity to be a "gate keeper", if you will. As an incident responder, I was asked to vet resumes that had been submitted in hopes of filling a position on our team. To some degree, it was my job to receive and filter the resumes, passing what I saw as the most qualified candidates on to the next phase. I've also worked with a pretty good number of analysts and consultants over the years.
The world of cybersecurity is pretty big and there are a lot of roads you can follow; there's pen testing, malware reverse engineering, DFIR, policy, etc. There are both proactive and reactive types of work. The key is to pick a place to start. This doesn't mean that you can't do more than one...it simply means that you need to decide where you want to start...and then start. Pick some place, and go from there. You may find that you're absolutely fascinated by what you're learning, or you may decide that where you started simply is not for you. Okay, no problem. Pick a new place and start over.
When it comes to reviewing resumes, I tend to not focus on certifications, nor the actual degree that someone has. Don't get me wrong, there are a lot of great certifications out there. The issue I have with certifications is that when most folks return from the course(s) to obtain the certification, there's nothing that holds them accountable for using what they learned. I've seen analysts go off to a 5 or 6 day training course in DFIR of Windows systems, which cost $5K - $6K (just for the course), and not know how to determine time stomping via the MFT (they compared the file system last modification time to the compile time in the PE header).
I am, however, interested to see that someone does have a degree. This is due to the fact that having a degree pretty much guarantees a minimum level of education, and it also gives insight into your ability to complete tasks. A four (or even two) year degree is not going to be a party everyday, and you're likely going to end up having to do things you don't enjoy.
And why is this important? Well, the (apparently) hidden secret of cybersecurity is that at some point, you're going to have to write. That's right. No matter what level of proficiency you develop at something, it's pretty useless if you can't communicate and share it with others. I'm not just talking about sharing your findings with your team mates and co-workers (hint, "LOL" doesn't count as "communication"), I'm also talking about sharing your work with clients.
Now, I have a good bit of experience with writing throughout my career. I wrote in the military (performance reviews, reports, course materials, etc.), as part of my graduate education (to include my thesis), and I've been writing almost continually since I started in infosec. So...you have to be able to write. A great way to get experience writing is to...well...write. Start a blog. Write something up, and share it with someone you trust to actually read it with a critical eye, not just hand it back to you with a "looks good". Accept that what you write is not going to be perfect, every time, and use that as a learning experience.
Writing helps me organize my thoughts...if I were to just start talking after I completed my analysis, what came out of my mouth would not be nearly as structured, nor as useful, as what I could produce in writing. And writing does not have to be sole source of communications; I very often find it extremely valuable to write something down first, and then use that as a reference for a conversation, or better yet, a conference presentation.
So, my recommendations for getting started in the cybersecurity field are pretty simple:
1. Pick some place to start. If you have to, reach to someone for advice/help.
2. Start. If you have to, reach to someone for advice/help.
3. Write about what you're doing. If you have to, reach to someone for advice/help.
There are plenty of free resources available that provide access to what you need to get started; online blog posts, pod casts/videos, presentations, books (yes, books online and in the library), etc. There are free images available for download, as part of DFIR challenges (if that's what you're interested in doing). There are places you can go to find out about malware, download samples, or even run samples in virtual environments and emulators. In fact, if you're viewing this blog post online, then you very likely have everything you need to get started. If you're interested in DFIR analysis or malware RE, you do not need to have access to big expensive commercial tools to conduct analysis...that's just an excuse for paralysis.
There is a significant reticence to sharing in this "community", and it's not simply isolated to folks who are new to the field. There are a lot of folks who have worked in this industry for quite a while who will not share experiences or findings. And there is no requirement to share something entirely new, that no one's seen before. In fact, there's a good bit of value in sharing something that may have been discussed previously; it shows that you understand it (or are trying to), and it can offer visibility and insight to others ("oh, that thing that was happening five years ago is coming back...like bell bottoms...").
The take-away from all of this is that when you're ready to put your resume out there and apply for a position in cybersecurity, you're going to have some experience in the work, have visible experience writing that your potential employer can validate, and you're going to know people in the field.
The Windows Incident Response Blog is dedicated to the myriad information surrounding and inherent to the topics of IR and digital analysis of Windows systems. This blog provides information in support of my books; "Windows Forensic Analysis" (1st thru 4th editions), "Windows Registry Forensics", as well as the book I co-authored with Cory Altheide, "Digital Forensics with Open Source Tools".
Sunday, April 09, 2017
Understanding What The Data Is Telling You
Not long ago, I was doing some analysis of a Windows 2012 system and ran across an interesting entry in the AppCompatCache data:
SYSVOL\Users\Admin\AppData\Roaming\badfile.exe Sat Jun 1 11:34:21 2013 Z
Now, we all know that the time stamp associated with entries in the AppCompatCache is the file system last modification time, derived from the $STANDARD_INFORMATION attribute. So, at this point, all I know about this file is that it existed on the system at some point, and given that it's now 2017 it is more than just a bit odd, albeit not impossible, that that is the correct file system modification date.
Next stop, the MFT...I parsed it and found the following:
71516 FILE Seq: 55847 Links: 1
[FILE],[BASE RECORD]
.\Users\Admin\AppData\Roaming\badfile.exe
M: Sat Jun 1 11:34:21 2013 Z
A: Mon Jan 13 20:12:31 2014 Z
C: Thu Mar 30 11:40:09 2017 Z
B: Mon Jan 13 20:12:31 2014 Z
FN: msiexec.exe Parent Ref: 860/48177
Namespace: 3
M: Thu Mar 30 11:40:09 2017 Z
A: Thu Mar 30 11:40:09 2017 Z
C: Thu Mar 30 11:40:09 2017 Z
B: Thu Mar 30 11:40:09 2017 Z
[$DATA Attribute]
File Size = 1337856 bytes
So, this is what "time stomping" of a file looks like, and this also helps validate that the AppCompatCache time stamp is the file system last modification time, extracted from one of the MFT record attributes. At this point, there's nothing to specifically indicate when the file was executed but now, we have a much better idea of when the file appeared on the system. The bad guy most likely used the GetFileTime() and SetFileTime() API calls to perform the time stomping, which we can see by going to the timeline:
Mon Jan 13 20:12:31 2014 Z
FILE - .A.B [152] C:\Users\Admin\AppData\Roaming\
FILE - .A.B [56] C:\Windows\explorer.exe\$TXF_DATA
FILE - .A.B [1337856] C:\Users\Admin\AppData\Roaming\badfile.exe
FILE - .A.B [2391280] C:\Windows\explorer.exe\
Fortunately, the system I was examining was Windows 2012, and as such, had a well-populated AmCache.hve file, from which I extracted the following:
File Reference: da2700001175c
LastWrite : Thu Mar 30 11:40:09 2017 Z
Path : C:\Users\Admin\AppData\Roaming\badfile.exe
Company Name : Microsoft Corporation
Product Name : Windows Installer - Unicode
File Descr : Windows® installer
Lang Code : 1033
SHA-1 : 0000b4c5e18f57b87f93ba601e3309ec01e60ccebee5f
Last Mod Time : Sat Jun 1 11:34:21 2013 Z
Last Mod Time2: Sat Jun 1 11:34:21 2013 Z
Create Time : Mon Jan 13 20:12:31 2014 Z
Compile Time : Thu Mar 30 09:28:13 2017 Z
From my timeline, as well as from previous experience, the LastWrite time for the key in the AmCache.hve corresponds to the first time that badfile.exe was executed on the system.
What's interesting is that the Compile Time value from the AmCache data is, in fact, the compile time extracted from the header of the PE file. Yes, this value is easily modified, as it is simply a bunch of bytes in the file that do not affect the execution of the file itself, but it is telling in this case.
So, on the surface, while it may first appear as if the badfile.exe had been on the system for four years, it turns out that by digging a bit deeper into the data, we can see that wasn't the case at all.
The take-aways from this are:
1. Do not rely on a single data point (AppCompatCache) to support your findings.
2. Do not rely on the misinterpretation of a single data point as the foundation of your findings. Doing so is more akin to forcing the data to fit your theory of what happened.
3. The key to analysis is to know the platform you're analyzing, know your data...no only what is available, but it's context.
4. During analysis, always look to artifact clusters. There will be times when you do not have access to all of the artifacts in the cluster, so you'll want to validate the reliability and fidelity of the artifacts that you do have.
SYSVOL\Users\Admin\AppData\Roaming\badfile.exe Sat Jun 1 11:34:21 2013 Z
Now, we all know that the time stamp associated with entries in the AppCompatCache is the file system last modification time, derived from the $STANDARD_INFORMATION attribute. So, at this point, all I know about this file is that it existed on the system at some point, and given that it's now 2017 it is more than just a bit odd, albeit not impossible, that that is the correct file system modification date.
Next stop, the MFT...I parsed it and found the following:
71516 FILE Seq: 55847 Links: 1
[FILE],[BASE RECORD]
.\Users\Admin\AppData\Roaming\badfile.exe
M: Sat Jun 1 11:34:21 2013 Z
A: Mon Jan 13 20:12:31 2014 Z
C: Thu Mar 30 11:40:09 2017 Z
B: Mon Jan 13 20:12:31 2014 Z
FN: msiexec.exe Parent Ref: 860/48177
Namespace: 3
M: Thu Mar 30 11:40:09 2017 Z
A: Thu Mar 30 11:40:09 2017 Z
C: Thu Mar 30 11:40:09 2017 Z
B: Thu Mar 30 11:40:09 2017 Z
[$DATA Attribute]
File Size = 1337856 bytes
So, this is what "time stomping" of a file looks like, and this also helps validate that the AppCompatCache time stamp is the file system last modification time, extracted from one of the MFT record attributes. At this point, there's nothing to specifically indicate when the file was executed but now, we have a much better idea of when the file appeared on the system. The bad guy most likely used the GetFileTime() and SetFileTime() API calls to perform the time stomping, which we can see by going to the timeline:
Mon Jan 13 20:12:31 2014 Z
FILE - .A.B [152] C:\Users\Admin\AppData\Roaming\
FILE - .A.B [56] C:\Windows\explorer.exe\$TXF_DATA
FILE - .A.B [1337856] C:\Users\Admin\AppData\Roaming\badfile.exe
FILE - .A.B [2391280] C:\Windows\explorer.exe\
Fortunately, the system I was examining was Windows 2012, and as such, had a well-populated AmCache.hve file, from which I extracted the following:
File Reference: da2700001175c
LastWrite : Thu Mar 30 11:40:09 2017 Z
Path : C:\Users\Admin\AppData\Roaming\badfile.exe
Company Name : Microsoft Corporation
Product Name : Windows Installer - Unicode
File Descr : Windows® installer
Lang Code : 1033
SHA-1 : 0000b4c5e18f57b87f93ba601e3309ec01e60ccebee5f
Last Mod Time : Sat Jun 1 11:34:21 2013 Z
Last Mod Time2: Sat Jun 1 11:34:21 2013 Z
Create Time : Mon Jan 13 20:12:31 2014 Z
Compile Time : Thu Mar 30 09:28:13 2017 Z
From my timeline, as well as from previous experience, the LastWrite time for the key in the AmCache.hve corresponds to the first time that badfile.exe was executed on the system.
What's interesting is that the Compile Time value from the AmCache data is, in fact, the compile time extracted from the header of the PE file. Yes, this value is easily modified, as it is simply a bunch of bytes in the file that do not affect the execution of the file itself, but it is telling in this case.
So, on the surface, while it may first appear as if the badfile.exe had been on the system for four years, it turns out that by digging a bit deeper into the data, we can see that wasn't the case at all.
The take-aways from this are:
1. Do not rely on a single data point (AppCompatCache) to support your findings.
2. Do not rely on the misinterpretation of a single data point as the foundation of your findings. Doing so is more akin to forcing the data to fit your theory of what happened.
3. The key to analysis is to know the platform you're analyzing, know your data...no only what is available, but it's context.
4. During analysis, always look to artifact clusters. There will be times when you do not have access to all of the artifacts in the cluster, so you'll want to validate the reliability and fidelity of the artifacts that you do have.
Saturday, April 08, 2017
Understanding File and Data Formats
When I started down my path of studying techniques and methods for computer forensic analysis, I'll admit that I didn't start out using a hex editor...that was a bit daunting and more than a little overwhelming at the time. Sure, I'd heard and read about those folks who did, and could, conduct a modicum of analysis using a hex editor, but at that point, I wasn't seeing "blondes, brunettes, and redheads...". Over time and with a LOT of practice, however, I found that I could pick out certain data types within hex data. For example, within a hex dump of data, over the years my eyes have started picking out repeating patterns of data, as well as specific data types, such as FILETIME objects.
Something that's come out of that is the understanding that knowing the structure or format of specific data types can provide valuable clues and even significant artifacts. For example, understanding the structure of Event Log records (binary format used for Windows NT, 2000, XP, and 2003 Event Logs) has led to the ability to parse for records on a binary level and completely bypass limitations imposed by using the API. The first time I did this, I found several valid records in a *.evt file that the API "said" shouldn't have been there. From there, I have been able to carve unstructured blobs of data for such records.
Back when I was part of the IBM ISS ERS Team, an understanding of the structure of Windows Registry hive files led us to being able to determine the difference between credit card numbers being stored "in" Registry keys and values, and being found in hive file slack space. The distinction was (and still is) extremely important.
Developing an understanding of data structures and file formats has led to findings such as Willi Ballenthin's EVTXtract, as well as the ability to parse Registry hive files for deleted keys and values, both of which have proven to be extremely valuable during a wide variety of investigations.
Other excellent examples of this include parsing OLE file formats from Decalage, James Habben's parsing Prefetch files, and Mari's parsing of data deleted from SQLite databases.
Other examples of what understanding data structures has led to includes parsing Windows shortcuts/LNK files that were sent to victims of phishing campaigns. This NViso blog post discusses tracking threat actors through the .lnk file they sent their victims, and this JPCert blog post from 2016 discusses finding indications of an adversary's development environment through the same resource.
Now, I'm not suggesting that every analyst needs to be intimately familiar with file formats, and be able to parse them by hand using just a hex editor. However, I am suggesting that analysts should at least become aware of what is available in various formats (or ask someone), and understand that many of the formats can provide a great deal of data that will assist you in your investigation.
Something that's come out of that is the understanding that knowing the structure or format of specific data types can provide valuable clues and even significant artifacts. For example, understanding the structure of Event Log records (binary format used for Windows NT, 2000, XP, and 2003 Event Logs) has led to the ability to parse for records on a binary level and completely bypass limitations imposed by using the API. The first time I did this, I found several valid records in a *.evt file that the API "said" shouldn't have been there. From there, I have been able to carve unstructured blobs of data for such records.
Back when I was part of the IBM ISS ERS Team, an understanding of the structure of Windows Registry hive files led us to being able to determine the difference between credit card numbers being stored "in" Registry keys and values, and being found in hive file slack space. The distinction was (and still is) extremely important.
Developing an understanding of data structures and file formats has led to findings such as Willi Ballenthin's EVTXtract, as well as the ability to parse Registry hive files for deleted keys and values, both of which have proven to be extremely valuable during a wide variety of investigations.
Other excellent examples of this include parsing OLE file formats from Decalage, James Habben's parsing Prefetch files, and Mari's parsing of data deleted from SQLite databases.
Other examples of what understanding data structures has led to includes parsing Windows shortcuts/LNK files that were sent to victims of phishing campaigns. This NViso blog post discusses tracking threat actors through the .lnk file they sent their victims, and this JPCert blog post from 2016 discusses finding indications of an adversary's development environment through the same resource.
Now, I'm not suggesting that every analyst needs to be intimately familiar with file formats, and be able to parse them by hand using just a hex editor. However, I am suggesting that analysts should at least become aware of what is available in various formats (or ask someone), and understand that many of the formats can provide a great deal of data that will assist you in your investigation.
Subscribe to:
Posts (Atom)