Windows Incident Response: June 2023

Wednesday, June 28, 2023

Interview Questions

There's been a lot of ink put toward resume recommendations and preparing for interviews over the years, and I feel like there's been even more lately, given the number of folks looking to transition to one of the cybersecurity fields, as well as tech layoffs we've seen since last year.

One of the topics I don't really see being addressed is questions you, the interviewee, can ask of the interviewer when it's your turn. As an interviewee, you're probably preparing yourself for the technical questions you're likely to face, but there are other aspects to the role.

I was once in a role where an organization was trying to start a consulting arm, so they hired a manager and one or two technical folks in each of two offices. Not long after starting, I found that the analyst in the other office was going to conduct a pen test for a customer; that analyst had previously worked at ISS, where they'd installed the RealSecure IDS product for that customer. I won't bore you with the drama, but suffice to say that I was able to get a copy of the "pen test" report; the engagement amounted to nothing more than running ISS's Internet Scanner product against the customer systems. As our team hadn't generated revenue yet, we didn't have any licenses for ISS's, nor anyone else's products. As such, the license used to run Internet Scanner was hacked, which the RealSecure product could detect. I gave notice after I found out that management had no intention of addressing the issue.

So, a question to ask interviewers at both the technical and management level is, you find out that tools were used on an engagement without valid licenses...what do you do?

Other questions you could ask include:

You find out that tools used on an engagement were run outside the bounds of the license agreement; what do you do?

You find out that a DFIR report or SOC ticket (depending upon the role you're interviewing for) submitted to a customer is grossly incorrect; not just "it could be misinterpreted" but what was reported to the customer was just flat out wrong. What do you do?

As the interviewee, the answers to these questions will give you insight into what to expect at the organization, should you accept an offer.

However, interviewers can also ask these questions, and gain insights themselves.

And yes, these questions are based on my experiences in the field.

Monday, June 26, 2023

Validation - This Time, Tool Validation

I've posted previously on validation, and more recently, on validation of findings. In my recent series of posts, I specifically avoided the top of tool validation, because while tool validation predicates the validation of findings, and there is some overlap, I thought it was important separate the two so that the discussion didn't go astray.

Well, now the topic of tool validation within DFIR has popped up again, so maybe it's time to address it yet again.

So, early on in my involvement in the industry, yes, running two or more tools against a data source was considered by some to be a means of tool validation. However, over time and as experience and knowledge grew, the fallacy of this approach became more and more apparent.

When I first left active duty, one of my first roles in the private sector was performing vulnerability assessments. For the technical aspect (we did interviews and reviewed processes, as well) we used ISS's Internet Scanner, which was pretty popular at the time. When I started, my boss, a retired Army Colonel, told me that it would take me "about 2 to 3 years of running the tool to really understand it." Well, within 6 months, I was already seeing the need to improve upon the tool and started writing an in-house scanner that was a vast improvement over the commercial tool.

The reason for this was that we'd started running into questions about how the tool did it's "thing". In short, it would connect to a system and run a query, process the results, and present a "decision" or finding. We started to see disparities between the tool findings and what was actually on the system, and as we began to investigate, we were unable to get any information regarding how the tool was making its decisions...we couldn't see in to the "black box".

For example, we scanned part of an infrastructure, and the tool reported that 21 systems had AutoAdminLogon set. We also scanned this infrastructure with the tool were developing, and found that one system had AutoAdminLogon set; the other 20 systems had the AutoAdminLogon value, but it was set to "0", and the admin password was not present in the Registry (in plain text). So, technically, AutoAdminLogon was not set on all 21 systems, and the customer knew it. Had we delivered the report based solely on the tool output, we would've had a lot of 'splainin' to do.

When I was working for a consulting company and waiting for a gubmint clearance to come through, I heard a senior analyst tell one of the junior folks that Nessus would determine the version of Windows it was running against by firing of nmap-like scans, and associating the responses it received to the drivers installed in various versions of Windows. I thought this was fascinating and wanted to learn more about it, so I downloaded Nessus and started opening and reading the plugins. It turned out that Nessus determined the version of Windows via a Registry query, which meant that if Nessus was run without admin credentials, it wasn't going to determine the version.

While I was part of the IBM ISS X-Force ERS team, and we were conducting PCI forensic response, Chris and I found that our process for scanning for credit card numbers had a significant flaw. We were using EnCase at the time, which used a built-in function named "IsValidCreditCard()". We had a case where JCB and Discover credit cards had reportedly been used, but we weren't seeing any in the output of our scanning process. So, we obtained test data from the brands, and ran our scan process across just that data, and still got no results. It turned out that even with track 1 and track 2 data, the IsValidCreditCard() function did not determine JCB and Discover cards to be "valid". So, we worked with someone to create a replacement function to override the built-in one; our new function had 7 regular expressions, and even though it was slower than the built-in function, it was far more accurate.

Finally, in the first half of 2020, right as the pandemic kicked off, I was working with a DFIR consulting team that used a series of open source tools to collect, parse, and present DFIR data to analysts. MS had published a blog post on human-operated ransomware, and identified a series of attacks where WMI was used for persistence; however, when the team encountered such attacks, they were unable to determine definitively if WMI was used for persistence, as the necessary data source had been collected, but the open source middleware that parsed the data did not include a module for parsing that particular data source. As such, relying on the output of the tool left gaps that weren't recognized.

So, what?
So, what's the point of reliving history? The point is that tool validation is very often not about running two (or more) different tools and seeing which presents "correct" results. After all, different tools may use the same API or the same process under the hood, so what's the difference? Running two different tools that do the same thing isn't running two different tools, and it's not tool validation. This is particularly true if you do not control the original data source, which is how most DFIR analysis works.

As alluded to in the examples above, tool validation needs to start with the data source, not the tool. What data source are you working with? A database? Great, which "style"? mySQL? SQLite? EDB? How does the tool you're using work? Is it closed source, or open source? If it's open source, and written in Python or Perl or some other interpreted language, what are the versions of the libraries on which the tool is based? I've seen cases where a script will not throw an error, but did present different data based on the library used.

This is also why I strongly recommend that analysts validate their findings. Very often there are direct or indirect artifacts in other data sources that are part of an artifact constellation that serves to validate a finding; if you run your tool and see something "missing" or added, you can then look to the overall constellation and circumstances to determine why that might be.

Saturday, June 24, 2023

DFIR Core Principles

My copy of "Forensic Discovery"

There are a lot of folks new to the cybersecurity industry, and in particular DFIR, and a lot of folks
considering getting into the field. As such, I thought it might be useful to share my view of the core, foundational principles of DFIR, those basic principles I return to again and again during investigations, as well as over the course of time. For me, these principles were developed initially through a process of self-education, reading all I could from those who really stood out in in the industry. For example, consider the figure to the right...this is what pages 4 and 5 of my copy of Forensic Discovery by Farmer and Venema look like. The rest of the pages aren't much different. I also have a copy of Eoghan Casey's Handbook of Digital Forensics and Investigations, which is in similar "condition", as are several other books, including my own.

The thing we have to remember about core principles is that they don't change over time; Forensic Discovery was published in 2005, and Casey's Handbook, 5 yrs later. But those principles haven't changed just because the Windows operating system has evolved, or new devices have been created. In fact, if you look at the index for Farmer and Venema's book, the word "Windows" never appears. My last book was published in 2018, and the first image covered in the book was Windows XP; however, neither of those facts invalidate the value of the book, as it addresses and presents the analytic process, which, at it's root, doesn't significantly change.

The principles I'm going to share here do not replace those items discussed through other media; not at all. In fact, these principles depend on and expand those topics presented in other books.

Principle 1

The first thing you have to understand about computer systems is that nothing happens on a computer system without something happening; that is, everything is the result of some action.

I know this sounds rudimentary, and I apologize if it sounds overly simplified, but over the course of my career (spanning more than 2 decades at this point) in various roles in DFIR, one of the biggest obstacles I've encountered when discussing a response with other analysts is that things don't just happen for no reason. Yes, it's entirely possible that any given, random bit on a hard drive may change state due to a fluctuation of some kind, but when it comes to a field in an MFT record (deleted vs in use file) or a Registry value changing state (1 to 0, or reverse), these things do not simply happen by themselves.

Let's say, for example, that a SOC analyst receives an alert that the "UseLogonCredential" value has been set to "1". This is a pretty good detection indicating that something bad has already happened, and that something bad is likely to happen in the very near future, as well. However, this does not just happen...someone needs to access the system (via keyboard or remotely) with the appropriate level of privileges, and then needs to run an application (RegEdit, reg.exe, another program that accesses the appropriate API functions...) in order to make the change.

Principle 2

Locard's Exchange Principle is one of Chris Pogue's favorites, to the point where he discusses it in his courses at OSU! This principle states that when two objects come into contact with each other, material is exchanged between them. This applies to the digital realm, as well; when two computers come into "contact", "material" or data regarding the connection and interaction is exchanged between them. Some of this data may be extremely transient, but due to advancements in computer use functionality, the fossilization of this data begins pretty quickly. That is to say that some of these artifacts are "stored" or logged, and those log entries can exist for varying amounts of time. For example, a record written to the Security Event Log may be overwritten within a few days (or even hours, depending upon the audit configuration and activity on the endpoint), but records written to other Windows Event Logs may exist for years without the risk of being overwritten. Evidence of activity may be written to the Registry, where it may exist until explicitly removed.

But the point of this principle is that something, some artifact of activity as a user or threat actor interacts with an endpoint will be created, and may continue to exist for a significant period of time.

Principle 3

This brings us to the third principle, direct vs indirect artifacts. This is something of a reiteration of section 1.7 (Archeology vs Geology) of Farmer & Venema's book; table 1.3 at the bottom of pg 13 essentially says that same thing. However, this principle needs to be extended to address more modern operating systems and applications; that is, when something happens on an endpoint...when a program is executed, or when a user or threat actor interacts with the endpoint in some way, there are artifacts that are created as a direct result of that interaction. For example, a threat actor my copy a file over to the endpoint, writing it to the file system. Then they may execute that program, redirecting the output to a file, again writing to the file system.

Think of this as a video camera pointed directly at the "scene of the crime", recording direct interactions between the threat actor and the target victim.

There are also "indirect" artifacts, which are those artifacts created as a result of the program or threat actor interacting with the ecosystem or "environment".

A great way to think of indirect artifacts is having video cameras near the scene of a crime, but not pointed directly at the scene itself. There may be a video camera across the street or around the corner, pointed in a different direction, but it captures video of the threat actor arriving in a car, and then leaving several minutes later. You may notice that the back seat of the car seems to be fuller than when it arrived, or the end of the car near the trunk (or "boot") may be lower to the ground, but you do not see exactly which actions occurred that resulted in these apparent changes.

A great thing about both direct and indirect artifacts is "fossilization", something mentioned earlier, and to be honest, ~~stolen~~ borrowed from Farmer and Venema. Everything that happens on an endpoint is the result of something happening, and in a great many cases, these artifacts are extremely transient. Simply put, depending upon where those artifacts exist in the order of volatility, they may only exist for a very short period of time. In their book, Farmer and Venema discussed "fossilization", specifically in the context of deleted files with *nix-based file systems. Operating systems have grown and evolved since the book was published, and a great deal of usability features have been added to operating systems and applications, significantly extending this fossilization. As such, while direct artifacts of user or threat actor interaction with an endpoint may not persist for long, fossilization may lead to indirect artifacts existing for days, months, or even years.

For example, let's say a threat actor connects to an endpoint; at that point, there is likely some process in memory, which may not exist for long. That process memory will be allocated, used, and then freed for later use, and given how "noisy" Windows systems are, even when apparently idle, that memory may be reused quickly. However, direct artifacts from the connection will very often be logged, depending upon the means and type of access, the audit and logging configuration of the endpoint, etc. If this process results in the threat actor interacting with the endpoint in some way, direct and indirect artifacts will be logged or "fossilized" on the endpoint, and depending upon the configuration, use, and subsequent interaction with the endpoint, those fossilized artifacts may exist for an extended period of time, even years.

Monday, June 19, 2023

The Need for Innovation in DFIR

Barely a week goes by and we see another yet post on social media that discusses knowledge sharing or "training" in cybersecurity, and in particular, DFIR and Windows forensic analysis. However, many times, these posts aren't "new", per se, but instead share information that is dated.

Now, there's nothing wrong with what many perceive to be "old" or "dated" information because the simple fact is that core principles simply don't change over time. However, there are more tactical elements of "analysis" (really, data parsing and presentation for analysis) that may, in fact, change over time. This is particularly true for Windows systems, particularly as it applies to the various builds available for Windows 10; analysts saw the creation or availability of a particular artifact (i.e., the "BAM" Registry key) in one build, only to no longer see it populated in another build.

Something else we see is an incorrect or incomplete use of terminology, which in some cases seems to be almost willful. We see this a lot when it comes to the Windows Registry, but that's really fodder for it's own post/article. However, there are posts like this one, that purports to share "important Windows directories", and the first six items listed are files. Further, items 4 through 6 are not "logs". Over the past several months, I've seen that particular list posted multiple times in LinkedIn, and just last week, it somehow made its way into my Twitter feed, unfortunately.

Something else we see quite often references the Windows Event Logs, and claims to share important records, but only presents those records based on event IDs. The issue here is that event IDs are not unique. While most of us are familiar with event ID 4624, it means one thing when the event source is "Microsoft-Windows-Security-Auditing", and something else entirely when the event source is "Microsoft-Windows-EventSystem".

So What?
Okay, so what? Who cares? We see this all the time, it's pretty common...no, I take that back, it's really common, everyone's doing it, so what?

Well, I think we can all agree that the bad guys are innovating day-in and day-out, and the only way the "blue" side is going to keep up or even get ahead is by innovating in our own way. As such, the first step is to move beyond the "...this is the way we've always done it..." approach, to use that as a stepping stone or launching point for implementing a new model for DFIR analysis, one that bakes the lessons of the last incident back into the process so that those lessons are applied to subsequent incidents/analysis.

We see some application of an approach like this for the SOC with the development of SOAR capabilities, and while DFIR can be thought of as an additional tier of the SOC, or as a standalone service, this isn't something that has seen much in the way of development within DFIR. Many (albeit not all) DFIR shops follow the same basic model...collect data, assign it to an analyst to parse and provide their findings. Unfortunately, what we tend to see in consulting firms and mirrored by some in-house teams, is a 1:3 or even 1:4 ratio between analysts and licensed forensic suites, with each analyst following their own process. Many times, this process is not documented, and simply run from memory; add to that the disparity in knowledge and experience between analysts working in isolation, and you can see how this model is inefficient and error-prone.

So, let's stop following the old way of doing things. Let's stop referring to Windows Event Log records solely by event ID. Let's stop viewing individual artifacts or data sources in isolation, and instead start correlating artifacts and events, looking at the side-by-side based on time stamps. If we then use a common set of tools, or a common framework for parsing and correlating data sources, we can then build on that process and begin decorating and enriching data, much like what I've been doing with Events Ripper (it's important to note that I'm not the only who can write Events Ripper plugins; so far, I'm just the only one who does).

This way, we can use automation to build on and share our own experiences, so that others may benefit from those experiences, without having to go through the investigative process themselves. This gets analysts to the point of actually conducting analysis sooner, so that there is more of an opportunity to discover new artifacts and associations, in a much more efficient manner.

Monday, June 05, 2023

Events Ripper Update

Yet again, recent incidents have led to Events Ripper being updated. This time, it's an updated plugin, and a new plugin.

appissue.pl - I updated this plugin based on Josh's finding and Tweet; I can't say that I've ever seen this event before, but when Josh mentioned it, I thought, hey, this is a great way to go about validating activity! Okay, so here's a batch file, and we see commands run via EDR telemetry...but do they succeed?? We may assume that they do, but it's a pretty straightforward process to validate these findings; in the incident that Josh reported, it turns out that the driver being loaded failed because it was blocked. Correlate that event with the other two events that Josh pointed out, and you've got pretty decent evidence indicating that while an infection was attempted and the driver was created within the file system, it's not loading. This gives us some headspace for our investigation, and provides evidence we can report to regulatory oversight bodies, etc.

sec5381.pl - I created this plugin as a result of analysis conducted during a recent investigation into the use of credential theft tools. We'd seen the use of a number of credential theft tools...lazagne, mimikatz, VNCPassView, PasswordFox64, etc. Some of these triggered alerts, so I started by creating a timeline from the EDR telemetry. As our telemetry does a great job of illustrating the process tree, I was able to tie all of these alerts, as well as other processes identified by analysts hunting for additional activity, to the original login session. As I began using the process creation time stamps to pivot into a timeline created from WEVTX data, I began to see several Microsoft-Windows-Security-Auditing/5381 records; in two instances, these events correlated to the use of WebBrowserPassView and IEPV.

What's interesting about this event record is that it includes the logon ID; I ran the following commands:

type events.txt | find "<logonID>" > id_events.txt
parser -f id_events.txt > id_tln.txt

I now had a micro-timeline of just the events associated with the unique logon ID, so I could see the breadth of the activity (i.e., when it started, when it ended, what occurred during that time, etc.). Depending upon the audit configuration of the endpoint, there are a number of other event records that also contain the logon ID, and would be included in our micro-timeline. However, even without those, we can fill in "gaps" via other means, such as just looking at the span of time, from the "oldest" to the most recent event from our "logon ID" search.

By the way, this approach is nothing new. I've used this technique...used simple DOS commands...to navigate, pivot, and get a specific view into timelines for some time now. This is part of the reason why I persist in maintaining the intermediate file ("events.txt") in a flat text file; it's easy to parse and enumerate with simple commands (no SQL queries needed), it's easy to combine multiple timelines, and the files compress really well for storage and shipping. Because they're text-based, they also don't need any special tools to access; doing a search in Notepad++, for example, I can choose the "Count" option to see how many times the search term appears in the file, and I can search backward to find the earliest event associated with that term.

Thursday, June 01, 2023

Events Ripper Update

Working a recent incident, I came across something very unusual. I started by going back into a previous investigation run against the endpoint that had been conducted a month ago, and extracting the WEVTX files collected as part of that investigation. So, the WEVTX files were retrieved from the endpoint on 30 Apr, and when I created the timeline, I found that the four most recent time segments were from 1 June 2024...that's right, 2024!

As I was using some of the indicators we already had (file and process names) to pivot into the timeline, I saw that I had Security Event Log records from 2020...now, that is weird! After all, it's not often that I see Security Event Log records going back a week or month, let alone 3 years!

Another indicator was the sessions.pl output from Events Ripper; I had logins lasting 26856 hours (1119 days), and others lasting -16931 hours (over 705 days). Given how the session span is calculated, I knew some was "off" in the Security (and very likely, other) Event Logs, particular the records associated with logon and logoff events.

I knew something was up, but I also knew that finding the "what was up" was also based largely on my experience, and might not be something a new or more junior analyst would be familiar with. After all, if an analyst was to create a timeline (and I'm seeing everyday that's a pretty big "if"), and if they were pivoting off of known indicators to build context, then how likely would it be that they had the experience to know that something was amiss?

So, naturally, I wrote an Events Ripper plugin (timechange.pl) to pull Security-Auditing/4616 event records from the Security Event Log and display the available information in a nice table. The plugin collects all of these events, with the exception of sub-second time changes (which can be fairly common), and displays them in a table showing the user, the time changed from, the time changed to, and via which process. I wrote the plugin, and it produced an entry on the next investigation...not one that had much impact on what was going on, as the system clock was updated by a few minutes, but this simply shows me how the use of plugins like this can be very valuable for elevating interesting and important artifacts to the analyst for review without requiring that analyst to have extensive experience.