I recently received a more pointed question regarding the use of AI in DFIR, asking if it could be used to develop investigative plans, or to identify both direct and circumstantial evidence of a compromise.
As I started thinking about the first part of the question, I was thinking to myself, "...how would you create such a thing?", but then I switched to "why?" and sort of stopped there. Why would you need an AI to develop investigative plans? Is it because analysts aren't creating then? If that's the case, then is this really a problem set for which "AI" is a solution?
About a dozen years ago, I was working at a company where the guy in charge of the IR consulting team mandated that analysts would create investigative plans. I remember this specifically because the announcement came out on my wife's birthday. Several months later, the staff deemed the mandate a resounding success, but no one was able to point to a single investigative plan. Even a full six months after the announcement, the mandate was still considered a success, but no one was able to point to a single investigative plan.
My point is, if your goal is to create investigative plans and you're looking to AI to "fill the gap" because analysts aren't doing it, then it's possible that this isn't a problem for which AI is a solution.
As to identifying evidence or artifacts of compromise, I don't believe that's necessarily a problem set that needs AI as the solution, either. Why is that? Well, how would the model be trained? Someone would have to go out and identify the artifacts, and then train the model. So why not simply identify and document the artifacts?
There was a recent post on social media regarding investigating WMI event consumers. While the linked resource includes a great deal of very valuable information, it's missing one thing...specific event records within the WMI-Activity/Operational Event Log that apply to event bindings. This information can be found (it's event ID 5861) and developed, and my point is that sometimes, automation is a much better solution than, say, something like AI, because what we see at the 'training set' is largely insufficient.
There was a recent post on social media regarding investigating WMI event consumers. While the linked resource includes a great deal of very valuable information, it's missing one thing...specific event records within the WMI-Activity/Operational Event Log that apply to event bindings. This information can be found (it's event ID 5861) and developed, and my point is that sometimes, automation is a much better solution than, say, something like AI, because what we see at the 'training set' is largely insufficient.
What do I mean by that? One of the biggest, most recurring issues I continue to see in DFIR circles is the misrepresentation (some times subtle, some times gross) of artifacts such as AmCache and ShimCache. If sources such as these, which are very often incomplete and ambiguous, leaving pretty significant gaps in understanding of the artifacts, are what constitutes the 'training set' for an AI/LLM, then where is that going to leave us when the output of these models is incorrect? And at this point, I'm not even talking about hallucinations, just models being trained with incorrect information.
Expand that beyond individual artifacts to a SOAR-like capability; the issues and problems simply become compounded as complexity increases. Then, take it another step/leap further, going from a SOAR capability within a single organization, to something so much more complex, such as an MDR or MSSP. Training a model in a single environment is complex enough, but training a model across multiple, often wildly disparate environments increases that complexity by orders of magnitude. Remember, one of the challenges all MDRs face is that what is a truly malicious event in one environment is often a critical business process in others.
Okay, let's take a step back for a moment. What about using AI for other tasks, such as packet analysis? Well, I'm so glad you asked! Richard McKee had that same question, and took a look at passing a packet capture to DeepSeek:
The YouTube video associated with the post can be found here.
Something else I see mentioned quite a bit is how AI is going to impact DFIR, by allowing "bad guys" to uncover zero day exploits. That's always been an issue, and I'm sure that the new issue with AI is that bad guys will cycle faster on developing and deploying these exploits. However, this is only really an issue for those who aren't prepared; if you don't have an asset inventory (of both systems and applications), haven't done anything to reduce your attack surface, haven't established things like patching and IR procedures...oh, wait. Sorry. Never mind. Yeah, that's going to be an issue for a lot of folks.