MITRE ATT&CK Evaluations Return: More Coverage, More Nuance

MITRE released a new round of MITRE ATT&CK enterprise evaluations today. This round had a lot of big changes – first off, only 11 vendors participated, which is a drop off from the 19 that participated in 2024. Some of the most notable missing vendors include SentinelOne, Microsoft, and Palo Alto Networks. Overall, it seems as though some vendors prioritized their own internal product efforts over the evaluation, likely due to investment in other areas, market and economic dynamics, and changes in the landscape.

Forrester strongly believes in the power of unbiased, third-party evaluations, especially of security products. Security products can sometimes be a black box. Evaluations like these, especially when the data used is shared, make capabilities a little less opaque.

Round 7: Breaking New Ground

This round emulated Scattered Spider, a financially motivated cybercriminal collective, and Mustang Panda, a PRC espionage group.

The MITRE ATT&CK team made big changes to the infrastructure in the evaluation to make it more realistic to a real-world scenario. The environment had more endpoints and subnets which were built out into a more realistic and complex network topology. Much like last round when they introduced expanded coverage with macOS, this year they expanded coverage to the cloud in addition to Windows and Linux devices.

The evaluations also expanded the scope to additional telemetry sources like identity, email, and cloud. For example, some of the emulations included identity compromise through SSO and MFA as well as abuse of cloud services.

MITRE included unmanaged devices in the evaluation, which demonstrated a blind spot for many providers. Unmanaged devices emulates real-world environments where organizations have BYO devices without managed agents, 3rd party contractors accessing on-premises or remotely, or test networks where endpoints won’t run standard protections.

A nuance worth noting is that the vendor tools used in this round are disparate. In past years, most vendors tested their EDR tool, but in this round, there were a variety of modules used together. For example, Trend Micro used modules from its Vision One platform, including: endpoint security, network security, cloud security, and exposure management. WithSecure used its EPP, XDR, and exposure management capabilities. Cyberani used a combination of SIEM, XDR, TIP, sandbox analysis, and XDR, all part of its MDR service.

Detections Tests: why are we still dealing with hundreds of alerts?

There were two detection tests emulating Scattered Spider and Mustang Panda. Both leveraged an array of LOLBins, tool downloads, and many different devices across the network. Within the detections tests, they included the Reconnaissance tactic to expand the detection window, specifically phishing, which is new for this round.

Importantly, there’s a clear distinction between the vendors that provided multiple alerts and those that provided very few alerts, correlated with all context. Vendors like CrowdStrike, Cybereason, and ESET only generated a handful of detections for each scenario. Those that provided very few were not necessarily seeing less – instead, as is a theme across the industry, vendors are more effectively consolidating related alerts into single cases instead of inundating users with a disparate barrage of alerts. Others, like Sophos and Trend Micro, generated hundreds of alerts. Some of those may be suppressed in the console, as many fall into the medium or low categories. Even still, the market is moving towards the consolidation of alerts into cases and all vendors in this evaluation should be also.

Protections Tests

There were seven tests, one for each stage: credential theft, identity providers, unmanaged to managed devices, initial access malware execution, malware execution and lateral movement, false positives, and AWS compromise.

The goal of the protection tests wasn’t just to show a “stopping of the threat,” but to measure the impact; was the attack stopped before the threat actor had a chance to gain persistence or steal credentials? This shows the importance of not only detecting an attack in progress but stopping it before it exposes the environment.

The MITRE ATT&CK team also included a protection test that incorporated false positives. In this test, every single activity that took place was considered non malicious and was supposed to be reported on as such. If the vendor blocked a particular action, it was a false positive. Ideally, zero security alerts should be generated off that test. Of all the vendors, Cybereason, Cynet, and Sophos all blocked activity during that test, which were false positives.

Test 2, which focused on an adversary manipulating IdP trust relationships was dropped due to difficulty distinguishing legitimate administrative activities from malicious actions. This is why you’ll see no responses for that test if you’re looking at results.

The Need For Third-Party Testing

Given the many market conversations and the lower-than-average turnout in this round of testing, it’s worth addressing the future of third-party testing like this and its impact on the security community. Many practitioners Forrester speaks with struggle to interpret and understand the results of these evaluations, and for good reason: there’s a lot of data, and the MITRE ATT&CK team hasn’t made a judgment call on which outcomes signal better performance. Even still, tests like these are important – especially when they are given room to evolve.

MITRE ATT&CK made many changes in this round for the better: incorporating cloud, building a more realistic environment to test in, continuing to incorporate noise/false positive tests, and expanding coverage to reconnaissance. Forrester still sees a lot of value in these tests. While every practitioner may not have the time or resources to dig through the data, the testing is still important to push the detection and response vendors forward. The evaluation offers a critical lens into where visibility and prevention fall short – and where they each perform most effectively.

If you’re a Forrester client, book an inquiry or guidance session with either of us if you have questions about the results.

Source link