In cybersecurity, anomaly detection promises to identify threats by flagging behavior that is “out of the ordinary.” The practical problem, as you’ve likely experienced, is the flood of false positives when trying to detect malicious command lines—noise, alert fatigue, and wasted time.
The research presented by Sophos at Black Hat USA 2025 offers an interesting twist: don’t use anomaly detection as your sole “hunter,” use it as a source of uncommon benign data to better train your supervised classifiers. The result? Far fewer false positives and more focus on what’s truly malicious.
The key idea is to feed (not replace) your supervised models with anomalous but benign commands. To do this, Sophos combined two components:
Counterintuitively, the success doesn’t hinge on anomaly detection finding the malicious items—it’s about identifying “benign rarity” to expand the model’s understanding of what is normal in your environment. This “catalog of complex benigns” is what drastically reduces false positives.
During January 2025, the research processed more than 50 million daily command lines with two ingestion and “featurization” approaches:
Advantage: Complete coverage and high granularity.
Advantage: Much lower cost and simpler deployment.
Both approaches worked, offering options depending on budget and compute time needs.
Learn more: What is Sophos and how does it improve enterprise cybersecurity?
After featurization, anomalies were identified using three unsupervised methods for robustness:
This ensemble avoids reliance on a single “rarity” criterion.
Figure 1: Cumulative distribution of command lines gathered per day over the test month using the full-scale method. The graph shows all command lines, deduplication by unique command line, and near-deduplication by cosine similarity of command line embeddings (Source: Sophos)
Many anomalies are near-identical variants (e.g., a parameter change). To prevent overweighting a pattern, they deduplicated candidates using embeddings (Jina) and cosine similarity, keeping only truly distinct anomalies before labeling.
The o3-mini reasoning LLM labeled each anomaly as benign or malicious. Manual validation later showed near-perfect benign accuracy for an entire week’s worth of data—enough to integrate benigns directly into training datasets with minimal human intervention.
Operational takeaway: You can expand your “good” dataset without hiring an army of analysts, and with statistical confidence.
Models were evaluated with two benchmarks:
Baselines compared:
Gains from adding “anomaly-derived benigns”:
In every case, noise (false positives) dropped and useful detection increased.
Figure 2: Cumulative distribution of command lines gathered per day over the test month using the reduced-scale method. The reduced scale plateaus slower because the sampled data is likely finding more local optima
Read more: Sophos NDR (Network Detection and Response)
The key takeaway from the research is powerful: using anomalies to expand benign datasets—rather than blindly “guessing the bad”—changes the game. With that diverse benign data feeding your classifiers, false positives drop, SOC teams can breathe, and your staff can focus on the critical alerts.
At TecnetOne, as certified Sophos partners, we’re here to help you and your company stay ahead in technology with top-quality security services.