The CAUSALITY Project: Achieving Intrusion Prediction

Causality Engineer, OpenDR

We use a bewildering and growing number of complex methods in an attempt to identify which CVEs are the ones that present the greatest technical or business risk. CVE volume increases year by year and some of our methodologies were developed in prior decades, when CVE volume was a fraction of what it is today. With nearly forty thousand new CVEs per year, many teams are groaning under the load, and backlogs are at historic highs. We can't predict which CVEs are going to go 'hot' in the future - but what if we could? This is the story of the CAUSALITY project. CAUSALITY is an intrusion prediction model that is successfully predicting CVEs being added to exploitation watchlists. At the time of this writing, the model has made 33 correct predictions with early warning lead times between one and four months. These early warnings allow us to actually shift "left of boom" and live our best lives. Every incident response we turn into incident avoidance gives time back to busy DevOPS teams while reducing business risk.

Outline:

1. The goal: predict superhot CVEs for use in detection processing or risk avoidance

2. CVE and vuln density growth over the past decade

4. If the growth curve continues, or increases, what does that look like?

5. The needle: less than a half percent of CVEs are whitelisted

6. Limitations of existing human analysis and why these can't get us there;some analysis in a notebook

6a. Exploitability, impact and scores

6b. Severities

6c. Correlations between metrics

6d. Search space reduction with categorical fields

6e. Search space reduction with numeric fields

6f. Comparing numeric metrics between listed and non-listed CVEs

6g. Comparing categorical metrics between listed and non-listed CVEs

7. Existing methodologies yield search space reduction to a subset of 33 - 45% of the population

8. ML based prediction and labeling

9. ML model results: 82% of watchlisted CVEs in 2024 can be predicted to be in 12% of the population

10. Where to find the prediction output (on Github)