March 14, 2024

Inside the prediction of a bug that led to a CVE in Redis

by Mark Greene


Shepherdly actively monitors popular open-source repositories for research purposes. Below we’ll walk you through what information Shepherdly would have been able to provide the maintainers at the time and more importantly how surface-level statistics break apart traditional engineering tuition.

First, the following analysis is in no way a judgment on the quality of work by these maintainers. We’re selecting an example for analysis to elevate awareness of how objective risk measurements can assist software engineers in which changes are more or less deserving of their time.

Small PRs are safer and should go faster, right?

The pull request we’ll be analyzing today is At first glance, it’s a tightly scoped change. ~100 lines across 7 files, including tests.

Many engineers and engineering metrics would consider this a “small” change and therefore, lower risk (by default). Furthermore, smaller changes carry a compounding bias that they should be merged quickly. This change followed that trajectory with a total cycle time of about 1 day.

This change introduced CVE-2023-41056 and was later fixed here.

Shepherdly's Take

Our model classified #11766 as high risk with a score of 76/100. The predictors in this case were the size of the change and the number of commits from reviewers.

Since the vast majority of changes within teams are well below this risk level, it’s imperative to focus on mitigations when the risk is that high.

Traditional Mitigation Intuition Needs Help

A lot of prevailing wisdom for how to mitigate risk within code changes rests on review and testing. This should remain of course, but software engineering as a profession lacks the precision of where risk will manifest before changes ship. Consider this research paper that demonstrated developers were 8x more likely to find vulnerabilities when given a reason to do so.

What this highlights is that there is enormous latent ability within engineers to find and mitigate these issues, but they lack a definitive tool to justify where they should be allocating their time.

Table Of Contents

A. IdentifiersContact details, such as real name, alias, postal address, telephone or mobile contact number, unique personal identifier, online identifier, Internet Protocol address, email address, and account nameYES
B. Personal information categories listed in the California Customer Records statuteName, contact information, education, employment, employment history, and financial informationNO
C. Protected classification characteristics under California or federal lawGender and date of birthNO
D. Commercial informationTransaction information, purchase history, financial details, and payment informationNO
E. Biometric informationFingerprints and voiceprintsNO
F. Internet or other similar network activityBrowsing history, search history, online behavior, interest data, and interactions with our and other websites, applications, systems, and advertisementsNO
G. Geolocation dataDevice location
H. Audio, electronic, visual, thermal, olfactory, or similar informationImages and audio, video or call recordings created in connection with our business activitiesNO
I. Professional or employment-related informationBusiness contact details in order to provide you our Services at a business level or job title, work history, and professional qualifications if you apply for a job with usNO
J. Education InformationStudent records and directory informationNO
K. Inferences drawn from other personal informationInferences drawn from any of the collected personal information listed above to create a profile or summary about, for example, an individual’s preferences and characteristicsNO
L. Sensitive Personal InformationNO