Flaw in R Creates Supply Chain Security Risks

Threat actors could exploit the vulnerability in the programming language to insert malicious code, according to HiddenLayer.

May 2nd, 2024 7:06am by Jeffrey Burt

Featued image for: Flaw in R Creates Supply Chain Security Risks

Feature image via Unsplash.

Developers already under siege by attacks on the software supply chain have another worry after researchers uncovered a high-severity security vulnerability in the popular open source R programming language.

The discovery of the flaw by analysts with AI security company HiddenLayer would let attackers exploit a weakness in the deserialization process in R to execute arbitrary code in a victim’s systems, essentially enabling them to write code or commands into the system.

Given the wide use of R in such industries as healthcare, finance, and government and — due to its use with large datasets — its growing presence in the AI and machine learning field, the ripple effects of an attack leveraging the flaw would be far-reaching.

The vulnerability — tracked as CVE-2024-27322 and given a severity rating of 8.8 out of 10 — also represents the latest software supply-chain threat to developers who are seeing bad actors trying everything from planting malicious packages into repositories like GitHub and the Python Package Index (PyPI) to takeover attempts of projects, such as the recent attempts targeting XZ Utils and at least one OpenJS Foundation project.

Cybersecurity firm ReversingLabs said in its 2024 State of Software Supply Chain Security report that the number of security threats circulating through open source package repositories between 2020 and 2023 jumped 1,300%, which included a 400 percent increase in threats found on PyPI.

Targeting the Supply Chain

As seen with the high-profile SolarWinds cyberattack in 2020, an attacker that can drop malicious code into the development process can see the reach of that code multiply when organizations downstream inadvertently use the compromised software, a danger that has increased with the growing use of open source software.

“As an attacker, what makes open source software most attractive to me is its ubiquitous use across the internet and the fact that I can find vulnerabilities by inspecting the source code,” Casey Ellis, founder and chief strategy office at Bugcrowd, told The New Stack. “This gives me a broader and more reliable target space, which creates options. Another attractive aspect of open source software for attackers is that many of these projects are unfunded and maintained solely by volunteers, which means vulnerabilities are more likely to go undiscovered and unfixed.”

The Ubiquity of R

In R’s case, its pervasive nature makes it an attractive target. Not only is it widely used in many industries, but it has more than 2 million users and an active user community with projects like Bioconductor for the biometrics field, which has more than 42 million downloads and 18,999 active site members.

In addition, the Comprehensive R Archive Network (CRAN) repository holds more than 20,000 packages and the R-project website links to another repository, R-force, which has more than 2,000 projects and 15,000 registered users.

“All of this is to say that the exploitation of a code execution vulnerability in R can have far-reaching implications across multiple verticals, including but not limited to vital government agencies, medical, and financial institutions,” HiddenLayer security researchers Kasimir Schulz and Kieran Evans wrote in their report.

A Flaw in Deserialization

The vulnerability found by HiddenLayer could be exploited by bad actors to allow them to create a malicious RDS (R Data Serialization) file and execute arbitrary code when the file is loaded and referenced. R uses a serialization format of its own. In serialization, a data structure or object is converted into a format that can be stored locally or sent over a network. Those objects can then be reconstructed — deserialized — for use when needed. The deserialization process is vulnerable to exploitation in certain situations, Schulz and Evans wrote.

“This vulnerability can be exploited through the loading of RDS … files or R packages, which are often shared between developers and data scientists,” the researchers wrote. “An attacker can create malicious RDS files or R packages containing embedded arbitrary R code that executes on the victim’s target device upon interaction.”

The flaw ties to related concepts in R that are called “lazy evaluation” and “promise objects.” With lazy evaluation, symbols, expressions, or variables are evaluated only when needed, and repeated evaluations are avoided, with the goal of optimizing the allocation of resources and improving system performance. The promise object is the focus of the evaluation.

The vulnerability in this process enables an attacker to create a promise object that includes malicious arbitrary code, and, because of lazy evaluation, the code is executed when the user references the symbol associated with the RDF file.

R’s “package building and sharing capabilities make it flexible and community-driven,” Schulz and Evans wrote. “However, a drawback to this is that not enough scrutiny is being placed on packages being uploaded to repositories, leaving users vulnerable to supply chain attacks.”

Caution Is Required

The team at R patched the vulnerability with the recent R v4.4.0 release, but Bugcrowd’s Ellis said developers need to be careful.

“This is significant because of the potential impact of exploitation, the number of code paths that appear to allow exploitation of the vulnerability, and the fact that R is often a ‘fast-and-loose’ language used in data and ML experimentation within organizations, meaning that the code which can end up in production is less likely to have been through the same kind of scrutiny as other production code,” he said.

Mayuresh Dani, manager of security research at Qualys’ threat research unit, told The New Stack that deserialized data have been a known weakness for years, dating back more than a decade. A presentation at a Black Hat conference in 2011 talked about Python .pkl files, the equivalent of serialized code.

“In fact, this is so prevalent that OWASP has assigned CWE-502: Deserialization of Untrusted Data to these types of attacks,” Dani said. “With the boom in AI/ML research, languages such as R that perform statistical analysis on large datasets are gaining importance.”

Jeffrey Burt has been a journalist for more than three decades, the last 20-plus years covering technology. During more than 16 years with eWEEK and in the years since as a freelance tech journalist, he has covered everything from data...