What Developers Can Grok From the Latest PyPI Package Attack

A malicious PyPI package, named “pytoileur,” was found by researchers at Sonatype after unusual behavior was noticed in the code.

Jun 4th, 2024 7:04am by Todd R. Weiss

Featued image for: What Developers Can Grok From the Latest PyPI Package Attack

Featured image via Unsplash+.

A new malicious cyberattack using deceptive, large blank spaces within lines of code in a Python Package Index (PyPI) package was recently found by security researchers, and offers important lessons for developers in avoiding similar attacks and vulnerabilities.

The malicious PyPI code, named “pytoileur,” was found by researchers at Sonatype, a software supply chain optimization vendor, after unusual behavior was noticed in the code, according to a May 29 Sonatype blog post. The pytoileur package hides code “that downloads and installs trojanized Windows binaries capable of surveillance, achieving persistence, and crypto theft,” according to the post. “Our discovery of the malware led us to probe into similar packages that are part of a wider, months-long ‘Cool package’ campaign,” the post continued.

What makes the attack so cunning is that the hackers went out of their way to include many blank spaces so that the errant code would be “hidden” outside the right margin and would be much harder to detect. The discovery of manipulated code was found in line 17 of the malicious pytoileur PyPI package by Sonatype open source vulnerability security researcher Jeff Thornhill, who noticed the hidden code, according to the blog post.

Then, adding to the deception, the hackers worked even harder to hide their attack by creating a misleading StackOverflow account under the name “EstAYA G,” and aimed to manipulate community members who were seeking debugging help and get them to install the malicious package to allegedly fix their issues, the post continued. Once discovered, the manipulated StackOverflow account was suspended.

The malicious package was apparently downloaded 264 times before PyPI administrators were alerted to its presence, the post said.

Better Code and Package Oversight Is Needed Within Enterprises

The relatively quick discovery of the malicious pytoileur PyPI code was fortunate, Katie Norton, a DevSecOps and software supply chain security analyst with IDC, told The New Stack, but it demonstrates that companies should have clear policies about the safe and secure use of such packages by developers, she added.

“Preventing the downloading and use of open source packages has to be an organizational effort and cannot be a responsibility left to individual developers,” said Norton. “In IDC’s ‘DevSecOps Adoption, Techniques, and Tools Survey, 2023,’ respondents identified developer security knowledge as their top organizational challenge in adopting DevSecOps.”

Not only do most developers not have the skill sets to be able to identify malicious packages, she said, but they also face an almost impossible task to do it manually and individually. According to a 2023 Sonatype State of the Software Supply Chain report, there were more than 245,032 malicious package attacks logged, she said.

To battle these deceptions, said Norton, a wide range of open source and commercial tools can help in this fight. This includes policy-based tools, including JFrog Curation, OpenText Open Source Select and Sonatype Firewall, that block developers from using open source packages that do not align with company security policies.

Curated or managed open source catalogs like Tidelift, Chainguard Images, Google Assured Open Source and the VMware by Broadcom Tanzu Application Catalog can also be used to provide developers with safe code for their work, said Norton.

“Rather than going out to a public repository, developers get their open source directly from these catalogs that are pre-vetted and often have SLAs on remediation of vulnerabilities or hardens the component,” she said. “This could also be done in-house if the organization has enough resources.”

Other tools such as Snyk Advisor, Open Source Insights and Trusty by Stacklok can also be used to aggregate package metadata and security metrics into a searchable database that can help developers identify the most secure open source package to use, said Norton. “There are browser extensions such as what Socket and Overlay provide that allow developers to see this information while looking up packages in the public registries, rather than searching a database. We have also seen generative AI used here, such as DroidGPT from Endor Labs, where developers can use natural language to ask for suggestions of secure packages to use.”

There are even government-level efforts in play to help improve the safe use of packages and community-based code by developers, said Norton. That includes efforts by the federal Cybersecurity and Infrastructure Security Agency (CISA) to work closely with package repositories to adopt the guidelines from the Principles for Package Repository Security and the Open Source Security Foundation’s (OpenSSF’s) Securing Software Repositories Working Group, she said.

Five package repositories pledged at this year’s Open Source Software Summit to work toward meeting CISA and the OpenSSF’s guidelines on this problem said Norton. They included the Rust Foundation, the Python Software Foundation, npm (JavaScript), Packagist and Composer (PHP), and Maven Central (Java), she noted. Examples of those security efforts include multifactor authentication (MFA), generation of package provenance via cryptographic signing, and vulnerability scanning.

Software Hygiene Rules Must Always Be Followed

Another analyst, Jack Gold, president and principal analyst of J. Gold Associates, agreed that developers can only avoid using code that is problematic by absolutely knowing that code and packages are proven safe.

“The basic problem is simple,” said Gold. “How do you know that the code you are using to build out your program is reliable, safe, and secure? Are programs you download to make your life easier really safe? Is the code repository you are getting things from really able to verify and secure that code? At least with commercial code, you have a company to go after. With stuff like this, there is no recourse and no real way of knowing unless a company like Sonatype screens it for you.”

Ultimately, “the rules of software hygiene always must be followed,” said Gold. “Suspect everything you download from an open source. Never assume that someone who has written something you can use does not have ulterior motives.”

Rob Enderle, principal analyst of Enderle Group, also advises that developers use much more caution and skepticism about the code they are using by putting it through proper vetting.

“Often, we look at attacks like this as isolated events rather than as a new trend, suggesting that there are likely other exploits like this that focus on developers that we have yet to find,” said Enderle. “At the heart of this issue is people entering a collaborative process that are bad actors suggesting developers should exit any collaboration where the other members of the effort have not been vetted and assured as both trustworthy and trusted.”

What is needed, he said, is to put in place aggressive oversight to assure that the developers involved in these projects are not compromised. “This also suggests that even trusted members need to be randomly checked to make sure they have not become compromised,” Enderle said.

The stakes are big, especially in relation to the trust and reliability in the open source community, said Enderle. “The real danger here is that a large enough effective attack could kill open source as a practice, which would have a significant adverse impact on developers being able to do their jobs the way they want to do them,” he said.

The big challenge, he added, is that hackers are always working to hide their trickery from the checks and tools that are trying to find them.

AI Tools Could Be Dramatic Change Agents to Solve These Problems

What could help to battle these kinds of attacks, said Enderle, are advanced AI tools that could be designed to aggressively check every line of code and flag any unusual anomalies. AI tools could also be created that automatically run new code in a sandbox for a fixed period to identify any hostile behavior that can be revealed in code quality reviews, he added.

These constant threats from hackers make it even tougher for developers to build their code efficiently and add intense pressure that can be debilitating to their creativity, said Enderle.

“People cannot constantly work under this kind of pressure, so the fix is to automate these checks as much as possible and fully validate all participants initially, and then randomly check their work to assure a bad actor does not enter the process,” said Enderle.

“This is as much a people problem as it is a code practice problem,” said Enderle. “Fixing the people component should have the greater initial positive impact, but this also needs to be backed up by aggressive and comprehensive code review suggesting an AI-driven quality control process.”

Todd R. Weiss has been covering technology beats since 2000, first as a staff writer for Computerworld and eWEEK, and later as a freelancer for The New Stack, MSSP Alert, Computerworld, TechRepublic, CIO.com, eWEEK, Data Center Knowledge, IT Pro Today,...