As many as 100 malicious artificial intelligence (AI)/machine learning (ML) models have been discovered in the Hugging Face platform.
These include instances where loading a pickle file leads to code execution, software supply chain security firm JFrog said.
“The model’s payload grants the attacker a shell on the compromised machine, enabling them to gain full control over victims’ machines through what is commonly referred to as a ‘backdoor,'” senior security researcher David Cohen said.
“This silent infiltration could potentially grant access to critical internal systems and pave the way for large-scale data breaches or even corporate espionage, impacting not just individual users but potentially entire organizations across the globe, all while leaving victims utterly unaware of their compromised state.”
Specifically, the rogue model initiates a reverse shell connection to 210.117.212[.]93, an IP address that belongs to the Korea Research Environment Open Network (KREONET). Other repositories bearing the same payload have been observed connecting to other IP addresses.
In one case, the authors of the model urged users not to download it, raising the possibility that the publication may be the work of researchers or AI practitioners.
“However, a fundamental principle in security research is refraining from publishing real working exploits or malicious code,” JFrog said. “This principle was breached when the malicious code attempted to connect back to a genuine IP address.”
The findings once again underscore the threat lurking within open-source repositories, which could be poisoned for nefarious activities.
They also come as researchers have devised efficient ways to generate prompts that can be used to elicit harmful responses from large-language models (LLMs) using a technique called beam search-based adversarial attack (BEAST).
In a related development, security researchers have developed what’s known as a generative AI worm called Morris II that’s capable of stealing data and spreading malware through multiple systems.
Morris II, a twist on one of the oldest computer worms, leverages adversarial self-replicating prompts encoded into inputs such as images and text that, when processed by GenAI models, can trigger them to “replicate the input as output (replication) and engage in malicious activities (payload),” security researchers Stav Cohen, Ron Bitton, and Ben Nassi said.
Even more troublingly, the models can be weaponized to deliver malicious inputs to new applications by exploiting the connectivity within the generative AI ecosystem.
The attack technique, dubbed ComPromptMized, shares similarities with traditional approaches like buffer overflows and SQL injections owing to the fact that it embeds the code inside a query and data into regions known to hold executable code.
ComPromptMized impacts applications whose execution flow is reliant on the output of a generative AI service as well as those that use retrieval augmented generation (RAG), which combines text generation models with an information retrieval component to enrich query responses.
The study is not the first, nor will it be the last, to explore the idea of prompt injection as a way to attack LLMs and trick them into performing unintended actions.
Previously, academics have demonstrated attacks that use images and audio recordings to inject invisible “adversarial perturbations” into multi-modal LLMs that cause the model to output attacker-chosen text or instructions.
“The attacker may lure the victim to a webpage with an interesting image or send an email with an audio clip,” Nassi, along with Eugene Bagdasaryan, Tsung-Yin Hsieh, and Vitaly Shmatikov, said in a paper published late last year.
“When the victim directly inputs the image or the clip into an isolated LLM and asks questions about it, the model will be steered by attacker-injected prompts.”
Early last year, a group of researchers at Germany’s CISPA Helmholtz Center for Information Security at Saarland University and Sequire Technology also uncovered how an attacker could exploit LLM models by strategically injecting hidden prompts into data (i.e., indirect prompt injection) that the model would likely retrieve when responding to user input.