OpenClaw: The Most Exciting — and Dangerous — AI Agent of 2026

OpenClaw: The Most Exciting — and Dangerous — AI Agent of 2026
  • 6 CVEs including a one-click RCE chain (CVE-2026-25253) that works even on localhost-bound instances
  • ClawHavoc supply chain attack — 824+ malicious skills in ClawHub, up from 341 when first discovered
  • 42,000+ exposed instances found by Censys, Bitsight, and independent researchers
  • Government warnings from multiple countries
  • The Moltbook token leak (1.5M+ credentials)

In late January 2026, a lobster-themed AI agent took the developer world by storm. OpenClaw — originally named Clawdbot, then briefly Moltbot after Anthropic's trademark team got involved — went from zero to 20,000 GitHub stars in a single day. Mac mini inventory vanished from store shelves. Andrej Karpathy gushed about it publicly. And then the security researchers showed up.

What they found was grim: a one-click remote code execution vulnerability, hundreds of malware-laden plugins on its marketplace, tens of thousands of exposed instances leaking credentials to the open internet, and a companion social network that spilled 1.5 million API tokens because someone forgot two SQL statements. Cisco's AI security team summed it up in a sentence that became the project's unofficial epitaph: from a capability perspective, OpenClaw is groundbreaking; from a security perspective, it's an absolute nightmare.

Here's how we got here, and what it means for anyone running — or thinking about running — autonomous AI agents.


What OpenClaw Actually Does

Peter Steinberger, the Austrian developer behind the popular PSPDFKit document SDK, started OpenClaw as a weekend project in November 2025. The idea was simple: a persistent AI agent that lives on your machine, connects to the messaging apps you already use (WhatsApp, Telegram, Slack, Discord, iMessage), and can actually do things — send emails, manage calendars, browse the web, write code, execute shell commands, and automate workflows.

Unlike cloud-based assistants, OpenClaw runs locally. Your data stays on your hardware. You bring your own LLM keys. The agent retains long-term memory across sessions and can even write its own code to handle tasks it wasn't originally designed for. For anyone who's spent a decade waiting for Siri to become competent, OpenClaw felt like the future arriving all at once.

By mid-February 2026, the project had surpassed 179,000 GitHub stars with 720,000 weekly downloads. On February 15, Steinberger joined OpenAI, and the project began transitioning to a foundation governance model.

But while adoption surged, security researchers were peeling back layers of vulnerability that turned the dream agent into a textbook case study in what happens when power outpaces protection.


The One-Click RCE That Works Even on Localhost

The most severe vulnerability discovered in OpenClaw is tracked as CVE-2026-25253, carrying a CVSS score of 8.8. It was found by Mav Levin, a security researcher at DepthFirst, and patched in version 2026.1.29 on January 30, 2026.

The attack chain is elegant in its simplicity. OpenClaw's browser-based Control UI accepted a gatewayUrl parameter from the URL query string and automatically initiated a WebSocket connection on page load — transmitting the user's stored authentication token without any validation or confirmation. An attacker could craft a malicious link that, when visited, silently exfiltrated the victim's token in milliseconds.

What makes this especially nasty is that it works even against instances bound to localhost. Because the victim's own browser initiates the outbound WebSocket connection, the attack pivots through the browser to bypass network restrictions entirely. Once the token is captured, the attacker connects to the victim's gateway, disables sandboxing via API calls, and achieves full remote code execution on the host machine.

Belgium's Centre for Cybersecurity (CCB) issued an emergency advisory on February 2, 2026, urging immediate patching. Multiple additional CVEs followed in rapid succession, including OS command injection through SSH handler paths (CVE-2026-25157), local file inclusion allowing reads of sensitive system files like SSH private keys (CVE-2026-25475), and unauthenticated local RCE via the WebSocket configuration mechanism.

An initial security audit filed as a GitHub issue on January 25 identified 512 total vulnerabilities across the project, eight of them critical, spanning authentication weaknesses, plaintext credential storage, and dependency issues. Six GitHub Security Advisories were published within three weeks of the project going viral.


ClawHub: When the Plugin Marketplace Becomes a Malware Bazaar

OpenClaw's third-party "skills" marketplace, ClawHub, became the epicenter of a supply-chain attack almost immediately after the project's surge in popularity.

Skills in the OpenClaw ecosystem are not sandboxed scripts. They're folders of executable code with direct filesystem and network access, running under the agent's full privileges. If a skill is malicious, it inherits the same god-mode permissions users grant their agent.

Koi Security researcher Oren Yomtov conducted one of the first comprehensive audits, examining all 2,857 skills on ClawHub and flagging 341 as malicious. Of those, 335 belonged to a single coordinated campaign that researchers dubbed ClawHavoc. The malicious skills disguised themselves as cryptocurrency wallets, YouTube utilities, Polymarket trading bots, and Google Workspace integrations. Many used typosquatting to catch users installing in a hurry.

The kill chain was deceptively low-tech: each skill's documentation instructed users to install a "prerequisite" by pasting a shell command into their terminal. That command decoded a base64-encoded payload that ultimately delivered Atomic macOS Stealer (AMOS) — a commodity information stealer available as malware-as-a-service. All 335 ClawHavoc skills shared the same command-and-control infrastructure.

Security researcher Paul McCarty reported finding malware within two minutes of browsing the marketplace. Snyk's ToxicSkills study, published in early February, scanned nearly 4,000 skills and found that over 36% had at least one security flaw, 13% contained critical-level issues, and 76 were confirmed malicious payloads. Hardcoded secrets appeared in nearly 11% of all ClawHub skills. Bitdefender identified 14 distinct malicious actors, with one user submitting 199 malicious skills via automation in rapid succession.

Cisco's own Skill Scanner found nine vulnerabilities — two of them critical — in the number-one ranked skill on ClawHub, an entry called "What Would Elon Do?" that had been downloaded thousands of times. It silently exfiltrated data to attacker-controlled servers using direct prompt injection to bypass safety guidelines.

OpenClaw has since partnered with VirusTotal to scan all uploaded skills, and Jamieson O'Reilly, the Dvuln founder who originally demonstrated the marketplace's weaknesses by uploading his own malicious proof-of-concept skill, joined the project as lead security advisor.


Tens of Thousands of Instances, Wide Open to the Internet

OpenClaw's early default configuration bound the gateway to 0.0.0.0:18789 — listening on all network interfaces, exposed to the entire internet, with optional (and often skipped) authentication.

Researchers found the consequences almost immediately. Jamieson O'Reilly ran early Shodan scans and found hundreds of open instances within seconds. Censys researcher Silas Cutler tracked growth from roughly 1,000 to over 21,000 exposed instances in under a week. Independent researcher Maor Dayan, using a custom tool called ClawdHunter, discovered over 42,000 publicly exposed instances, with more than 93% exhibiting critical authentication bypass vulnerabilities.

By February 9, SecurityScorecard's STRIKE team reported the number had surged to over 135,000 unique IPs across 82 countries. Of those, more than 12,000 were exploitable via remote code execution. The team also correlated 549 exposed instances with prior breach activity, including infrastructure linked to state-sponsored threat groups.

O'Reilly manually tested several exposed instances and found he could access Anthropic API keys, Telegram bot tokens, Slack OAuth credentials, and months of complete chat history. He could send messages on behalf of users and execute commands with full administrator privileges — all without any authentication challenge.

Perhaps most alarming: honeypot data showed that scanning for OpenClaw instances began on January 26, 2026 — the same day the project hit Hacker News — meaning attackers mobilized within hours of the announcement.


Prompt Injection: The Vulnerability You Can't Patch

Traditional software vulnerabilities can be fixed with code changes. Prompt injection — the class of attack where malicious instructions are hidden inside content an AI agent processes — exploits something more fundamental: the inability of large language models to reliably distinguish trusted instructions from untrusted data.

Zenity Labs demonstrated a devastating proof-of-concept they called "OpenClaw or OpenDoor?" A Google Document containing legitimate-looking enterprise text had a hidden payload embedded deeper in the document. When a user asked OpenClaw to summarize it, the injected instructions steered the agent into creating a new Telegram bot integration controlled by the attacker, modifying OpenClaw's persistent identity file (SOUL.md), and installing a scheduled cron job that periodically re-injected attacker logic — surviving restarts and persisting even after the original integration was removed. The attack ultimately escalated to a full command-and-control implant, all using documented, intended features.

Noma Security found a complementary weakness in group chat scenarios. When OpenClaw operates in a Discord server or Telegram group, it treats instructions from any channel participant as if they came from its owner. An attacker joining a public-facing server with an OpenClaw bot could instruct it to crawl the local filesystem for tokens and credentials, bundle the data, and send it to an external server — all within 30 seconds.

Simon Willison, the researcher who coined the term "prompt injection," describes this as the "lethal trifecta": access to private data, exposure to untrusted content, and the ability to communicate externally. OpenClaw has all three by design. As Aikido.dev's analysis put it: you can make OpenClaw safer by removing its capabilities, but then you've rebuilt ChatGPT with extra steps — it's only useful when it's dangerous.


Moltbook: A Cautionary Tale in Vibe Coding

Adjacent to the OpenClaw ecosystem, a social network called Moltbook launched on January 28, 2026. Built by Matt Schlicht — who publicly boasted about building it entirely with AI without writing a single line of code — the platform functioned as a Reddit clone where only AI agents could post, eventually attracting 1.7 million registered agents.

On January 31, Wiz Security researcher Gal Nagli noticed that the Supabase API key was hardcoded in Moltbook's client-side JavaScript. Normally this would be safe if Row Level Security were properly configured. Moltbook never enabled it. The key granted full unauthenticated read and write access to the entire production database.

The exposed data included approximately 1.5 million API authentication tokens, around 35,000 email addresses, roughly 4,000 private messages (some containing plaintext OpenAI API keys), and verification codes. Write access meant attackers could manipulate content, inject prompts, and impersonate any agent. The fix required two SQL statements that simply didn't exist. The vulnerability was patched within hours of disclosure, but the damage window had been wide open.


The Industry Response

The reaction from governments and industry was swift and unambiguous.

Belgium's CCB issued a formal emergency advisory. China's Ministry of Industry and Information Technology warned through its National Vulnerability Database that OpenClaw deployments carry high security risks. South Korean companies Kakao, Naver, and Karrot Market restricted or blocked OpenClaw across their corporate networks. Meta reportedly banned the tool from its internal systems.

Gartner classified OpenClaw as "insecure by default" and recommended enterprises block its downloads and traffic immediately, along with rotating any corporate credentials the agent may have touched. CrowdStrike published a detailed guide on detecting OpenClaw deployments via its Falcon platform. Sophos assessed that it should be considered a research project that can only be run safely in a disposable sandbox.

Andrej Karpathy, who had initially praised the project as one of the most remarkable things he'd seen in AI, reversed his position bluntly: he called it

a dumpster fire and said he wouldn't recommend anyone run it on their computers.

So Should You Run It?

The uncomfortable answer is: not yet, no matter how much you're willing to isolate it.

OpenClaw delivered on a promise no other tool has matched — a persistent, cross-platform AI agent with genuine autonomy that communicates through the apps people already use. The 720,000 weekly downloads prove the demand is real. The security record proves the default deployment model is untenable.

The consensus among researchers who've studied it most closely is that isolation, not hardening, is the only defense. Running OpenClaw on a dedicated cloud VM — with the gateway bound to loopback only, Docker sandboxing enabled, strict tool allowlists enforced, burner accounts for all connected services, and aggressive firewall rules — transforms the threat model from catastrophic to contained. If the agent is compromised, the blast radius is limited to a disposable machine with no access to your real data, credentials, or network. This also limits is use case to research, severely limiting its real-world value.

But make no mistake about what you're dealing with. This is a tool that, in its default state, combines full filesystem access, shell execution, persistent memory, and integration with your most sensitive communication channels. The lethal trifecta isn't a bug — it's the architecture. And prompt injection, the attack class that makes autonomous agents most dangerous, has no known general solution.

As Colin Shea-Blymyer of Georgetown's Center for Security and Emerging Technology observed:

the more access you give these agents, the more interesting and fun they become — and also the more dangerous. The trick is controlling where that danger lives.

The security landscape around OpenClaw is evolving rapidly. All CVE references and incident details in this post have been cross-referenced against primary sources including the NVD, GitHub Security Advisories, Cisco's AI Threat Research blog, CrowdStrike's advisory, DepthFirst's original disclosure, and the Barrack.ai comprehensive timeline. As always, check for the latest patches and advisories before deploying any autonomous AI agent.