The Bug That Won't Die:
10 Years of the Same Mistake
A decade of deserialization vulnerabilities (and why we keep making them)
There are now multiple publicly available exploit scripts (I forked one on GitHub here) for the React and Next.js vulnerabilities (CVE-2025-55182 and CVE-2025-66478).
The underlying issue is data serialization/deserialization, which evoked thoughts about a blog I wrote in 2016, addressing the same issue (at the time, the topic was CVE-2015-4852, a serialization flaw in Java objects that affected Oracle and Apache products).
2 Risk Takeaways
- The exploit pattern repeats because serialization is a straightforward method for transferring data, and developers typically use what works. Coders use different languages and frameworks, yet the same class of vulnerability persists. The upstream opportunity here is for universities to aggressively drive security into all programming courses.
- Everyone is a coder now, and security domain expertise has never been more important. Every business function will include AI-assisted coders, supercharging productivity and efficiency. LLMs don’t need to stop for human input, but understanding internet plumbing, tools, platforms, and security implications is now crucial. The most valuable employees can use AI for 10x+ impact AND catch potential issues as humans become the AI-copilots.
Technical Causation
- Serialization is seductive: It’s the easy path for passing complex objects across trust boundaries (client ↔ server, service ↔ service). Developers reach for it because it “just works” (until it catastrophically doesn’t).
- Framework abstraction hides the danger: Some percentage of Next.js developers using Server Actions are unaware that they’re invoking a custom serialization protocol. They’re calling a function. The risk is invisible until it’s exploited.
- The ecosystem never learns collectively: Java shops learned (painfully) about gadget chains and ObjectInputStream. However, that institutional knowledge didn’t necessarily transfer to Node.js/React developers building RSC implementations a decade later.
The Threat
The attack surface has expanded once again. In 2015, we were tracking exploit chatter on Chinese forums appearing weeks before CVEs. Now, there are double-digit public GitHub repositories with weaponized exploit code within days of disclosure. Agentic workflows will soon compress that window down to minutes. The time-to-exploitation will amount to the time defenders spend reading about any new high-severity vulnerability.
Defender Considerations for CVE-2025-55182 / CVE-2025-66478
- Attackers differentiate vulnerable App Router targets from safe Pages Router sites by checking for window.__next_f vs __NEXT_DATA__. Your asset inventory should already know which flavor you’re running.
- The vulnerability lives in the Flight protocol deserialization. If you’re not using Server Actions, consider disabling them. If you are, the endpoint (Next-Action header targets) is where to focus WAF rules.
- Hunt for anomalous POST requests with Next-Action headers containing malicious multipart payloads targeting __proto__ or unusual serialized JSON structures. The exploit exfils via base64 in error digests.
- The core issue is in react-server-dom-webpack, react-server-dom-parcel, and react-server-dom-turbopack. Custom RSC implementations outside Next.js are equally exposed.
- RCE means immediate credential harvesting from environment variables, lateral movement via cloud metadata endpoints, and persistence via scheduled tasks or cron jobs. IR playbooks should assume full compromise.
For “Vibe Coders”: A Conversation with Claude Opus 4.5
Q: What’s a proper input validation methodology for deserializing data to avoid security failures?
A: Native serialization formats (Java ObjectInputStream, PHP unserialize, Python pickle, .NET BinaryFormatter) reconstruct objects with behavior, which means arbitrary code can execute during deserialization. The fix is using formats that only transfer data, not executable structures.
Safe alternatives:
1. Data-only interchange formats
- JSON (most common)
- Protocol Buffers (Google)
- FlatBuffers (zero-copy, fast)
- MessagePack (binary JSON, compact)
- CBOR (binary, IETF standard)
These parse into primitive types and structures—no object instantiation, no code execution.
2. Schema validation on top
- JSON Schema
- Zod / Yup (JavaScript)
- Pydantic (Python)
- marshmallow (Python)
Define exactly what shape the data must have. Reject anything that doesn’t match before your application logic ever sees it.
3. When you need objects, build them explicitly
Instead of:
obj = pickle.load(untrusted_data) # dangerous
Do:
data = json.loads(untrusted_data) # safe parse
obj = MyClass(**validated_schema(data)) # explicit construction
4. For config/markup
- TOML instead of YAML (no code execution tags)
- JSON5 for human-readable config
- Avoid YAML’s !!python/object or Ruby’s !ruby/object—or use safe loaders exclusively (yaml.safe_load())
The one-liner: If your serialization format can reconstruct arbitrary types, it’s a vulnerability. Stick to dumb data formats + explicit object construction.
Q: Do modern LLMs know how to securely validate data by default?
A: Technically, yes, practically, it’s complicated.
What’s in the training data:
- Security research, CVE databases, OWASP guidance
- Countless blog posts about pickle/unserialize dangers
- Secure coding documentation from every major vendor
So if you ask directly—”Is pickle safe for untrusted data?”—any decent LLM will say no.
The problem:
LLMs are also trained on millions of Stack Overflow answers, tutorials, and GitHub repos that use pickle.load(), unserialize(), and yaml.load() because they’re convenient. When someone asks, “How do I save this object to a file and load it later?” the pattern-match often lands on the dangerous-but-common approach.
What I’ve observed:
- Ask for “the fastest way” → often gets insecure code
- Ask for “production-ready” or “secure” → better results
- Ask without security context → coin flip
The real gap:
LLMs don’t reason about security—they predict likely tokens based on training. They won’t spontaneously say “wait, where is this data coming from?” the way an experienced developer might. The threat model isn’t automatic.
Practical advice for developers using LLM assistants:
- Explicitly ask for secure alternatives
- Ask “What are the security implications of this code?”
- Treat generated code like untrusted input—review it
So yes, the knowledge is in there. But so is a decade of insecure examples, and the LLM doesn’t inherently prioritize one over the other unless prompted.
Stay Ahead with Recorded Future
Learn how Recorded Future's Vulnerability Intelligence tracks CVE disclosures, monitors exploit availability across GitHub and underground forums, and prioritizes vulnerabilities based on active threats to your environment. The platform compresses your response window from days to hours.