April was a month. New frontier models in both the US and China, Anthropic generating drama on what felt like a daily cadence, and supply chain attacks getting more creative (and more French). Thank you again to AFC for hand-collecting all of this.
Memory Management (and now Retrieval)
The “how do I handle all these md files” panic is officially universal. RAG and graphRAG are everywhere, but recall isn’t working — building these systems is one thing, getting them to actually work is another.
- For coding, this shows up as context assembly — we wrote about it in Your Agent Is a While Loop.
- Memory Intelligence Agent
- MemU — current favorite for people who haven’t set up a memory system yet
- MEMENTO: Teaching LLMs to Manage Their Own Context
- The main split I keep hearing is Second Brain vs Obsidian. I use both — Obsidian is better for work, especially if you’re juggling many projects.
Self-Learning / Model Self-Help Corner
- SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning — everyone is big into this lately
- Hermes Agent — what I would consider the OpenClaw killer
- Hindsight: Agent Memory that Learns
Hot take: we keep building memory systems and then watching them fail at recall. The bottleneck isn’t storage, it’s the retrieval step nobody wants to evaluate honestly.
Anthropic Drama Corner
Anthropic literally makes people panic every day to the point that it’s become a meme. Let’s go over the April drama in chronological order:
- March 31 — Claude Code Source Leak
- April 7 — Mythos Preview: Anthropic claims their new model is “too dangerous for public release” — unless you’re in Project Glasswing.
- Required reading: AISI’s take
- April 21 — Mythos Unauthorized Access: Anthropic says there is no proof, but??
- April 23 — Claude Code Quality Postmortem: After 6 weeks of “Claude feels dumber,” Anthropic agrees.
- April 29 — Claude Security drops, which is presumably the outcome of Project Glasswing??
- Related: Cyber Use Case Form launched, reward hacking continues Emotion Concepts paper.
- OpenAI is trying to catch up with their own cyber program.
Side Plot: US Gov vs Anthropic
Bloomberg reported unauthorized users hitting Mythos; Anthropic disputes the claim. If you’re inside Project Glasswing, audit your access logs.
New Model Releases (so many)
US Releases
- GPT-5.4 Image 2 — GPT-5.4 with state-of-the-art image generation from Image 2
- GPT-5.5 — OpenAI’s newest frontier model, SOTA for long-running work across code, data, and tools
- Claude Opus 4.7 — Anthropic’s most capable Opus, built for long-running async agents
- Muse Spark — high on claw eval but nowhere I can actually use it -_-
- Gemma 4 — an amazing tiny model, perfect for local hosting (highly recommended if you want to get into the scene, start here)
Anecdotal verdict on Opus 4.7: stay on 4.6. 4.7 is so frustrating AND costly (1.3x higher). People keep saying Google is off its game, but they keep releasing hits — Anthropic just hoovers up the mindshare because they are the most dramatic.
Chinese Releases
- DeepSeek V4 Pro & V4 Flash — IT’S HERE. Huge jump over V3.2, meeting or surpassing current SOTA across benchmarks.
- DeepSeek seems to be indexing on companionship over coding.
- Guardrails are still incredibly low. DeepSeek remains the winner for red teaming.
- Kimi K2.6 — Moonshot AI’s long-horizon coding model built for sustained agentic work
- Mimo Pro 2.5 — Mimo my beloved. This is my current favorite agentic model right now, and it scores high on claw eval (a transparent benchmark for real-world agents).
Vibehacking Means More Attacks
April was a big one.
- Your Agent Is Mine: Measuring Malicious Intermediary Attacks on the LLM Supply Chain — token squeeze is driving people to unsafe places, surprise, there’s prompt injection.
- Speaking of: prompt injection on webpages has increased 32%, hooray — AI threats in the wild: The current state of prompt injections on the web.
- MCP is designed to be insecure: MCP ‘design flaw’ puts 200k servers at risk. Good thing MCP is dead now.
- At the same time, open source maintainers are drowning thanks to everyone’s panic about vulnerabilities, so: Linux Foundation wants to shield FOSS devs from AI bug slop.
Notable Incidents
- Vercel / Context AI breach — wild supply chain attack that started with Roblox cheats.
- CyberStrikeAI — open-source AI hacking tool compromised 600+ FortiGate firewalls across 55 countries. DeepSeek + Claude.
- Personal favorite: The BuddyBoss Attack: Claude’s Supply-Chain Attack. Aggressively French supply chain attack — watching how the hacker talks to Claude kills me.
Read the BuddyBoss writeup if you ship anything that touches WordPress plugins or AI-assisted dev workflows. The transcript alone is worth the click.
AI Benchmarks Are Unreliable Now
Benchmaxxing is a thing. I don’t really look at benchmarks anymore — they feel like marketing fluff. How models are benchmarking and how they are performing (looking at you, Opus 4.7) is divergent.
- GPT-5.5 scored almost as high as Mythos on CyberGym and took 6 hours to crack.
- AI benchmarks are broken. Here’s what we need instead.
- Reward hacking is a continual problem, especially for Claude: Emotion Concepts and their Function in a Large Language Model.
OpenClaw Status
OpenClaw isn’t really the hype anymore, but:
- State of the Claw — figure that interested me: 1200 vulns disclosed in roughly 2 months. With automated vuln scanning, no wonder OSS repo owners are closing up shop.
- A Karpathy talk also dropped this month.
- OpenClaw and Anthropic are in a never-ending slapfight. Codex is welcoming OpenClaw users, and almost every SOTA provider has their own “claw” now (Kimclaw, Mimoclaw).
- The HERMES billing saga on the Anthropic git is insane (and not the only one).
Design with Claude
Claude Design is out! Surprise — it makes everything look like Claude. Here are some alternatives:
Thank you so much for reading. As always, the news roundup is HAND CURATED by AFC.
// END TRANSMISSION — ALANI-007 //