April was a month. New frontier models in both the US and China, Anthropic generating drama on what felt like a daily cadence, and supply chain attacks getting more creative (and more French). Thank you again to AFC for hand-collecting all of this.


Memory Management (and now Retrieval)

The “how do I handle all these md files” panic is officially universal. RAG and graphRAG are everywhere, but recall isn’t working — building these systems is one thing, getting them to actually work is another.

Self-Learning / Model Self-Help Corner

Hot take: we keep building memory systems and then watching them fail at recall. The bottleneck isn’t storage, it’s the retrieval step nobody wants to evaluate honestly.


Anthropic Drama Corner

Anthropic literally makes people panic every day to the point that it’s become a meme. Let’s go over the April drama in chronological order:

Side Plot: US Gov vs Anthropic

  • US Gov says no one can access Mythos
  • …except them.
Model Risk
Mythos Unauthorized Access (Apr 21)

Bloomberg reported unauthorized users hitting Mythos; Anthropic disputes the claim. If you’re inside Project Glasswing, audit your access logs.


New Model Releases (so many)

US Releases

  • GPT-5.4 Image 2 — GPT-5.4 with state-of-the-art image generation from Image 2
  • GPT-5.5 — OpenAI’s newest frontier model, SOTA for long-running work across code, data, and tools
  • Claude Opus 4.7 — Anthropic’s most capable Opus, built for long-running async agents
  • Muse Spark — high on claw eval but nowhere I can actually use it -_-
  • Gemma 4 — an amazing tiny model, perfect for local hosting (highly recommended if you want to get into the scene, start here)

Anecdotal verdict on Opus 4.7: stay on 4.6. 4.7 is so frustrating AND costly (1.3x higher). People keep saying Google is off its game, but they keep releasing hits — Anthropic just hoovers up the mindshare because they are the most dramatic.

Chinese Releases

  • DeepSeek V4 Pro & V4 Flash — IT’S HERE. Huge jump over V3.2, meeting or surpassing current SOTA across benchmarks.
    • DeepSeek seems to be indexing on companionship over coding.
    • Guardrails are still incredibly low. DeepSeek remains the winner for red teaming.
  • Kimi K2.6 — Moonshot AI’s long-horizon coding model built for sustained agentic work
  • Mimo Pro 2.5 — Mimo my beloved. This is my current favorite agentic model right now, and it scores high on claw eval (a transparent benchmark for real-world agents).

Vibehacking Means More Attacks

April was a big one.


Notable Incidents

Supply Chain
BuddyBoss / Claude

Read the BuddyBoss writeup if you ship anything that touches WordPress plugins or AI-assisted dev workflows. The transcript alone is worth the click.


AI Benchmarks Are Unreliable Now

Benchmaxxing is a thing. I don’t really look at benchmarks anymore — they feel like marketing fluff. How models are benchmarking and how they are performing (looking at you, Opus 4.7) is divergent.


OpenClaw Status

OpenClaw isn’t really the hype anymore, but:

  • State of the Claw — figure that interested me: 1200 vulns disclosed in roughly 2 months. With automated vuln scanning, no wonder OSS repo owners are closing up shop.
  • A Karpathy talk also dropped this month.
  • OpenClaw and Anthropic are in a never-ending slapfight. Codex is welcoming OpenClaw users, and almost every SOTA provider has their own “claw” now (Kimclaw, Mimoclaw).
  • The HERMES billing saga on the Anthropic git is insane (and not the only one).

Design with Claude

Claude Design is out! Surprise — it makes everything look like Claude. Here are some alternatives:


Thank you so much for reading. As always, the news roundup is HAND CURATED by AFC.


// END TRANSMISSION — ALANI-007 //