Cleaning up your Anki deck with Claude Code

A skill that finds your worst Anki cards and helps you fix them.

This month I officially crossed two years on my Anki streak. 🎉

Not to brag, but (to brag) that makes me 77th out of about 20,000 people who use the Anki leaderboard. You know, if you care about that sort of thing (which I obviously don’t).Methinks the lady doth protest too much.

My Anki leaderboard row: rank 77 (A+), a 2-year (751-day) streak, ~102 reviews per day, 88% retention.

So it’s safe to say Anki is a part of my life now. We’ll stay together until we die. Um, until I die. Or maybe superintelligence takes over and puts all the open-source software behind a paywall I can’t afford (that’s my version of an AI-dystopian nightmare).

My terrible cards

Unfortunately, over the years I’ve made some truly terrible cards. Here are three of my worst—each bad for a different reason.

Terrible Card #1:

Front: Searle equated computer programs to what linguistic component?

Back: Syntax.

“Linguistic component”. What was I thinking? What does that even mean? A component of language. Grammar? Morphology?… the cue is not even close to pinning down the answer. One fix would be something like “Searle argues that a program has ___, but no semantics.”

Terrible Card #2:

Front: A function $f$ is measurable on the completion $\mathcal{F}^\mu$ of a $\sigma$-algebra $\mathcal{F}$ if and only if… (and sketch the proof).

Back: …it is a.e. equal to an $\mathcal{F}$-measurable function $g$. Proof: the completion contains every set whose symmetric difference with an $\mathcal{F}$-set is negligible; prove the claim for indicators, then simple functions, then pass to the limit with monotone convergence. Done.

For some unknown reason I thought that (a) I could memorize an entire proof on the back of a flashcard and (b) that this was a useful thing to do. I was in a dark place. This flashcard either needs the useful techniques extracted from it, or it should be deleted altogether.

And finally, my favourite:

Lord Farquaad from Shrek saying 'pick number 3, my lord'.

Terrible Card #3:

Front: Define strong artificial intelligence in one phrase.

Back: Machines can duplicate human cognition.

Two bad Searle-adjacent cards!?.. maybe I was sick that day. This one is particularly insidious because on the surface it seems like a perfectly reasonable card. But there is no “one phrase” for defining strong artificial intelligence, so it’s virtually impossible to grade the answer to this card. Moreover, it doesn’t actually capture the useful thing I cared about when writing it, which was the difference between strong and weak AI. A card like “What is the difference between strong and weak AI?” would already be a marginal improvement, and even then we should pull out exactly what the key difference is and cue that. This card is difficult to repair.

Finding your bad cards

Anyway, when you’re writing cards regularly, you’re bound to write the occasional dud, and you won’t realize it’s a dud until you start reviewing! Your review data is the best indicator of your cards that need rewriting.

But rewriting cards is such a pain! Anki’s interface for editing cards in bulk is terrible.All due respect to the amazing people who maintain Anki and keep it open source — it’s just not built for bulk-editing cards.

Most people are familiar with the “leech” tag that Anki adds to cards you’ve failed 8 or more times. But Anki actually computes far more useful metrics than that. For example, if you open the card browser, right click on a column and enable the “retrievability” column, you can sort cards by how likely you are to recall them. When FSRS chooses how long to wait before it surfaces a card again, it combines retrievability with a difficulty and stability metric for the card.

I use the interval to rank cards by a simple cost: how often a card fails you, divided by how long Anki plans to wait before surfacing it again. The worst cards are not just the ones you keep getting wrong, they’re the ones you keep seeing over and over again. For those interested, I’ve included the exact metric at the end of this post.

A skill to clean up your flashcards

Fortunately, while Claude Code / ChatGPT’s Codex are not particularly gifted at writing cards (see the “memory machines” report by Ozzie Kirkby and Andy Matuschak),The skill uses the framework outlined in this report to "score" cards. Even good LLMs do not categorize the cards perfectly, but the framework is useful. they are great at analyzing cards, and they have a much nicer UI than Anki!

The Jordan Peterson 'clean up your room' meme, edited so 'room' is crossed out and 'flashcards' is scrawled above it.

I’ve been using Claude Code to talk to my flashcards for a while. In an email to Nate Meyvis (the developer of Zippyflash) back in April, I wrote:

If you’re not using Claude Code to talk to Anki via Ankiconnect, you’re missing out. :)

I stand by it.

I’m not the first person to point an LLM at Anki—there are a few skills out there already, and even a full Anki MCP server. But almost all of them are built around generating cards, which is the one thing LLMs are reliably bad at. I wanted a skill for fixing the cards that you (or your review data) already know are bad.

So I put together a skillAlthough this post was written entirely by me, the skill itself wasn't---it was written by Claude Opus 4.8 + ChatGPT Pro with my prompting. I've tested it and am confident that it can be used for its intended purpose, but as with any AI-written tool you should expect the occasional mistake. to make it easier for YOU to talk to your flashcards with an LLM. The most important thing about the skill is that it keeps you in control of your cards: you decide what gets rewritten, what gets split up, what gets filled out, and what gets deleted. The LLM is under strict instructions not to do anything to your flashcards without explicit approval! Here’s how it works:

Install it by opening Claude Code (or Codex) and typing: “Install this skill: https://github.com/djt97/anki-skill”.
Open Anki and make sure you have AnkiConnect installed.
Open a new chat window and type /anki (in Claude Code) or $anki (in Codex).
The LLM will run a “health check” on your flashcards.
1. As part of this, it will identify what conventions you use (I add “pink flags” to cards I see that I want to edit).
2. It will also ask you about your preferences for fixing cards that you’re struggling to remember.
Then you work collaboratively to fix those pesky cards that have been causing you problems!

Below is a short video of me using the skill for the first time in Claude Code.

The metric

For those interested, the cost of a card is:I was concerned that lapse rate and 1/interval would be collinear because “the cards you fail” are going to be exactly “the cards you see most often”. It turns out they capture different things: across my own collection their correlation is only about 0.1, because a card’s interval is driven mostly by how long it's been in your collection for, whereas its lapse rate is how often you’ve failed it over its whole history.

\[\text{cost} = \frac{\text{lapse rate}}{\text{interval}}\qquad\text{where lapse rate}=\frac{\text{lapses}}{\text{reviews}}\]

I had initially played with more sophisticated metrics combining different statistics that Anki records, until I realized that the interval given by the FSRS algorithm already bakes in how hard it thinks a card is for you (and this is optimized for your own review data).Anki’s FSRS scheduler estimates each card’s difficulty, stability, and retrievability, and you can sort by them in the card Browser (though I maintain that editing them in Anki is a pain). AnkiConnect unfortunately doesn’t expose those numbers, so the skill uses what it can get: lapses, reviews, and the interval set by FSRS. Of course, if you manually flag the cards you think are troubling (as I do), the exact details of the metric are less relevant to you. Moreover, there may even be cards you tend to get right, but feel you don’t understand. The FSRS parameters are not going to pick up on those cards, even though you want to edit them, hence why I manually flag.

I hope someone finds this useful!