Building a Language-Learning App from Scratch
Terminology
When I use SRS, I mean "spaced repetition system"
The Problem
I've been learning foreign languages for a while. And what you notice after having internalized the basics (grammar, conjugations), is that what's left is a seemingly insurmountable steep curve of vocabulary that comprises the bulk of your actual time spent learning the language. And in the past, it was much more difficult to climb up this wall. It often took manual reading, clipping sentences, saving words, and creating flashcards.
The standard method to go about doing it was tedious. In the beginning, one had to look up definitions, but even then, finding the distinct examples was hard (reverso.net simply showed uses, but this could be duplicates, or incomplete), and the definitions in the dictionary wouldn't always give you the right "senses." For example, let's take English as an example, with the verb "lead."
He leads the group (standard use case)
This road leads to Rome (leads in the sense of a path leading to a place)
This led to me looking through the library on a Saturday (as in "this resulted in," a more figurative use case)
They were leading 5 to 0 (in the sports context)
She's led a sheltered lifestyle (suffixed with the particular noun "life" or "lifestyle" afterwards)
He's been led astray (in a particular phrase, in combination with "astray")
He's been led on! (phrasal verb, led on)
The events that led up to (phrasal verb)
The article leads with an anecdote from the author (in the sense of an article beginning with something)
It's led me to believe that… (in the sense of to make someone do something)
This would've have been very difficult for a non-native speaker to gather. But later came AI, which solved many of these problems. Now, it's much easier to generate such a list. So I thought - what if we had an SRS that was integrated with text reading itself, so one can see which words one has already learned, and easily add new ones?
The gap I kept running into was this: Anki gives you great spaced-repetition scheduling, but it disconnects words from context. Duolingo gamifies learning but doesn't let you study words you've actually encountered. I wanted something in between — a tool that lets me build a deck from real texts I read, and then quizzes me on those exact words in context (and could generate other contexts for that word, for comparison purposes).
This is a write-up of what I built, the technical choices that mattered, and what I'd do differently.
What it does
The core loop is simple:
- 1.
You upload a text (an article, a chapter, anything).
- 2.
The app highlights the words in the text that are already in your deck.
- 3.
You can select any word or phrase and add it to your deck — with AI-generated example sentences and usage notes.
- 4.
Every day, the app shows you words due for review. You see the word and its example sentences, try to recall it, then self-rate (Again / Hard / Good / Easy).
- 5.
The SRS engine schedules the word further into the future based on how well you recalled it.
- 6.
You can choose manual review if wanted, because the SRS methodology is not always.
Supporting features: lemmatized search across your text corpus, multi-language support (French, Spanish, English, German), multi-deck management, and a subscription paywall.
Stack
Backend
Choice: FastAPI + Uvicorn
I wanted automatic request validation without writing it myself — defining a Pydantic model for request bodies and getting 422 errors for free is a better deal than the equivalent in Flask.
The primary alternatives are Django and Flask.
And why Python?
;
Auth
Choice: Clerk (JWT)
I didn't want to build login, sessions, OAuth flows, or magic links. Clerk handles all of that; I just validate JWTs on the backend and extract the user ID from the claims. Worth the cost for a solo project.
What is JWT alternatives?
;
;
Billing
Choice: Stripe
The standard choice for subscription payments. The Checkout and Customer Portal flows mean I don't build any payment UI myself.
Database
Choice: SQLite
The app runs on a single machine and the data is one user's vocabulary deck — there's no concurrency story that would justify the operational overhead of Postgres. SQLite with WAL mode handles concurrent reads fine, and the migration system is dead simple.
NLP
Choice: spaCy
spaCy is a "natural language processing" library in Python. It can convert between word forms, such as "have," "haves," "having," etc.
The app needs lemmatization — matching "découvert" to the deck entry for "découvrir" — and spaCy does this well across multiple languages with pre-trained models. The tradeoff is operational: the models aren't bundled with the package and have to be downloaded explicitly in the Dockerfile.
LLM
Choice: OpenAI GPT-4o-mini (pluggable)
Good enough for content generation, cheap, and fast. I wrapped the LLM call in a pluggable client interface so the app can swap in Anthropic or a local Ollama instance without changing calling code. During development I used Ollama to avoid burning API credits.
Frontend
Choice: React 19 + Vite + TanStack Query
Vite for fast dev builds, TanStack Query for server state — caching canonicalization results with
staleTime: Infinityso clicking the same word twice doesn't hit the API again.React-Query has many uses. In this case, . There's the separation of state as well for this.
canonicalization
;
;
;
;
Future could be, ;
CSS modules, why
React Router
Protected routes
Vite
Tree-shaking
Accessibility
Security
SSR - no SSR here
SSR's main benefits are SEO, first contentful paint, open graph - but we don't use these
HSTS
REST
Caching
bfcache
Error handling, network resilience
Source maps
Future: Redux, MUI, or Zustand
Deployment
Choice: Fly.io (single container)
Single container, persistent volume for SQLite and the text corpus, auto-stop/start machines to keep costs near zero when idle.
SQL
Choice: plain SQLite
Why?
In the future, one would have to switch to Postgres eventually. Why?
ORMs
How to Use
The app is organized into eight pages, accessible from the sidebar.
Texts — a possible starting point. Upload plain text files (articles, book chapters, transcripts, anything). The app renders the text with your known vocabulary highlighted inline. Click a highlighted word to see its definition and use cases. Select any word or phrase in the text to add it to your deck or attach it as a new use case for an existing word.
See Words — another potential starting point. Your full vocabulary list. Add words manually, edit notes and definitions, manage use cases, or trigger AI generation of use cases for a word. This alternative interface allows you to add word by word, instead of selecting from a text. This is where you build and curate the deck outside of a reading session.
Add Sentences — enter a sentence and click individual words to canonicalize them (resolve inflected forms to their dictionary entry) and look up their notes and use cases. Useful for processing sentences you encounter outside the app. I would say, a third entry point.
Review — the daily SRS queue. Words due today are presented one at a time. You see the word and its use cases, try to recall it, then reveal the notes and self-rate: Again, Hard, Good, or Easy. The SRS engine reschedules each word based on your rating.
Manual Review — the same interface as Review, but loads all words in the deck regardless of due date and does not record ratings. Use this to browse or self-test without affecting the SRS schedule.
Search — lemmatized search across your uploaded texts. Type a word in any form and the app finds every sentence in your corpus where that word (or any inflection of it) appears.
Table — a sortable list of all words with their SRS metadata: due date, interval, ease factor, streak. Useful for getting an overview of where your deck stands.
Settings — deck management (create new decks, switch the active deck) and subscription management (subscribe, view status, open the Stripe customer portal).
The SRS engine
Spaced repetition is based on the insight that you should review something just before you're about to forget it. Review too soon and you're wasting time; review too late and you've already forgotten.
The algorithm I implemented is SM-2 (SuperMemo 2), which is the same algorithm Anki uses. Each word has two key numbers: an ease factor (a multiplier representing how difficult the word is for you) and an interval (days until next review). After every review:
Again: reset the interval to 1 day, reduce ease factor.
Hard: multiply the interval by 0.85, reduce ease factor slightly.
Good: multiply the interval by the ease factor (standard progression).
Easy: multiply the interval by 1.3 × ease factor, increase ease factor.
The ease factor is clamped at a minimum of 1.3 — words can only get so hard. After a few months, a word you know well might have an interval of 100+ days. You barely think about it.
The state for each word (interval, ease factor, due date, streak, repetitions) lives in an srs_state table. Every review is also logged to a separate reviews table — append-only history — so I can build analytics later without losing data.
I don't think the particularities of the learning algorithm are as important as the generation of the cards, but it's important to elucidate how precisely it works. The learning algorithm isn't changed from Anki, but understanding how it calculates and its principles of operation is important.
NLP: lemmatization and why it matters
When you search for "running" in a text, you usually want to find "run", "runs", "ran" too. This is lemmatization — reducing a word to its dictionary form.
I used spaCy for this. Each language has a separate model (fr_core_news_sm, es_core_news_sm, etc.) that gets loaded lazily on first use and cached in memory. On top of that, I pre-build a lemma map per deck — a dictionary from lemma string to word record — so that highlight and search queries can run fast without re-tokenizing everything.
# Simplified highlight logic
def highlight(text, deck_id):
nlp = _get_nlp(active_deck.language)
lemma_map = _get_lemma_map(deck_id) # lemma → {word_id, word}
doc = nlp(text)
return [
{"start": token.idx, "end": token.idx + len(token), "word_id": lemma_map[token.lemma_]["word_id"]}
for token in doc
if token.lemma_ in lemma_map
]
The lemma map gets invalidated whenever a word is added or deleted from the deck. It's rebuilt on next use. Simple and effective.
One complication: the spaCy models aren't bundled with the package — you have to download them separately. In the Dockerfile, this meant explicitly running python -m spacy download fr_core_news_sm (and the other three language models) at build time. That's easy to miss, and I did miss it for the non-French models initially.
An alternative would be to use a RAG, or vector DB.
Lemmatization shows up in three places in the app:
Texts — when rendering a text, every token is lemmatized and looked up in the deck's lemma map. If the lemma matches a known word, that token gets highlighted. This means "découvrons" lights up because "découvrir" is in the deck, even though the exact string doesn't appear.
Search — the query is lemmatized before matching. Type "led" and the search finds sentences containing "lead", "leads", "leading", "led" — any form that shares the same lemma. The corpus sentences are also lemmatized at search time, so the match is lemma-to-lemma rather than string-to-string.
Add Sentences — when you click a word in the entered sentence, the app lemmatizes it to find the canonical form before looking it up in the deck. This is the first step of canonicalization (before the spaCy POS logic and Ollama fallback kick in for multi-word phrases).
Canonicalization: what form should I look up?
When a user selects a word in a text and wants to add it to their deck, they're probably looking at an inflected form — "découvert" instead of "découvrir". I need to canonicalize it to its dictionary form before adding it to the deck.
I built a two-tier approach:
- 1.
spaCy first: Fast, local, free. For single tokens, return the lemma. For multi-word phrases, strip function words and reflexive pronouns, then pick the most semantically significant word by POS priority (VERB > NOUN > ADJ > ADV).
- 2.
LLM fallback: If spaCy returns nothing useful (common with complex phrases), fall back to a local LLM with a language-aware prompt.
Right now there's only 1 LLM being used, however, in the future, I would separate between a cheap LLM for canonicalization, and a more expensive and sophisticated one for other prompts (like sense generation).
AI content generation
Use case generation (/api/generate): Given a word, ask the LLM to produce a comprehensive set of example sentences covering all meaningful senses — like a lexicographer would. Low temperature (0.2), structured JSON output. The prompt is explicit about what to include: core meanings, prepositional constructions, reflexive forms, idiomatic phrases.
I'm using OpenAI GPT-4o-mini — good enough for this task, cheap, fast. But I wrapped the LLM call in a pluggable client interface so the app can use Anthropic's API or a local Ollama instance without changing any calling code. This came in handy during development when I was running everything locally.
Multi-user design and auth
The app is multi-user with a subscription paywall. Auth is handled by Clerk — I just validate JWTs on the backend with python-jose, extract the user ID from claims["sub"], and use that to scope all data.
Data ownership flows through a chain: subscriptions.user_id → decks.user_id → words.deck_id → srs_state.word_id. Endpoints that accept a word_id directly verify ownership before operating — there's an explicit _get_word_for_user() check rather than trusting the client.
Billing is Stripe subscriptions. The flow:
- 1.
User clicks Subscribe → create a Stripe Checkout session → redirect.
- 2.
Payment completes → Stripe fires a webhook → backend writes
status = "active"tosubscriptionstable. - 3.
Every API call (except deck endpoints and billing endpoints) checks
require_subscription, which chains off JWT verification.
Deployment
Single container on Fly.io. FastAPI serves both the API and the built React frontend. SQLite and the text corpus live on a persistent Fly.io volume at /data.
The multi-stage Dockerfile builds the React frontend with Node first, then copies the dist/ into the Python image. VITE_CLERK_PUBLISHABLE_KEY is a Docker build arg (it's a public key — safe to bake into the bundle at build time).
SQLite on a single volume is not backed up automatically. Litestream → S3 would fix this and is on the to-do list.
What I'd do differently
Text corpus in the database. Right now, texts are stored as .txt files in a per-user directory. It works, but it means backups require copying both the database and the files directory. Moving texts into a texts table would simplify everything.
What I learned
spaCy is powerful but has operational surface area. The models are separate downloads that have to be explicitly included in your Docker image. It's easy to end up with the model working locally but missing in production.
SQLite is underrated for this kind of app. Multi-user, subscriptions, SRS state — all of it works fine in SQLite on a single machine. The migration system I built (numbered .sql files, applied once and tracked in a schema_migrations table) is dead simple and has never given me trouble.
The LLM pluggability paid off. Being able to swap between OpenAI, Anthropic, and local Ollama during development meant I wasn't burning API credits constantly. The CompletionClient protocol is maybe 10 lines of code and saved a lot of friction.
Clerk + Stripe is a pretty good combination. Clerk handles all the auth complexity (OAuth, magic links, session management), and the JWT validation on the backend is straightforward. Stripe's webhook model is reliable once you understand that local testing uses a different webhook secret than production.
What's next
A few things I want to build:
Use-case-informed quizzes: pass the stored example sentences as context into the quiz prompt, so the LLM can generate harder, more targeted questions.
RAG over the text corpus: the lemmatization and corpus infrastructure is in place; adding a vector index would enable semantic retrieval of relevant passages as quiz context.
Analytics: the
reviewstable has a complete history of every review session. There's a lot of interesting data in there about learning curves and difficult words.Terraform the usage of Terraform
AI tutor having an AI tutor chatbot that can know what you've already done, what you need to work on