Your Table Keeps Playing. We Take Notes.
Running a tabletop session is already a full-time job. Scribbling notes, looking up rules mid-fight, remembering which NPC owes whom money, that shouldn't also be on your plate. So we built a Discord bot that sits in your voice channel, listens to the whole session, and hands you back a searchable transcript, live suggestions while you play, and a structured recap when you stop.
Why we built this
Every GM we've talked to describes the same moment at the table. A player asks "wait, didn't the innkeeper say his brother moved to Neverwinter?" and you're suddenly scrolling through last week's notes, or bluffing, or both. Rules questions are worse. Nobody wants to alt-tab out of the fiction for two minutes to find the grappling rules while four people wait.
We already had document search and an AI agent that knows your campaign. But the agent is only useful if you remember to ask it. During an actual session, the GM is too busy being the world to notice they could look something up. What was missing was something that listens on its own and quietly points at the right answer before you finish the question.
The stack, end to end
There are three moving pieces: a transcriber, a suggestions engine, and a note generator. They share the same audio stream but run on different cadences and different models.
Real-time transcription with Mistral Voxtral
The backend exposes a WebSocket endpoint that accepts raw PCM audio (16-bit signed, mono,
16 kHz). Every chunk is forwarded to Voxtral, Mistral's streaming
speech-to-text model (voxtral-mini-transcribe-realtime-2602). Voxtral streams
back four kinds of events: partial word-level deltas, confirmed sentence-level segments
with timestamps, a detected-language signal, and a final done event. Partial deltas arrive
in well under a second, which is the whole point. The AI needs something to react to while
the table is still talking.
The session manager persists segments to the database every two minutes during a live
session, so a three-hour game doesn't depend on the WebSocket surviving for three hours.
When the session ends we also run a pass with the higher-quality batch model
(voxtral-mini-transcribe-2507) on the full audio, chunked into ten-minute
pieces with fifteen-second overlaps and transcribed five chunks at a time. The real-time
transcript is for reactivity; the batch transcript is the one that becomes canon.
Live suggestions with ministral-3b
Every eight seconds, the recent transcript window (about a minute) plus a rolling summary of the session so far is fed to ministral-3b, a small and very fast Mistral model. It returns zero to three JSON suggestions. Each suggestion is tagged as a question, an entity, or a rule, and shows up in the GM's browser as a clickable card.
- Questions are things the AI thinks the table is about to need answered, like "What's the AC of a bulette?" or "Does the innkeeper know about the heist?"
- Entities are NPCs, locations, or items the agent picked out of the conversation that might be worth pulling up from your campaign documents.
- Rules are rule lookups triggered by phrases like "can I grapple him?" or "what's the save DC?"
Separately, every thirty seconds a larger model (mistral-small-latest) updates
the rolling summary. That's what keeps suggestions coherent as a session goes long. The 3b
model doesn't need to re-read the whole transcript; it just gets the latest summary plus
the last minute of speech. That split is what makes sub-second suggestion latency actually
affordable.
We picked ministral-3b over a larger model on purpose. A suggestion that arrives thirty seconds late is worse than useless. The table has moved on. A small fast model that nails the obvious cases beats a big slow model that answers the wrong question.
Structured session notes
When the session ends, the full transcript plus the campaign context (name, system, known
cast, exact character spellings) goes to an LLM with a Pydantic schema pinned to the
response. The output is a structured StructuredNotes object with typed fields,
not free-form markdown:
- A short summary (2β3 sentences) for the "last time onβ¦" recap.
- A long summary (2β3 paragraphs of narrative).
- Key events in chronological order.
- NPCs, locations, combats, loot, and spells that actually came up.
- Plot developments, open threads, and player decisions.
Because it's structured JSON instead of free text, the frontend renders it as a grid of cards, one per category, instead of a wall of markdown. Open threads from session 7 are still visible in session 12. Loot gets tracked per adventure. A model that forgets a field gets caught by Pydantic validation instead of silently producing garbage.
The Discord bot
Almost everyone we talked to was already on Discord. Asking them to also run a desktop recorder (mic plugged in, permissions granted, the right window in focus) was more friction than the feature was worth. So we shipped the whole pipeline as a Discord bot.
You invite LorePanic to your server, link your Discord account to your
LorePanic account with /lp-link, then run /lp-record start from
a voice channel. The bot joins, goes self-muted but self-undeafened, and subscribes to
every speaker in the channel via @discordjs/voice.
Each speaker's audio arrives as Opus packets at 48 kHz stereo. The bot decodes them with
prism-media, mixes all speakers into a single mono stream with additive mixing
and soft clipping (so two people talking at once doesn't distort), downsamples to 16 kHz,
and chunks the stream into 20 ms frames. Those frames go out over a WebSocket to the same
backend endpoint the browser uses. Authentication is a bot API key plus the Discord user ID
of the GM who started the session, which the backend uses to tie the stream to the right
LorePanic user and campaign.
The browser-side suggestions panel opens a read-only companion WebSocket to the same session. The GM sees transcript, suggestions, and the rolling summary update live in a tab next to their campaign, while Discord does what it already does well: carry the voice.
Why mix in the bot rather than on the server
We considered sending each speaker as a separate stream and mixing server-side. It would give us per-speaker transcripts "for free." But Voxtral's real-time endpoint expects one stream, credit accounting gets weirder when a session has seven audio channels, and most of the questions the suggestions engine cares about ("what did we just say about the warlock?") don't care who said it. We left per-speaker diarization for a future pass on the batch transcript, where we have more time to spend and a bigger model to spend it on.
What it feels like at the table
You start the session. You forget the bot is there. Around the ninety-second mark, a card appears in the corner of your laptop: "Open the stat block for Sir Vandemar?" because a player just asked how tough Sir Vandemar looks. You click it, the entry slides in, you read off the description. Two minutes later: "Check rule: grappling an ogre". You tap it, the rule appears, you answer, play keeps moving.
At the end of the session you run /lp-record stop. The bot leaves the channel.
A few minutes later you get a notification: session notes are ready. You open the dashboard
and there's a two-sentence recap, a full narrative summary, every NPC that spoke, every
combat that happened, the loot table, the open threads you now have to remember to resolve
next week. You copy the short summary into your Discord #campaign-log channel and
go to bed.
What's next
We're working on a few extensions to this feature:
- Per-speaker transcripts on the batch pass, so the recap can quote players by name.
- Voice commands for the bot itself, e.g. "LorePanic, what did the innkeeper say about his brother?" without breaking immersion to type.
- Automatic campaign-note diffs, where the agent updates your canonical NPC and location pages directly instead of leaving it to the GM.
- In-person mode: a web recorder for tables that play around a physical table, using the same pipeline.
All of this runs on the credit system, so heavy users pay for their own Voxtral and Mistral usage and we don't have to rate-limit anyone into a worse experience. Free tier gives you enough to transcribe a full session with room to spare.
Let a bot take the notes.
Try LorePanic for free