Bring Your Own ChatGPT: A Free Tier That Actually Runs on GPT-5
Many of our users already pay OpenAI twenty dollars a month. For a long time our answer to that was a shrug and a pricing page: LorePanic called the API on our account, we passed the cost through as credits, you paid us too. That stopped feeling right. So we built a tier that routes every LLM call, every transcription, every live suggestion through the user's own ChatGPT Plus (or Pro) subscription, and charges them nothing. This post is about how.
The premise
OpenAI ships a terminal tool called Codex CLI.
When you run codex login, it opens a device-code auth flow against
auth.openai.com, and once you approve it on the web, the CLI makes LLM calls
on your behalf against an endpoint at chatgpt.com/backend-api/codex. No
separate API key. No separate bill. It just uses your ChatGPT subscription.
If a CLI can do it, a SaaS backend can too. The rest of this post is the engineering of sitting in the same seat Codex sits in: a device-auth flow we mirror, a Responses endpoint we proxy to, and a transcription path that forks into two different OpenAI products depending on whether you're doing batch or realtime.
We're not the first to try this. Open-source CLIs like OpenClaw shipped the ChatGPT-subscription OAuth path against Codex endpoints first, and OpenCode now offers a similar BYO-subscription experience through a Codex auth plugin. The twist for us was running the same flow inside a hosted SaaS instead of a local CLI. Tokens live on the server, not on the user's laptop, which changes a lot about refresh, storage, and failure modes.
One upfront caveat: this uses OpenAI's sign-in flow in a way that isn't blessed for third-party products in their public docs. We tell users so on the docs page, and if OpenAI ever pushes back we'll turn the tier off and move those users back to a paid plan. Everything below assumes you've read that and are here for the code, not the lawyering.
Step 1: the device-code dance
The Codex CLI's login is a three-request choreography against two different hosts. We reproduced it server-side so the user never leaves LorePanic.
-
POST https://auth.openai.com/api/accounts/deviceauth/usercodewith a PKCEcode_challenge. Response: auser_code(8 chars), adevice_auth_id, aninterval, and anexpires_atabout fifteen minutes out. -
The user opens
auth.openai.com/codex/device, signs in to ChatGPT, pastes the code, approves. -
Meanwhile we poll
POST /api/accounts/deviceauth/tokenwith thedevice_auth_idand the PKCEcode_verifier. It returns HTTP 200 with anauthorization_codeonce the user approves, or one of a handful of pending / expired / denied error codes until then. -
We hand that
authorization_code(plus the verifier) toPOST https://auth.openai.com/oauth/tokenwithgrant_type=authorization_code, and get back the real bundle:access_token,refresh_token,id_token.
The pending state has three distinct error codes that all mean the same thing
(authorization_pending, deviceauth_authorization_unknown,
deviceauth_authorization_pending), and you have to treat all of them as
"keep polling" or the flow drops on the first tick. Terminal states (expired, denied) have
their own pairs. RFC 8628 describes one shape; OpenAI returns a different one nested under
an error object. We handle both.
Pending sessions live in a Postgres table keyed by user id, unique-constrained, and pruned
lazily on every new start(). That means the flow survives a backend restart
and works across workers. Restarting the flow from the UI is also the legal way to recover
from a stuck session: delete the row, POST usercode again.
Step 2: storing and refreshing the token
The interesting fields don't come from the OAuth response. They come from the
id_token. It's an unsigned-from-our-perspective JWT; we base64-decode the
payload without verifying (OpenAI is the authority, not us) and pull three ChatGPT-specific
claims out of a nested key called https://api.openai.com/auth:
chatgpt_plan_type. plus, pro, or the empty string. This is the field that tells us the user actually has a subscription we can ride on.chatgpt_account_id. Sent on every downstream request as achatgpt-account-idheader. Without it, the Responses endpoint returns 403.email. Purely cosmetic, we show it in the "Connected as" line.
Access tokens, refresh tokens, and id tokens are all encrypted at rest with a key held
separately from the database, and decrypted into memory only at the moment of use. When an
access token is within sixty seconds of expiry we do a refresh
(grant_type=refresh_token against the same /oauth/token endpoint)
and re-persist. A 400 or 401 from that endpoint means the user
either unlinked LorePanic from their OpenAI account or let their subscription lapse. We
mark the row as revoked and the UI asks them to reconnect.
Step 3: routing chat through Codex's Responses endpoint
The chat agent is the easy part, if only because we didn't want to rewrite LangChain's
streaming-plus-tool-calling state machine. We subclass langchain_openai.ChatOpenAI,
point base_url at https://chatgpt.com/backend-api/codex, set
use_responses_api=True, and pass the user's access_token as the
API key. That gets you 90% of the way.
The other 10% is two quirks the Codex endpoint has vs. OpenAI's public Responses API:
- A top-level
instructionsfield is required. Leave it off and you get a 400. Our default fallback is "You are LorePanic, an AI assistant for tabletop RPG Game Masters." - Messages with
role: "system"are rejected outright with a 400 saying "System messages are not allowed". LangChain, left to its own devices, inlinesSystemMessageobjects as input items. So we intercept_get_request_payload, pull everySystemMessageout of the input list, concatenate their content, and promote the result to the top-levelinstructions.
Headers matter a lot. The request must look exactly like Codex CLI: User-Agent
set to Codex's string, originator: codex_cli_rs, a fresh UUID as
session_id per request, OpenAI-Beta: responses=experimental.
Deviate on any of these and Cloudflare returns a challenge page instead of a 4xx, so you
get an HTML body where you expected JSON and your client blows up three layers deep. We
learned that the slow way.
One constraint we can't route around: the model slug is fixed. The endpoint accepts
gpt-5.4 and that's it. On the BYO tier the model dropdown in the UI is
disabled. Fair trade.
Step 4: transcription, which is actually two problems
This was the surprise. ChatGPT's backend has a transcription endpoint, but only batch. OpenAI's public API has a totally separate realtime transcription product. On the BYO tier we need both: a full session is three hours of audio, and live suggestions need sub-second turnaround. So we fork.
Batch: POST /backend-api/transcribe
The batch path is a plain multipart upload of a WAV chunk to
chatgpt.com/backend-api/transcribe, same bearer token, same Codex header
fingerprint. The response is {text, asset_pointer, asset_ttl, asset_format}.
Two things to know:
- No segment-level timestamps. Voxtral's batch model gives us per-segment timing; this endpoint doesn't. That means the BYO recap loses the "jump to this moment in the audio" feature. The full narrative transcript is still there. You just can't seek.
- Auth errors are load-bearing. A 401 or 403 from this endpoint isn't a
transient network blip; it means the access token died between requests. We raise a
specific
CodexTranscribeAuthErrorso the session manager triggers a reconnect flow instead of retrying forever.
Realtime: a different host entirely
For live transcription there's no hidden ChatGPT backend endpoint, so we go to OpenAI's
public Realtime API at api.openai.com. But the user's token is a ChatGPT
access token, not an OpenAI API key. So there's a two-step dance:
-
POST https://api.openai.com/v1/realtime/client_secretswith the ChatGPT access token and a body of{"session": {"type": "transcription"}}. Response: an ephemeral key prefixedek_that's valid for a single short session. -
Open a WebSocket to
wss://api.openai.com/v1/realtime?intent=transcriptionwith that ephemeral key as theAuthorizationheader. The first message we send is asession.updatethat nails down the transcription model (gpt-4o-transcribe). After that,input_audio_buffer.appendframes in,transcription.deltaevents out.
OpenAI's Realtime transcription refuses any audio below 24 kHz, but the GM's browser and
the Discord bot both hand us 16 kHz PCM. Rather than forcing clients to change, we upsample
server-side with audioop.ratecv, carrying its state variable between chunks so
there are no boundary artefacts between the 20 ms frames.
We could have used the realtime path for both live and batch by buffering audio and driving one long session, but batch is cheaper and more accurate against the ChatGPT backend, so we keep them split. A session ends, the live socket closes, the batch pass runs against the saved WAV, and the two transcripts are merged. The realtime one becomes scrollback. The batch one becomes canon.
What it feels like to build against an undocumented endpoint
Most of our debugging story is in a directory called prototypes/codex_oauth/
that nobody on the team wants to delete. A dozen tiny scripts (probe.py,
probe_device.py, probe_realtime_session.py,
probe_long.py), each one the smallest reproducible case for some new header
or field we had to discover empirically. They exist because:
- Error responses are non-uniform. One endpoint returns
{"error": "code"}, another nests it as{"error": {"code": "..."}}, a third returns a Cloudflare HTML page with a 403. - Header fingerprints are load-bearing and undocumented. Changing a
User-Agenttohttpx/0.27.0because that's the default turns 200s into 403s. - The "right" shape drifts between the beta and the GA endpoints. The realtime session minter in particular returns a totally different JSON shape in GA (
{"value": "ek_..."}) than in beta ({"client_secret": {"value": "ek_..."}}). We accept both.
None of this is secret. The Codex CLI source is open and we read it. But none of it is
stable either. Every API call in this path has a comment pointing at the prototype script
that validated it, so when OpenAI changes something underneath us, the repro is one
python invocation away from the fix.
What this means for pricing
LorePanic has a credit system. Every API call is metered, we apply a markup over the raw
provider cost (the multiple depends on the call type; transcription is much lower than
chat), and users pay out of a balance that refills on their plan. The BYO tier is the one
place that accounting shuts off entirely. When a user's
preferred_llm_provider is set to codex_oauth, the credit deducter
is a no-op. OpenAI bills them for their Plus subscription; we bill them zero.
That only works because the incremental cost to us for a BYO user is genuinely close to zero. We run some Postgres, we proxy some HTTP, we don't pay a provider per token. If everyone flipped to BYO tomorrow we'd still make payroll. The paid tiers exist for users who don't want to connect an OpenAI account, who want word-by-word latency (Voxtral), who want to pick a cheaper model for the agent, or who want segment timestamps in their batch transcripts.
What's next
- Google / Anthropic equivalents. Neither has a device-code flow against a consumer subscription in the way ChatGPT does, but Gemini Advanced has an API path we're exploring, and Claude.ai's Max plan is moving in a similar direction. If it becomes feasible we'll offer the same "connect your account, pay us nothing" trade.
- Streaming tool calls on the BYO tier. Today the agent on Codex's Responses endpoint streams tokens fine but buffers tool-call arguments until complete, which makes multi-step agent turns feel draggy. We think this is fixable on our side.
- Health-check crons. Right now we notice a revoked token when a call fails. We'd rather notice the night before, with a background sweep that pings each connection cheaply and surfaces "reconnect before your next session" prompts ahead of time.
Already paying OpenAI $20/month? Stop paying us too.
Connect your ChatGPT Plus