Bring Your Own ChatGPT: A Free Tier That Actually Runs on GPT-5

April 24, 2026 · LorePanic Team

Many of our users already pay OpenAI twenty dollars a month. For a long time our answer to that was a shrug and a pricing page: LorePanic called the API on our account, we passed the cost through as credits, you paid us too. That stopped feeling right. So we built a tier that routes every LLM call, every transcription, every live suggestion through the user's own ChatGPT Plus (or Pro) subscription, and charges them nothing. This post is about how.

The premise

OpenAI ships a terminal tool called Codex CLI. When you run codex login, it opens a device-code auth flow against auth.openai.com, and once you approve it on the web, the CLI makes LLM calls on your behalf against an endpoint at chatgpt.com/backend-api/codex. No separate API key. No separate bill. It just uses your ChatGPT subscription.

If a CLI can do it, a SaaS backend can too. The rest of this post is the engineering of sitting in the same seat Codex sits in: a device-auth flow we mirror, a Responses endpoint we proxy to, and a transcription path that forks into two different OpenAI products depending on whether you're doing batch or realtime.

We're not the first to try this. Open-source CLIs like OpenClaw shipped the ChatGPT-subscription OAuth path against Codex endpoints first, and OpenCode now offers a similar BYO-subscription experience through a Codex auth plugin. The twist for us was running the same flow inside a hosted SaaS instead of a local CLI. Tokens live on the server, not on the user's laptop, which changes a lot about refresh, storage, and failure modes.

One upfront caveat: this uses OpenAI's sign-in flow in a way that isn't blessed for third-party products in their public docs. We tell users so on the docs page, and if OpenAI ever pushes back we'll turn the tier off and move those users back to a paid plan. Everything below assumes you've read that and are here for the code, not the lawyering.

Step 1: the device-code dance

The Codex CLI's login is a three-request choreography against two different hosts. We reproduced it server-side so the user never leaves LorePanic.

POST https://auth.openai.com/api/accounts/deviceauth/usercode with a PKCE code_challenge. Response: a user_code (8 chars), a device_auth_id, an interval, and an expires_at about fifteen minutes out.
The user opens auth.openai.com/codex/device, signs in to ChatGPT, pastes the code, approves.
Meanwhile we poll POST /api/accounts/deviceauth/token with the device_auth_id and the PKCE code_verifier. It returns HTTP 200 with an authorization_code once the user approves, or one of a handful of pending / expired / denied error codes until then.
We hand that authorization_code (plus the verifier) to POST https://auth.openai.com/oauth/token with grant_type=authorization_code, and get back the real bundle: access_token, refresh_token, id_token.

The pending state has three distinct error codes that all mean the same thing (authorization_pending, deviceauth_authorization_unknown, deviceauth_authorization_pending), and you have to treat all of them as "keep polling" or the flow drops on the first tick. Terminal states (expired, denied) have their own pairs. RFC 8628 describes one shape; OpenAI returns a different one nested under an error object. We handle both.

Pending sessions live in a Postgres table keyed by user id, unique-constrained, and pruned lazily on every new start(). That means the flow survives a backend restart and works across workers. Restarting the flow from the UI is also the legal way to recover from a stuck session: delete the row, POST usercode again.

Step 2: storing and refreshing the token

The interesting fields don't come from the OAuth response. They come from the id_token. It's an unsigned-from-our-perspective JWT; we base64-decode the payload without verifying (OpenAI is the authority, not us) and pull three ChatGPT-specific claims out of a nested key called https://api.openai.com/auth:

chatgpt_plan_type. plus, pro, or the empty string. This is the field that tells us the user actually has a subscription we can ride on.
chatgpt_account_id. Sent on every downstream request as a chatgpt-account-id header. Without it, the Responses endpoint returns 403.
email. Purely cosmetic, we show it in the "Connected as" line.

Access tokens, refresh tokens, and id tokens are all encrypted at rest with a key held separately from the database, and decrypted into memory only at the moment of use. When an access token is within sixty seconds of expiry we do a refresh (grant_type=refresh_token against the same /oauth/token endpoint) and re-persist. A 400 or 401 from that endpoint means the user either unlinked LorePanic from their OpenAI account or let their subscription lapse. We mark the row as revoked and the UI asks them to reconnect.

Step 3: routing chat through Codex's Responses endpoint

The chat agent is the easy part, if only because we didn't want to rewrite LangChain's streaming-plus-tool-calling state machine. We subclass langchain_openai.ChatOpenAI, point base_url at https://chatgpt.com/backend-api/codex, set use_responses_api=True, and pass the user's access_token as the API key. That gets you 90% of the way.

The other 10% is two quirks the Codex endpoint has vs. OpenAI's public Responses API:

A top-level instructions field is required. Leave it off and you get a 400. Our default fallback is "You are LorePanic, an AI assistant for tabletop RPG Game Masters."
Messages with role: "system" are rejected outright with a 400 saying "System messages are not allowed". LangChain, left to its own devices, inlines SystemMessage objects as input items. So we intercept _get_request_payload, pull every SystemMessage out of the input list, concatenate their content, and promote the result to the top-level instructions.

Headers matter a lot. The request must look exactly like Codex CLI: User-Agent set to Codex's string, originator: codex_cli_rs, a fresh UUID as session_id per request, OpenAI-Beta: responses=experimental. Deviate on any of these and Cloudflare returns a challenge page instead of a 4xx, so you get an HTML body where you expected JSON and your client blows up three layers deep. We learned that the slow way.

One constraint we can't route around: the model slug is fixed. The endpoint accepts gpt-5.4 and that's it. On the BYO tier the model dropdown in the UI is disabled. Fair trade.

Step 4: transcription, which is actually two problems

This was the surprise. ChatGPT's backend has a transcription endpoint, but only batch. OpenAI's public API has a totally separate realtime transcription product. On the BYO tier we need both: a full session is three hours of audio, and live suggestions need sub-second turnaround. So we fork.

Batch: POST /backend-api/transcribe

The batch path is a plain multipart upload of a WAV chunk to chatgpt.com/backend-api/transcribe, same bearer token, same Codex header fingerprint. The response is {text, asset_pointer, asset_ttl, asset_format}. Two things to know:

No segment-level timestamps. Voxtral's batch model gives us per-segment timing; this endpoint doesn't. That means the BYO recap loses the "jump to this moment in the audio" feature. The full narrative transcript is still there. You just can't seek.
Auth errors are load-bearing. A 401 or 403 from this endpoint isn't a transient network blip; it means the access token died between requests. We raise a specific CodexTranscribeAuthError so the session manager triggers a reconnect flow instead of retrying forever.

Realtime: a different host entirely

For live transcription there's no hidden ChatGPT backend endpoint, so we go to OpenAI's public Realtime API at api.openai.com. But the user's token is a ChatGPT access token, not an OpenAI API key. So there's a two-step dance:

POST https://api.openai.com/v1/realtime/client_secrets with the ChatGPT access token and a body of {"session": {"type": "transcription"}}. Response: an ephemeral key prefixed ek_ that's valid for a single short session.
Open a WebSocket to wss://api.openai.com/v1/realtime?intent=transcription with that ephemeral key as the Authorization header. The first message we send is a session.update that nails down the transcription model (gpt-4o-transcribe). After that, input_audio_buffer.append frames in, transcription.delta events out.

OpenAI's Realtime transcription refuses any audio below 24 kHz, but the GM's browser and the Discord bot both hand us 16 kHz PCM. Rather than forcing clients to change, we upsample server-side with audioop.ratecv, carrying its state variable between chunks so there are no boundary artefacts between the 20 ms frames.

We could have used the realtime path for both live and batch by buffering audio and driving one long session, but batch is cheaper and more accurate against the ChatGPT backend, so we keep them split. A session ends, the live socket closes, the batch pass runs against the saved WAV, and the two transcripts are merged. The realtime one becomes scrollback. The batch one becomes canon.

What it feels like to build against an undocumented endpoint

Most of our debugging story is in a directory called prototypes/codex_oauth/ that nobody on the team wants to delete. A dozen tiny scripts (probe.py, probe_device.py, probe_realtime_session.py, probe_long.py), each one the smallest reproducible case for some new header or field we had to discover empirically. They exist because:

Error responses are non-uniform. One endpoint returns {"error": "code"}, another nests it as {"error": {"code": "..."}}, a third returns a Cloudflare HTML page with a 403.
Header fingerprints are load-bearing and undocumented. Changing a User-Agent to httpx/0.27.0 because that's the default turns 200s into 403s.
The "right" shape drifts between the beta and the GA endpoints. The realtime session minter in particular returns a totally different JSON shape in GA ({"value": "ek_..."}) than in beta ({"client_secret": {"value": "ek_..."}}). We accept both.

None of this is secret. The Codex CLI source is open and we read it. But none of it is stable either. Every API call in this path has a comment pointing at the prototype script that validated it, so when OpenAI changes something underneath us, the repro is one python invocation away from the fix.

What this means for pricing

LorePanic has a credit system. Every API call is metered, we apply a markup over the raw provider cost (the multiple depends on the call type; transcription is much lower than chat), and users pay out of a balance that refills on their plan. The BYO tier is the one place that accounting shuts off entirely. When a user's preferred_llm_provider is set to codex_oauth, the credit deducter is a no-op. OpenAI bills them for their Plus subscription; we bill them zero.

That only works because the incremental cost to us for a BYO user is genuinely close to zero. We run some Postgres, we proxy some HTTP, we don't pay a provider per token. If everyone flipped to BYO tomorrow we'd still make payroll. The paid tiers exist for users who don't want to connect an OpenAI account, who want word-by-word latency (Voxtral), who want to pick a cheaper model for the agent, or who want segment timestamps in their batch transcripts.

What's next

Google / Anthropic equivalents. Neither has a device-code flow against a consumer subscription in the way ChatGPT does, but Gemini Advanced has an API path we're exploring, and Claude.ai's Max plan is moving in a similar direction. If it becomes feasible we'll offer the same "connect your account, pay us nothing" trade.
Streaming tool calls on the BYO tier. Today the agent on Codex's Responses endpoint streams tokens fine but buffers tool-call arguments until complete, which makes multi-step agent turns feel draggy. We think this is fixable on our side.
Health-check crons. Right now we notice a revoked token when a call fails. We'd rather notice the night before, with a background sweep that pings each connection cheaply and surfaces "reconnect before your next session" prompts ahead of time.

Already paying OpenAI $20/month? Stop paying us too.

Connect your ChatGPT Plus