Files
imbytecat 9e6b4dc7fa feat: add 16:9 and 2K size presets for gpt-image-2
- Add 1536x864, 864x1536, 2048x1152, 1152x2048, 2560x1440 to the size
  dropdown. 2560x1440 is the OpenAI cookbook's recommended upper
  widescreen reliability boundary
- Note that sizes above 2560x1440 are experimental and the max-edge
  rule is strictly <3840 (3840x2160 is out, use 3824x2144 instead)
- 1920x1080 is invalid because 1080 isn't a multiple of 16 — captured
  in AGENTS.md alongside the full constraint set so future agents stop
  proposing it
2026-05-19 00:24:13 +08:00

156 lines
7.9 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# AGENTS.md
Bun + Hono server that proxies an OpenAI-compatible image endpoint and serves
a small vanilla TS playground. SSE end-to-end: streams gpt-image partial
previews through, with keepalive comments that survive Cloudflare's 120s
proxy-read timeout.
## Runtime
- Bun, not Node. See `CLAUDE.md` for the full Bun-vs-Node cheatsheet
(prefer `Bun.serve`, `Bun.file`, `bun:test`, `Bun.sql`, etc.). Do not add
`dotenv` — Bun loads `.env` automatically.
- Bun version baseline: `1.3.13` (per `README.md`).
## Config
Required env vars (validated at startup via `requireEnv`; the process exits
if any is missing):
| Var | Example | Purpose |
|---|---|---|
| `BASE_URL` | `https://api.openai.com/v1` | OpenAI-compatible base URL |
| `API_KEY` | `sk-…` | Bearer token sent to upstream |
| `MODEL` | `gpt-image-2` | Model name forwarded to upstream |
`.env.example` is the source of truth for variable names. The real `.env`
is gitignored. Restart the server after changing env vars — they are read
once at module load.
These secrets stay **server-side**. The browser only sends `prompt`,
`size`, and `referenceImages`. Combined with the `0.0.0.0` bind, this
means anyone reachable on the network can spend your upstream quota —
bind to `127.0.0.1` or put auth in front if that matters.
## Commands
- Install: `bun install`
- Dev (HMR): `bun run dev``bun --hot ./index.ts`
- Start: `bun run start``bun ./index.ts`
- Typecheck: no script defined. Use `bunx tsc --noEmit` (tsconfig already sets
`noEmit: true`, so plain `bunx tsc` works too).
- Tests / lint / formatter: none configured. If adding tests, use `bun test`.
The server binds `0.0.0.0` (see `index.ts`), so it is reachable from other
hosts on the network — be mindful when entering API keys.
Bun's dev server auto-serves `/.well-known/appspecific/com.chrome.devtools.json`
advertising the project root to Chrome DevTools' "Automatic Workspace Folders".
Sandboxed browsers (Flatpak/Snap) reject the path with
`Unable to add filesystem: <illegal path>`. Disabled via
`development.chromeDevToolsAutomaticWorkspaceFolders: false`.
## Architecture
Three files do everything:
- `index.ts` — Hono app mounted under `Bun.serve` (`fetch: app.fetch`).
- `routes: { "/": index }` serves `index.html` via Bun's HTML bundler;
everything else falls through to Hono.
- `idleTimeout: 255` (max) — `Bun.serve`'s 10s default kills SSE
connections before the first keepalive can fire. The symptom is an
empty EventStream in DevTools and `request timed out after 10 seconds`
in the log.
- `POST /api/generate` uses `streamSSE` from `hono/streaming`. Accepts
`{ prompt, size, referenceImages? }``BASE_URL`, `API_KEY`, and
`MODEL` come from env, not the request. Emits:
- `event: partial``{ image: dataUrl, index }` for each
`image_generation.partial_image` / `image_edit.partial_image`.
- `event: final``{ image: dataUrl }` for `*.completed`.
- `event: done` — empty payload, sent before stream ends.
- `event: error``{ message }` for any failure.
- First write is `: connected\n\n` so the browser/EventStream tab
becomes responsive immediately; then a `: keepalive\n\n` raw comment
every 15s.
- Upstream dispatch:
- `referenceImages` present → `POST {baseURL}/images/edits` as
`multipart/form-data` (blobs decoded from data URLs via
`decodeDataUrl``Uint8Array<ArrayBuffer>`). Single reference uses
field name `image`; **two or more references use `image[]`** to
match OpenAI's documented array syntax — strict gateways reject
repeated `image` parts with a `duplicate_parameter` 400.
- Otherwise → `POST {baseURL}/images/generations` as JSON.
- Always sends `stream: true, partial_images: 2` first. On a 400 that
mentions `stream` or `partial_images` (see
`isStreamingUnsupportedError`), retries once with `stream: false`
and replays the JSON response as a single `final` event via
`forwardUpstreamJSON`. Any other 4xx/5xx becomes an `error` event.
- `AbortController` wired to `stream.onAbort()` and threaded as `signal`
into every upstream `fetch`. The `catch` branch is suppressed when
`signal.aborted` so closed tabs don't spam the log.
- Targets the **gpt-image series only** (gpt-image-2 default). Do not
reintroduce DALL·E-only fields like `response_format` — gpt-image
always returns `b64_json`.
- `gpt-image-2` size constraints (per the OpenAI cookbook): both edges
multiple of 16, max edge **< 3840**, long/short ≤ 3:1, total pixels
in 655,3608,294,400, `auto` not supported. Exact 16:9 requires
`16k × 9k` with `k` a multiple of 16 (so `1280×720`, `1536×864`,
`2048×1152`, `2560×1440`, …). Sizes above 2560×1440 are
experimental — the popular 4K target `3840×2160` violates the
`< 3840` rule, round down to `3824×2144` if you need it. Common
misses: `1920×1080` is **not valid** (1080 % 16 ≠ 0).
- `client.ts` — browser entry, loaded via `<script type="module"
src="./client.ts">` in `index.html`. Bun's bundler resolves the import,
inlines `@microsoft/fetch-event-source`, and serves the bundle from
`/_bun/client/index-*.js`. **Inline `<script type="module">` blocks are
not bundled by Bun** — any client JS that imports from `node_modules`
must live in a separate file.
- Uses `fetchEventSource` instead of hand-rolled `fetch` +
`ReadableStream` SSE parsing. It supports POST + body, custom headers,
`signal`, and the `onopen` / `onmessage` / `onerror` callbacks.
- On `done`, the client calls `abort.abort()` to terminate the
`fetchEventSource` loop cleanly — otherwise it would retry forever.
- Text fields (`size`, `prompt`) persist in `localStorage` under the
`aip:<field>` prefix. Reference images stay in-memory only.
- `index.html` — markup + inline CSS only. No JS lives here.
No router, no DB, no auth, no AI SDK. API key is supplied per-request by the
browser and never stored server-side.
## TypeScript conventions
`tsconfig.json` is strict with bundler-mode resolution:
- `strict`, `noUncheckedIndexedAccess`, `noImplicitOverride`,
`noFallthroughCasesInSwitch` are on — array/object index access is
`T | undefined` and must be narrowed.
- `verbatimModuleSyntax` + `moduleDetection: "force"` — use `import type` for
type-only imports; every file is a module.
- `allowImportingTsExtensions` is on; `.ts` extensions in imports are fine.
- `lib: ["ESNext", "DOM", "DOM.Iterable"]` — DOM globals are in scope for
`client.ts`. The server file uses Bun globals from `@types/bun`; the
overlap (`fetch`, `Response`, `Blob`, `FormData`) resolves to Web
standards, which is what we want.
- TS 5.7+ split `Uint8Array` into `Uint8Array<ArrayBuffer>` vs
`Uint8Array<SharedArrayBuffer>`. DOM's `Blob`/`BufferSource` requires the
former. Allocate via `new Uint8Array(new ArrayBuffer(n))` (see
`decodeDataUrl`) rather than `new Uint8Array(n)` — the latter widens to
`ArrayBufferLike` and fails to satisfy `BlobPart`.
## When extending the API
- Routes live on the Hono `app`. For long-running upstream calls, mirror
the existing pattern:
- `return streamSSE(c, async (stream) => { … })`
- `stream.onAbort(() => abortController.abort())` at the top
- `await stream.write(": connected\n\n")` to flush headers immediately
- `setInterval(() => stream.write(": keepalive\n\n").catch(() => {}),
15_000)` and `clearInterval` in `finally`
- `stream.writeSSE({ event, data: JSON.stringify(payload) })` for
application events
- Catch errors and check `signal.aborted` before emitting `error` —
otherwise every closed tab logs noise.
- Send the optimistic request to upstream first; detect the specific 400
via `isStreamingUnsupportedError` and retry with a degraded body rather
than feature-detecting up front.
- Decode incoming data URLs with `decodeDataUrl` and pass the typed
`Uint8Array<ArrayBuffer>` directly as a `Blob` part in `FormData`.