feat: stream gpt-image generation via SSE with keepalive

- /api/generate now responds with text/event-stream end-to-end
- forwards upstream image_generation.* / image_edit.* partial+completed events
- 20s keepalive comments survive Cloudflare's 120s proxy-read timeout
- falls back to non-streaming when upstream rejects stream/partial_images
- drops @ai-sdk/openai-compatible, @ai-sdk/react, ai (unused)
- frontend consumes SSE via fetch+ReadableStream, shows progressive preview
This commit is contained in:
2026-05-18 22:44:31 +08:00
parent 54f13c1097
commit 5af05b2141
5 changed files with 327 additions and 170 deletions
+45 -30
View File
@@ -1,7 +1,8 @@
# AGENTS.md
Bun + TypeScript single-file server that exposes an OpenAI-compatible image
generation endpoint and serves a small vanilla HTML/JS playground.
Bun + TypeScript single-file server that proxies an OpenAI-compatible image
endpoint and serves a small vanilla HTML/JS playground. The whole pipeline is
SSE end-to-end so it survives Cloudflare's 120s proxy-read timeout.
## Runtime
@@ -19,7 +20,7 @@ generation endpoint and serves a small vanilla HTML/JS playground.
`noEmit: true`, so plain `bunx tsc` works too).
- Tests / lint / formatter: none configured. If adding tests, use `bun test`.
The server binds `0.0.0.0` (see `index.ts:61`), so it is reachable from other
The server binds `0.0.0.0` (see `index.ts:175`), so it is reachable from other
hosts on the network when running locally — be mindful when entering API keys.
## Architecture
@@ -27,25 +28,37 @@ hosts on the network when running locally — be mindful when entering API keys.
- `index.ts` — the entire backend. One `Bun.serve` instance with:
- `/` serves `index.html` via Bun's HTML import (`import index from "./index.html"`).
- `POST /api/generate` accepts
`{ baseURL, apiKey, model, prompt, size, referenceImages? }`. It returns
`{ images: string[] }` where each entry is a `data:` URL (base64).
- Two code paths inside the handler:
1. No `referenceImages` → uses `@ai-sdk/openai-compatible` + `generateImage`
from `ai`.
2. `referenceImages` present → hand-rolled `multipart/form-data` POST to
`${baseURL}/images/edits` (see `generateWithReference`). The AI SDK
does not currently expose image edits for OpenAI-compatible providers,
so this path bypasses it on purpose. The edits endpoint is gpt-image
series only (see UI hint in `index.html`).
`{ baseURL, apiKey, model, prompt, size, referenceImages? }` and **always
responds with `text/event-stream`**. Emitted events:
- `event: partial``{ image: dataUrl, index }` for each `partial_image`
- `event: final``{ image: dataUrl }` for the completed image
- `event: done` — empty payload, sent right before close
- `event: error``{ message }` for any failure
- SSE comments `: keepalive` every 20s while waiting for upstream, so
Cloudflare's 120s proxy-read timeout never fires.
- Upstream dispatch:
- `referenceImages` present `POST {baseURL}/images/edits` as
`multipart/form-data` (image blobs decoded from data URLs).
- Otherwise → `POST {baseURL}/images/generations` as JSON.
- Both calls send `stream: true, partial_images: 2` first. If upstream
returns a 400 mentioning `stream` or `partial_images`,
`isStreamingUnsupportedError` triggers a single retry with
`stream: false` and the response is replayed as one `final` event via
`forwardUpstreamJSON`. Any other 4xx/5xx propagates as `error`.
- Targets the **gpt-image series only** (gpt-image-2 is the default). Do
not reintroduce DALL·E-only fields like `response_format` — gpt-image
always returns `b64_json`.
- `index.html` — self-contained UI: inline CSS, plain DOM JS, no build step.
Text fields (`baseURL`, `apiKey`, `model`, `size`, `prompt`) persist in
`localStorage` under the `aip:<field>` prefix. Reference images are kept
in an in-memory `refImages` array as base64 data URLs and are **not**
persisted — refreshing the page drops them. There is no React code despite
Reads the SSE response via `fetch` + `ReadableStream` (not `EventSource`,
because the API is `POST`). Partials overwrite a single `<img>` so the
preview animates in place. Text fields (`baseURL`, `apiKey`, `model`,
`size`, `prompt`) persist in `localStorage` under the `aip:<field>` prefix.
Reference images are kept in an in-memory `refImages` array as base64 data
URLs and are **not** persisted. There is no React code despite
`react` / `react-dom` / `@types/react*` being in `package.json` — treat
those deps as latent. Do not invent a React frontend unless asked.
- No router, no DB, no auth. API key is supplied per-request by the browser
and never stored server-side.
- No router, no DB, no auth, no AI SDK. API key is supplied per-request by
the browser and never stored server-side.
## TypeScript conventions
@@ -60,15 +73,17 @@ hosts on the network when running locally — be mindful when entering API keys.
## When extending the API
- Add new routes inside the `routes` object in `index.ts`; keep the
- Add routes inside the `routes` object in `index.ts`; keep the
`{ POST: async (req) => … }` shape used by `/api/generate`.
- Return JSON with `Response.json(...)`. Validate the request body shape
explicitly — the existing handler asserts required fields and returns 400
before calling the model.
- The AI SDK image type is loose; the current handler casts to
`{ mediaType?: string; base64?: string }`. Mirror that pattern rather than
trusting field presence.
- For anything the AI SDK does not cover (e.g. image edits, masks, variations),
follow `generateWithReference`: build `FormData` with `Blob`s decoded from
the incoming data URLs and `fetch` the upstream endpoint directly with the
caller's `Authorization: Bearer <apiKey>`.
- For any long-running upstream call, mirror the SSE-with-keepalive pattern:
build a `ReadableStream<Uint8Array>`, start a 20s `: keepalive` comment
timer in `start()`, do work inside `try`, always `clearInterval` and
`controller.close()` in `finally`. Helpers `sseEvent` / `sseComment`
already exist.
- Stay defensive about upstream capabilities: many OpenAI-compatible
providers reject unknown params. Send the optimistic request first, then
detect the specific 400 (see `isStreamingUnsupportedError`) and retry with
a degraded body rather than feature-detecting up front.
- Decode incoming data URLs with `decodeDataUrl` (returns `Buffer` + mime)
and pass them as `Blob` parts to `FormData` — same pattern as the edits
path.