feat: stream gpt-image generation via SSE with keepalive
- /api/generate now responds with text/event-stream end-to-end - forwards upstream image_generation.* / image_edit.* partial+completed events - 20s keepalive comments survive Cloudflare's 120s proxy-read timeout - falls back to non-streaming when upstream rejects stream/partial_images - drops @ai-sdk/openai-compatible, @ai-sdk/react, ai (unused) - frontend consumes SSE via fetch+ReadableStream, shows progressive preview
This commit is contained in:
@@ -1,7 +1,8 @@
|
||||
# AGENTS.md
|
||||
|
||||
Bun + TypeScript single-file server that exposes an OpenAI-compatible image
|
||||
generation endpoint and serves a small vanilla HTML/JS playground.
|
||||
Bun + TypeScript single-file server that proxies an OpenAI-compatible image
|
||||
endpoint and serves a small vanilla HTML/JS playground. The whole pipeline is
|
||||
SSE end-to-end so it survives Cloudflare's 120s proxy-read timeout.
|
||||
|
||||
## Runtime
|
||||
|
||||
@@ -19,7 +20,7 @@ generation endpoint and serves a small vanilla HTML/JS playground.
|
||||
`noEmit: true`, so plain `bunx tsc` works too).
|
||||
- Tests / lint / formatter: none configured. If adding tests, use `bun test`.
|
||||
|
||||
The server binds `0.0.0.0` (see `index.ts:61`), so it is reachable from other
|
||||
The server binds `0.0.0.0` (see `index.ts:175`), so it is reachable from other
|
||||
hosts on the network when running locally — be mindful when entering API keys.
|
||||
|
||||
## Architecture
|
||||
@@ -27,25 +28,37 @@ hosts on the network when running locally — be mindful when entering API keys.
|
||||
- `index.ts` — the entire backend. One `Bun.serve` instance with:
|
||||
- `/` serves `index.html` via Bun's HTML import (`import index from "./index.html"`).
|
||||
- `POST /api/generate` accepts
|
||||
`{ baseURL, apiKey, model, prompt, size, referenceImages? }`. It returns
|
||||
`{ images: string[] }` where each entry is a `data:` URL (base64).
|
||||
- Two code paths inside the handler:
|
||||
1. No `referenceImages` → uses `@ai-sdk/openai-compatible` + `generateImage`
|
||||
from `ai`.
|
||||
2. `referenceImages` present → hand-rolled `multipart/form-data` POST to
|
||||
`${baseURL}/images/edits` (see `generateWithReference`). The AI SDK
|
||||
does not currently expose image edits for OpenAI-compatible providers,
|
||||
so this path bypasses it on purpose. The edits endpoint is gpt-image
|
||||
series only (see UI hint in `index.html`).
|
||||
`{ baseURL, apiKey, model, prompt, size, referenceImages? }` and **always
|
||||
responds with `text/event-stream`**. Emitted events:
|
||||
- `event: partial` — `{ image: dataUrl, index }` for each `partial_image`
|
||||
- `event: final` — `{ image: dataUrl }` for the completed image
|
||||
- `event: done` — empty payload, sent right before close
|
||||
- `event: error` — `{ message }` for any failure
|
||||
- SSE comments `: keepalive` every 20s while waiting for upstream, so
|
||||
Cloudflare's 120s proxy-read timeout never fires.
|
||||
- Upstream dispatch:
|
||||
- `referenceImages` present → `POST {baseURL}/images/edits` as
|
||||
`multipart/form-data` (image blobs decoded from data URLs).
|
||||
- Otherwise → `POST {baseURL}/images/generations` as JSON.
|
||||
- Both calls send `stream: true, partial_images: 2` first. If upstream
|
||||
returns a 400 mentioning `stream` or `partial_images`,
|
||||
`isStreamingUnsupportedError` triggers a single retry with
|
||||
`stream: false` and the response is replayed as one `final` event via
|
||||
`forwardUpstreamJSON`. Any other 4xx/5xx propagates as `error`.
|
||||
- Targets the **gpt-image series only** (gpt-image-2 is the default). Do
|
||||
not reintroduce DALL·E-only fields like `response_format` — gpt-image
|
||||
always returns `b64_json`.
|
||||
- `index.html` — self-contained UI: inline CSS, plain DOM JS, no build step.
|
||||
Text fields (`baseURL`, `apiKey`, `model`, `size`, `prompt`) persist in
|
||||
`localStorage` under the `aip:<field>` prefix. Reference images are kept
|
||||
in an in-memory `refImages` array as base64 data URLs and are **not**
|
||||
persisted — refreshing the page drops them. There is no React code despite
|
||||
Reads the SSE response via `fetch` + `ReadableStream` (not `EventSource`,
|
||||
because the API is `POST`). Partials overwrite a single `<img>` so the
|
||||
preview animates in place. Text fields (`baseURL`, `apiKey`, `model`,
|
||||
`size`, `prompt`) persist in `localStorage` under the `aip:<field>` prefix.
|
||||
Reference images are kept in an in-memory `refImages` array as base64 data
|
||||
URLs and are **not** persisted. There is no React code despite
|
||||
`react` / `react-dom` / `@types/react*` being in `package.json` — treat
|
||||
those deps as latent. Do not invent a React frontend unless asked.
|
||||
- No router, no DB, no auth. API key is supplied per-request by the browser
|
||||
and never stored server-side.
|
||||
- No router, no DB, no auth, no AI SDK. API key is supplied per-request by
|
||||
the browser and never stored server-side.
|
||||
|
||||
## TypeScript conventions
|
||||
|
||||
@@ -60,15 +73,17 @@ hosts on the network when running locally — be mindful when entering API keys.
|
||||
|
||||
## When extending the API
|
||||
|
||||
- Add new routes inside the `routes` object in `index.ts`; keep the
|
||||
- Add routes inside the `routes` object in `index.ts`; keep the
|
||||
`{ POST: async (req) => … }` shape used by `/api/generate`.
|
||||
- Return JSON with `Response.json(...)`. Validate the request body shape
|
||||
explicitly — the existing handler asserts required fields and returns 400
|
||||
before calling the model.
|
||||
- The AI SDK image type is loose; the current handler casts to
|
||||
`{ mediaType?: string; base64?: string }`. Mirror that pattern rather than
|
||||
trusting field presence.
|
||||
- For anything the AI SDK does not cover (e.g. image edits, masks, variations),
|
||||
follow `generateWithReference`: build `FormData` with `Blob`s decoded from
|
||||
the incoming data URLs and `fetch` the upstream endpoint directly with the
|
||||
caller's `Authorization: Bearer <apiKey>`.
|
||||
- For any long-running upstream call, mirror the SSE-with-keepalive pattern:
|
||||
build a `ReadableStream<Uint8Array>`, start a 20s `: keepalive` comment
|
||||
timer in `start()`, do work inside `try`, always `clearInterval` and
|
||||
`controller.close()` in `finally`. Helpers `sseEvent` / `sseComment`
|
||||
already exist.
|
||||
- Stay defensive about upstream capabilities: many OpenAI-compatible
|
||||
providers reject unknown params. Send the optimistic request first, then
|
||||
detect the specific 400 (see `isStreamingUnsupportedError`) and retry with
|
||||
a degraded body rather than feature-detecting up front.
|
||||
- Decode incoming data URLs with `decodeDataUrl` (returns `Buffer` + mime)
|
||||
and pass them as `Blob` parts to `FormData` — same pattern as the edits
|
||||
path.
|
||||
|
||||
Reference in New Issue
Block a user