OhMyGPT

Streaming

Stream model responses in real-time.

Streaming allows you to receive model output incrementally as it is generated, rather than waiting for the complete response. This is essential for building responsive chat interfaces and real-time applications.

To enable streaming, set stream: true in your request. The response will be delivered as Server-Sent Events (SSE).

Basic streaming example

from openai import OpenAI

client = OpenAI(
    base_url="https://api.ohmygpt.com/v1",
    api_key="<OHMYGPT_API_KEY>",
)

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Write a short poem about coding."}],
    stream=True
)

for chunk in stream:
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="", flush=True)

Raw SSE format

If you are not using an SDK, you need to parse the SSE stream manually. Each event has the format:

data: {"id":"chatcmpl-...","choices":[{"delta":{"content":"Hello"}}],...}

data: {"id":"chatcmpl-...","choices":[{"delta":{"content":" world"}}],...}

data: [DONE]

The stream ends with data: [DONE].

Manual parsing example

import requests
import json

response = requests.post(
    "https://api.ohmygpt.com/v1/chat/completions",
    headers={
        "Authorization": "Bearer <OHMYGPT_API_KEY>",
        "Content-Type": "application/json"
    },
    json={
        "model": "gpt-4o",
        "messages": [{"role": "user", "content": "Hello!"}],
        "stream": True
    },
    stream=True
)

for line in response.iter_lines():
    if line:
        line = line.decode('utf-8')
        if line.startswith('data: '):
            data = line[6:]
            if data == '[DONE]':
                break
            chunk = json.loads(data)
            content = chunk['choices'][0]['delta'].get('content', '')
            print(content, end='', flush=True)

Stream cancellation

You can cancel a streaming request by aborting the connection. For supported providers, this immediately stops model processing and billing.

from threading import Event, Thread

def stream_with_cancel(cancel_event: Event):
    response = requests.post(
        "https://api.ohmygpt.com/v1/chat/completions",
        headers={"Authorization": "Bearer <OHMYGPT_API_KEY>"},
        json={
            "model": "gpt-4o",
            "messages": [{"role": "user", "content": "Write a long story."}],
            "stream": True
        },
        stream=True
    )

    for line in response.iter_lines():
        if cancel_event.is_set():
            response.close()
            return
        # Process line...

# Usage
cancel = Event()
thread = Thread(target=stream_with_cancel, args=(cancel,))
thread.start()

# To cancel:
cancel.set()

Stream cancellation stops billing only for providers that support it. For other providers, the model continues processing in the background and you are charged for the full response.

Stream options

Additional parameters for streaming requests:

ParameterTypeDescription
streambooleanEnable streaming (required for SSE output)
stream_options.include_usagebooleanInclude token usage in the final chunk

Example with usage tracking:

{
  "model": "gpt-4o",
  "messages": [{"role": "user", "content": "Hello!"}],
  "stream": true,
  "stream_options": {
    "include_usage": true
  }
}

When include_usage is true, the final chunk before [DONE] contains the usage object with prompt_tokens, completion_tokens, and total_tokens.

How is this guide?

Last updated on

On this page