Streaming
Stream model responses in real-time.
Streaming allows you to receive model output incrementally as it is generated, rather than waiting for the complete response. This is essential for building responsive chat interfaces and real-time applications.
To enable streaming, set stream: true in your request. The response will be delivered as Server-Sent Events (SSE).
Basic streaming example
from openai import OpenAI
client = OpenAI(
base_url="https://api.ohmygpt.com/v1",
api_key="<OHMYGPT_API_KEY>",
)
stream = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Write a short poem about coding."}],
stream=True
)
for chunk in stream:
content = chunk.choices[0].delta.content
if content:
print(content, end="", flush=True)Raw SSE format
If you are not using an SDK, you need to parse the SSE stream manually. Each event has the format:
data: {"id":"chatcmpl-...","choices":[{"delta":{"content":"Hello"}}],...}
data: {"id":"chatcmpl-...","choices":[{"delta":{"content":" world"}}],...}
data: [DONE]The stream ends with data: [DONE].
Manual parsing example
import requests
import json
response = requests.post(
"https://api.ohmygpt.com/v1/chat/completions",
headers={
"Authorization": "Bearer <OHMYGPT_API_KEY>",
"Content-Type": "application/json"
},
json={
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello!"}],
"stream": True
},
stream=True
)
for line in response.iter_lines():
if line:
line = line.decode('utf-8')
if line.startswith('data: '):
data = line[6:]
if data == '[DONE]':
break
chunk = json.loads(data)
content = chunk['choices'][0]['delta'].get('content', '')
print(content, end='', flush=True)Stream cancellation
You can cancel a streaming request by aborting the connection. For supported providers, this immediately stops model processing and billing.
from threading import Event, Thread
def stream_with_cancel(cancel_event: Event):
response = requests.post(
"https://api.ohmygpt.com/v1/chat/completions",
headers={"Authorization": "Bearer <OHMYGPT_API_KEY>"},
json={
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Write a long story."}],
"stream": True
},
stream=True
)
for line in response.iter_lines():
if cancel_event.is_set():
response.close()
return
# Process line...
# Usage
cancel = Event()
thread = Thread(target=stream_with_cancel, args=(cancel,))
thread.start()
# To cancel:
cancel.set()Stream cancellation stops billing only for providers that support it. For other providers, the model continues processing in the background and you are charged for the full response.
Stream options
Additional parameters for streaming requests:
| Parameter | Type | Description |
|---|---|---|
stream | boolean | Enable streaming (required for SSE output) |
stream_options.include_usage | boolean | Include token usage in the final chunk |
Example with usage tracking:
{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello!"}],
"stream": true,
"stream_options": {
"include_usage": true
}
}When include_usage is true, the final chunk before [DONE] contains the usage object with prompt_tokens, completion_tokens, and total_tokens.
How is this guide?
Last updated on
