AI Library

AI and LLM functions for interacting with OpenAI-compatible APIs. This library provides:

  1. AI Client - Create clients and make completions
  2. Tool Registry - Build tool schemas for AI agents
  3. Thinking Extractor - Extract reasoning blocks from AI responses

Available Functions

Function Description
Client(base_url, **kwargs) Create AI client for API interactions
text(response) Extract text content from response
thinking(response) Extract thinking blocks from response
extract_thinking(text) Extract thinking blocks from text string
ToolRegistry() Create tool registry for building schemas

Creating an AI Client

The first step is to create an AI client instance:

import scriptling.ai as ai

# OpenAI API with defaults
client = ai.Client("", api_key="sk-...")

# With custom settings
client = ai.Client(
    "https://api.openai.com/v1",
    api_key="sk-...",
    max_tokens=2048,
    temperature=0.7
)

# Claude (max_tokens defaults to 4096 if not specified)
client = ai.Client(
    "https://api.anthropic.com",
    provider=ai.CLAUDE,
    api_key="sk-ant-..."
)

# Local LLM (LM Studio, Ollama, etc.)
client = ai.Client("http://127.0.0.1:1234/v1")

Response Helpers

ai.text(response)

Extracts the text content from a completion response, automatically removing any thinking blocks.

Parameters:

  • response (dict): Chat completion response from client.completion()

Returns: str - The response text with thinking blocks removed

Example:

import scriptling.ai as ai

client = ai.Client("", api_key="sk-...")
response = client.completion("gpt-4", "What is 2+2?")

# Get just the text, without thinking blocks
text = ai.text(response)
print(text)  # "4"

ai.thinking(response)

Extracts thinking/reasoning blocks from a completion response.

Parameters:

  • response (dict): Chat completion response from client.completion()

Returns: list - List of thinking block strings (empty if no thinking blocks)

Example:

import scriptling.ai as ai

client = ai.Client("", api_key="sk-...")
response = client.completion("gpt-4", "Explain step by step")

# Get thinking blocks separately
thoughts = ai.thinking(response)
for thought in thoughts:
    print("Reasoning:", thought)

# Get clean text
text = ai.text(response)
print("Answer:", text)

Thinking Extractor

ai.extract_thinking(text)

Extracts thinking/reasoning blocks from AI model responses. Many models include their reasoning in special blocks (like <think>...</think>) which you may want to process separately from the main response.

Supported Formats:

  • XML-style: <think>...</think>, <thinking>...</thinking>
  • OpenAI style: <Thought>...</Thought>
  • Markdown blocks: ```thinking\n...\n```, ```thought\n...\n```
  • Claude style: <antThinking>...</antThinking>

Parameters:

  • text (str): The AI response text to process

Returns: dict - Contains:

  • thinking (list): List of extracted thinking block strings
  • content (str): The cleaned response text with thinking blocks removed

Example:

import scriptling.ai as ai

response_text = """<think>
Let me analyze this step by step.
The user wants to know about Python.
</think>

Python is a high-level programming language known for its readability."""

result = ai.extract_thinking(response_text)

# Access the thinking blocks
for thought in result["thinking"]:
    print("Model reasoning:", thought)

# Get the cleaned response
print("Response:", result["content"])
# Output: "Python is a high-level programming language known for its readability."

With Agent Responses:

import scriptling.ai as ai
import scriptling.ai.agent as agent

bot = agent.Agent(client, tools=tools, system_prompt="...")
response = bot.trigger("Explain Python")

# Extract and display thinking separately
result = ai.extract_thinking(response.content)

if result["thinking"]:
    print("=== Model Reasoning ===")
    for thought in result["thinking"]:
        print(thought)
    print()

print("=== Response ===")
print(result["content"])

AI Client Reference

scriptling.ai.Client(base_url, **kwargs)

Creates a new AI client instance for making API calls to supported services.

Parameters:

  • base_url (str): Base URL of the API (defaults to https://api.openai.com/v1 if empty)

  • provider (str, optional): Provider type (defaults to ai.OPENAI). Use constants:

    Constant Provider
    ai.OPENAI OpenAI
    ai.CLAUDE Anthropic Claude
    ai.GEMINI Google Gemini
    ai.OLLAMA Ollama
    ai.ZAI Z AI
    ai.MISTRAL Mistral
  • api_key (str, optional): API key for authentication

  • max_tokens (int, optional): Default max_tokens for all requests. Claude defaults to 4096 if not set

  • temperature (float, optional): Default temperature for all requests (0.0-2.0)

  • remote_servers (list, optional): List of remote MCP server configs, each a dict with:

    • base_url (str, required): URL of the MCP server
    • namespace (str, optional): Namespace prefix for tools from this server
    • bearer_token (str, optional): Bearer token for authentication

Returns: AIClient - A client instance with methods for API calls

Example:

import scriptling.ai as ai

# OpenAI API with defaults
client = ai.Client("", api_key="sk-...", max_tokens=2048, temperature=0.7)

# Claude (max_tokens defaults to 4096 if not specified)
client = ai.Client(
    "https://api.anthropic.com",
    provider=ai.CLAUDE,
    api_key="sk-ant-...",
    max_tokens=4096,  # Optional, defaults to 4096 for Claude
    temperature=0.7
)

# LM Studio / Local LLM
client = ai.Client("http://127.0.0.1:1234/v1")

# With MCP servers configured
client = ai.Client("http://127.0.0.1:1234/v1", remote_servers=[
    {"base_url": "http://127.0.0.1:8080/mcp", "namespace": "scriptling"},
    {"base_url": "https://api.example.com/mcp", "namespace": "search", "bearer_token": "secret"},
])

Default Parameters:

When you set max_tokens and temperature at the client level, they apply to all requests unless overridden:

# Set defaults at client creation
client = ai.Client("", api_key="sk-...", max_tokens=2048, temperature=0.7)

# Uses client defaults (2048 tokens, 0.7 temperature)
response = client.completion("gpt-4", "Hello!")

# Override per request
response = client.completion("gpt-4", "Hello!", max_tokens=4096, temperature=0.9)

Tool Registry

Build OpenAI-compatible tool schemas for AI agents. See Agent Library for complete agent examples.

ai.ToolRegistry()

Creates a new tool registry.

Example:

import scriptling.ai as ai

client = ai.Client("", api_key="sk-...")

tools = ai.ToolRegistry()
tools.add("read_file", "Read a file", {"path": "string"}, lambda args: os.read_file(args["path"]))
schemas = tools.build()

# Pass tools directly to completion()
response = client.completion("gpt-4", [{"role": "user", "content": "Read file /data/config.txt"}], tools=schemas)

See Agent Library for detailed ToolRegistry documentation and automatic tool handling.

AIClient Class

All client methods are instance methods on the client object returned by ai.Client() or ai.WrapClient().

client.completion(model, messages, **kwargs)

Creates a chat completion using this client’s configuration.

Parameters:

  • model (str): Model identifier (e.g., “gpt-4”, “gpt-3.5-turbo”)
  • messages (str or list): Either a string (user message) or a list of message dicts with “role” and “content” keys
  • system_prompt (str, optional): System prompt to use when messages is a string
  • tools (list, optional): List of tool schema dicts from ToolRegistry.build()
  • temperature (float, optional): Sampling temperature (0.0-2.0)
  • max_tokens (int, optional): Maximum tokens to generate

Returns: dict - Response containing id, choices, usage, etc.

Examples:

import scriptling.ai as ai

client = ai.Client("", api_key="sk-...")

# String shorthand - simple user message
response = client.completion("gpt-4", "What is 2+2?")
print(response.choices[0].message.content)

# String shorthand with system prompt
response = client.completion("gpt-4", "What is 2+2?", system_prompt="You are a helpful math tutor")
print(response.choices[0].message.content)

# Full messages array
response = client.completion("gpt-4", [{"role": "user", "content": "What is 2+2?"}])
print(response.choices[0].message.content)

With Tool Calling:

import scriptling.ai as ai

client = ai.Client("", api_key="sk-...")

# Create tools registry
tools = ai.ToolRegistry()
tools.add("get_time", "Get current time", {}, lambda args: "12:00 PM")
tools.add("read_file", "Read a file", {"path": "string"}, lambda args: os.read_file(args["path"]))

# Build schemas and pass to completion
schemas = tools.build()
response = client.completion("gpt-4", [{"role": "user", "content": "What time is it?"}], tools=schemas)

client.completion_stream(model, messages, **kwargs)

Creates a streaming chat completion using this client’s configuration. Returns a ChatStream object that can be iterated over.

Parameters:

  • model (str): Model identifier (e.g., “gpt-4”, “gpt-3.5-turbo”)
  • messages (str or list): Either a string (user message) or a list of message dicts with “role” and “content” keys
  • system_prompt (str, optional): System prompt to use when messages is a string
  • tools (list, optional): List of tool schema dicts from ToolRegistry.build()
  • temperature (float, optional): Sampling temperature (0.0-2.0)
  • max_tokens (int, optional): Maximum tokens to generate

Returns: ChatStream - A stream object with a next() method

Examples:

# String shorthand - simple user message
client = ai.Client("", api_key="sk-...")
stream = client.completion_stream("gpt-4", "Count to 10")
while True:
    chunk = stream.next()
    if chunk is None:
        break
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content:
            print(delta.content, end="")
print()

# String shorthand with system prompt
stream = client.completion_stream("gpt-4", "Explain quantum physics", system_prompt="You are a physics professor")
# ... iterate as above

# Full messages array
stream = client.completion_stream("gpt-4", [{"role": "user", "content": "Count to 10"}])
# ... iterate as above

With Tool Calling:

import scriptling.ai as ai

client = ai.Client("", api_key="sk-...")

tools = ai.ToolRegistry()
tools.add("get_weather", "Get weather for a city", {"city": "string"}, weather_handler)
schemas = tools.build()

stream = client.completion_stream("gpt-4", [{"role": "user", "content": "What's the weather in Paris?"}], tools=schemas)
# Stream chunks...

client.ask(model, messages, **kwargs)

Quick completion method that returns text directly, with thinking blocks automatically removed. This is a convenience method for simple queries where you don’t need the full response object.

Parameters:

  • model (str): Model identifier (e.g., “gpt-4”, “gpt-3.5-turbo”)
  • messages (str or list): Either a string (user message) or a list of message dicts
  • system_prompt (str, optional): System prompt to use when messages is a string
  • tools (list, optional): List of tool schema dicts from ToolRegistry.build()
  • temperature (float, optional): Sampling temperature (0.0-2.0)
  • max_tokens (int, optional): Maximum tokens to generate

Returns: str - The response text with thinking blocks removed

Examples:

import scriptling.ai as ai

client = ai.Client("", api_key="sk-...")

# Simple query
answer = client.ask("gpt-4", "What is 2+2?")
print(answer)  # "4"

# With system prompt
answer = client.ask("gpt-4", "Explain quantum physics", system_prompt="You are a physics professor")
print(answer)

# Full messages array
answer = client.ask("gpt-4", [{"role": "user", "content": "Hello!"}])
print(answer)

client.embedding(model, input)

Creates an embedding vector for the given input text(s) using the specified model.

Parameters:

  • model (str): Model identifier (e.g., “text-embedding-3-small”, “text-embedding-3-large”)
  • input (str or list): Input text(s) to embed - can be a string or list of strings

Returns: dict - Response containing data (list of embeddings with index, embedding, object), model, and usage

Example:

client = ai.Client("", api_key="sk-...")

# Single text embedding
response = client.embedding("text-embedding-3-small", "Hello world")
print(response.data[0].embedding)

# Batch embedding
response = client.embedding("text-embedding-3-small", ["Hello", "World"])
for emb in response.data:
    print(emb.embedding)

# Using embeddings for similarity search
texts = ["cat", "dog", "car", "bicycle"]
response = client.embedding("text-embedding-3-small", texts)

# Query similarity
query_resp = client.embedding("text-embedding-3-small", "vehicle")
query_emb = query_resp.data[0].embedding

# Find most similar (simplified - in practice use proper cosine similarity)
import math
for i, text_emb in enumerate(response.data):
    # Simple dot product as example (use cosine similarity in production)
    similarity = sum(a * b for a, b in zip(query_emb, text_emb.embedding))
    print(f"{texts[i]}: {similarity}")

ChatStream Class

stream.next()

Advances to the next response chunk and returns it.

Returns: dict - The next response chunk, or null if the stream is complete

Example:

import scriptling.ai as ai

client = ai.Client("", api_key="sk-...")
stream = client.completion_stream("gpt-4", [{"role": "user", "content": "Hello!"}])
while True:
    chunk = stream.next()
    if chunk is None:
        break
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content:
            print(delta.content, end="")

client.models()

Lists all models available for this client configuration.

Returns: list - List of model dicts with id, created, owned_by, etc.

Example:

client = ai.Client("", api_key="sk-...")
models = client.models()
for model in models:
    print(model.id)

client.response_create(model, input, **kwargs)

Creates a response using the OpenAI Responses API (new structured API).

Parameters:

  • model (str): Model identifier (e.g., “gpt-4o”, “gpt-4”)
  • input (str or list): Either a string (user message content) or a list of input items (messages)
  • system_prompt (str, optional): System prompt to use when input is a string
  • background (bool, optional): If true, runs asynchronously and returns immediately with in_progress status

Returns: dict - Response object with id, status, output, usage, etc.

Examples:

# String shorthand - simple user message
client = ai.Client("", api_key="sk-...")
response = client.response_create("gpt-4o", "Hello!")
print(response.output)

# String shorthand with system prompt
response = client.response_create("gpt-4o", "What is AI?", system_prompt="You are a helpful assistant")
print(response.output)

# Background processing
response = client.response_create("gpt-4o", "What is AI?", background=True)
print(response.status)  # "queued" or "in_progress"
# Poll for completion
import time
while response.status in ["queued", "in_progress"]:
    time.sleep(0.5)
    response = client.response_get(response.id)
print(response.status)  # "completed"
print(response.output)

# Full input array (Responses API format)
response = client.response_create("gpt-4o", [
    {"type": "message", "role": "user", "content": "Hello!"}
])
print(response.output)

client.response_get(id)

Retrieves a previously created response by its ID.

Parameters:

  • id (str): Response ID

Returns: dict - Response object with id, status, output, usage, etc.

Example:

client = ai.Client("", api_key="sk-...")
response = client.response_get("resp_123")
print(response.status)

client.response_cancel(id)

Cancels a currently in-progress response.

Parameters:

  • id (str): Response ID to cancel

Returns: dict - Cancelled response object

Example:

client = ai.Client("", api_key="sk-...")
response = client.response_cancel("resp_123")

client.response_delete(id)

Deletes a response by ID, removing it from storage.

Parameters:

  • id (str): Response ID to delete

Returns: None

Example:

client = ai.Client("", api_key="sk-...")
client.response_delete("resp_123")

client.response_compact(id)

Compacts a response by removing intermediate reasoning steps, returning a more concise version with only the final output.

Parameters:

  • id (str): Response ID to compact

Returns: dict - Compacted response object with reasoning removed

Example:

client = ai.Client("", api_key="sk-...")

# Create a response with reasoning
response = client.response_create("gpt-4o", "Solve this complex problem: 2+2")

# Compact it to remove reasoning steps
compacted = client.response_compact(response.id)
print(compacted.output)  # Output without reasoning blocks

Usage Examples

String Shorthand (Simple Queries)

For simple queries, you can pass a string directly instead of building a messages array:

import scriptling.ai as ai

client = ai.Client("", api_key="sk-...")

# Simple user message
response = client.completion("gpt-4", "What is 2+2?")
print(response.choices[0].message.content)

# With system prompt
response = client.completion("gpt-4", "Explain quantum physics", system_prompt="You are a physics professor")
print(response.choices[0].message.content)

# Works with streaming too
stream = client.completion_stream("gpt-4", "Tell me a story")
while True:
    chunk = stream.next()
    if chunk is None:
        break
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content:
            print(delta.content, end="")
print()

Basic Chat Completion

import scriptling.ai as ai

client = ai.Client("", api_key="sk-...")
response = client.completion("gpt-4", [{"role": "user", "content": "Hello!"}])
print(response.choices[0].message.content)

Conversation with Multiple Messages

client = ai.Client("", api_key="sk-...")

response = client.completion(
    "gpt-4",
    [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"},
        {"role": "assistant", "content": "The capital of France is Paris."},
        {"role": "user", "content": "And what about Germany?"}
    ]
)

print(response.choices[0].message.content)

Streaming Chat Completion

client = ai.Client("", api_key="sk-...")

stream = client.completion_stream("gpt-4", [{"role": "user", "content": "Count to 10"}])
while True:
    chunk = stream.next()
    if chunk is None:
        break
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content:
            print(delta.content, end="")
print()

Using Custom Base URL

# For OpenAI-compatible services like LM Studio, local LLMs, etc.
client = ai.Client("http://127.0.0.1:1234/v1")

response = client.completion("mistralai/ministral-3-3b", [{"role": "user", "content": "Hello!"}])

Using MCP Tools with AI

MCP servers can be configured during client creation using the remote_servers parameter:

import scriptling.ai as ai

client = ai.Client("http://127.0.0.1:1234/v1", remote_servers=[
    {"base_url": "http://127.0.0.1:8080/mcp", "namespace": "scriptling"},
])

# Tools from MCP servers are automatically available to AI calls
response = client.completion("gpt-4", [{"role": "user", "content": "Search for recent news"}])

Error Handling

import scriptling.ai as ai

try:
    client = ai.Client("", api_key="sk-...")
    response = client.completion("gpt-4", [{"role": "user", "content": "Hello!"}])
    print(response.choices[0].message.content)
except Exception as e:
    print("Error:", e)

Message Format

Messages are dictionaries with the following keys:

  • role (str): “system”, “user”, “assistant”, or “tool”
  • content (str): The message content
  • tool_calls (list, optional): Tool calls made by the assistant
  • tool_call_id (str, optional): ID for tool response messages
message = {
    "role": "user",
    "content": "What is the weather like?"
}