Lead AI
Home/SDK/Instructor
Instructor

Instructor

SDK
Structured Output
8.0
free
intermediate

Schema-first wrapper for LLM APIs that makes structured extraction, typed agent outputs, and reliable JSON generation easier in production code.

Widely adopted in enterprise

structured-output
pydantic
python
Visit Website

Recommended Fit

Best Use Case

Python developers needing reliable, structured JSON output from LLMs with Pydantic validation.

Instructor Key Features

Pydantic Validation

Get LLM outputs as validated Pydantic models with retry on failure.

Structured Output

Type-safe Responses

Define expected schemas and get guaranteed structured JSON output.

Multi-provider Support

Works with OpenAI, Anthropic, Mistral, and other providers.

Streaming Partial

Stream partial structured responses for responsive UIs.

Instructor Top Functions

Add AI capabilities to apps with simple API calls

Overview

Instructor is a schema-first SDK that transforms LLM API calls into type-safe, validated structured outputs using Pydantic models. Rather than parsing raw JSON strings and handling validation errors in application code, Instructor wraps your LLM provider's API—OpenAI, Anthropic, Cohere, Ollama, and others—to enforce schemas at the generation level. This means the model commits to your defined structure, dramatically reducing malformed responses and downstream parsing failures in production systems.

The library treats your Pydantic models as the single source of truth for LLM outputs. You define a schema once, and Instructor handles prompt injection, response validation, retries with corrective feedback, and streaming partial updates. This approach eliminates the boilerplate of manual JSON validation and error handling that typically bloats extraction pipelines.

Key Strengths

Instructor's core value lies in its tight integration with Pydantic v2, enabling automatic type validation, nested object support, and field-level constraints (min/max length, regex patterns, custom validators). When an LLM produces invalid JSON or violates your schema rules, Instructor automatically retries the request with the validation error embedded in the prompt, teaching the model how to correct itself without requiring manual intervention.

  • Multi-provider support: OpenAI, Anthropic, Cohere, Ollama, vLLM, Groq, and local models work with the same API surface
  • Streaming partial updates: Receive validated partial objects as tokens arrive, enabling real-time UI updates for long-form content
  • Deterministic output modes: JSON, XML, or tool-use modes ensure compatibility across different LLM architectures
  • Async-first design: Built for concurrent request handling with proper error propagation and retry logic
  • Zero production cost: Completely free and open-source; no per-call fees or premium tier

Who It's For

Instructor is purpose-built for Python developers building data extraction pipelines, AI agents, and structured content generation workflows where reliability matters. If you're parsing resumes into candidate objects, extracting structured insights from documents, or building multi-turn agents that must return predictable, typed responses, Instructor eliminates entire categories of runtime errors.

Teams maintaining production systems benefit most: the automatic retry-with-feedback loop reduces manual debugging, while the async runtime scales efficiently with thousands of concurrent requests. The library is especially valuable in fintech, legal tech, and healthcare sectors where structured data quality is non-negotiable.

Bottom Line

Instructor fills a critical gap in the LLM toolchain: it makes reliable structured extraction the default, not the exception. The schema-first design, combined with free pricing, Pydantic integration, and multi-provider flexibility, makes it the go-to choice for Python teams that refuse to ship unreliable JSON parsing code. The learning curve is gentle if you're already familiar with Pydantic; investment pays off immediately through reduced error handling boilerplate and fewer production incidents.

Instructor Pros

  • Completely free and open-source with no per-call charges or premium tier, making it cost-effective for high-volume extraction workloads.
  • Automatic retry-with-feedback loop corrects invalid responses without manual error handling, dramatically reducing malformed JSON in production.
  • Seamless Pydantic v2 integration enables nested objects, field validation (min/max, regex, custom validators), and automatic type coercion without extra boilerplate.
  • Multi-provider support (OpenAI, Anthropic, Cohere, Ollama, vLLM, Groq) with identical API surface means you can swap LLM backends without rewriting extraction logic.
  • Streaming partial updates deliver validated objects token-by-token, enabling real-time UI updates and progressive response handling.
  • Deterministic output modes (JSON, XML, tool-use) adapt to different model architectures and ensure compatibility across providers.
  • Async-first design with built-in concurrency support scales efficiently for handling thousands of concurrent extraction requests.

Instructor Cons

  • Python-only SDK limits adoption in polyglot teams; no official Go, Rust, Node.js, or Java implementations yet.
  • Performance overhead from validation and retry logic adds latency compared to raw API calls; not suitable for ultra-low-latency requirements under 100ms.
  • Relies on prompt injection to enforce schemas, which may be less reliable with weaker or uncensored models that ignore structured output instructions.
  • Limited debugging visibility into why a model fails to comply with a schema; error messages can be opaque without verbose logging configuration.
  • Dependency on Pydantic v2 creates tight coupling; models cannot be easily ported to non-Pydantic validation frameworks without refactoring.
  • Streaming mode complexity increases error handling surface area; partial failures mid-stream can be harder to debug than synchronous calls.

Get Latest Updates about Instructor

Tools, features, and AI dev insights - straight to your inbox.

Follow Us

Instructor Social Links

Need Instructor alternatives?

Instructor FAQs

Is Instructor free? Are there any hidden costs?
Yes, Instructor is completely free and open-source. You only pay for the underlying LLM API calls (OpenAI, Anthropic, etc.). There are no per-call fees, premium features, or usage limits imposed by Instructor itself. All source code is available on GitHub under the MIT license.
Can I use Instructor with non-OpenAI LLMs?
Absolutely. Instructor supports Anthropic, Cohere, Ollama, vLLM, Groq, and any OpenAI-compatible API endpoint. You patch the client the same way regardless of provider, so switching backends requires only a one-line change. Local models via Ollama are also fully supported for offline extraction.
What happens if the LLM returns invalid JSON that doesn't match my schema?
Instructor automatically catches validation errors and retries the request with the validation error message embedded in a new prompt, teaching the model how to correct itself. This loop continues until the response matches your schema or a maximum retry count is reached. You can configure retry behavior and fallback handling.
Does Instructor work with streaming responses?
Yes. Enable streaming to receive validated partial objects as tokens arrive. Each partial is a live instance of your Pydantic model with fields populated progressively. This is ideal for UI updates, real-time displays, and responsive applications where waiting for the full response is unacceptable.
How does Instructor compare to alternatives like Marvin or OpenAI's JSON mode?
Instructor is more flexible than OpenAI's JSON mode (which only guarantees valid JSON, not schema compliance) and more lightweight than Marvin (which has broader AI agent features). Instructor's core strength is tight Pydantic integration and automatic retry-with-feedback, which neither alternative offers at the same level of abstraction.