Uses an AI model to extract structured data from unstructured text โ such as emails, articles, invoices, or PDF content. You define a JSON Schema describing the fields you want; the model reads the source text and fills in the schema. The result is a clean, structured object ready for use in downstream steps.
schema_extractorA huge amount of valuable information lives in unstructured text: customer emails, support tickets, scanned invoices, meeting notes, news articles. The Schema Extractor node bridges the gap between free-form prose and structured data your workflow can act on.
You provide two things: the source text (from a previous step or the trigger), and a JSON Schema that describes the shape of the output you want. The AI model reads the text, finds the relevant information, and returns a structured object matching your schema.
This is more specialised โ and more reliable โ than asking a general Agent node to extract data via JSON Mode. The Schema Extractor is purpose-built for extraction tasks, handling edge cases like missing fields, varied formatting, and ambiguous values more robustly than a general agent prompt.
| Field | Status | Description |
|---|---|---|
| Provider | Required | The AI Provider to use for extraction. Any provider with a capable language model works. For best results, use a model known for instruction-following (e.g. GPT-4o, Claude). |
| Source Text | Required | The text to extract from. Supports {{ variable }} references โ for example {{ email_trigger.output.body }}, {{ pdf_reader.output.content }}, or {{ agent.output.text }}. |
| Schema | Required | A JSON Schema object that defines the expected output structure. Write this as a valid JSON object in the field. The model will attempt to fill every property defined in the schema from the source text. |
| Field | Type | Description |
|---|---|---|
| Schema fields | varies | Each property defined in your schema becomes a top-level field in the output object. For example, if your schema defines vendor, total, and line_items, those three keys will be present in the output. |
raw_text | string | The original source text that was passed to the extractor. Useful for debugging or audit trails. |
A workflow receives a PDF invoice via a file-watch trigger, extracts text from it, and then uses the Schema Extractor to pull out the key financial fields.
{{ read_pdf.output.content }}.
{{ extract_invoice.output.total }}, {{ extract_invoice.output.vendor_name }}, etc. in downstream steps to create database records, send notifications, or trigger approval workflows.
description hints in your schema. Adding a description property to each schema field significantly improves extraction accuracy. The model uses these hints to understand what each field means and where to look for it in the text.
required. If a field might legitimately not appear in every document (e.g. a discount on an invoice), do not mark it as required. Optional fields that are absent will be returned as null, which is easier to handle downstream than a failed extraction.