Extraction Engines

Adapted from the Griptape AI Framework documentation.

Overview

Extraction Engines enable the extraction of structured data from unstructured text. These Engines can be used with Structures via Extraction Tasks. As of now, Gen AI Builder supports two types of Extraction Engines: the CSV Extraction Engine and the JSON Extraction Engine.

CSV

The CSV Extraction Engine extracts comma separated values from unstructured text.

Code
Logs

from griptape.drivers.prompt.openai import OpenAiChatPromptDriver
from griptape.engines import CsvExtractionEngine

# Initialize the CsvExtractionEngine instance
csv_engine = CsvExtractionEngine(
    prompt_driver=OpenAiChatPromptDriver(model="gpt-3.5-turbo"), column_names=["name", "age", "location"]
)

# Define some unstructured data
sample_text = """
Alice, 28, lives in New
York.
Bob, 35 lives in California
Charlie is 40
Collin is 28 and lives in San Francisco
"""

# Extract CSV rows using the engine
result = csv_engine.extract_text(sample_text)

for row in result:
    print(row.to_text())

name,age,location
Alice,28,New York
Bob,35,California
Charlie,40,
Collin,28,San Francisco

name,age,location
Alice,28,New York
Bob,35,California
Charlie,40,
Collin,28,San Francisco

JSON

The JSON Extraction Engine extracts JSON-formatted values from unstructured text.

Code
Logs

import json

from schema import Literal, Schema

from griptape.artifacts import ListArtifact
from griptape.chunkers import TextChunker
from griptape.drivers.prompt.openai import OpenAiChatPromptDriver
from griptape.engines import JsonExtractionEngine
from griptape.loaders import WebLoader

# Define a schema for extraction
json_engine = JsonExtractionEngine(
    prompt_driver=OpenAiChatPromptDriver(model="gpt-4.1"),
    template_schema=Schema(
        {
            Literal("model", description="Name of an LLM model."): str,
            Literal("notes", description="Any notes of substance about the model."): Schema([str]),
        }
    ).json_schema("ProductSchema"),
)

# Load data from the web
web_data = WebLoader().load("https://en.wikipedia.org/wiki/Large_language_model")
chunks = TextChunker().chunk(web_data)


# Extract data using the engine
result = json_engine.extract_artifacts(ListArtifact(chunks))

for artifact in result:
    print(json.dumps(artifact.value, indent=2))

{
  "model": "GPT-1",
  "notes": [
    "Introduced in 2018.",
    "Considered the first LLM despite having only 0.117 billion parameters."
  ]
}
{
  "model": "GPT-2",
  "notes": [
    "Released in 2019.",
    "Caught widespread attention because OpenAI initially deemed it too powerful to release publicly."
  ]
}
{
  "model": "GPT-3",
  "notes": [
    "Released in 2020.",
    "Available only via API with no offering of downloading the model to execute locally."
  ]
}
{
  "model": "ChatGPT",
  "notes": [
    "Released in 2022.",
    "Consumer-facing browser-based model that captured the imaginations of the general population."
  ]
}
{
  "model": "GPT-4",
  "notes": [
    "Released in 2023.",
    "Praised for increased accuracy and multimodal capabilities.",
    "OpenAI did not reveal the high-level architecture and the number of parameters."
  ]
}
{
  "model": "OpenAI o1",
  "notes": [
    "Released in 2024.",
    "Reasoning model that generates long chains of thought before returning a final answer."
  ]
}
{
  "model": "DeepSeek R1",
  "notes": [
    "Released in January 2025.",
    "671-billion-parameter open-weight model that performs comparably to OpenAI o1 but at a much lower cost."
  ]
}
{
  "model": "BERT",
  "notes": [
    "Introduced in 2018.",
    "Encoder-only model.",
    "Usage began to decline in 2023 due to improvements in decoder-only models."
  ]
}
{
  "model": "Mistral 7B and Mixtral 8x7b",
  "notes": [
    "Models by Mistral AI.",
    "Have the more permissive Apache License."
  ]
}
{
  "model": "Pixtral 12B",
  "notes": [
    "Introduced by Mistral in September 2024.",
    "First multimodal model by Mistral."
  ]
}

{
  "model": "GPT-3.5",
  "notes": [
    "Part of OpenAI's GPT series.",
    "Used in ChatGPT and Microsoft Copilot."
  ]
}
{
  "model": "GPT-4",
  "notes": [
    "Part of OpenAI's GPT series.",
    "Praised for increased accuracy and multimodal capabilities.",
    "Architecture and number of parameters not revealed."
  ]
}
...Output truncated for brevity...

On this page
Overview
CSV
JSON

← Prev

RAG Engine

↑ Up

RAG Engine

Summary Engine

Could this page be better? Report a problem or suggest an addition!

Extraction Engines

Overview

CSV

JSON

← Prev

↑ Up

Next →