Extraction Engines
Adapted from the Griptape AI Framework documentation.
Overview
Extraction Engines enable the extraction of structured data from unstructured text. These Engines can be used with Structures via Extraction Tasks. As of now, Gen AI Builder supports two types of Extraction Engines: the CSV Extraction Engine and the JSON Extraction Engine.
CSV
The CSV Extraction Engine extracts comma separated values from unstructured text.
from griptape.drivers.prompt.openai import OpenAiChatPromptDriver from griptape.engines import CsvExtractionEngine # Initialize the CsvExtractionEngine instance csv_engine = CsvExtractionEngine( prompt_driver=OpenAiChatPromptDriver(model="gpt-3.5-turbo"), column_names=["name", "age", "location"] ) # Define some unstructured data sample_text = """ Alice, 28, lives in New York. Bob, 35 lives in California Charlie is 40 Collin is 28 and lives in San Francisco """ # Extract CSV rows using the engine result = csv_engine.extract_text(sample_text) for row in result: print(row.to_text())
name,age,location Alice,28,New York Bob,35,California Charlie,40, Collin,28,San Francisco
name,age,location Alice,28,New York Bob,35,California Charlie,40, Collin,28,San Francisco
JSON
The JSON Extraction Engine extracts JSON-formatted values from unstructured text.
import json from schema import Literal, Schema from griptape.artifacts import ListArtifact from griptape.chunkers import TextChunker from griptape.drivers.prompt.openai import OpenAiChatPromptDriver from griptape.engines import JsonExtractionEngine from griptape.loaders import WebLoader # Define a schema for extraction json_engine = JsonExtractionEngine( prompt_driver=OpenAiChatPromptDriver(model="gpt-4.1"), template_schema=Schema( { Literal("model", description="Name of an LLM model."): str, Literal("notes", description="Any notes of substance about the model."): Schema([str]), } ).json_schema("ProductSchema"), ) # Load data from the web web_data = WebLoader().load("https://en.wikipedia.org/wiki/Large_language_model") chunks = TextChunker().chunk(web_data) # Extract data using the engine result = json_engine.extract_artifacts(ListArtifact(chunks)) for artifact in result: print(json.dumps(artifact.value, indent=2))
{ "model": "GPT-1", "notes": [ "Introduced in 2018.", "Considered the first LLM despite having only 0.117 billion parameters." ] } { "model": "GPT-2", "notes": [ "Released in 2019.", "Caught widespread attention because OpenAI initially deemed it too powerful to release publicly." ] } { "model": "GPT-3", "notes": [ "Released in 2020.", "Available only via API with no offering of downloading the model to execute locally." ] } { "model": "ChatGPT", "notes": [ "Released in 2022.", "Consumer-facing browser-based model that captured the imaginations of the general population." ] } { "model": "GPT-4", "notes": [ "Released in 2023.", "Praised for increased accuracy and multimodal capabilities.", "OpenAI did not reveal the high-level architecture and the number of parameters." ] } { "model": "OpenAI o1", "notes": [ "Released in 2024.", "Reasoning model that generates long chains of thought before returning a final answer." ] } { "model": "DeepSeek R1", "notes": [ "Released in January 2025.", "671-billion-parameter open-weight model that performs comparably to OpenAI o1 but at a much lower cost." ] } { "model": "BERT", "notes": [ "Introduced in 2018.", "Encoder-only model.", "Usage began to decline in 2023 due to improvements in decoder-only models." ] } { "model": "Mistral 7B and Mixtral 8x7b", "notes": [ "Models by Mistral AI.", "Have the more permissive Apache License." ] } { "model": "Pixtral 12B", "notes": [ "Introduced by Mistral in September 2024.", "First multimodal model by Mistral." ] }
{ "model": "GPT-3.5", "notes": [ "Part of OpenAI's GPT series.", "Used in ChatGPT and Microsoft Copilot." ] } { "model": "GPT-4", "notes": [ "Part of OpenAI's GPT series.", "Praised for increased accuracy and multimodal capabilities.", "Architecture and number of parameters not revealed." ] } ...Output truncated for brevity...
Could this page be better? Report a problem or suggest an addition!