Chunkers Innovation Release

Suggest edits

This documentation covers the current Innovation Release of EDB Postgres AI. You may also want the docs for the current LTS version.

Overview

Chunkers are used to split arbitrarily long text into chunks of certain token length. Each chunker has a tokenizer, a max token count, and a list of default separators used to split up text into TextArtifacts. Different types of chunkers provide lists of separators for specific text shapes:

TextChunker: works on most texts.
PdfChunker: works on text from PDF docs.
MarkdownChunker works on markdown text.

Here is how to use a chunker:

from griptape.chunkers import TextChunker
from griptape.tokenizers import OpenAiTokenizer

TextChunker(
    # set an optional custom tokenizer
    tokenizer=OpenAiTokenizer(model="gpt-4.1"),
    # optionally modify default number of tokens
    max_tokens=100,
).chunk("long text")

The most common use of a Chunker is to split up a long text into smaller chunks for inserting into a Vector Database when doing Retrieval Augmented Generation (RAG).

See RagEngine for more information on how to use Chunkers in RAG pipelines.

On this page
Overview

← Prev

Loaders

↑ Up

Data abstractions

Prompt

Chunkers Innovation Release

Overview

← Prev

↑ Up

Next →