Chunkers

Adapted from the Griptape AI Framework documentation.

Overview

Chunkers are used to split arbitrarily long text into chunks of certain token length. Each chunker has a tokenizer, a max token count, and a list of default separators used to split up text into TextArtifacts. Different types of chunkers provide lists of separators for specific text shapes:

TextChunker: works on most texts.
PdfChunker: works on text from PDF docs.
MarkdownChunker works on markdown text.

Here is how to use a chunker:

from griptape.chunkers import TextChunker
from griptape.tokenizers import OpenAiTokenizer

TextChunker(
    # set an optional custom tokenizer
    tokenizer=OpenAiTokenizer(model="gpt-4.1"),
    # optionally modify default number of tokens
    max_tokens=100,
).chunk("long text")

The most common use of a Chunker is to split up a long text into smaller chunks for inserting into a Vector Database when doing Retrieval Augmented Generation (RAG).

See RagEngine for more information on how to use Chunkers in RAG pipelines.

On this page
Overview

← Prev

Loaders

↑ Up

Data abstractions

Prompt

Could this page be better? Report a problem or suggest an addition!

Chunkers

Overview

← Prev

↑ Up

Next →