Chunkers Innovation Release
This documentation covers the current Innovation Release of
EDB Postgres AI. You may also want the docs for the current LTS version.
Overview
Chunkers are used to split arbitrarily long text into chunks of certain token length. Each chunker has a tokenizer, a max token count, and a list of default separators used to split up text into TextArtifacts. Different types of chunkers provide lists of separators for specific text shapes:
- TextChunker: works on most texts.
- PdfChunker: works on text from PDF docs.
- MarkdownChunker works on markdown text.
Here is how to use a chunker:
from griptape.chunkers import TextChunker from griptape.tokenizers import OpenAiTokenizer TextChunker( # set an optional custom tokenizer tokenizer=OpenAiTokenizer(model="gpt-4.1"), # optionally modify default number of tokens max_tokens=100, ).chunk("long text")
The most common use of a Chunker is to split up a long text into smaller chunks for inserting into a Vector Database when doing Retrieval Augmented Generation (RAG).
See RagEngine for more information on how to use Chunkers in RAG pipelines.
- On this page
- Overview