Chunkers
Adapted from the Griptape AI Framework documentation.
Overview
Chunkers are used to split arbitrarily long text into chunks of certain token length. Each chunker has a tokenizer, a max token count, and a list of default separators used to split up text into TextArtifacts. Different types of chunkers provide lists of separators for specific text shapes:
- TextChunker: works on most texts.
- PdfChunker: works on text from PDF docs.
- MarkdownChunker works on markdown text.
Here is how to use a chunker:
from griptape.chunkers import TextChunker from griptape.tokenizers import OpenAiTokenizer TextChunker( # set an optional custom tokenizer tokenizer=OpenAiTokenizer(model="gpt-4.1"), # optionally modify default number of tokens max_tokens=100, ).chunk("long text")
The most common use of a Chunker is to split up a long text into smaller chunks for inserting into a Vector Database when doing Retrieval Augmented Generation (RAG).
See RagEngine for more information on how to use Chunkers in RAG pipelines.
- On this page
- Overview
Could this page be better? Report a problem or suggest an addition!