Working with Data Lake in Gen AI Builder
What is the Data Lake
The Data Lake is the foundational object storage layer for Gen AI Builder and the AI Factory. It is where Griptape-powered services store:
- Uploaded files from Data Sources
- Indexed data and embeddings
- Griptape Structures and Tools
- Temporary artifacts used in AI workflows
Without a properly configured Data Lake, core features of Gen AI Builder — including Libraries, Knowledge Bases, and Retrievers — cannot function.
For a deep dive, see: Data Lake explained.
Why use a dedicated Data Lake
- Separation of concerns: Keeps Griptape data isolated from other object storage.
- Performance: Enables optimized patterns for AI workloads.
- Security: Simplifies permission scoping and CORS configuration.
- Reliability: Ensures the AI indexing and retrieval pipelines have guaranteed storage availability.
Best practice: Use a dedicated bucket or container only for Gen AI Builder / Griptape.
When to configure the Data Lake
- When setting up a new Griptape deployment.
- When connecting an external storage backend to Gen AI Builder.
- Before adding Data Sources or creating Knowledge Bases.
Important: The Data Lake must be configured before using Libraries or Knowledge Bases.
How does the Data Lake fit into the content pipeline
The Data Lake supports the entire AI Factory pipeline:
Data Sources → Data Lake → Libraries → Knowledge Bases → Retrievers → Assistants → AI Applications
The Data Lake is the persistent storage backend that powers:
- Data Sources of type Data Lake
- Content staging during Library and Knowledge Base creation
- Storage for Griptape Structures and Tools
Getting started
To configure the Data Lake:
- Provision an object storage bucket (S3-compatible or GCS).
- Configure CORS to enable UI interaction.
- Obtain and provide required credentials to your deployment.
See Configure the Data Lake for the full step-by-step guide.
Related topics
Could this page be better? Report a problem or suggest an addition!