Working with Data Lake in Gen AI Builder

What is the Data Lake

The Data Lake is the foundational object storage layer for Gen AI Builder and the AI Factory. It is where Griptape-powered services store:

Without a properly configured Data Lake, core features of Gen AI Builder — including Libraries, Knowledge Bases, and Retrievers — cannot function.

For a deep dive, see: Data Lake explained.

Separation of concerns: Keeps Griptape data isolated from other object storage.
Performance: Enables optimized patterns for AI workloads.
Security: Simplifies permission scoping and CORS configuration.
Reliability: Ensures the AI indexing and retrieval pipelines have guaranteed storage availability.

Best practice: Use a dedicated bucket or container only for Gen AI Builder / Griptape.

Important: The Data Lake must be configured before using Libraries or Knowledge Bases.

The Data Lake supports the entire AI Factory pipeline:

Data Sources → Data Lake → Libraries → Knowledge Bases → Retrievers → Assistants → AI Applications

The Data Lake is the persistent storage backend that powers:

To configure the Data Lake:

See Configure the Data Lake for the full step-by-step guide.

Gen AI Builder Hub

Gen AI Builder Hub

Managing Data with Libraries in Gen AI Builder