Best practices for Hybrid Knowledge Bases
Who is this for
Platform users creating Hybrid Knowledge Bases to power advanced AI applications combining structured and unstructured data. Typical users include developers, AI architects, data engineers, and content managers working with product catalogs, customer profiles, or mixed datasets.
What you will accomplish
You will learn best practices for designing Hybrid Knowledge Bases in Gen AI Builder:
- How to select structured and unstructured columns
- How to design a schema that supports powerful AI queries
- How to avoid common pitfalls in Hybrid Knowledge Base design
Why design Hybrid Knowledge Bases carefully
- Hybrid Knowledge Bases allow you to combine structured filters (exact matches, ranges, facets) with semantic search (vector-based search).
- A well-designed Hybrid Knowledge Base enables more accurate, flexible, and explainable AI responses.
- Poor schema design can lead to inefficient queries, incomplete search results, or wasted processing.
For background, see:
- Knowledge Bases explained
- AI Factory Concepts
- Retrieval Augmented Generation (RAG)
- Embeddings explained
To see how Hybrid Knowledge Bases are used in Hybrid Manager deployments, visit:
Complexity and time to complete
- Complexity: Moderate — schema design requires understanding your data and use cases.
- Estimated time: 15–30 minutes per Hybrid Knowledge Base.
Designing Hybrid Knowledge Bases
Understand your use case
Start by understanding what types of queries you want your AI application to support.
Examples:
- "Show products under $500 in category 'Outdoor' and rank by relevance to search terms."
- "Find policies for region 'EU' that mention data retention."
Selecting structured columns
Structured columns support:
- Exact match filters (e.g.,
region = 'EU'
) - Range queries (e.g.,
price BETWEEN 100 AND 500
) - Faceted search (e.g., list all
categories
)
Best practices:
- Use short, unambiguous column names that match your Library schema exactly.
- Choose data types carefully:
Text
for identifiers and categories (SKU, region, category)Number
for numeric ranges (price, rating)Date
for date-based filtering (release date, expiration date)Boolean
for simple yes/no filters (isActive, inStock)- Do not include freeform text fields as structured columns — they should go in unstructured columns.
Selecting unstructured columns
Unstructured columns support:
- Full-text semantic search
- Retrieval-Augmented Generation (RAG) queries
Best practices:
- Select columns containing rich text that will benefit from embedding:
- Descriptions
- Reviews
- Policies
- FAQ answers
- Detailed notes
- Avoid columns with redundant or trivial text — this can degrade search quality.
- Be intentional about how many unstructured columns you define:
- Too many dilute search results.
- Too few limit context richness.
Example schema design
Product catalog Hybrid KB
Structured columns:
SKU
(Text)Category
(Text)Price
(Number)AvailableSince
(Date)InStock
(Boolean)
Unstructured columns:
ProductName
Features
DetailedDescription
Common pitfalls
Mismatched column names
- The column names you define must match exactly the column names in your source Library (CSV headers, Google Sheet columns).
- Misspelled names or casing differences will cause ingestion errors.
Poor type selection
- Do not mark numeric fields as Text or vice versa.
- Use Date types correctly to support proper date queries.
Overloading structured columns
- Do not add long freeform text columns as structured — this will degrade filtering performance.
Underusing unstructured columns
- If your unstructured text is spread across multiple columns (e.g., FAQ question, FAQ answer), consider combining them with a Structure before ingestion, or clearly select both as unstructured.
Example scenario
You are designing a Hybrid Knowledge Base for your Product Catalog.
Example configuration:
- Structured Columns:
SKU
(Text)Category
(Text)Price
(Number)AvailableSince
(Date)- Unstructured Columns:
ProductName
DetailedDescription
Resulting queries supported:
- Filter by
Category
andPrice
range. - Sort by
AvailableSince
. - Perform semantic search across product names and descriptions.
Troubleshooting
Incomplete search results
- Verify that unstructured columns were selected and indexed correctly.
- Check that structured filters in queries are not too restrictive.
Filter not working
- Verify that the column was marked as structured and with the correct data type.
- Check that the Library schema and column names match exactly.
Ingestion errors
- Check that all column names match your source Library exactly.
- Verify that the data types of columns align with your intended schema.
Related topics
← Prev
Configure a Web Page data source
↑ Up
Gen AI How-To Guides
Next →
Manage Knowledge Bases in Gen AI Builder
Could this page be better? Report a problem or suggest an addition!