Best practices for Hybrid Knowledge Bases

Who is this for

Platform users creating Hybrid Knowledge Bases to power advanced AI applications combining structured and unstructured data. Typical users include developers, AI architects, data engineers, and content managers working with product catalogs, customer profiles, or mixed datasets.

What you will accomplish

You will learn best practices for designing Hybrid Knowledge Bases in Gen AI Builder:

  • How to select structured and unstructured columns
  • How to design a schema that supports powerful AI queries
  • How to avoid common pitfalls in Hybrid Knowledge Base design

Why design Hybrid Knowledge Bases carefully

  • Hybrid Knowledge Bases allow you to combine structured filters (exact matches, ranges, facets) with semantic search (vector-based search).
  • A well-designed Hybrid Knowledge Base enables more accurate, flexible, and explainable AI responses.
  • Poor schema design can lead to inefficient queries, incomplete search results, or wasted processing.

For background, see:

To see how Hybrid Knowledge Bases are used in Hybrid Manager deployments, visit:

Complexity and time to complete

  • Complexity: Moderate — schema design requires understanding your data and use cases.
  • Estimated time: 15–30 minutes per Hybrid Knowledge Base.

Designing Hybrid Knowledge Bases

Understand your use case

Start by understanding what types of queries you want your AI application to support.

Examples:

  • "Show products under $500 in category 'Outdoor' and rank by relevance to search terms."
  • "Find policies for region 'EU' that mention data retention."

Selecting structured columns

Structured columns support:

  • Exact match filters (e.g., region = 'EU')
  • Range queries (e.g., price BETWEEN 100 AND 500)
  • Faceted search (e.g., list all categories)

Best practices:

  • Use short, unambiguous column names that match your Library schema exactly.
  • Choose data types carefully:
  • Text for identifiers and categories (SKU, region, category)
  • Number for numeric ranges (price, rating)
  • Date for date-based filtering (release date, expiration date)
  • Boolean for simple yes/no filters (isActive, inStock)
  • Do not include freeform text fields as structured columns — they should go in unstructured columns.

Selecting unstructured columns

Unstructured columns support:

  • Full-text semantic search
  • Retrieval-Augmented Generation (RAG) queries

Best practices:

  • Select columns containing rich text that will benefit from embedding:
  • Descriptions
  • Reviews
  • Policies
  • FAQ answers
  • Detailed notes
  • Avoid columns with redundant or trivial text — this can degrade search quality.
  • Be intentional about how many unstructured columns you define:
  • Too many dilute search results.
  • Too few limit context richness.

Example schema design

Product catalog Hybrid KB

Structured columns:

  • SKU (Text)
  • Category (Text)
  • Price (Number)
  • AvailableSince (Date)
  • InStock (Boolean)

Unstructured columns:

  • ProductName
  • Features
  • DetailedDescription

Common pitfalls

Mismatched column names

  • The column names you define must match exactly the column names in your source Library (CSV headers, Google Sheet columns).
  • Misspelled names or casing differences will cause ingestion errors.

Poor type selection

  • Do not mark numeric fields as Text or vice versa.
  • Use Date types correctly to support proper date queries.

Overloading structured columns

  • Do not add long freeform text columns as structured — this will degrade filtering performance.

Underusing unstructured columns

  • If your unstructured text is spread across multiple columns (e.g., FAQ question, FAQ answer), consider combining them with a Structure before ingestion, or clearly select both as unstructured.

Example scenario

You are designing a Hybrid Knowledge Base for your Product Catalog.

Example configuration:

  • Structured Columns:
  • SKU (Text)
  • Category (Text)
  • Price (Number)
  • AvailableSince (Date)
  • Unstructured Columns:
  • ProductName
  • DetailedDescription

Resulting queries supported:

  • Filter by Category and Price range.
  • Sort by AvailableSince.
  • Perform semantic search across product names and descriptions.

Troubleshooting

Incomplete search results

  • Verify that unstructured columns were selected and indexed correctly.
  • Check that structured filters in queries are not too restrictive.

Filter not working

  • Verify that the column was marked as structured and with the correct data type.
  • Check that the Library schema and column names match exactly.

Ingestion errors

  • Check that all column names match your source Library exactly.
  • Verify that the data types of columns align with your intended schema.

Could this page be better? Report a problem or suggest an addition!