Pipelines knowledge bases usage

Creating a new knowledge base configuration

There are two types of knowledge bases: table and volume. Given the different nature of the data sources and the options required for each, there are different functions to create them.

Knowledge base for a table data source

The aidb.create_table_knowledge_base() function is used to create a knowledge base for a table data source. Below you can find a minimal example. See aidb.create_table_knowledge_base reference for a full list of options.

Example: Creating a knowledge base

SELECT aidb.create_table_knowledge_base(
               name => 'test_knowledge_base',
               model_name => 'bert',
               source_table => 'test_source_table',
               source_data_column => 'content',
               source_data_format => 'Text'
       );

If you're using only Postgres tables, skip to Creating the embeddings.

If you're using external data sources, you need to create a volume and create a knowledge base for it, which is explained next.

Knowledge base for a volume data source

Creating a new volume

Before you can create a knowledge base for a volume, you need to create a volume and a PGFS storage location. See volumes for more detail.

Here, we create a PGFS storage location for a public S3 bucket:

SELECT pgfs.create_storage_location('aidb_text_bucket', 's3://aidb-rag-app',
               options => '{"region": "eu-central-1", "skip_signature": "true"}'
       );

The aidb.create_volume function is used to create a volume.

Example: Creating a volume

SELECT aidb.create_volume('aidb_text', 'aidb_text_bucket', '/', 'Text');

The server_name comes from calling PGFS functions to create a storage location pgfs.create_storage_location. The path is the path to the data in the storage location. The last argument indicates the data format; Text or Image.

Creating a knowledge base for a volume

The aidb.create_volume_knowledge_base() function is used to create a knowledge base for a volume data source. Below you can find a minimal example. See aidb.create_volume_knowledge_base reference for a full list of options.

Example: Creating a knowledge base for a volume

SELECT aidb.create_volume_knowledge_base(
               name => 'test_knowledge_base_volume',
               model_name => 'bert',
               source_volume_name => 'aidb_text'
       );

Creating the embeddings

You can use bulk embedding to manually run embedding computation:

SELECT aidb.bulk_embedding('test_knowledge_base');

and for the volume example:

SELECT aidb.bulk_embedding('test_knowledge_base_volume');

Enable live auto-processing (automated embeddings) for any future changes:

SELECT aidb.set_auto_knowledge_base('test_knowledge_base', 'Live');

You can disable auto-processing as well (It is 'Disabled' by default when creating new knowledge bases):

SELECT aidb.set_auto_knowledge_base('test_knowledge_base', 'Disabled');

Retrieving

A basic key knowledge base, aidb.retrieve_key is available. It doesn't look up the source data but returns the ID/key of the matching embeddings.

Retrieving the key

aidb.retrieve_key(<knowledge base name>, <query string>, <optional number of results>);

Example: Retrieving the key

SELECT * FROM aidb.retrieve_key('test_knowledge_base', 'shoes', 2);
Output
key  |      distance
-------+--------------------
 43941 | 0.2938963414490189
 19337 | 0.3023805122617119
(2 rows)

You can use this command if you want to do a join/lookup yourself based on the key. For knowledge bases with external (volume) data sources, this capability is especially useful. Usually the application itself wants to do the retrieval from the external data source. Or you might want to push down the actual retrieval to a client application.

Retrieving the text

The retrieve_text function joins the embeddings with the source data and directly returns the results:

aidb.retrieve_text(<knowledge base name>, <query string>, <optional number of results>);

Example

SELECT * FROM aidb.retrieve_text('test_knowledge_base', 'jacket', 2);
Output
key  |                       value                        |      distance
-------+----------------------------------------------------+--------------------
 19337 | United Colors of Benetton Men Stripes Black Jacket | 0.2994317672742334
 55018 | Lakme 3 in 1 Orchid  Aqua Shine Lip Color          | 0.3804609668507203
(2 rows)

Listing the knowledge bases

A view is available that lists all the knowledge_bases. aidb.knowledge_bases also includes some of the knowledge bases configuration:

SELECT * FROM aidb.knowledge_bases;
Output
 id |        name         |     vector_table_name      | vector_table_key_column | vector_table_vector_column |  model_name  | topk | distance_operator | options | source_table_name | source_table_data_column | source_table_data_column_type | source_table_key_column | source_volume_name
----+---------------------+----------------------------+-------------------------+----------------------------+--------------+------+-------------------+---------+-------------------+--------------------------+-------------------------------+-------------------------+--------------------
  2 | test_knowledge_base      | test_knowledge_base_vector      | id                      | embeddings                 | simple_model |    5 | InnerProduct      | {}      | test_source_table | content                  | Text                          | id                      |
  5 | test_knowledge_base_cosa | test_knowledge_base_cosa_vector | id                      | embeddings                 | simple_model |    1 | L2                | {}      | test_source_table | content                  | Text                          | id                      |
  3 | test_knowledge_base_cos  | test_knowledge_base_cos_vector  | id                      | embeddings                 | simple_model |    5 | Cosine            | {}      | test_source_table | content                  | Text                          | id                      |
(3 rows)

We recommend that you select only the columns you're interested in:

SELECT name, source_table_name FROM aidb.knowledge_bases;
Output
        name         | source_table_name
---------------------+-------------------
 test_knowledge_base      | test_source_table
 test_knowledge_base_cos  | test_source_table
 test_knowledge_base_cosa | test_source_table
(3 rows)

Deleting a knowledge base

This command doesn't delete the vector table or anything else. It only deletes the configuration:

SELECT aidb.delete_knowledge_base(<name>);

End-to-end example

This example is an end-to-end example for a table/text knowledge base.


Could this page be better? Report a problem or suggest an addition!