Pipelines knowledge bases reference

This reference documentation for Pipelines knowledge bases includes information on the functions and views available in the aidb extension related to knowledge bases.

Views

aidb.knowledge_bases

Also referenceable as aidb.kbs, the aidb.knowledge_bases view provides a list of all knowledge bases in the database. It includes information about the knowledge base name, the model used, and the source data type.

ColumnTypeDescription
idinteger
nametextName of the knowledge base.
vector_schematextSchema for vector_table.
vector_tabletextName of the table where the embeddings are stored. Gets newly created if it doesn’t exist. Managed by aidb.
vector_key_columntextColumn to use to store the key that references the key in source data when computing embeddings. We recommend using the default and letting aidb manage this table.
vector_data_columntextColumn to store embeddings in. We recommend using the default and letting aidb manage this table.
model_nametextName of the model to use for embedding computation and retrievals.
topkintegerDefault number of results to return during a retrieve. Similar to LIMIT in SQL.
distance_operatoraidb.DistanceOperatorDuring retrieval, the vector operation to use to compare the vectors.
optionsjsonbUnused.
auto_processingaidb.PipelineAutoProcessingModeAuto-processing mode.
owner_roletextThe Postgres role who created this pipeline. Background auto-processing will run the pipeline as this role.
batch_sizeintegerHow many records to process concurrently when running this pipeline.
background_sync_intervalintervalUsed for background auto-processing. This is the interval between pipeline executions.
source_typetextIndicates whether this pipeline uses a table or volume as data source.
source_schematextSchema for source_table.
source_tabletextName of the table used as input for the pipeline. Unused if the knowledge base uses a volume as source.
source_data_columntextColumn name in the source table that Pipelines computes embeddings for. This is also the column that's returned in retrieve operations.
source_data_formataidb.PipelineDataFormatFormat of the data the knowledge base is working with. Uses type aidb.PipelineDataFormat.
source_key_columntextColumn to use as key for storing the embedding in the vector table. This provides a reference from the embedding to the source data.
source_volume_nametextName of the volume to use as a data source. Only applicable to knowledge bases configured with aidb.create_volume_knowledge_base().

aidb.knowledge_base_stats

Also referenceable as aidb.kbstat, the aidb.knowledge_base_stats view provides current statistics about auto processing for knowledge base pipelines.

ColumnTypeDescription
knowledge basetextName of the knowledge base.
auto processingaidb.PipelineAutoProcessingModeAuto-processing mode.
table: unprocessed rowsbigintFor table knowledge bases: The number of unprocessed rows that are new or might have changed since the last execution.
volume: scans completedbigintFor volume knowledge bases: The number of full listings of the source volume that were performed so far.
count(source records)bigintThe number of records in the source of the knowledge base. For volume knowledge bases, this is number of records seen during the last full scan.
count(embeddings)bigintThe number of embeddings stored in the destination table.

Types

aidb.DistanceOperator

The aidb.DistanceOperator type is an enum that represents the distance operators that can be used during retrieval.

ValueDescription
L2Euclidean distance
InnerInner product
CosineCosine similarity
L1L1 distance
HammingHamming distance
JaccardJaccard distance

SQL definition:

CREATE TYPE DistanceOperator AS ENUM (
  'L2',
  'InnerProduct',
  'Cosine',
  'L1',
  'Hamming',
  'Jaccard'
);

aidb.PipelineDataFormat

The aidb.PipelineDataFormat type is an enum that represents the data formats that can be used as source data.

ValueDescription
TextText data
ImageImage data
PdfPDF data

SQL definition:

CREATE TYPE PipelineDataFormat AS ENUM (
  'Text',
  'Image',
  'Pdf'
);

aidb.PipelineAutoProcessingMode

The aidb.PipelineAutoProcessingMode type is an enum used to define how auto processing for a Pipeline shoud behave (e.g. a knowledge base or a Preparer).

ValueDescription
LiveNew data is processed immediately while being added (using Postgres Triggers)
BackgroundContinuous processing in the background (using Postgres background workers)
DisabledNo automated processing
CREATE TYPE PipelineAutoProcessingMode AS ENUM (
	'Live',
	'Background',
	'Disabled'
);

Functions

aidb.create_table_knowledge_base

Creates a knowledge base for a given table.

Parameters

ParameterTypeDefaultDescription
nameTEXTRequiredName of the knowledge base.
model_nameTEXTRequiredName of the model to use.
source_tableregclassRequiredName of the table to use as source.
source_data_columnTEXTRequiredColumn name in source table to use.
source_data_formataidb.PipelineDataFormatRequiredFormat of data in that column ("Text", "Image", "PDF").
source_key_columnTEXT'id'Column to use as key to reference the rows.
vector_tableTEXTNULL
vector_data_columnTEXT'embeddings'
vector_key_columnTEXT'id'
topkINTEGER1
distance_operatoraidb.distanceoperator'L2'
optionsJSONB'{}'::JSONBUnused
auto_processingaidb.PipelineAutoProcessingMode'Disabled'Configure auto-processing for this pipeline.
index_typeTEXT'vector'Type of index to use for the vector table.
batch_sizeint100How many records to process concurrently.
background_sync_intervalinterval'30 seconds'Interval between pipeline executions if background auto-processing is configured.
Index_types
  • If index_type is set to vector, the system will automatically create a hnsw index on the vector table based on the distance operator used in the knowledge base. This is the default index type. The vector index type is only able to support 2000 dimensions or less. If more dimensions are needed, the index type should be set to disabled.

    distance_operatorindex_type
    L2vector_l2_ops
    InnerProductvector_ip_ops
    Cosinevector_cosine_ops
    L1vector_l1_ops
  • If index_type is set to ivfflat, the system will create a IVFFlat index on the vector table.

  • If index_type is set to disabled, no index will be created.

Example

SELECT aidb.create_table_knowledge_base(
               name => 'test_knowledge_base',
               model_name => 'bert',
               source_table => 'test_source_table',
               source_data_column => 'content',
               source_data_type => 'Text'
       );

aidb.create_volume_knowledge_base

Creates a knowledge base for a given PGFS volume.

Parameters

ParameterTypeDefaultDescription
nameTEXTRequiredName of the knowledge base
model_nameTEXTRequiredName of the model
source_volume_nameTEXTRequiredName of the volume
vector_tableTEXTNULLName of the vector table
vector_data_columnTEXT'embeddings'Name of the vector column
vector_key_columnTEXT'id'Name of the key column
topkINTEGER1Number of results to return
distance_operatoraidb.distanceoperator'L2'Distance operator
optionsJSONB'{}'::JSONBOptions
auto_processingaidb.PipelineAutoProcessingMode'Disabled'Configure auto-processing for this pipeline.
index_typeTEXT'vector'Type of index to use for the vector table.
batch_sizeint100How many records to process concurrently.
background_sync_intervalinterval'30 seconds'Interval between pipeline executions if background auto-processing is configured.
Index_types
  • If index_type is set to vector, the system will automatically create a hnsw index on the vector table based on the distance operator used in the knowledge base. This is the default index type. The vector index type is only able to support 2000 dimensions or less. If more dimensions are needed, the index type should be set to disabled.

    distance_operatorindex_type
    L2vector_l2_ops
    InnerProductvector_ip_ops
    Cosinevector_cosine_ops
    L1vector_l1_ops
  • If index_type is set to ivfflat, the system will create a IVFFlat index on the vector table.

  • If index_type is set to disabled, no index will be created.

Example

SELECT aidb.create_volume_knowledge_base(
               name => 'demo_vol_knowledge_base',
               model_name => 'simple_model',
               source_volume_name => 'demo_bucket_vol'
       );

aidb.set_auto_knowledge_base

Sets the auto-processing mode for this knowledge base.

Parameters

ParameterTypeDefaultDescription
knowledge_base_nameTEXTName of knowledge base for which to enable auto-processing
modeaidb.PipelineAutoProcessingModeDesired auto-processing mode
batch_sizeINTEGER DEFAULT NULLHow many records to process in one batch in Disabled (aka. Manual) or Background processing
background_sync_intervalINTERVAL DEFAULT NULLDesired sync interval for background processing

Example

SELECT aidb.set_auto_knowledge_base('test_knowledge_base', 'Live');
SELECT aidb.set_auto_knowledge_base('test_knowledge_base', 'Disabled', batch_size => 100);
SELECT aidb.set_auto_knowledge_base('test_knowledge_base', 'Background');
SELECT aidb.set_auto_knowledge_base('test_knowledge_base', 'Background', batch_size => 100, background_sync_interval => '10 seconds');

aidb.bulk_embedding

Compute the embeddings for the source records in this pipeline and store them in vector destination.

The behavior of this function depends on the configured auto_processing mode:

  • Live and Disabled auto-processing: Directly process all source records.
  • Background auto-processing: Mark all source records for processing but don't perform the operation. The background worker will perform it.

Parameters

ParameterTypeDefaultDescription
knowledge_base_nameTEXTName of knowledge base for which to generate embeddings.
silentBOOLfalseDisable printing status and progress logs.

Example

aidb=# SELECT aidb.bulk_embedding('kb_volume_image_manual');
Output
INFO:  kb_volume_image_manual: (re)setting state table to process all data...
INFO:  kb_volume_image_manual: Starting... Batch size 10, count(known source records): 0, scans completed: 0, count(embeddings): 0
INFO:  kb_volume_image_manual: Batch iteration finished, count(known source records): 10, scans completed: 0, count(embeddings): 10
[...]
INFO:  kb_volume_image_manual: Batch iteration finished, count(known source records): 177, scans completed: 0, count(embeddings): 177
INFO:  kb_volume_image_manual: finished, count(known source records): 177, scans completed: 1, count(embeddings): 177

aidb.retrieve_key

Retrieves a key from matching embeddings without looking up the source data.

Parameters

ParameterTypeDefaultDescription
knowledge_base_nameTEXTName of knowledge base to use for retrieval
query_stringTEXTQuery string to use for retrieval
number_of_resultsINTEGER0Number of results to return

Example

SELECT * FROM aidb.retrieve_key('test_knowledge_base', 'shoes', 2);
Output
key  |      distance
-------+--------------------
 43941 | 0.2938963414490189
 19337 | 0.3023805122617119
(2 rows)

aidb.retrieve_text

Retrieves the source text data from matching embeddings by joining the embeddings with the source table.

Parameters

ParameterTypeDefaultDescription
knowledge_base_nameTEXTName of knowledge base to use for retrieval
query_stringTEXTQuery string to use for retrieval
number_of_resultsINTEGER0Number of results to return

Returns

ColumnTypeDescription
keytextKey of the retrieved data
valuetextValue of the retrieved data
distancedouble precisionDistance of the retrieved data from the query

Example

SELECT * FROM aidb.retrieve_text('test_knowledge_base', 'jacket', 2);
Output
key  |                       value                        |      distance
-------+----------------------------------------------------+--------------------
 19337 | United Colors of Benetton Men Stripes Black Jacket | 0.2994317672742334
 55018 | Lakme 3 in 1 Orchid  Aqua Shine Lip Color          | 0.3804609668507203
(2 rows)

aidb.delete_knowledge_base

Deletes only the knowledge base's configuration from the database.

Parameters

ParameterTypeDefaultDescription
knowledge_base_nameTEXTName of knowledge base to delete

Example

select aidb.delete_knowledge_base('test_knowledge_base');
Output
 delete_knowledge_base
------------------

(1 row)

aidb.create_volume

Creates a volume from a PGFS storage location for use as a data source in knowledge bases.

Parameters

ParameterTypeDefaultDescription
nameTEXTName of the volume to create
server_nameTEXTName of the storage location to use for the volume
pathTEXTPath to the volume in the storage location
mime_typeTEXTType of data in the volume (Text or Image)

Example

select aidb.create_volume('demo_bucket_vol', 'demo_bucket', 'demo_bucket/demo_folder', 'Text');
Output
 create_volume
---------------
(1 row)

aidb.list_volumes

Lists all the volumes that were created in the database.

Example

select * from aidb.list_volumes();

aidb.delete_volume

Deletes a volume from the database.

Parameters

ParameterTypeDefaultDescription
volume_nameTEXTName of the volume to delete

Example

select aidb.delete_volume('demo_bucket_vol');
Output
 delete_volume
-------------
(1 row)

Could this page be better? Report a problem or suggest an addition!