Preparers auto-processing example

This example uses preparer auto-processing with the ChunkText operation in AI Accelerator.

Tip

This operation transforms the shape of the data, automatically unnesting collections by introducing a part_id column. See the unnesting concept for more detail.

Preparer with table data source

-- Create source test table
CREATE TABLE source_table__1628
(
    id      INT GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
    content TEXT NOT NULL
);

SELECT aidb.create_table_preparer(
    name => 'preparer__1628',
    operation => 'ChunkText',
    source_table => 'source_table__1628',
    source_data_column => 'content',
    destination_table => 'chunked_data__1628',
    destination_data_column => 'chunk',
    source_key_column => 'id',
    destination_key_column => 'id',
    options => '{"desired_length": 150}'::JSONB  -- Configuration for the ChunkText operation
);

SELECT aidb.set_auto_preparer('preparer__1628', 'Live');

INSERT INTO source_table__1628
VALUES (1, 'This is a significantly longer text example that might require splitting into smaller chunks. The purpose of this function is to partition text data into segments of a specified maximum length, for example, this sentence 145 is characters. This enables processing or storage of data in manageable parts.');
SELECT * FROM chunked_data__1628;
Output
 id | part_id | unique_id |                                                                       chunk
----+---------+-----------+---------------------------------------------------------------------------------------------------------------------------------------------------
 1  |       0 | 1.part.0  | This is a significantly longer text example that might require splitting into smaller chunks.
 1  |       1 | 1.part.1  | The purpose of this function is to partition text data into segments of a specified maximum length, for example, this sentence 145 is characters.
 1  |       2 | 1.part.2  | This enables processing or storage of data in manageable parts.
(3 rows)
INSERT INTO source_table__1628
VALUES (2, 'This sentence is short enough for desired_length=150 so it wont be chunked. But this second sentence will be in another chunk since both together are over 150 characters.');
SELECT * FROM chunked_data__1628;
Output
 id | part_id | unique_id |                                                                       chunk
----+---------+-----------+---------------------------------------------------------------------------------------------------------------------------------------------------
 1  |       0 | 1.part.0  | This is a significantly longer text example that might require splitting into smaller chunks.
 1  |       1 | 1.part.1  | The purpose of this function is to partition text data into segments of a specified maximum length, for example, this sentence 145 is characters.
 1  |       2 | 1.part.2  | This enables processing or storage of data in manageable parts.
 2  |       0 | 2.part.0  | This sentence is short enough for desired_length=150 so it wont be chunked.
 2  |       1 | 2.part.1  | But this second sentence will be in another chunk since both together are over 150 characters.
(5 rows)
DELETE FROM source_table__1628 WHERE id = 1;
SELECT * FROM chunked_data__1628;
Output
 id | part_id | unique_id |                 chunk
----+---------+-----------+----------------------------------------
 2  |       0 | 2.part.0  | This sentence is short enough for desired_length=150 so it wont be chunked.
 2  |       1 | 2.part.1  | But this second sentence will be in another chunk since both together are over 150 characters.
(2 rows)
SELECT aidb.set_auto_preparer('preparer__1628', 'Disabled');

Could this page be better? Report a problem or suggest an addition!