Preparers auto-processing example
This example uses preparer auto-processing with the ChunkText operation in AI Accelerator.
Tip
This operation transforms the shape of the data, automatically unnesting collections by introducing a part_id
column. See the unnesting concept for more detail.
Preparer with table data source
-- Create source test table CREATE TABLE source_table__1628 ( id INT GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY, content TEXT NOT NULL ); SELECT aidb.create_table_preparer( name => 'preparer__1628', operation => 'ChunkText', source_table => 'source_table__1628', source_data_column => 'content', destination_table => 'chunked_data__1628', destination_data_column => 'chunk', source_key_column => 'id', destination_key_column => 'id', options => '{"desired_length": 150}'::JSONB -- Configuration for the ChunkText operation ); SELECT aidb.set_auto_preparer('preparer__1628', 'Live'); INSERT INTO source_table__1628 VALUES (1, 'This is a significantly longer text example that might require splitting into smaller chunks. The purpose of this function is to partition text data into segments of a specified maximum length, for example, this sentence 145 is characters. This enables processing or storage of data in manageable parts.');
SELECT * FROM chunked_data__1628;
Output
id | part_id | unique_id | chunk ----+---------+-----------+--------------------------------------------------------------------------------------------------------------------------------------------------- 1 | 0 | 1.part.0 | This is a significantly longer text example that might require splitting into smaller chunks. 1 | 1 | 1.part.1 | The purpose of this function is to partition text data into segments of a specified maximum length, for example, this sentence 145 is characters. 1 | 2 | 1.part.2 | This enables processing or storage of data in manageable parts. (3 rows)
INSERT INTO source_table__1628 VALUES (2, 'This sentence is short enough for desired_length=150 so it wont be chunked. But this second sentence will be in another chunk since both together are over 150 characters.');
SELECT * FROM chunked_data__1628;
Output
id | part_id | unique_id | chunk ----+---------+-----------+--------------------------------------------------------------------------------------------------------------------------------------------------- 1 | 0 | 1.part.0 | This is a significantly longer text example that might require splitting into smaller chunks. 1 | 1 | 1.part.1 | The purpose of this function is to partition text data into segments of a specified maximum length, for example, this sentence 145 is characters. 1 | 2 | 1.part.2 | This enables processing or storage of data in manageable parts. 2 | 0 | 2.part.0 | This sentence is short enough for desired_length=150 so it wont be chunked. 2 | 1 | 2.part.1 | But this second sentence will be in another chunk since both together are over 150 characters. (5 rows)
DELETE FROM source_table__1628 WHERE id = 1;
SELECT * FROM chunked_data__1628;
Output
id | part_id | unique_id | chunk ----+---------+-----------+---------------------------------------- 2 | 0 | 2.part.0 | This sentence is short enough for desired_length=150 so it wont be chunked. 2 | 1 | 2.part.1 | But this second sentence will be in another chunk since both together are over 150 characters. (2 rows)
SELECT aidb.set_auto_preparer('preparer__1628', 'Disabled');
- On this page
- Preparer with table data source
← Prev
Preparer chaining example
↑ Up
AI Accelerator Pipelines - Preparers examples
Next →
Preparers chunk text operation examples
Could this page be better? Report a problem or suggest an addition!