Working with AI data stored in S3-compatible object storage
The following examples demonstrate how to use the aidb functions with S3-compatible object storage. You can use the following examples as is, because they use a publicly accessible example S3 bucket. Or you can prepare your own S3 compatible object storage bucket with some test data and try the steps in this section with that data.
These examples also use image data and an appropriate image encoder LLM instead of text data. You could, though, use plain text data on object storage similar to the examples in Working with AI data in Postgres.
Creating a retriever
Start by creating a retriever for images stored on s3-compatible object storage as the source using the aidb.create_s3_retriever
function.
- The
retriever_name
is used to identify and reference the retriever; set it toimage_embeddings
for this example. - The
schema_name
is the schema where the source table is located. - The
model_name
is the name of the embeddings encoder model for similarity data; set it toclip-vit-base-patch32
to use the open encoder model for image data from HuggingFace. - The
data_type
is the type of data in the source table, which could be eitherimg
ortext
; set it toimg
. - The
bucket_name
is the name of the S3 bucket where the data is stored; set this totorsten
. - The
prefix
is the prefix of the objects in the bucket; set this to an empty string because you want all the objects in that bucket. - The
endpoint_url
is the URL of the S3 endpoint; set that tohttps://s3.us-south.cloud-object-storage.appdomain.cloud
to access the public example bucket.
This gives the following SQL command:
Refreshing the retriever
Next, run the aidb.refresh_retriever
function.
Retrieving data
Finally, run the aidb.retrieve_via_s3
function with the required parameters to retrieve the top K most relevant (most similar) AI data items. Be aware that the object type is currently limited to image and text files. The syntax for aidb.retrieve_via_s3
is:
- The
retriever_name
is used to identify and reference the retriever; set it toimage_embeddings
for this example. - The
topk
is the number of most relevant data items to retrieve; set this to 1. - The
bucket
is the name of the S3 bucket where the data is stored. - The
object
is the name of the object in the bucket. - The
endpoint_url
is the URL of the S3 endpoint.
Run the aidb.retrieve_via_s3
function with the required parameters to retrieve the top K most relevant (most similar) AI data items. Be aware that the object type is currently limited to image and text files.
Could this page be better? Report a problem or suggest an addition!