Limitations
Blocking operations
Preparer pipelines currently don't support asynchronous operations. All operations are blocking and take place in the foreground. When executing bulk data preparation, the system will be blocked until the embedding is complete for all the items currently in the database or storage.
Observability
Preparer pipelines currently have limited observability. Refer to the Postgres server logs and output of the commands. There isn't a single pane of glass for monitoring the entire system.
Large documents
Pipelines currently doesn't handle chunking large documents.
Data filtering
While Pipelines can limit embedded documents using SQL filters and views based on the content of rows, it currently doesn't support filtering on data in S3 storage. It's limited to using subpaths and prefix filtering.
Load balancing for models
There's currently no load balancing mechanism for model access.
Data formats
Knowledge base pipelines currently supports only text and image formats. Other formats, including structured data, video, and audio, aren't currently supported.
Auto-processing
- Live auto-processing is supported only for text data. Image data embeddings can be manually computed or using background auto-processing.
- Background auto-processing is supported only by the knowledge base pipeline, not by the preparer pipeline.
Upgrading
When upgrading the aidb and pgfs extensions, there's currently no support for Postgres extension upgrades. When upgrading to a new version of the extensions, you must therefore drop and re-create the extensions.
Refer to Upgrading Steps for more details and commands.
Could this page be better? Report a problem or suggest an addition!