Configure a Custom data source

Who is this for

Platform users who need to ingest data from proprietary systems, internal databases, or APIs not supported by built-in data source types. Typical users include developers, system integrators, and advanced data engineers who build custom pipelines and Structures.

What you will accomplish

You will configure a Custom data source in Gen AI Builder by selecting a pre-configured PG.AI Structure that acts as a data provider. The output of this Structure will be ingested and made available for indexing into Libraries and Knowledge Bases.

Why use a Custom data source

  • Internal data often resides in proprietary databases, APIs, or specialized data stores.
  • The Custom data source allows you to connect these systems by writing a PG.AI Structure that performs the data retrieval.
  • This offers maximum flexibility while ensuring the resulting content flows through Gen AI Builder’s ingestion and transformation pipeline.

For background on how this content powers downstream AI use cases, see:

For concepts related to Structures, see:

Complexity and time to complete

  • Complexity: High. You must write or have access to a PG.AI Structure that correctly implements data retrieval logic.
  • Estimated time: 15–60 minutes depending on Structure complexity and target system integration.

Key considerations

Structure design

  • The selected PG.AI Structure must be implemented and tested to act as a data source (data provider Structure).
  • The Structure should return data in a format suitable for downstream processing (typically textual).
  • The Structure should handle retries, error conditions, and paging if applicable.

Transform logic

  • You can optionally apply a separate "Transform your data" Structure to further refine, clean, or enrich the content before indexing.

Security and compliance

  • If the Structure connects to sensitive systems, ensure appropriate access controls, authentication, and logging are implemented.
  • Be mindful of privacy regulations and data handling requirements.

How to configure a Custom data source

  1. In the interface where you manage data sources, select + Add New Data Source.
  2. Choose Custom Data Source from the available data source types.
  3. Configure the following fields:
  • Name: Provide a clear and unique name. Example: Legacy CRM Connector.
  • Description (optional): Add descriptive context for future reference.
  1. Connect to Custom Data Source:
  • PG.AI Structure: Select the pre-configured PG.AI Structure that implements your data retrieval logic.
  • Example: InternalProductAPIReader
  1. (Optional) Configure advanced options:
  • Scheduled refresh: Enable this option to automatically refresh the data on a schedule.
  • Provide a cron expression. Example: 0 2 * * * (daily at 2 AM).
  • Transform your data: Enable this option to apply an additional PG.AI Structure to transform the data during ingestion.
  • Select an existing Structure from the list.
  1. Select Create to add the Custom data source.

Supported content

  • Determined by the output of the selected PG.AI Structure.
  • Typically textual data suitable for AI processing.

Managing and refreshing the data source

Once created, the Custom data source can be viewed and managed through the Data Sources interface.

Actions available:

  • Edit: Modify the data source configuration.
  • Refresh: Manually trigger a data ingestion job.
  • Delete: Remove the data source.

You can also review the data job history to monitor ingestion performance and troubleshoot issues.

Troubleshooting

Structure not listed

  • Verify that your PG.AI Structure is correctly registered and marked as a data source (data provider Structure).
  • Ensure the Structure is accessible in your current project or environment.

Fetching errors or incomplete data

  • Check the Structure’s implementation for errors, authentication issues, or unexpected API changes.
  • Review the data job history for detailed error messages.

Example scenario

You want to ingest product catalog data from an internal REST API into Gen AI Builder.

Example configuration:

  • Name: Product Catalog via Custom API
  • PG.AI Structure: InternalProductAPIReader
  • Scheduled refresh: 0 2 * * * (daily at 2 AM)
  • Transform your data: Apply a Structure that extracts and formats product descriptions.

Could this page be better? Report a problem or suggest an addition!