GPU Recommendations for Default NIM Models v1.4.0 (LTS)

Overview

From Hybrid Manager, there are two primary consumers of AI models:

  • PG.AI Knowledge Base (AIDB Postgres extension) for creating and maintaining AI Knowledge Bases.
  • Langflow for building AI pipelines and agentic flows.

Default NIM Models

Model typeNIM modelNVIDIA NIM documented resource requirements
Text completionnvidia/llama-3.3-nemotron-super-49b-v14 × L40S
Text embeddingsnvidia/llama-3.2-nemoretriever-300m-embed-v11 × L40S
Image embeddingsnvclip1 × L40S
OCRpaddleocr1 × L40S
Text rerankingllama-3.2-nv-rerankqa-1b-v21 × L40S

Minimum GPU Requirement

Based on the default models above, the minimum to run them concurrently is 8 × L40S GPUs.

Cloud Mappings

  • AWS EKS: recommend a node group with 2 × g6e.12xlarge nodes.
  • GCP GKE: recommend a node pool with 2 × a2-highgpu-4g nodes.

Note: GCP does not offer L40S GPUs. The recommended A2 nodes with A100 GPUs are supported and documented for the NIM models listed above.