Image Generation

Overview

Image Generation Drivers build and execute API calls to image generation models.

Provide a Driver to a Tool for use by an Agent:

Image Generation Drivers

Gen AI Builder

The Gen AI Builder Image Generation Driver provides access to image generation models hosted by Gen AI Builder.

Today, the only accessible model is dall-e-3.

import os
from io import BytesIO

from PIL import Image

from griptape.drivers.image_generation.griptape_cloud import GriptapeCloudImageGenerationDriver

driver = GriptapeCloudImageGenerationDriver(
    api_key=os.environ["GT_CLOUD_API_KEY"],
    model="dall-e-3",
)


image = driver.run_text_to_image(["A capybara sitting on a rock in the sun."])

Image.open(BytesIO(image.value)).show()

Amazon Bedrock

The Amazon Bedrock Image Generation Driver provides multi-model access to image generation models hosted by Amazon Bedrock. This Driver manages API calls to the Bedrock API, while the specific Model Drivers below format the API requests and parse the responses.

Stable Diffusion

The Bedrock Stable Diffusion Model Driver provides support for Stable Diffusion models hosted by Amazon Bedrock. This Model Driver supports configurations specific to Stable Diffusion, like style presets, clip guidance presets, and sampler.

This Model Driver supports negative prompts. When provided, the image generation request will include negatively-weighted prompts describing features or characteristics to avoid in the resulting generation.

Titan

The Bedrock Titan Image Generator Model Driver provides support for Titan Image Generator models hosted by Amazon Bedrock. This Model Driver supports configurations specific to Titan Image Generator, like quality, seed, and cfg_scale.

This Model Driver supports negative prompts. When provided, the image generation request will include negatively-weighted prompts describing features or characteristics to avoid in the resulting generation.

Azure OpenAI

The Azure OpenAI Image Generation Driver provides access to OpenAI models hosted by Azure. In addition to the configurations provided by the underlying OpenAI Driver, the Azure OpenAI Driver allows configuration of Azure-specific deployment values.

Leonardo.Ai

The Leonardo Image Generation Driver enables image generation using models hosted by Leonardo.ai.

This Driver supports configurations like model selection, image size, specifying a generation seed, and generation steps. For details on supported configuration parameters, see Leonardo.Ai's image generation documentation.

This Driver supports negative prompts. When provided, the image generation request will include negatively-weighted prompts describing features or characteristics to avoid in the resulting generation.

OpenAI

The OpenAI Image Generation Driver provides access to OpenAI image generation models. Like other OpenAI Drivers, the image generation Driver will implicitly load an API key in the OPENAI_API_KEY environment variable if one is not explicitly provided.

This Driver supports image generation configurations like style presets, image quality preference, and image size. For details on supported configuration values, see the OpenAI documentation.

HuggingFace Pipelines

Info

This driver requires the drivers-image-generation-huggingface extra.

The HuggingFace Pipelines Image Generation Driver enables image generation through locally-hosted models using the HuggingFace Diffusers library. This Driver requires a Pipeline Driver to prepare the appropriate Pipeline.

This Driver requires a model configuration, specifying the model to use for image generation. The value of the model configuration must be one of the following:

  • A model name from the HuggingFace Model Hub, like stabilityai/stable-diffusion-3-medium-diffusers
  • A path to the directory containing a model on the filesystem, like ./models/stable-diffusion-3/
  • A path to a file containing a model on the filesystem, like ./models/sd3_medium_incl_clips.safetensors

The device configuration specifies the hardware device used to run inference. Common values include cuda (supporting CUDA-enabled GPUs), cpu (supported by a device's CPU), and mps (supported by Apple silicon GPUs). For more information, see HuggingFace's documentation on GPU inference.

Stable Diffusion 3 Image Generation Pipeline Driver

Info

The Stable Diffusion 3 Image Generation Pipeline Driver requires the drivers-image-generation-huggingface extra.

The Stable Diffusion 3 Image Generation Pipeline Driver provides a Stable Diffusion3DiffusionPipeline for text-to-image generations via the HuggingFace Pipelines Image Generation Driver's .try_text_to_image() method. This Driver accepts a text prompt and configurations including Stable Diffusion 3 model, output image size, generation seed, and inference steps.

Image generation consumes substantial memory. On devices with limited VRAM, it may be necessary to enable the enable_model_cpu_offload or drop_t5_encoder configurations. For more information, see HuggingFace's documentation on reduced memory usage.

from griptape.artifacts import TextArtifact
from griptape.drivers.image_generation.huggingface_pipeline import HuggingFacePipelineImageGenerationDriver
from griptape.drivers.image_generation_pipeline.stable_diffusion_3 import StableDiffusion3ImageGenerationPipelineDriver
from griptape.structures import Pipeline
from griptape.tasks import PromptImageGenerationTask

image_generation_task = PromptImageGenerationTask(
    input=TextArtifact("landscape photograph, verdant, countryside, 8k"),
    image_generation_driver=HuggingFacePipelineImageGenerationDriver(
        model="stabilityai/stable-diffusion-3-medium-diffusers",
        device="cuda",
        pipeline_driver=StableDiffusion3ImageGenerationPipelineDriver(
            height=512,
            width=512,
        ),
    ),
)

output_artifact = Pipeline(tasks=[image_generation_task]).run().output

Stable Diffusion 3 Img2Img Image Generation Pipeline Driver

Info

The Stable Diffusion 3 Image Generation Pipeline Driver requires the drivers-image-generation-huggingface extra.

The Stable Diffusion 3 Img2Img Image Generation Pipeline Driver provides a StableDiffusion3Img2ImgPipeline for image-to-image generations, accepting a text prompt and input image. This Driver accepts a text prompt, an input image, and configurations including Stable Diffusion 3 model, output image size, inference steps, generation seed, and strength of generation over the input image.

from griptape.artifacts import TextArtifact
from griptape.drivers.image_generation.huggingface_pipeline import HuggingFacePipelineImageGenerationDriver
from griptape.drivers.image_generation_pipeline.stable_diffusion_3_img_2_img import (
    StableDiffusion3Img2ImgImageGenerationPipelineDriver,
)
from griptape.loaders import ImageLoader
from griptape.structures import Pipeline
from griptape.tasks import VariationImageGenerationTask

prompt_artifact = TextArtifact("landscape photograph, verdant, countryside, 8k")
input_image_artifact = ImageLoader().load("tests/resources/mountain.png")

image_variation_task = VariationImageGenerationTask(
    input=(prompt_artifact, input_image_artifact),
    image_generation_driver=HuggingFacePipelineImageGenerationDriver(
        model="stabilityai/stable-diffusion-3-medium-diffusers",
        device="cuda",
        pipeline_driver=StableDiffusion3Img2ImgImageGenerationPipelineDriver(
            height=1024,
            width=1024,
        ),
    ),
)

output_artifact = Pipeline(tasks=[image_variation_task]).run().output

StableDiffusion3ControlNetImageGenerationPipelineDriver

Note

The Stable Diffusion 3 Image Generation Pipeline Driver requires the drivers-image-generation-huggingface extra.

The StableDiffusion3ControlNetImageGenerationPipelineDriver provides a StableDiffusion3ControlNetPipeline for image-to-image generations, accepting a text prompt and a control image. This Driver accepts a text prompt, a control image, and configurations including Stable Diffusion 3 model, ControlNet model, output image size, generation seed, inference steps, and the degree to which the model adheres to the control image.

from griptape.artifacts import TextArtifact
from griptape.drivers.image_generation.huggingface_pipeline import HuggingFacePipelineImageGenerationDriver
from griptape.drivers.image_generation_pipeline.stable_diffusion_3_controlnet import (
    StableDiffusion3ControlNetImageGenerationPipelineDriver,
)
from griptape.loaders import ImageLoader
from griptape.structures import Pipeline
from griptape.tasks import VariationImageGenerationTask

prompt_artifact = TextArtifact("landscape photograph, verdant, countryside, 8k")
control_image_artifact = ImageLoader().load("canny_control_image.png")

controlnet_task = VariationImageGenerationTask(
    input=(prompt_artifact, control_image_artifact),
    image_generation_driver=HuggingFacePipelineImageGenerationDriver(
        model="stabilityai/stable-diffusion-3-medium-diffusers",
        device="cuda",
        pipeline_driver=StableDiffusion3ControlNetImageGenerationPipelineDriver(
            controlnet_model="InstantX/SD3-Controlnet-Canny",
            height=768,
            width=1024,
        ),
    ),
)

output_artifact = Pipeline(tasks=[controlnet_task]).run().output

Could this page be better? Report a problem or suggest an addition!