Image Generation
Overview
Image Generation Drivers build and execute API calls to image generation models.
Provide a Driver to a Tool for use by an Agent:
from griptape.drivers.image_generation.openai import OpenAiImageGenerationDriver from griptape.structures import Agent from griptape.tools import PromptImageGenerationTool driver = OpenAiImageGenerationDriver( model="gpt-image-1", ) agent = Agent( tools=[ PromptImageGenerationTool(image_generation_driver=driver), ] ) agent.run("Generate a watercolor painting of a dog riding a skateboard")
[02/27/25 20:23:29] INFO PromptTask f2efcbd59cb948af88545c67d36645e6 Input: Generate a watercolor painting of a dog riding a skateboard [02/27/25 20:23:31] INFO Subtask f66c6d5fe0dc4e8187a095d13d5fee37 Actions: [ { "tag": "call_P4IwFKQR80xHnpD0xbUDQ5oZ", "name": "PromptImageGenerationTool", "path": "generate_image", "input": { "values": { "prompt": "A watercolor painting of a dog riding a skateboard, capturing the playful and dynamic motion of the scene, with vibrant colors and fluid brushstrokes typical of watercolor art.", "negative_prompt": "realism, digital art, oil painting" } } } ] [02/27/25 20:23:45] INFO Subtask f66c6d5fe0dc4e8187a095d13d5fee37 Response: Image, format: png, size: 3148011 bytes [02/27/25 20:23:46] INFO PromptTask f2efcbd59cb948af88545c67d36645e6 Output: Here is the watercolor painting of a dog riding a skateboard. The image captures the playful and dynamic motion of the scene with vibrant colors and fluid brushstrokes typical of watercolor art.
Image Generation Drivers
Gen AI Builder
The Gen AI Builder Image Generation Driver provides access to image generation models hosted by Gen AI Builder.
Today, the only accessible model is dall-e-3
.
import os from io import BytesIO from PIL import Image from griptape.drivers.image_generation.griptape_cloud import GriptapeCloudImageGenerationDriver driver = GriptapeCloudImageGenerationDriver( api_key=os.environ["GT_CLOUD_API_KEY"], model="dall-e-3", ) image = driver.run_text_to_image(["A capybara sitting on a rock in the sun."]) Image.open(BytesIO(image.value)).show()
Amazon Bedrock
The Amazon Bedrock Image Generation Driver provides multi-model access to image generation models hosted by Amazon Bedrock. This Driver manages API calls to the Bedrock API, while the specific Model Drivers below format the API requests and parse the responses.
Stable Diffusion
The Bedrock Stable Diffusion Model Driver provides support for Stable Diffusion models hosted by Amazon Bedrock. This Model Driver supports configurations specific to Stable Diffusion, like style presets, clip guidance presets, and sampler.
This Model Driver supports negative prompts. When provided, the image generation request will include negatively-weighted prompts describing features or characteristics to avoid in the resulting generation.
from griptape.drivers.image_generation.amazon_bedrock import AmazonBedrockImageGenerationDriver from griptape.drivers.image_generation_model.bedrock_stable_diffusion import ( BedrockStableDiffusionImageGenerationModelDriver, ) from griptape.structures import Agent from griptape.tools import PromptImageGenerationTool model_driver = BedrockStableDiffusionImageGenerationModelDriver( style_preset="pixel-art", ) driver = AmazonBedrockImageGenerationDriver( image_generation_model_driver=model_driver, model="stability.stable-diffusion-xl-v1", ) agent = Agent( tools=[ PromptImageGenerationTool(image_generation_driver=driver), ] ) agent.run("Generate an image of a dog riding a skateboard")
[02/27/25 20:24:56] INFO PromptTask f8b0a42adcde40609ca82c4e8f004d01 Input: Generate an image of a dog riding a skateboard [02/27/25 20:24:58] INFO Subtask 7bc320b3f5f94352bf6e532bbd2e55dd Actions: [ { "tag": "call_lD64RFogb1SibAoDmzlPuErb", "name": "PromptImageGenerationTool", "path": "generate_image", "input": { "values": { "prompt": "A dog riding a skateboard, outdoor setting, dynamic motion, playful atmosphere", "negative_prompt": "no humans, no other animals, no cityscape" } } } ] [02/27/25 20:25:01] INFO Subtask 7bc320b3f5f94352bf6e532bbd2e55dd Response: Image, format: png, size: 472868 bytes [02/27/25 20:25:02] INFO PromptTask f8b0a42adcde40609ca82c4e8f004d01 Output: Here is the generated image of a dog riding a skateboard.
Titan
The Bedrock Titan Image Generator Model Driver provides support for Titan Image Generator models hosted by Amazon Bedrock. This Model Driver supports configurations specific to Titan Image Generator, like quality, seed, and cfg_scale.
This Model Driver supports negative prompts. When provided, the image generation request will include negatively-weighted prompts describing features or characteristics to avoid in the resulting generation.
from griptape.drivers.image_generation.amazon_bedrock import AmazonBedrockImageGenerationDriver from griptape.drivers.image_generation_model.bedrock_titan import BedrockTitanImageGenerationModelDriver from griptape.structures import Agent from griptape.tools import PromptImageGenerationTool model_driver = BedrockTitanImageGenerationModelDriver() driver = AmazonBedrockImageGenerationDriver( image_generation_model_driver=model_driver, model="amazon.titan-image-generator-v2:0", ) agent = Agent( tools=[ PromptImageGenerationTool(image_generation_driver=driver), ] ) agent.run("Generate a watercolor painting of a dog riding a skateboard")
[02/27/25 20:24:56] INFO PromptTask bf0a67980d5e4b54b176be694ef013e0 Input: Generate a watercolor painting of a dog riding a skateboard [02/27/25 20:24:58] INFO Subtask af79b42227e24c338913ef3d30a2b8c5 Actions: [ { "tag": "call_AkukHF346zDFG44pwCgEo9sJ", "name": "PromptImageGenerationTool", "path": "generate_image", "input": { "values": { "prompt": "A watercolor painting of a dog riding a skateboard, vibrant colors, dynamic motion, playful expression, outdoor setting", "negative_prompt": "blurry, abstract, dark colors" } } } ] [02/27/25 20:25:07] INFO Subtask af79b42227e24c338913ef3d30a2b8c5 Response: Image, format: png, size: 490827 bytes INFO PromptTask bf0a67980d5e4b54b176be694ef013e0 Output: Here is the watercolor painting of a dog riding a skateboard. Enjoy the vibrant and playful scene!
Azure OpenAI
The Azure OpenAI Image Generation Driver provides access to OpenAI models hosted by Azure. In addition to the configurations provided by the underlying OpenAI Driver, the Azure OpenAI Driver allows configuration of Azure-specific deployment values.
import os from griptape.drivers.image_generation.openai import AzureOpenAiImageGenerationDriver from griptape.structures import Agent from griptape.tools import PromptImageGenerationTool driver = AzureOpenAiImageGenerationDriver( model="dall-e-3", azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT_2"], api_key=os.environ["AZURE_OPENAI_API_KEY_2"], ) agent = Agent( tools=[ PromptImageGenerationTool(image_generation_driver=driver), ] ) agent.run("Generate a watercolor painting of a dog riding a skateboard")
[02/27/25 20:25:18] INFO PromptTask e321a770f7584ba0839fe643aa88dd73 Input: Generate a watercolor painting of a dog riding a skateboard [02/27/25 20:25:20] INFO Subtask f9ab45351c85477194dd0bd04903318f Actions: [ { "tag": "call_pz92Y2pOzUHnuCWcN9EgYPuY", "name": "PromptImageGenerationTool", "path": "generate_image", "input": { "values": { "prompt": "A watercolor painting of a dog riding a skateboard, capturing the fluid and vibrant style of watercolor art. The dog should appear joyful and energetic, with the skateboard in motion, creating a sense of dynamic movement.", "negative_prompt": "Avoid any background elements that distract from the main subject, such as other animals or people." } } } ] [02/27/25 20:25:40] INFO Subtask f9ab45351c85477194dd0bd04903318f Response: Image, format: png, size: 3163260 bytes [02/27/25 20:25:41] INFO PromptTask e321a770f7584ba0839fe643aa88dd73 Output: Here is the watercolor painting of a dog riding a skateboard, capturing the joyful and energetic essence of the scene.
Leonardo.Ai
The Leonardo Image Generation Driver enables image generation using models hosted by Leonardo.ai.
This Driver supports configurations like model selection, image size, specifying a generation seed, and generation steps. For details on supported configuration parameters, see Leonardo.Ai's image generation documentation.
This Driver supports negative prompts. When provided, the image generation request will include negatively-weighted prompts describing features or characteristics to avoid in the resulting generation.
import os from griptape.drivers.image_generation.leonardo import LeonardoImageGenerationDriver from griptape.structures import Agent from griptape.tools import PromptImageGenerationTool driver = LeonardoImageGenerationDriver( model=os.environ["LEONARDO_MODEL_ID"], api_key=os.environ["LEONARDO_API_KEY"], image_width=512, image_height=1024, ) agent = Agent( tools=[ PromptImageGenerationTool(image_generation_driver=driver), ] ) agent.run("Generate a watercolor painting of a dog riding a skateboard")
[02/27/25 20:24:07] INFO PromptTask a3cca1da3eb04a749214790f34974629 Input: Generate a watercolor painting of a dog riding a skateboard [02/27/25 20:24:09] INFO Subtask 67c7fa62fa5d4dd08845633b79d0ab11 Actions: [ { "tag": "call_6uB9vggFWimmRmJM70IbPws1", "name": "PromptImageGenerationTool", "path": "generate_image", "input": { "values": { "prompt": "watercolor painting of a dog riding a skateboard", "negative_prompt": "" } } } ] [02/27/25 20:24:15] INFO Subtask 67c7fa62fa5d4dd08845633b79d0ab11 Response: Image, format: png, size: 411 bytes [02/27/25 20:26:23] INFO PromptTask a3cca1da3eb04a749214790f34974629 Output: Here is a watercolor painting of a dog riding a skateboard:  agent = Agent(tools=[PromptImageGenerationTool(image_generation_driver=driver, off_prompt=True), FileManagerTool()]) agent.run("Generate a watercolor painting of a dog riding a skateboard and save it to dog.png")
[02/27/25 20:23:13] INFO PromptTask 1b9decc9c5af4d47afc7b7ca9c984079 Input: Generate a watercolor painting of a dog riding a skateboard [02/27/25 20:23:15] INFO Subtask 097f2d7d4193432b95c5728462105ddb Actions: [ { "tag": "call_HAsLy07G0DBT8hN3GWWcLci4", "name": "PromptImageGenerationTool", "path": "generate_image", "input": { "values": { "prompt": "A watercolor painting of a dog riding a skateboard, vibrant colors, dynamic motion, playful expression, outdoor setting", "negative_prompt": "realistic, digital art, oil painting, cartoonish" } } } ] [02/27/25 20:23:25] INFO Subtask 097f2d7d4193432b95c5728462105ddb Response: Image, format: png, size: 787421 bytes [02/27/25 20:23:26] INFO PromptTask 1b9decc9c5af4d47afc7b7ca9c984079 Output: Here is a watercolor painting of a dog riding a skateboard. Enjoy the vibrant colors and dynamic motion!
HuggingFace Pipelines
Info
This driver requires the drivers-image-generation-huggingface
extra.
The HuggingFace Pipelines Image Generation Driver enables image generation through locally-hosted models using the HuggingFace Diffusers library. This Driver requires a Pipeline Driver to prepare the appropriate Pipeline.
This Driver requires a model
configuration, specifying the model to use for image generation. The value of the model
configuration must be one of the following:
- A model name from the HuggingFace Model Hub, like
stabilityai/stable-diffusion-3-medium-diffusers
- A path to the directory containing a model on the filesystem, like
./models/stable-diffusion-3/
- A path to a file containing a model on the filesystem, like
./models/sd3_medium_incl_clips.safetensors
The device
configuration specifies the hardware device used to run inference. Common values include cuda
(supporting CUDA-enabled GPUs), cpu
(supported by a device's CPU), and mps
(supported by Apple silicon GPUs). For more information, see HuggingFace's documentation on GPU inference.
Stable Diffusion 3 Image Generation Pipeline Driver
Info
The Stable Diffusion 3 Image Generation Pipeline Driver
requires the drivers-image-generation-huggingface
extra.
The Stable Diffusion 3 Image Generation Pipeline Driver provides a Stable Diffusion3DiffusionPipeline
for text-to-image generations via the HuggingFace Pipelines Image Generation Driver's .try_text_to_image()
method. This Driver accepts a text prompt and configurations including Stable Diffusion 3 model, output image size, generation seed, and inference steps.
Image generation consumes substantial memory. On devices with limited VRAM, it may be necessary to enable the enable_model_cpu_offload
or drop_t5_encoder
configurations. For more information, see HuggingFace's documentation on reduced memory usage.
from griptape.artifacts import TextArtifact from griptape.drivers.image_generation.huggingface_pipeline import HuggingFacePipelineImageGenerationDriver from griptape.drivers.image_generation_pipeline.stable_diffusion_3 import StableDiffusion3ImageGenerationPipelineDriver from griptape.structures import Pipeline from griptape.tasks import PromptImageGenerationTask image_generation_task = PromptImageGenerationTask( input=TextArtifact("landscape photograph, verdant, countryside, 8k"), image_generation_driver=HuggingFacePipelineImageGenerationDriver( model="stabilityai/stable-diffusion-3-medium-diffusers", device="cuda", pipeline_driver=StableDiffusion3ImageGenerationPipelineDriver( height=512, width=512, ), ), ) output_artifact = Pipeline(tasks=[image_generation_task]).run().output
Stable Diffusion 3 Img2Img Image Generation Pipeline Driver
Info
The Stable Diffusion 3 Image Generation Pipeline Driver
requires the drivers-image-generation-huggingface
extra.
The Stable Diffusion 3 Img2Img Image Generation Pipeline Driver provides a StableDiffusion3Img2ImgPipeline
for image-to-image generations, accepting a text prompt and input image. This Driver accepts a text prompt, an input image, and configurations including Stable Diffusion 3 model, output image size, inference steps, generation seed, and strength of generation over the input image.
from griptape.artifacts import TextArtifact from griptape.drivers.image_generation.huggingface_pipeline import HuggingFacePipelineImageGenerationDriver from griptape.drivers.image_generation_pipeline.stable_diffusion_3_img_2_img import ( StableDiffusion3Img2ImgImageGenerationPipelineDriver, ) from griptape.loaders import ImageLoader from griptape.structures import Pipeline from griptape.tasks import VariationImageGenerationTask prompt_artifact = TextArtifact("landscape photograph, verdant, countryside, 8k") input_image_artifact = ImageLoader().load("tests/resources/mountain.png") image_variation_task = VariationImageGenerationTask( input=(prompt_artifact, input_image_artifact), image_generation_driver=HuggingFacePipelineImageGenerationDriver( model="stabilityai/stable-diffusion-3-medium-diffusers", device="cuda", pipeline_driver=StableDiffusion3Img2ImgImageGenerationPipelineDriver( height=1024, width=1024, ), ), ) output_artifact = Pipeline(tasks=[image_variation_task]).run().output
StableDiffusion3ControlNetImageGenerationPipelineDriver
Note
The Stable Diffusion 3 Image Generation Pipeline Driver
requires the drivers-image-generation-huggingface
extra.
The StableDiffusion3ControlNetImageGenerationPipelineDriver provides a StableDiffusion3ControlNetPipeline
for image-to-image generations, accepting a text prompt and a control image. This Driver accepts a text prompt, a control image, and configurations including Stable Diffusion 3 model, ControlNet model, output image size, generation seed, inference steps, and the degree to which the model adheres to the control image.
from griptape.artifacts import TextArtifact from griptape.drivers.image_generation.huggingface_pipeline import HuggingFacePipelineImageGenerationDriver from griptape.drivers.image_generation_pipeline.stable_diffusion_3_controlnet import ( StableDiffusion3ControlNetImageGenerationPipelineDriver, ) from griptape.loaders import ImageLoader from griptape.structures import Pipeline from griptape.tasks import VariationImageGenerationTask prompt_artifact = TextArtifact("landscape photograph, verdant, countryside, 8k") control_image_artifact = ImageLoader().load("canny_control_image.png") controlnet_task = VariationImageGenerationTask( input=(prompt_artifact, control_image_artifact), image_generation_driver=HuggingFacePipelineImageGenerationDriver( model="stabilityai/stable-diffusion-3-medium-diffusers", device="cuda", pipeline_driver=StableDiffusion3ControlNetImageGenerationPipelineDriver( controlnet_model="InstantX/SD3-Controlnet-Canny", height=768, width=1024, ), ), ) output_artifact = Pipeline(tasks=[controlnet_task]).run().output
- On this page
- Overview
- Image Generation Drivers
Could this page be better? Report a problem or suggest an addition!