I'm working on a workflow where I need my agent to generate images not just from text prompts, but also based on existing images that are provided as input. My current setup only handles text-to-image generation.
Could anyone share some insights or point me in the right direction for achieving image-to-image generation with an agent?