Here’s a quick rundown on my workflow for the diner scene. If you haven't seen it yet, the link is in the description.
This entire scene started with a single Midjourney image. I liked the aesthetic and wanted to test some different tools for character consistency and lip sync using an Image-to-Video workflow.
(The Toolkit) Here’s the full list of tools used for this scene:
- Image Gen: Midjourney & Nano Banana
- Image Editing: Adobe Photoshop & Midjourney
- Video Gen: Seedance, Kling
- Lip Sync: Kling & LipSync Pro on Wavespeed
- Audio: Veo3 (for dialogue only)
- Post-Production: Premiere, Topaz Upscaler, & FilmConvert Pro
(The Process)
My starting point was this Midjourney image. I immediately used the edit feature to "Zoom Out" and generate a wide shot of the scene.
Next, I took that source image into Nano Banana to create my different camera angles. My prompts were direct commands like, "Over-the-shoulder shot of the woman," or "Create a waitress pouring coffee."
From there, I bring every image into Photoshop for fine-tuning, ensuring a 16:9 aspect ratio. Each finalized still image becomes the starting point for a video clip.
With a folder full of these polished images, I upload them to Seedance to generate motion. My prompts here are simple actions: "The woman talks," "The man listens," "The waitress walks away."
(Post-Production & Final Polish)
I assemble all the Seedance clips in Adobe Premiere. At this stage, I completely ignore lip sync and focus only on the pacing of the visual edit.
Once the scene is assembled, I record and place the final dialogue. For this scene, I used Veo3 for the woman’s lines and recorded the man’s lines myself, though I plan on using real actors in the future for better performance.
With the dialogue timed out in the sequence, I address lip sync as the final step. On a shot-by-shot basis, I use either Kling's built-in feature or LipSync Pro, depending on which gives the better result.
To finish, I add sound design, music, and a grain effect from FilmConvert Pro to give it that cinematic texture. And that's the whole process.