This was indeed a difficult assignment. Details shifted (like the droplets on the bottle), text broke (like the small text underneath the logo), and especially the proportions do not seem right, as this is supposed to be only a 0.2L bottle. I tried a mixture of seedream 5 lite, nanobanana 2, and gpt image 2 for the frames. Then I tried both seedance 2 reference to video and first frame to video.
My question: these generations are done with 720p (in order to spare credits), would 1080p have had a higher succes rate on text preservation?