Command Palette
Search for a command to run...
FIFO-Diffusion: Generating Infinite Videos from Text without Training
FIFO-Diffusion: Generating Infinite Videos from Text without Training
Jihwan Kim Junoh Kang Jinyoung Choi Bohyung Han
Abstract
We propose a novel inference technique based on a pretrained diffusion modelfor text-conditional video generation. Our approach, called FIFO-Diffusion, isconceptually capable of generating infinitely long videos without training.This is achieved by iteratively performing diagonal denoising, whichconcurrently processes a series of consecutive frames with increasing noiselevels in a queue; our method dequeues a fully denoised frame at the head whileenqueuing a new random noise frame at the tail. However, diagonal denoising isa double-edged sword as the frames near the tail can take advantage of cleanerones by forward reference but such a strategy induces the discrepancy betweentraining and inference. Hence, we introduce latent partitioning to reduce thetraining-inference gap and lookahead denoising to leverage the benefit offorward referencing. We have demonstrated the promising results andeffectiveness of the proposed methods on existing text-to-video generationbaselines.