Advanced Academy Reader

Scalable Transformer Flows (STARFlow)

Deep dive into the STARFlow architecture, combining the expressiveness of autoregressive transformers with the efficiency of normalizing flows.

advanced•3 / 5

STARFlow-V: Video Generation

In this section

STARFlow-V extends this to video. Video is harder because of temporal consistency—frames must flow logically over time.

The transformer attends to both spatial patches (within a frame) and temporal patches (across frames).

Because flows are non-iterative (or require fewer steps than diffusion), STARFlow-V can generate video faster than comparable diffusion models.

Section 3 of 5•