Skip to content

Scalable Transformer Flows (STARFlow)

Deep dive into the STARFlow architecture, combining the expressiveness of autoregressive transformers with the efficiency of normalizing flows.

advanced3 / 5

STARFlow-V: Video Generation

STARFlow-V extends this to video. Video is harder because of temporal consistency—frames must flow logically over time.

Spatiotemporal Attention#

The transformer attends to both spatial patches (within a frame) and temporal patches (across frames).

Efficiency#

Because flows are non-iterative (or require fewer steps than diffusion), STARFlow-V can generate video faster than comparable diffusion models.

Section 3 of 5
Next →