Scalable Transformer Flows (STARFlow)

Introduction

Generative modeling is currently dominated by two families: Autoregressive Models (like GPT) and Diffusion Models (like Stable Diffusion).

Autoregressive

Great at structure, slow (token-by-token).

Diffusion

Great at quality, slow (iterative denoising).

STARFlow (Scalable Transformer Auto-Regressive Flow) introduces a hybrid architecture that aims to combine the best of both worlds: the expressiveness of transformers with the efficiency of Normalizing Flows.

The Architecture

STARFlow integrates a Transformer backbone into a Normalizing Flow framework.

What is a Normalizing Flow?

A Normalizing Flow is a sequence of invertible transformations that maps a simple distribution (like a Gaussian) to a complex data distribution (like images).

Invertible

You can go from Data -> Noise (Training) and Noise -> Data (Generation) using the exact same math.

Exact Likelihood

Unlike GANs, you can calculate the exact probability of a data point.

The STARFlow Innovation

Standard flows struggle to model long-range dependencies (global structure in an image). STARFlow solves this by using a Transformer to parameterize the flow's transformations.

Autoregressive Prior

The model predicts the distribution of the next "patch" of the image based on previous patches.

Flow Refinement

Instead of just predicting a pixel value, it predicts the parameters of a flow transformation that generates the pixel.

STARFlow-V: Video Generation

STARFlow-V extends this to video. Video is harder because of temporal consistency—frames must flow logically over time.

Spatiotemporal Attention

The transformer attends to both spatial patches (within a frame) and temporal patches (across frames).

Efficiency

Because flows are non-iterative (or require fewer steps than diffusion), STARFlow-V can generate video faster than comparable diffusion models.

Performance vs. Diffusion

Quality

STARFlow achieves competitive FID (Fréchet Inception Distance) scores, rivaling state-of-the-art diffusion models.

Speed

It offers a significant speedup in inference time due to the flow-based generation mechanism.

Conclusion

STARFlow represents a resurgence of Flow-based models. By empowering flows with the scaling capabilities of Transformers, we open a third path for high-fidelity generative AI that doesn't rely on the expensive iterative processes of diffusion.

Scalable Transformer Flows (STARFlow)

Learning Goals

Practical Skills

Prerequisites

Advanced Content Notice

Scalable Transformer Flows (STARFlow)

Introduction

Autoregressive

Diffusion

The Architecture

What is a Normalizing Flow?

Invertible

Exact Likelihood

The STARFlow Innovation

Autoregressive Prior

Flow Refinement

STARFlow-V: Video Generation

Spatiotemporal Attention

Efficiency

Performance vs. Diffusion

Quality

Speed

Conclusion

Master Advanced AI Concepts