Scalable Transformer Flows (STARFlow)

Introduction

Generative modeling is currently dominated by two families: Autoregressive Models (like GPT) and Diffusion Models (like Stable Diffusion).

Autoregressive

Great at structure, slow (token-by-token).

Diffusion

Great at quality, slow (iterative denoising).

STARFlow (Scalable Transformer Auto-Regressive Flow) introduces a hybrid architecture that aims to combine the best of both worlds: the expressiveness of transformers with the efficiency of Normalizing Flows.

The Architecture

STARFlow integrates a Transformer backbone into a Normalizing Flow framework.

What is a Normalizing Flow?

A Normalizing Flow is a sequence of invertible transformations that maps a simple distribution (like a Gaussian) to a complex data distribution (like images).

Invertible

You can go from Data -> Noise (Training) and Noise -> Data (Generation) using the exact same math.

Exact Likelihood

Unlike GANs, you can calculate the exact probability of a data point.

The STARFlow Innovation

Standard flows struggle to model long-range dependencies (global structure in an image). STARFlow solves this by using a Transformer to parameterize the flow's transformations.

Autoregressive Prior

The model predicts the distribution of the next "patch" of the image based on previous patches.

Flow Refinement

Instead of just predicting a pixel value, it predicts the parameters of a flow transformation that generates the pixel.

STARFlow-V: Video Generation

STARFlow-V extends this to video. Video is harder because of temporal consistency—frames must flow logically over time.

Spatiotemporal Attention

The transformer attends to both spatial patches (within a frame) and temporal patches (across frames).

Efficiency

Because flows are non-iterative (or require fewer steps than diffusion), STARFlow-V can generate video faster than comparable diffusion models.

Performance vs. Diffusion

Quality

STARFlow achieves competitive FID (Fréchet Inception Distance) scores, rivaling state-of-the-art diffusion models.

Speed

It offers a significant speedup in inference time due to the flow-based generation mechanism.

Conclusion

STARFlow represents a resurgence of Flow-based models. By empowering flows with the scaling capabilities of Transformers, we open a third path for high-fidelity generative AI that doesn't rely on the expensive iterative processes of diffusion.

Scalable Transformer Flows (STARFlow)

Introduction

Autoregressive#

Diffusion#