Stabilize large-model training by restricting weight updates to curated manifolds that align with desired behaviors and safety envelopes.
advanced•4 / 9
Stability gains and empirical observations
Training curves show reduced variance when attention weights are kept on orthogonal manifolds, leading to faster convergence.
Constraining intermediate representations can mitigate catastrophic forgetting during continual learning, since updates avoid directions that erase prior knowledge.
Safety-focused manifolds can cap amplification of risky behaviors by removing gradient directions associated with flagged patterns.