REER Reverse Reasoning Guide

REER treats finding a good reasoning trajectory as a search problem in a huge space of possible thoughts. The goal is to discover a step-by-step path (called z) that best "explains" a high-quality output (y) for a given input (x). Quality is measured by perplexity—how surprised the AI is by y after following z. Lower perplexity means the path makes y seem more probable and coherent.

Key components:

Iterative Local Search: Start with a basic initial thought path. Then, break it into segments and refine each one locally—tweak words or add steps—while checking if the overall perplexity improves. Repeat until the path is strong, guided by small, targeted changes to avoid getting stuck.
Perplexity-Guided Refinement: Perplexity acts as a compass. For each tweak, compute how well the updated path predicts the final answer. Keep changes that lower perplexity (better explanation) and discard those that raise it.
Data Curation: Collect real-world question-answer pairs (e.g., writing prompts and responses). Run the search on them to generate reasoning trajectories. Filter for quality using techniques like context setup and end-result checks, resulting in a diverse dataset focused on creative areas like literature and arts.

Formally, it's optimizing: Find z that minimizes perplexity of y given x and z.

REER Reverse Reasoning Guide

REER as a Search Problem