REER Reverse Reasoning Guide

Introduction

REER, or Reverse-Engineered Reasoning, is a new way to teach AI models how to think deeply and step-by-step for open-ended tasks like writing stories or essays. Unlike traditional methods that build reasoning from scratch, REER starts with a high-quality final answer and works backward to uncover the hidden thinking process that could have led to it. This creates useful "reasoning trajectories"—detailed paths of thought—for training AI to handle creative, unstructured problems.

Why Forward Methods Fail

Forward methods, like trial-and-error (reinforcement learning) or copying from a teacher model (distillation), struggle with open-ended tasks. Reinforcement learning needs clear rewards to guide improvements, but creative writing lacks obvious "right" or "wrong" answers—there's no simple score for a poem or story. Distillation requires a super-smart teacher model and is too expensive to scale. These approaches often produce shallow or random outputs because they can't reliably explore the vast space of possible ideas without strong guidance.

How Backward Synthesis Works

Backward synthesis flips the process: instead of guessing thoughts to reach an answer, you start with a known good answer (like a well-written essay) and reverse-engineer the reasoning that might explain it. Using computation, you iteratively build a chain of thoughts—planning, exploring ideas, and self-correcting—that makes the final answer feel logical and natural when "simulated" by the AI. This is done without gradients (no heavy math training), making it efficient and scalable for creating training data.

REER as a Search Problem

REER treats finding a good reasoning trajectory as a search problem in a huge space of possible thoughts. The goal is to discover a step-by-step path (called z) that best "explains" a high-quality output (y) for a given input (x). Quality is measured by perplexity—how surprised the AI is by y after following z. Lower perplexity means the path makes y seem more probable and coherent.

Key components:

Iterative Local Search: Start with a basic initial thought path. Then, break it into segments and refine each one locally—tweak words or add steps—while checking if the overall perplexity improves. Repeat until the path is strong, guided by small, targeted changes to avoid getting stuck.
Perplexity-Guided Refinement: Perplexity acts as a compass. For each tweak, compute how well the updated path predicts the final answer. Keep changes that lower perplexity (better explanation) and discard those that raise it.
Data Curation: Collect real-world question-answer pairs (e.g., writing prompts and responses). Run the search on them to generate reasoning trajectories. Filter for quality using techniques like context setup and end-result checks, resulting in a diverse dataset focused on creative areas like literature and arts.

Formally, it's optimizing: Find z that minimizes perplexity of y given x and z.

Steps of REER

1. **Gather Data**: Collect input-output pairs (e.g., a writing prompt and a great response).
2. **Initialize**: Create a simple starting trajectory, like a one-sentence plan.
3. **Refine Iteratively**: Divide the trajectory into parts. For each part, generate variations and score them by perplexity. Replace with the best version if it improves the whole path's score.
4. **Filter and Curate**: Discard low-quality paths (e.g., those not ending coherently). Add diversity by covering genres like stories or essays.
5. **Use for Training**: Feed these trajectories into AI models to teach deep thinking patterns.

Example

Imagine the task: "Write a short story about a lost explorer." A high-quality output y is a vivid tale of adventure and discovery.

Initial Trajectory z: "Think of a jungle setting. Add a hero. End with finding treasure."
Refinement: Perplexity is high (AI finds y surprising). Tweak the first segment: "Plan the plot: Explorer gets lost in Amazon, faces dangers like rivers and animals. Hmm... Alternatively, include a mysterious guide." Recheck perplexity—lower now, as it better leads to y's details.
Final z: "Outline key events: Start with excitement of expedition. Build tension with isolation and clues. Explore alternatives: What if the treasure is knowledge, not gold? Self-correct: Wait, that's too vague—focus on emotional growth. Conclude with reflection." This path now makes y feel like a natural outcome.

Benefits

REER is efficient because it's gradient-free and uses existing good outputs as anchors, avoiding endless trial-and-error. It produces diverse, high-quality reasoning data tailored to creative tasks, helping AI learn planning, alternative exploration, and self-correction. This leads to more thoughtful, human-like outputs without needing massive resources or perfect rewards, making deep reasoning accessible for open-ended generation.

REER Reverse Reasoning Guide

Learning Goals

Practical Skills

Beginner-Friendly Content

REER Reverse Reasoning Guide

Introduction

Why Forward Methods Fail

How Backward Synthesis Works

REER as a Search Problem

Steps of REER

Example

Benefits

Build Your AI Foundation