A framework for improving model alignment without expensive manual annotations using dual learning and self-supervised feedback.
Recent experiments have shown that self-supervised methods can achieve performance parity with models trained on thousands of human-labeled samples. For instance, a model trained using self-generated feedback on a coding dataset improved its pass@1 rate on HumanEval by significant margins without seeing a single human-ranked pair during the alignment phase.