Inner Alignment: The new problem introduced by this work - ensuring that a mesa-optimizer's internal objective (the mesa-objective) aligns with the base objective it was trained under.
This leads to pseudo or fake alignment problem.
paper link: https://arxiv.org/pdf/1906.01820