RENT is an unsupervised method for training reasoning LLMs by minimizing entropy. We demonstrate on a variety of datasets and models that RENT improves model performance without using any ground truth ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results