Clip-low increases entropy and clip-high decreases entropy in reinforcement learning of large language models (submitted)

Jaesung R. Park, Junsu Kim, Gyeongman Kim, Jinyoung Jo, Sean Choi, Jaewoong Cho, Ernest K. Ryu

September 2025

PDF DOI

Type

Uncategorized