Give AlbumentationsX a star on GitHub — it powers this leaderboard

Star on GitHub

CyberAgentAILab/filtered-dpo

Introducing Filtered Direct Preference Optimization (fDPO) that enhances language model alignment with human preferences by discarding lower-quality samples compared to those generated by the learning model

Stars: 16Language: Jupyter Notebook
CyberAgentAILab/filtered-dpo - GitHub Repository | PyPI Leaderboard