krıtıve
Sign in

Join kritive

Show what you build.

Sign inCreate account
HomeTrendsCollectionsProfileSettings
Help
krıtıve
HomeTrendsSearchActivityMessages

krıtıve© 2026

TrendsHelpTermsPrivacyContent Policy

1 project

Sambhav Shresthasambhavshrestha
·2h ago
Argument-Quality-Ranking

I ran an evaluation of how good each LLMs are on ranking arguments and build my own RoBERTa model which shows similar accuracy to GPT 5.5

Just completed the final project of my grad NLP class. I ran an evaluation to see how good LLMs are at on finding if an argument is actually good or bad. Given two arguments on the same topic, the model predicts which one is higher quality. I trained and evaluated on argument pairs spanning multiple difficulty levels, and benchmark against GPT-5.5, Llama 3, and Mistral to understand where small fine-tuned models stand relative to frontier LLMs on this task. Results: RoBERTa v3 matches GPT-5.5 at this task (0.657 vs 0.665): 125M param model fine-tuned locally is competitive with a frontier API model | Model | Link | |---|---| | RoBERTa v3 (best) | SambhavSBU/argument-quality-roberta-v3 |

Pythonllms