FrontierLab Publications

2025

RAIDEN Benchmark: Evaluating Role-playing Conversational Agents with Measurement-Driven Custom Dialogues

Wu et al. | COLING 2025 | GitHub Logo HF Logo