About

Dialogue is notoriously hard to evaluate. Past approaches have used human evaluation.

Benchmarks

You can find evaluation results in the subtasks. You can also submitting evaluation metrics for this task.

Subtasks

Greatest papers with code