3 dataset results for Decision Making AND Texts AND English

LogiQA consists of 8,678 QA instances, covering multiple types of deductive reasoning. Results show that state-of-the-art neural models perform by far worse than human ceiling. The dataset can also serve as a benchmark for reinvestigating logical AI under the deep learning NLP setting.

71 PAPERS • NO BENCHMARKS YET

PeerRead

PearRead is a dataset of scientific peer reviews. The dataset consists of over 14K paper drafts and the corresponding accept/reject decisions in top-tier venues including ACL, NIPS and ICLR, as well as over 10K textual peer reviews written by experts for a subset of the papers.

32 PAPERS • NO BENCHMARKS YET

Evidence Inference

Evidence Inference is a corpus for this task comprising 10,000+ prompts coupled with full-text articles describing RCTs.

25 PAPERS • NO BENCHMARKS YET

Datasets

3 dataset results for Decision Making AND Texts AND English