This resource contains 10.5 million paragraphs with associated statement labels, realized as one paragraph per file, one sentence per line. Each file is placed in a subdirectory named after its annotated class. The statements were extracted from author-annotated environments, where we only selected the first paragraph,immediately following the heading. Headings include both structural sections (e.g. Introduction), as well as scholarly statement annotations, (e.g. Definition, Proof, Remark).
The annotated statement dataset is derived from arXMLiv, a machine-readable HTML5 representation of the arXiv corpus of scientific articles.
Definition with math lexemes (main data, single sentence, linebreaks for readability):
a directed quantum turing automaton is a quadruple
italic_T RELOP_equals OPEN_( caligraphic_H PUNCT_, caligraphic_K PUNCT_, caligraphic_L PUNCT_, italic_tau CLOSE_) PUNCT_,
where
caligraphic_H caligraphic_K and caligraphic_L
are finite dimensional hilbert spaces over the complex field blackboard_C and
italic_tau METARELOP_colon caligraphic_H MULOP_tensor_product caligraphic_K ARROW_rightarrow
caligraphic_H MULOP_tensor_product caligraphic_L
is an isometry in fdhilb
source: definition/1e4a1aea317bbf363c5314fb25eaf72c8a350a1007bb8aafc542e188405b93d5.txt
Same definition without math lexemes (nomath data, single sentence, linebreaks for readability):
a directed quantum turing automaton is a quadruple
where and are finite dimensional hilbert spaces over the complex field and
is an isometry in fdhilb
nomath source: definition/35b170bae4259a5c430846116142d4e4a45097e52daf818b78ea378d94d14a21.txt
Paper | Code | Results | Date | Stars |
---|