im2latex-100k

Introduced by Deng et al. in Image-to-Markup Generation with Coarse-to-Fine Attention

A prebuilt dataset for OpenAI's task for image-2-latex system. Includes total of ~100k formulas and images splitted into train, validation and test sets. Formulas were parsed from LaTeX sources provided here: http://www.cs.cornell.edu/projects/kddcup/datasets.html(originally from arXiv)

Each image is a PNG image of fixed size. Formula is in black and rest of the image is transparent.

For related tools (eg. tokenizer) check out this repository: https://github.com/Miffyli/im2latex-dataset For pre-made evaluation scripts and built im2latex system check this repository: https://github.com/harvardnlp/im2markup

Newlines used in formulas_im2latex.lst are UNIX-style newlines (\n). Reading file with other type of newlines results to slightly wrong amount of lines (104563 instead of 103558), and thus breaks the structure used by this dataset. Python 3.x reads files using newlines of the running system by default, and to avoid this file must be opened with newlines="\n" (eg. open("formulas_im2latex.lst", newline="\n")).

Source: https://zenodo.org/record/56198#.YJjuCGZKgox

Homepage

Benchmarks

Add a new result Link an existing benchmark

Trend	Task	Dataset Variant	Best Model	Paper	Code
	Optical Character Recognition (OCR)	im2latex-100k	I2L-STRIPS

Papers

Paper	Code	Results	Date	Stars

Dataset Loaders

Add Remove

No data loaders found. You can submit your data loader here.

Tasks

Optical Character Recognition (OCR)

Similar Datasets

DDI-100

I2L-140K

Im2latex-90k

NText

Source: https://arxiv.org/pdf/1609.04938v2.pdf.

Usage

License

CC0 1.0 Universal

im2latex-100k

Benchmarks

Add a new result Link an existing benchmark

Papers

Dataset Loaders

Add Remove

Tasks

Similar Datasets

DDI-100

I2L-140K

Im2latex-90k

NText

Usage

License

Modalities

Languages

im2latex-100k

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit

Similar Datasets

DDI-100

I2L-140K

Im2latex-90k

NText

Usage

License Edit

Modalities Edit

Languages Edit

Benchmarks

Add a new result Link an existing benchmark

Dataset Loaders

Add Remove

Tasks

License

Modalities

Languages