InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions

nttmdlab-nlp/instructdoc 24 Jan 2024

We study the problem of completing various visual document understanding (VDU) tasks, e. g., question answering and information extraction, on real-world documents through human-written instructions.

document understanding Question Answering +1

112
0.21 stars / hour

Neuro-GPT: Towards A Foundation Model for EEG

wenhui0206/neurogpt 7 Nov 2023

To handle the scarcity and heterogeneity of electroencephalography (EEG) data for Brain-Computer Interface (BCI) tasks, and to harness the power of large publicly available data sets, we propose Neuro-GPT, a foundation model consisting of an EEG encoder and a GPT model.

EEG Motor Imagery

47
0.21 stars / hour

ESRL: Efficient Sampling-based Reinforcement Learning for Sequence Generation

hiyouga/llama-efficient-tuning 4 Aug 2023

Applying Reinforcement Learning (RL) to sequence generation models enables the direct optimization of long-term rewards (\textit{e. g.,} BLEU and human feedback), but typically requires large-scale sampling over a space of action sequences.

Abstractive Text Summarization Language Modelling +5

20,913
0.21 stars / hour

Mixed Continuous and Categorical Flow Matching for 3D De Novo Molecule Generation

dunni3/flowmol 30 Apr 2024

In this work, we explore the use of flow matching, a recently proposed generative modeling framework that generalizes diffusion models, for the task of de novo molecule generation.

3D Molecule Generation

17
0.20 stars / hour

Trafilatura: A Web Scraping Library and Command-Line Tool for Text Discovery and Extraction

adbar/trafilatura ACL 2021

The tool performs significantly better than other open-source solutions in this evaluation and in external benchmarks.

2,857
0.20 stars / hour

Ultra Inertial Poser: Scalable Motion Capture and Tracking from Sparse Inertial Sensors and Ultra-Wideband Ranging

eth-siplab/ultrainertialposer 30 Apr 2024

Our method then fuses these inter-sensor distances with the 3D states estimated from each sensor Our graph-based machine learning model processes the 3D states and distances to estimate a person's 3D full body pose and translation.

Pose Estimation

16
0.20 stars / hour

Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

hustvl/vim 17 Jan 2024

The results demonstrate that Vim is capable of overcoming the computation & memory constraints on performing Transformer-style understanding for high-resolution images and it has great potential to be the next-generation backbone for vision foundation models.

object-detection Object Detection +3

2,172
0.19 stars / hour

StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic Control

ironjr/streammultidiffusion 14 Mar 2024

The enormous success of diffusion models in text-to-image synthesis has made them promising candidates for the next generation of end-user applications for image generation and editing.

Text-to-Image Generation

433
0.18 stars / hour

ReFT: Representation Finetuning for Language Models

stanfordnlp/pyreft 4 Apr 2024

LoReFT is a drop-in replacement for existing PEFTs and learns interventions that are 10x-50x more parameter-efficient than prior state-of-the-art PEFTs.

Arithmetic Reasoning

649
0.18 stars / hour

Generating and Imputing Tabular Data via Diffusion and Flow-based Gradient-Boosted Trees

atong01/conditional-flow-matching 18 Sep 2023

Through empirical evaluation across the benchmark, we demonstrate that our approach outperforms deep-learning generation methods in data generation tasks and remains competitive in data imputation.

Imputation

746
0.18 stars / hour