Trending Research

emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

alibaba-damo-academy/FunASR • • 23 Dec 2023

To the best of our knowledge, emotion2vec is the first universal representation model in various emotion-related tasks, filling a gap in the field.

Self-Supervised Learning Sentiment Analysis +1

3,353

0.20 stars / hour

Paper
Code

LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation

3dtopia/lgm • • 7 Feb 2024

2) 3D Backbone: We present an asymmetric U-Net as a high-throughput backbone operating on multi-view images, which can be produced from text or single-view image input by leveraging multi-view diffusion models.

1,251

0.19 stars / hour

Paper
Code

The Unreasonable Ineffectiveness of the Deeper Layers

arcee-ai/PruneMe • • 26 Mar 2024

We empirically study a simple layer-pruning strategy for popular families of open-weight pretrained LLMs, finding minimal degradation of performance on different question-answering benchmarks until after a large fraction (up to half) of the layers are removed.

Quantization Question Answering

0.19 stars / hour

Paper
Code

From Matching to Generation: A Survey on Generative Information Retrieval

ruc-nlpir/genir-survey • 23 Apr 2024

We will summarize the advancements in GR regarding model training, document identifier, incremental learning, downstream tasks adaptation, multi-modal GR and generative recommendation, as well as progress in reliable response generation in aspects of internal knowledge memorization, external knowledge augmentation, generating response with citations and personal information assistant.

Incremental Learning Information Retrieval +5

0.19 stars / hour

Paper
Code

Zephyr: Direct Distillation of LM Alignment

huggingface/alignment-handbook • • 25 Oct 2023

Starting from a dataset of outputs ranked by a teacher model, we apply distilled direct preference optimization (dDPO) to learn a chat model with significantly improved intent alignment.

2D Cyclist Detection Language Modelling

3,809

0.18 stars / hour

Paper
Code

Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges

badripatro/mamba360 • 24 Apr 2024

This survey also highlights diverse applications of SSMs across domains such as vision, video, audio, speech, language (especially long sequence modeling), medical (including genomics), chemical (like drug design), recommendation systems, and time series analysis, including tabular data.

Inductive Bias Machine Translation +9

0.18 stars / hour

Paper
Code

Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster

guinmoon/llmfarm • 6 Apr 2023

We study recent research advances that improve large language models through efficient pre-training and scaling, and open datasets and tools.

883

0.18 stars / hour

Paper
Code

iTransformer: Inverted Transformers Are Effective for Time Series Forecasting

thuml/Time-Series-Library • • 10 Oct 2023

These forecasters leverage Transformers to model the global dependencies over temporal tokens of time series, with each token formed by multiple variates of the same timestamp.

Time Series Time Series Forecasting

4,357

0.18 stars / hour

Paper
Code

Generative Agents: Interactive Simulacra of Human Behavior

a16z-infra/ai-town • 7 Apr 2023

Believable proxies of human behavior can empower interactive applications ranging from immersive environments to rehearsal spaces for interpersonal communication to prototyping tools.

Language Modelling Large Language Model

6,400

0.18 stars / hour

Paper
Code

Efficient Memory Management for Large Language Model Serving with PagedAttention

vllm-project/vllm • • 12 Sep 2023

On top of it, we build vLLM, an LLM serving system that achieves (1) near-zero waste in KV cache memory and (2) flexible sharing of KV cache within and across requests to further reduce memory usage.

Language Modelling Large Language Model +1

18,614

0.17 stars / hour

Paper
Code