Latest Research

PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning

magic-research/PLLaVA • • arXiv 2024

PLLaVA achieves new state-of-the-art performance on modern benchmark datasets for both video question-answer and captioning tasks.

Ranked #1 on Zero-Shot Video Question Answer on TGIF-QA

Video-based Generative Performance Benchmarking (Consistency) Video-based Generative Performance Benchmarking (Contextual Understanding) +4

26 Apr 2024

Paper
Code

History repeats itself: A Baseline for Temporal Knowledge Graph Forecasting

nec-research/recurrency_baseline_tkg • • 25 Apr 2024

Temporal Knowledge Graph (TKG) Forecasting aims at predicting links in Knowledge Graphs for future timesteps based on a history of Knowledge Graphs.

25 Apr 2024

Paper
Code

Vision-based robot manipulation of transparent liquid containers in a laboratory setting

danischober/labliquidvision • • 25 Apr 2024

Laboratory processes involving small volumes of solutions and active ingredients are often performed manually due to challenges in automation, such as high initial costs, semi-structured environments and protocol variability.

25 Apr 2024

Paper
Code

TinyChart: Efficient Chart Understanding with Visual Token Merging and Program-of-Thoughts Learning

x-plug/mplug-docowl • • 25 Apr 2024

Charts are important for presenting and explaining complex data relationships.

889

25 Apr 2024

Paper
Code

Interpreting Answers to Yes-No Questions in Dialogues from Multiple Domains

wang-zijie/yn-question-multi-domains • 25 Apr 2024

People often answer yes-no questions without explicitly saying yes, no, or similar polar keywords.

25 Apr 2024

Paper
Code

AAPL: Adding Attributes to Prompt Learning for Vision-Language Models

Gahyeonkim09/AAPL • • 25 Apr 2024

Through our experiments, we have identified important issues in CoOp and CoCoOp: the context learned through traditional image augmentation is biased toward seen classes, negatively impacting generalization to unseen classes.

25 Apr 2024

Paper
Code

SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension

ailab-cvc/seed-bench • • 25 Apr 2024

We hope that our work can serve as a valuable addition to existing MLLM benchmarks, providing insightful observations and inspiring further research in the area of text-rich visual comprehension with MLLMs.

239

25 Apr 2024

Paper
Code

Deep learning-based blind image super-resolution with iterative kernel reconstruction and noise estimation

hfates/ikr-net • • 25 Apr 2024

Yet, there is a gap in the literature to provide a well-generalized deep learning-based solution that performs well on images with unknown and highly complex degradations.

25 Apr 2024

Paper
Code

SwarmRL: Building the Future of Smart Active Systems

swarmrl/swarmrl • • 25 Apr 2024

This work introduces SwarmRL, a Python package designed to study intelligent active particles.

25 Apr 2024

Paper
Code

Continual Learning of Large Language Models: A Comprehensive Survey

beyonderxx/trace • • 25 Apr 2024

In this survey, we provide a comprehensive overview of the current research progress on LLMs within the context of CL.

25 Apr 2024

Paper
Code