PLLaVA achieves new state-of-the-art performance on modern benchmark datasets for both video question-answer and captioning tasks.
Ranked #1 on Zero-Shot Video Question Answer on TGIF-QA
Video-based Generative Performance Benchmarking (Consistency) Video-based Generative Performance Benchmarking (Contextual Understanding) +4
Temporal Knowledge Graph (TKG) Forecasting aims at predicting links in Knowledge Graphs for future timesteps based on a history of Knowledge Graphs.
Laboratory processes involving small volumes of solutions and active ingredients are often performed manually due to challenges in automation, such as high initial costs, semi-structured environments and protocol variability.
Charts are important for presenting and explaining complex data relationships.
People often answer yes-no questions without explicitly saying yes, no, or similar polar keywords.
Through our experiments, we have identified important issues in CoOp and CoCoOp: the context learned through traditional image augmentation is biased toward seen classes, negatively impacting generalization to unseen classes.
We hope that our work can serve as a valuable addition to existing MLLM benchmarks, providing insightful observations and inspiring further research in the area of text-rich visual comprehension with MLLMs.
Yet, there is a gap in the literature to provide a well-generalized deep learning-based solution that performs well on images with unknown and highly complex degradations.
This work introduces SwarmRL, a Python package designed to study intelligent active particles.
In this survey, we provide a comprehensive overview of the current research progress on LLMs within the context of CL.