Search Results for author: Dan Guo

Found 26 papers, 15 papers with code

A Label-Aware Autoregressive Framework for Cross-Domain NER

1 code implementation Findings (NAACL) 2022 Jinpeng Hu, He Zhao, Dan Guo, Xiang Wan, Tsung-Hui Chang

In doing so, label information contained in the embedding vectors can be effectively transferred to the target domain, and Bi-LSTM can further model the label relationship among different domains by pre-train and then fine-tune setting.

Cross-Domain Named Entity Recognition named-entity-recognition +2

Unified Static and Dynamic Network: Efficient Temporal Filtering for Video Grounding

1 code implementation21 Mar 2024 Jingjing Hu, Dan Guo, Kun Li, Zhan Si, Xun Yang, Xiaojun Chang, Meng Wang

Inspired by the activity-silent and persistent activity mechanisms in human visual perception biology, we design a Unified Static and Dynamic Network (UniSDNet), to learn the semantic association between the video and text/audio queries in a cross-modal environment for efficient video grounding.

Video Grounding

Training A Small Emotional Vision Language Model for Visual Art Comprehension

1 code implementation17 Mar 2024 Jing Zhang, Liang Zheng, Dan Guo, Meng Wang

This paper develops small vision language models to understand visual art, which, given an art work, aims to identify its emotion category and explain this prediction with natural language.

Language Modelling

Frequency Decoupling for Motion Magnification via Multi-Level Isomorphic Architecture

1 code implementation12 Mar 2024 Fei Wang, Dan Guo, Kun Li, Zhun Zhong, Meng Wang

To this end, we present FD4MM, a new paradigm of Frequency Decoupling for Motion Magnification with a Multi-level Isomorphic Architecture to capture multi-level high-frequency details and a stable low-frequency structure (motion field) in video space.

Motion Magnification Representation Learning

Benchmarking Micro-action Recognition: Dataset, Methods, and Applications

1 code implementation8 Mar 2024 Dan Guo, Kun Li, Bin Hu, Yan Zhang, Meng Wang

It offers insights into the feelings and intentions of individuals and is important for human-oriented applications such as emotion recognition and psychological assessment.

Action Recognition Benchmarking +1

EulerMormer: Robust Eulerian Motion Magnification via Dynamic Filtering within Transformer

1 code implementation7 Dec 2023 Fei Wang, Dan Guo, Kun Li, Meng Wang

Then, we introduce a novel dynamic filter that eliminates noise cues and preserves critical features in the motion magnification and amplification generation phases.

Denoising Motion Magnification

Exploring Sparse Spatial Relation in Graph Inference for Text-Based VQA

no code implementations13 Oct 2023 Sheng Zhou, Dan Guo, Jia Li, Xun Yang, Meng Wang

The associations between these repetitive objects are superfluous for answer reasoning; (2) two spatially distant OCR tokens detected in the image frequently have weak semantic dependencies for answer reasoning; and (3) the co-existence of nearby objects and tokens may be indicative of important visual cues for predicting answers.

Graph Learning Object +5

Dual-Path Temporal Map Optimization for Make-up Temporal Video Grounding

no code implementations12 Sep 2023 Jiaxiu Li, Kun Li, Jia Li, Guoliang Chen, Dan Guo, Meng Wang

Compared with the general video grounding task, MTVG focuses on meticulous actions and changes on the face.

Sentence text similarity +1

Exploiting Diverse Feature for Multimodal Sentiment Analysis

no code implementations25 Aug 2023 Jia Li, Wei Qian, Kun Li, Qi Li, Dan Guo, Meng Wang

Specifically, we achieve the results of 0. 8492 and 0. 8439 for MuSe-Personalisation in terms of arousal and valence CCC.

Multimodal Sentiment Analysis

Dual-path TokenLearner for Remote Photoplethysmography-based Physiological Measurement with Facial Videos

1 code implementation15 Aug 2023 Wei Qian, Dan Guo, Kun Li, Xilan Tian, Meng Wang

Specifically, the proposed Dual-TL uses a Spatial TokenLearner (S-TL) to explore associations in different facial ROIs, which promises the rPPG prediction far away from noisy ROI disturbances.

ViGT: Proposal-free Video Grounding with Learnable Token in Transformer

no code implementations11 Aug 2023 Kun Li, Dan Guo, Meng Wang

First, we employed a sharing feature encoder to project both video and query into a joint feature space before performing cross-modal co-attention (i. e., video-to-query attention and query-to-video attention) to highlight discriminative features in each modality.

Feature Correlation regression +1

M&M: Tackling False Positives in Mammography with a Multi-view and Multi-instance Learning Sparse Detector

no code implementations11 Aug 2023 Yen Nhi Truong Vu, Dan Guo, Ahmed Taha, Jason Su, Thomas Paul Matthews

Deep-learning-based object detection methods show promise for improving screening mammography, but high rates of false positives can hinder their effectiveness in clinical practice.

object-detection Object Detection

Data Augmentation for Human Behavior Analysis in Multi-Person Conversations

no code implementations3 Aug 2023 Kun Li, Dan Guo, Guoliang Chen, Feiyang Liu, Meng Wang

In this paper, we present the solution of our team HFUT-VUT for the MultiMediate Grand Challenge 2023 at ACM Multimedia 2023.

Joint Skeletal and Semantic Embedding Loss for Micro-gesture Classification

1 code implementation20 Jul 2023 Kun Li, Dan Guo, Guoliang Chen, Xinge Peng, Meng Wang

In this paper, we briefly introduce the solution of our team HFUT-VUT for the Micros-gesture Classification in the MiGA challenge at IJCAI 2023.

Action Classification Classification +2

Improving Audio-Visual Video Parsing with Pseudo Visual Labels

no code implementations4 Mar 2023 Jinxing Zhou, Dan Guo, Yiran Zhong, Meng Wang

We perform extensive experiments on the LLP dataset and demonstrate that our method can generate high-quality segment-level pseudo labels with the help of our newly proposed loss and the label denoising strategy.

Denoising Pseudo Label

Audio-Visual Segmentation with Semantics

1 code implementation30 Jan 2023 Jinxing Zhou, Xuyang Shen, Jianyuan Wang, Jiayi Zhang, Weixuan Sun, Jing Zhang, Stan Birchfield, Dan Guo, Lingpeng Kong, Meng Wang, Yiran Zhong

To deal with these problems, we propose a new baseline method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.

Segmentation Semantic Segmentation +1

Global Temporal Difference Network for Action Recognition

no code implementations TMM 2022 Zhao Xie, Jiansong Chen, Kewei Wu, Dan Guo, Richang Hong

In the global aggregation module, the global prior knowledge is learned by aggregating the visual feature sequence of video into a global vector.

Action Recognition

Contrastive Positive Sample Propagation along the Audio-Visual Event Line

1 code implementation18 Nov 2022 Jinxing Zhou, Dan Guo, Meng Wang

Visual and audio signals often coexist in natural environments, forming audio-visual events (AVEs).

Contrastive Learning Representation Learning

MEGCF: Multimodal Entity Graph Collaborative Filtering for Personalized Recommendation

1 code implementation14 Oct 2022 Kang Liu, Feng Xue, Dan Guo, Le Wu, Shujie Li, Richang Hong

This paper aims at solving the mismatch problem between MFE and UIM, so as to generate high-quality embedding representations and better model multimodal user preferences.

Collaborative Filtering Image Classification

Joint Multi-grained Popularity-aware Graph Convolution Collaborative Filtering for Recommendation

1 code implementation10 Oct 2022 Kang Liu, Feng Xue, Xiangnan He, Dan Guo, Richang Hong

In this work, we propose to model multi-grained popularity features and jointly learn them together with high-order connectivity, to match the differentiation of user preferences exhibited in popularity features.

Collaborative Filtering Recommendation Systems

Emotion Separation and Recognition from a Facial Expression by Generating the Poker Face with Vision Transformers

no code implementations22 Jul 2022 Jia Li, Jiantao Nie, Dan Guo, Richang Hong, Meng Wang

Here, we regard an expressive face as the comprehensive result of a set of facial muscle movements on one's poker face (i. e., emotionless face), inspired by Facial Action Coding System.

Disentanglement Facial Expression Recognition +1

Audio-Visual Segmentation

1 code implementation11 Jul 2022 Jinxing Zhou, Jianyuan Wang, Jiayi Zhang, Weixuan Sun, Jing Zhang, Stan Birchfield, Dan Guo, Lingpeng Kong, Meng Wang, Yiran Zhong

To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.

Segmentation

SAPAG: A Self-Adaptive Privacy Attack From Gradients

no code implementations14 Sep 2020 Yijue Wang, Jieren Deng, Dan Guo, Chenghong Wang, Xianrui Meng, Hang Liu, Caiwen Ding, Sanguthevar Rajasekaran

Distributed learning such as federated learning or collaborative learning enables model training on decentralized data from users and only collects local gradients, where data is processed close to its sources for data privacy.

Federated Learning Reconstruction Attack

Recurrent Relational Memory Network for Unsupervised Image Captioning

no code implementations24 Jun 2020 Dan Guo, Yang Wang, Peipei Song, Meng Wang

Unsupervised image captioning with no annotations is an emerging challenge in computer vision, where the existing arts usually adopt GAN (Generative Adversarial Networks) models.

Computational Efficiency Image Captioning +2

Iterative Context-Aware Graph Inference for Visual Dialog

1 code implementation CVPR 2020 Dan Guo, Hui Wang, Hanwang Zhang, Zheng-Jun Zha, Meng Wang

Visual dialog is a challenging task that requires the comprehension of the semantic dependencies among implicit visual and textual contexts.

Graph Attention Graph Embedding +2

Cannot find the paper you are looking for? You can Submit a new open access paper.