Search Results for author: Dong Zhang

Found 71 papers, 35 papers with code

Joint Multi-modal Aspect-Sentiment Analysis with Auxiliary Cross-modal Relation Detection

1 code implementation EMNLP 2021 Xincheng Ju, Dong Zhang, Rong Xiao, Junhui Li, Shoushan Li, Min Zhang, Guodong Zhou

Therefore, in this paper, we are the first to jointly perform multi-modal ATE (MATE) and multi-modal ASC (MASC), and we propose a multi-modal joint learning approach with auxiliary cross-modal relation detection for multi-modal aspect-level sentiment analysis (MALSA).

Relation Sentiment Analysis +1

On the Temperature of Machine Learning Systems

no code implementations19 Apr 2024 Dong Zhang

We consider that the initial potential energy of a ML system is described by the model's loss functions, and the energy adheres to the principle of minimum potential energy.

SpeechAlign: Aligning Speech Generation to Human Preferences

2 code implementations8 Apr 2024 Dong Zhang, Zhaowei Li, ShiMin Li, Xin Zhang, Pengyu Wang, Yaqian Zhou, Xipeng Qiu

However, the integration of human feedback to align speech outputs to human preferences is often neglected.

Language Modelling

Unleashing Network Potentials for Semantic Scene Completion

1 code implementation12 Mar 2024 Fengyun Wang, Qianru Sun, Dong Zhang, Jinhui Tang

Semantic scene completion (SSC) aims to predict complete 3D voxel occupancy and semantics from a single-view RGB-D image, and recent SSC methods commonly adopt multi-modal inputs.

Location-guided Head Pose Estimation for Fisheye Image

no code implementations28 Feb 2024 Bing Li, Dong Zhang, Cheng Huang, Yun Xian, Ming Li, Dah-Jye Lee

Camera with a fisheye or ultra-wide lens covers a wide field of view that cannot be modeled by the perspective projection.

Head Pose Estimation Multi-Task Learning

AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling

1 code implementation19 Feb 2024 Jun Zhan, Junqi Dai, Jiasheng Ye, Yunhua Zhou, Dong Zhang, Zhigeng Liu, Xin Zhang, Ruibin Yuan, Ge Zhang, Linyang Li, Hang Yan, Jie Fu, Tao Gui, Tianxiang Sun, Yugang Jiang, Xipeng Qiu

We introduce AnyGPT, an any-to-any multimodal language model that utilizes discrete representations for the unified processing of various modalities, including speech, text, images, and music.

Language Modelling Large Language Model

Comment-aided Video-Language Alignment via Contrastive Pre-training for Short-form Video Humor Detection

1 code implementation14 Feb 2024 Yang Liu, Tongfei Shen, Dong Zhang, Qingying Sun, Shoushan Li, Guodong Zhou

The growing importance of multi-modal humor detection within affective computing correlates with the expanding influence of short-form video sharing on social media platforms.

Humor Detection

GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators

1 code implementation10 Feb 2024 Yuchen Hu, Chen Chen, Chao-Han Huck Yang, Ruizhe Li, Dong Zhang, Zhehuai Chen, Eng Siong Chng

Leveraging the rich linguistic knowledge and strong reasoning abilities of LLMs, our new paradigm can integrate the rich information in N-best candidates to generate a higher-quality translation result.

Machine Translation Translation

Boundary and Relation Distillation for Semantic Segmentation

no code implementations24 Jan 2024 Dong Zhang, Pingcheng Dong, Xinting Hu, Long Chen, Kwang-Ting Cheng

Concurrently, the relation distillation transfers implicit relations from the teacher model to the student model using pixel-level self-relation as a bridge, ensuring that the student's mask has strong target region connectivity.

Implicit Relations Knowledge Distillation +2

SpeechGPT-Gen: Scaling Chain-of-Information Speech Generation

1 code implementation24 Jan 2024 Dong Zhang, Xin Zhang, Jun Zhan, ShiMin Li, Yaqian Zhou, Xipeng Qiu

It comprises an autoregressive model based on LLM for semantic information modeling and a non-autoregressive model employing flow matching for perceptual information modeling.

Voice Conversion

InferAligner: Inference-Time Alignment for Harmlessness through Cross-Model Guidance

1 code implementation20 Jan 2024 Pengyu Wang, Dong Zhang, Linyang Li, Chenkun Tan, Xinghao Wang, Ke Ren, Botian Jiang, Xipeng Qiu

With the rapid development of large language models (LLMs), they are not only used as general-purpose AI assistants but are also customized through further fine-tuning to meet the requirements of different applications.

BoNuS: Boundary Mining for Nuclei Segmentation with Partial Point Labels

1 code implementation15 Jan 2024 Yi Lin, Zeyu Wang, Dong Zhang, Kwang-Ting Cheng, Hao Chen

To alleviate this problem, in this paper, we propose a weakly-supervised nuclei segmentation method that only requires partial point labels of nuclei.

Multiple Instance Learning Segmentation

GroundingGPT:Language Enhanced Multi-modal Grounding Model

2 code implementations11 Jan 2024 Zhaowei Li, Qi Xu, Dong Zhang, Hang Song, Yiqing Cai, Qi Qi, Ran Zhou, Junting Pan, Zefeng Li, Van Tu Vu, Zhida Huang, Tao Wang

Beyond capturing global information like other multi-modal models, our proposed model excels at tasks demanding a detailed understanding of local information within the input.

Language Modelling Large Language Model

SpeechAgents: Human-Communication Simulation with Multi-Modal Multi-Agent Systems

1 code implementation8 Jan 2024 Dong Zhang, Zhaowei Li, Pengyu Wang, Xin Zhang, Yaqian Zhou, Xipeng Qiu

In this paper, we propose SpeechAgents, a multi-modal LLM based multi-agent system designed for simulating human communication.

Language Modelling Large Language Model

Towards SAMBA: Segment Anything Model for Brain Tumor Segmentation in Sub-Sharan African Populations

no code implementations19 Dec 2023 Mohannad Barakat, Noha Magdy, Jjuuko George William, Ethel Phiri, Raymond Confidence, Dong Zhang, Udunna C Anazodo

This study was conducted on the Brain Tumor Segmentation (BraTS) Challenge Africa (BraTS-Africa) dataset, which provides a valuable resource for addressing challenges specific to resource-limited settings, particularly the African population, and facilitating the development of effective and more generalizable segmentation algorithms.

Brain Tumor Segmentation Segmentation +1

Bridging the Gap: Generalising State-of-the-Art U-Net Models to Sub-Saharan African Populations

no code implementations19 Dec 2023 Alyssa R. Amod, Alexandra Smith, Pearly Joubert, Confidence Raymond, Dong Zhang, Udunna C. Anazodo, Dodzi Motchon, Tinashe E. M. Mutsvangwa, Sébastien Quetin

We replicated a framework that secured the 2nd position in the 2022 BraTS competition to investigate the impact of dataset composition on model performance and pursued four distinct approaches through training a model with: 1) BraTS-Africa data only (train_SSA, N=60), 2) BraTS-Adult Glioma data only (train_GLI, N=1251), 3) both datasets together (train_ALL, N=1311), and 4) through further training the train_GLI model with BraTS-Africa data (train_ftSSA).

Physics-Informed Neural Network for Discovering Systems with Unmeasurable States with Application to Lithium-Ion Batteries

no code implementations27 Nov 2023 Yuichi Kajiura, Jorge Espin, Dong Zhang

In particular, instead of having loss terms from each differential equation, this method embeds the dynamics into a loss function that quantifies the error between observed and predicted system outputs.

SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models

3 code implementations31 Aug 2023 Xin Zhang, Dong Zhang, ShiMin Li, Yaqian Zhou, Xipeng Qiu

Therefore, we propose SpeechTokenizer, a unified speech tokenizer for speech large language models.

Language Modelling Quantization

Synthetic Instance Segmentation from Semantic Image Segmentation Masks

1 code implementation2 Aug 2023 Yuchen Shen, Dong Zhang, yuhui Zheng, Zechao Li, Liyong Fu, Qiaolin Ye

SISeg does not require training a semantic or/and instance segmentation model and avoids the need for instance-level image annotations.

Image Segmentation Instance Segmentation +3

Improving Reference-based Distinctive Image Captioning with Contrastive Rewards

no code implementations25 Jun 2023 Yangjun Mao, Jun Xiao, Dong Zhang, Meng Cao, Jian Shao, Yueting Zhuang, Long Chen

A recent DIC method proposes to generate distinctive captions by comparing the target image with a set of semantic-similar reference images, i. e., reference-based DIC (Ref-DIC).

Benchmarking Contrastive Learning +1

DUB: Discrete Unit Back-translation for Speech Translation

1 code implementation19 May 2023 Dong Zhang, Rong Ye, Tom Ko, Mingxuan Wang, Yaqian Zhou

The key point is to bridge the modality gap between speech and text so that useful MT techniques can be applied to ST.

Machine Translation Speech-to-Text Translation +1

SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities

1 code implementation18 May 2023 Dong Zhang, ShiMin Li, Xin Zhang, Jun Zhan, Pengyu Wang, Yaqian Zhou, Xipeng Qiu

Multi-modal large language models are regarded as a crucial step towards Artificial General Intelligence (AGI) and have garnered significant interest with the emergence of ChatGPT.

Language Modelling Large Language Model +2

Should ChatGPT and Bard Share Revenue with Their Data Providers? A New Business Model for the AI Era

no code implementations4 May 2023 Dong Zhang

Sharing revenue with data providers using such a scoring system would encourage more data owners to participate in the revenue-sharing program.

Discrepancy-Guided Reconstruction Learning for Image Forgery Detection

no code implementations26 Apr 2023 Zenan Shi, Haipeng Chen, Long Chen, Dong Zhang

In this paper, we propose a novel image forgery detection paradigm for boosting the model learning capacity on both forgery-sensitive and genuine compact visual patterns.

Image Forgery Detection

Boosting Convolution with Efficient MLP-Permutation for Volumetric Medical Image Segmentation

no code implementations23 Mar 2023 Yi Lin, Xiao Fang, Dong Zhang, Kwang-Ting Cheng, Hao Chen

Recently, the advent of vision Transformer (ViT) has brought substantial advancements in 3D dataset benchmarks, particularly in 3D volumetric medical image segmentation (Vol-MedSeg).

Image Segmentation Semantic Segmentation +1

Semantic Scene Completion with Cleaner Self

1 code implementation CVPR 2023 Fengyun Wang, Dong Zhang, Hanwang Zhang, Jinhui Tang, Qianru Sun

SSC is a well-known ill-posed problem as the prediction model has to "imagine" what is behind the visible surface, which is usually represented by Truncated Signed Distance Function (TSDF).

Vessel-Promoted OCT to OCTA Image Translation by Heuristic Contextual Constraints

1 code implementation13 Mar 2023 Shuhan LI, Dong Zhang, Xiaomeng Li, Chubin Ou, Lin An, Yanwu Xu, Kwang-Ting Cheng

In this paper, we propose a novel framework, TransPro, that translates 3D Optical Coherence Tomography (OCT) images into exclusive 3D OCTA images using an image translation pattern.

Translation

Protocol selection for second-order consensus against disturbance

no code implementations10 Dec 2022 Jiamin Wang, Liqi Zhou, Dong Zhang, Jian Liu, Yuanshi Zheng

Noticing that both the absolute and relative velocity protocols can solve the second-order consensus of multi-agent systems, this paper aims to investigate which of the above two protocols has better anti-disturbance capability, in which the anti-disturbance capability is measured by the L2 gain from the disturbance to the consensus error.

Centralized Feature Pyramid for Object Detection

1 code implementation5 Oct 2022 Yu Quan, Dong Zhang, Liyan Zhang, Jinhui Tang

To address this problem, in this paper, we propose a Centralized Feature Pyramid (CFP) for object detection, which is based on a globally explicit centralized feature regulation.

Object object-detection +1

Understanding the Tricks of Deep Learning in Medical Image Segmentation: Challenges and Future Directions

1 code implementation21 Sep 2022 Dong Zhang, Yi Lin, Hao Chen, Zhuotao Tian, Xin Yang, Jinhui Tang, Kwang Ting Cheng

Over the past few years, the rapid development of deep learning technologies for computer vision has significantly improved the performance of medical image segmentation (MedISeg).

Data Augmentation Domain Adaptation +3

Graph Reasoning Transformer for Image Parsing

no code implementations20 Sep 2022 Dong Zhang, Jinhui Tang, Kwang-Ting Cheng

In this paper, we propose a novel Graph Reasoning Transformer (GReaT) for image parsing to enable image patches to interact following a relation reasoning pattern.

Relation

Rethinking the Reference-based Distinctive Image Captioning

1 code implementation22 Jul 2022 Yangjun Mao, Long Chen, Zhihong Jiang, Dong Zhang, Zhimeng Zhang, Jian Shao, Jun Xiao

Unfortunately, reference images used by existing Ref-DIC works are easy to distinguish: these reference images only resemble the target image at scene-level and have few common objects, such that a Ref-DIC model can trivially generate distinctive captions even without considering the reference images.

Attribute Benchmarking +1

FedMix: Mixed Supervised Federated Learning for Medical Image Segmentation

1 code implementation4 May 2022 Jeffry Wicaksana, Zengqiang Yan, Dong Zhang, Xijie Huang, Huimin Wu, Xin Yang, Kwang-Ting Cheng

To relax this assumption, in this work, we propose a label-agnostic unified federated learning framework, named FedMix, for medical image segmentation based on mixed image labels.

Federated Learning Image Segmentation +4

Learning to Reduce Information Bottleneck for Object Detection in Aerial Images

1 code implementation5 Apr 2022 Yuchen Shen, Dong Zhang, Zhihao Song, Xuesong Jiang, Qiaolin Ye

In this letter, we first underline the importance of the neck network in object detection from the perspective of information bottleneck.

object-detection Object Detection In Aerial Images

FaceAtlasAR: Atlas of Facial Acupuncture Points in Augmented Reality

1 code implementation29 Nov 2021 Menghe Zhang, Jurgen Schulze, Dong Zhang

Acupuncture is a technique in which practitioners stimulate specific points on the body.

Face Alignment

Towards Domain-Independent and Real-Time Gesture Recognition Using mmWave Signal

1 code implementation11 Nov 2021 Yadong Li, Dongheng Zhang, Jinbo Chen, Jinwei Wan, Dong Zhang, Yang Hu, Qibin Sun, Yan Chen

To enhance the robustness of the system and reduce data collecting efforts, we design a data augmentation framework for mmWave signals based on correlations between signal patterns and gesture variations.

Data Augmentation Gesture Recognition

Cell-Level State of Charge Estimation for Battery Packs Under Minimal Sensing

no code implementations17 Sep 2021 Dong Zhang, Luis D. Couto, Ross Drummond, Shashank Sripad, Venkatasubramanian Viswanathan

This manuscript presents an algorithm for individual Lithium-ion (Li-ion) battery cell state of charge (SOC) estimation in a large-scale battery pack under minimal sensing, where only pack-level voltage and current are measured.

Region-Aware Network: Model Human's Top-Down Visual Perception Mechanism for Crowd Counting

no code implementations23 Jun 2021 Yuehai Chen, Jing Yang, Dong Zhang, Kun Zhang, Badong Chen, Shaoyi Du

More specifically, we scan the whole input images and its priority maps in the form of column vector to obtain a relevance matrix estimating their similarity.

Crowd Counting

Learning Calibrated-Guidance for Object Detection in Aerial Images

1 code implementation21 Mar 2021 Zongqi Wei, Dong Liang, Dong Zhang, Liyan Zhang, Qixiang Geng, Mingqiang Wei, Huiyu Zhou

Specifically, for a given set of feature maps, CG first computes the feature similarity between each channel and the remaining channels as the intermediary calibration guidance.

Object object-detection +2

Machine Learning based Malicious Payload Identification in Software-Defined Networking

no code implementations4 Jan 2021 Qiumei Cheng, Chunming Wu, Haifeng Zhou, Dezhang Kong, Dong Zhang, Junchi Xing, Wei Ruan

In this paper, a novel OpenFlow-enabled deep packet inspection (OFDPI) approach is proposed based on the SDN paradigm to provide adaptive and efficient packet inspection.

Networking and Internet Architecture

Dual-SLAM: A framework for robust single camera navigation

no code implementations23 Sep 2020 Huajian Huang, Wen-Yan Lin, Siying Liu, Dong Zhang, Sai-Kit Yeung

As local pose estimation is ill-conditioned, local pose estimation failures happen regularly, making the overall SLAM system brittle.

Pose Estimation Simultaneous Localization and Mapping

Mask Detection and Breath Monitoring from Speech: on Data Augmentation, Feature Representation and Modeling

no code implementations12 Aug 2020 Haiwei Wu, Lin Zhang, Lin Yang, Xuyang Wang, Jun-Jie Wang, Dong Zhang, Ming Li

This paper introduces our approaches for the Mask and Breathing Sub-Challenge in the Interspeech COMPARE Challenge 2020.

Data Augmentation

Feature Pyramid Transformer

1 code implementation ECCV 2020 Dong Zhang, Hanwang Zhang, Jinhui Tang, Meng Wang, Xiansheng Hua, Qianru Sun

Yet, the non-local spatial interactions are not across scales, and thus they fail to capture the non-local contexts of objects (or parts) residing in different scales.

Instance Segmentation object-detection +3

Reconstructing undersampled photoacoustic microscopy images using deep learning

2 code implementations30 May 2020 Anthony DiSpirito III, Daiwei Li, Tri Vu, Maomao Chen, Dong Zhang, Jianwen Luo, Roarke Horstmeyer, Junjie Yao

One primary technical challenge in photoacoustic microscopy (PAM) is the necessary compromise between spatial resolution and imaging speed.

3D Action Recognition

Direct Quantification for Coronary Artery Stenosis Using Multiview Learning

no code implementations20 Jul 2019 Dong Zhang, Guang Yang, Shu Zhao, Yanping Zhang, Heye Zhang, Shuo Li

The proposed DMQCA model consists of a multiview module with two attention mechanisms, a key-frame module, and a regression module, to achieve direct accurate multiple-index estimation.

Multiview Learning regression

Composition Loss for Counting, Density Map Estimation and Localization in Dense Crowds

no code implementations ECCV 2018 Haroon Idrees, Muhmmad Tayyab, Kishan Athrey, Dong Zhang, Somaya Al-Maadeed, Nasir Rajpoot, Mubarak Shah

With multiple crowd gatherings of millions of people every year in events ranging from pilgrimages to protests, concerts to marathons, and festivals to funerals; visual crowd analysis is emerging as a new frontier in computer vision.

Crowd Counting Management +1

Video Fill In the Blank using LR/RL LSTMs with Spatial-Temporal Attentions

1 code implementation ICCV 2017 Amir Mazaheri, Dong Zhang, Mubarak Shah

Since the source sentence is broken into two fragments: the sentence's left fragment (before the blank) and the sentence's right fragment (after the blank), traditional Recurrent Neural Networks cannot encode this structure accurately because of many possible variations of the missing word in terms of the location and type of the word in the source sentence.

Sentence

ClusterNet: Detecting Small Objects in Large Scenes by Exploiting Spatio-Temporal Information

no code implementations CVPR 2018 Rodney LaLonde, Dong Zhang, Mubarak Shah

To reduce the large search space, the first stage (ClusterNet) takes in a set of extremely large video frames, combines the motion and appearance information within the convolutional architecture, and proposes regions of objects of interest (ROOBI).

Object object-detection +1

Unsupervised Action Proposal Ranking through Proposal Recombination

no code implementations3 Apr 2017 Waqas Sultani, Dong Zhang, Mubarak Shah

Given the action proposals in a video, the goal of the proposed work is to generate a few better action proposals that are ranked properly.

Action Detection Action Recognition +1

Two-View Label Propagation to Semi-supervised Reader Emotion Classification

no code implementations COLING 2016 Shoushan Li, Jian Xu, Dong Zhang, Guodong Zhou

In this paper, we propose a two-view label propagation approach to semi-supervised reader emotion classification by exploiting two views, namely source text and response text in a label propagation algorithm.

Classification Emotion Classification +2

Video Fill in the Blank with Merging LSTMs

no code implementations13 Oct 2016 Amir Mazaheri, Dong Zhang, Mubarak Shah

In the experiments, we have demonstrated the superior performance of the proposed method on the challenging "Movie Fill-in-the-Blank" dataset.

Local feature hierarchy for face recognition across pose and illumination

no code implementations12 Jul 2016 Xiaoyue Jiang, Dong Zhang, Xiaoyi Feng

Accordingly we propose an end-to-end face recognition method to deal with pose and illumination simultaneously based on convolutional networks where the discriminative nonlinear features that are invariant to pose and illumination are extracted.

Face Recognition

A Framework for Human Pose Estimation in Videos

no code implementations26 Apr 2016 Dong Zhang, Mubarak Shah

A sequence of the best poses is inferred from the abstract body part tracklets through the tree-based optimization.

Pose Estimation

Robust Scene Text Recognition Using Sparse Coding based Features

no code implementations29 Dec 2015 Da-Han Wang, Hanzi Wang, Dong Zhang, Jonathan Li, David Zhang

For character detection, we use the HSC features instead of using the Histograms of Oriented Gradients (HOG) features.

Scene Text Recognition

Human Pose Estimation in Videos

no code implementations ICCV 2015 Dong Zhang, Mubarak Shah

Using the idea of `Association', the optimal tracklets are generated for each abstract body part, in order to enforce the spatiotemporal constraints between body parts in adjacent frames.

Pose Estimation

Face Verification Using Boosted Cross-Image Features

no code implementations28 Sep 2013 Dong Zhang, Omar Oreifej, Mubarak Shah

In contrast, we propose to extract cross-image features, i. e. features across the pair of images, which, as we demonstrate, is more discriminative to the similarity and the dissimilarity of faces.

Face Detection Face Recognition +1

Video Object Segmentation through Spatially Accurate and Temporally Dense Extraction of Primary Object Regions

no code implementations CVPR 2013 Dong Zhang, Omar Javed, Mubarak Shah

The proposed approach has several contributions: First, a novel layered Directed Acyclic Graph (DAG) based framework is presented for detection and segmentation of the primary object in video.

Object Optical Flow Estimation +4

Cannot find the paper you are looking for? You can Submit a new open access paper.