Search Results for author: Shuming Ma

Found 80 papers, 51 papers with code

Towards Making the Most of Cross-Lingual Transfer for Zero-Shot Neural Machine Translation

1 code implementation • ACL 2022 • Guanhua Chen, Shuming Ma, Yun Chen, Dongdong Zhang, Jia Pan, Wenping Wang, Furu Wei

When applied to zero-shot cross-lingual abstractive summarization, it produces an average performance gain of 12. 3 ROUGE-L over mBART-ft. We conduct detailed analyses to understand the key ingredients of SixT+, including multilinguality of the auxiliary parallel data, positional disentangled encoder, and the cross-lingual transferability of its encoder.

Abstractive Text Summarization Cross-Lingual Abstractive Summarization +5

Paper
Code

Improving Multilingual Neural Machine Translation with Auxiliary Source Languages

1 code implementation • Findings (EMNLP) 2021 • Weijia Xu, Yuwei Yin, Shuming Ma, Dongdong Zhang, Haoyang Huang

Multilingual neural machine translation models typically handle one source language at a time.

Machine Translation Translation

Paper
Code

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

4 code implementations • 27 Feb 2024 • Shuming Ma, Hongyu Wang, Lingxiao Ma, Lei Wang, Wenhui Wang, Shaohan Huang, Li Dong, Ruiping Wang, Jilong Xue, Furu Wei

Recent research, such as BitNet, is paving the way for a new era of 1-bit Large Language Models (LLMs).

Paper
Code

When an Image is Worth 1,024 x 1,024 Words: A Case Study in Computational Pathology

no code implementations • 6 Dec 2023 • Wenhui Wang, Shuming Ma, Hanwen Xu, Naoto Usuyama, Jiayu Ding, Hoifung Poon, Furu Wei

This technical report presents LongViT, a vision Transformer that can process gigapixel images in an end-to-end manner.

Survival Prediction whole slide images

Paper
Add Code

Auto-ICL: In-Context Learning without Human Supervision

1 code implementation • 15 Nov 2023 • Jinghan Yang, Shuming Ma, Furu Wei

In the era of Large Language Models (LLMs), human-computer interaction has evolved towards natural language, offering unprecedented flexibility.

In-Context Learning

Paper
Code

BitNet: Scaling 1-bit Transformers for Large Language Models

2 code implementations • 17 Oct 2023 • Hongyu Wang, Shuming Ma, Li Dong, Shaohan Huang, Huaijie Wang, Lingxiao Ma, Fan Yang, Ruiping Wang, Yi Wu, Furu Wei

The increasing size of large language models has posed challenges for deployment and raised concerns about environmental impact due to high energy consumption.

Language Modelling Quantization

233

Paper
Code

Kosmos-2.5: A Multimodal Literate Model

no code implementations • 20 Sep 2023 • Tengchao Lv, Yupan Huang, Jingye Chen, Lei Cui, Shuming Ma, Yaoyao Chang, Shaohan Huang, Wenhui Wang, Li Dong, Weiyao Luo, Shaoxiang Wu, Guoxin Wang, Cha Zhang, Furu Wei

We present Kosmos-2. 5, a multimodal literate model for machine reading of text-intensive images.

Reading Comprehension Text Generation

Paper
Add Code

Retentive Network: A Successor to Transformer for Large Language Models

8 code implementations • 17 Jul 2023 • Yutao Sun, Li Dong, Shaohan Huang, Shuming Ma, Yuqing Xia, Jilong Xue, Jianyong Wang, Furu Wei

In this work, we propose Retentive Network (RetNet) as a foundation architecture for large language models, simultaneously achieving training parallelism, low-cost inference, and good performance.

Language Modelling

18,338

Paper
Code

LongNet: Scaling Transformers to 1,000,000,000 Tokens

3 code implementations • 5 Jul 2023 • Jiayu Ding, Shuming Ma, Li Dong, Xingxing Zhang, Shaohan Huang, Wenhui Wang, Nanning Zheng, Furu Wei

Scaling sequence length has become a critical demand in the era of large language models.

18,338

Paper
Code

Kosmos-2: Grounding Multimodal Large Language Models to the World

2 code implementations • 26 Jun 2023 • Zhiliang Peng, Wenhui Wang, Li Dong, Yaru Hao, Shaohan Huang, Shuming Ma, Furu Wei

We introduce Kosmos-2, a Multimodal Large Language Model (MLLM), enabling new capabilities of perceiving object descriptions (e. g., bounding boxes) and grounding text to the visual world.

Ranked #11 on Visual Question Answering on ViP-Bench

Image Captioning In-Context Learning +8

18,340

Paper
Code

Discourse Centric Evaluation of Machine Translation with a Densely Annotated Parallel Corpus

1 code implementation • 18 May 2023 • Yuchen Eleanor Jiang, Tianyu Liu, Shuming Ma, Dongdong Zhang, Mrinmaya Sachan, Ryan Cotterell

Several recent papers claim human parity at sentence-level Machine Translation (MT), especially in high-resource languages.

Machine Translation Sentence +1

Paper
Code

On the Off-Target Problem of Zero-Shot Multilingual Neural Machine Translation

2 code implementations • 18 May 2023 • Liang Chen, Shuming Ma, Dongdong Zhang, Furu Wei, Baobao Chang

We conduct experiments on a multilingual machine translation benchmark in 11 languages.

Machine Translation Translation

Paper
Code

Are More Layers Beneficial to Graph Transformers?

1 code implementation • 1 Mar 2023 • Haiteng Zhao, Shuming Ma, Dongdong Zhang, Zhi-Hong Deng, Furu Wei

Despite that going deep has proven successful in many neural architectures, the existing graph transformers are relatively shallow.

Paper
Code

Language Is Not All You Need: Aligning Perception with Language Models

1 code implementation • NeurIPS 2023 • Shaohan Huang, Li Dong, Wenhui Wang, Yaru Hao, Saksham Singhal, Shuming Ma, Tengchao Lv, Lei Cui, Owais Khan Mohammed, Barun Patra, Qiang Liu, Kriti Aggarwal, Zewen Chi, Johan Bjorck, Vishrav Chaudhary, Subhojit Som, Xia Song, Furu Wei

A big convergence of language, multimodal perception, action, and world modeling is a key step toward artificial general intelligence.

Image Captioning Language Modelling +4

18,338

Paper
Code

HanoiT: Enhancing Context-aware Translation via Selective Context

no code implementations • 17 Jan 2023 • Jian Yang, Yuwei Yin, Shuming Ma, Liqun Yang, Hongcheng Guo, Haoyang Huang, Dongdong Zhang, Yutao Zeng, Zhoujun Li, Furu Wei

Context-aware neural machine translation aims to use the document-level context to improve translation quality.

Document Level Machine Translation Machine Translation +2

Paper
Add Code

Why Can GPT Learn In-Context? Language Models Implicitly Perform Gradient Descent as Meta-Optimizers

1 code implementation • 20 Dec 2022 • Damai Dai, Yutao Sun, Li Dong, Yaru Hao, Shuming Ma, Zhifang Sui, Furu Wei

We comprehensively compare the behaviors of in-context learning and explicit finetuning on real tasks to provide empirical evidence that supports our understanding.

In-Context Learning Open-Ended Question Answering

3,180

Paper
Code

A Length-Extrapolatable Transformer

5 code implementations • 20 Dec 2022 • Yutao Sun, Li Dong, Barun Patra, Shuming Ma, Shaohan Huang, Alon Benhaim, Vishrav Chaudhary, Xia Song, Furu Wei

Position modeling plays a critical role in Transformers.

Language Modelling Position

2,920

Paper
Code

GanLM: Encoder-Decoder Pre-training with an Auxiliary Discriminator

1 code implementation • 20 Dec 2022 • Jian Yang, Shuming Ma, Li Dong, Shaohan Huang, Haoyang Huang, Yuwei Yin, Dongdong Zhang, Liqun Yang, Furu Wei, Zhoujun Li

Inspired by the idea of Generative Adversarial Networks (GANs), we propose a GAN-style model for encoder-decoder pre-training by introducing an auxiliary discriminator, unifying the ability of language understanding and generation in a single model.

Denoising Sentence +1

Paper
Code

Advancing Multilingual Pre-training: TRIP Triangular Document-level Pre-training for Multilingual Language Models

no code implementations • 15 Dec 2022 • Hongyuan Lu, Haoyang Huang, Shuming Ma, Dongdong Zhang, Wai Lam, Furu Wei

Despite the success of multilingual sequence-to-sequence pre-training, most existing approaches rely on document-level monolingual corpora in many different languages, sentence-level bilingual corpora,\footnote{In this paper, we use `bilingual corpora' to denote parallel corpora with `bilingual translation pairs' in many different language pairs, each consisting of two sentences/documents with the same meaning written in different languages.

Abstractive Text Summarization Cross-Lingual Abstractive Summarization +4

Paper
Add Code

TorchScale: Transformers at Scale

1 code implementation • 23 Nov 2022 • Shuming Ma, Hongyu Wang, Shaohan Huang, Wenhui Wang, Zewen Chi, Li Dong, Alon Benhaim, Barun Patra, Vishrav Chaudhary, Xia Song, Furu Wei

Large Transformers have achieved state-of-the-art performance across many tasks.

Language Modelling Machine Translation +1

2,920

Paper
Code

A Bilingual Parallel Corpus with Discourse Annotations

1 code implementation • 26 Oct 2022 • Yuchen Eleanor Jiang, Tianyu Liu, Shuming Ma, Dongdong Zhang, Mrinmaya Sachan, Ryan Cotterell

The BWB corpus consists of Chinese novels translated by experts into English, and the annotated test set is designed to probe the ability of machine translation systems to model various discourse phenomena.

Document Level Machine Translation Machine Translation +2

Paper
Code

CROP: Zero-shot Cross-lingual Named Entity Recognition with Multilingual Labeled Sequence Translation

1 code implementation • 13 Oct 2022 • Jian Yang, Shaohan Huang, Shuming Ma, Yuwei Yin, Li Dong, Dongdong Zhang, Hongcheng Guo, Zhoujun Li, Furu Wei

Specifically, the target sequence is first translated into the source language and then tagged by a source NER model.

Cross-Lingual NER Machine Translation +5

Paper
Code

Foundation Transformers

4 code implementations • 12 Oct 2022 • Hongyu Wang, Shuming Ma, Shaohan Huang, Li Dong, Wenhui Wang, Zhiliang Peng, Yu Wu, Payal Bajaj, Saksham Singhal, Alon Benhaim, Barun Patra, Zhun Liu, Vishrav Chaudhary, Xia Song, Furu Wei

A big convergence of model architectures across language, vision, speech, and multimodal is emerging.

Language Modelling Machine Translation +1

18,338

Paper
Code

Revamping Multilingual Agreement Bidirectionally via Switched Back-translation for Multilingual Neural Machine Translation

no code implementations • 28 Sep 2022 • Hongyuan Lu, Haoyang Huang, Shuming Ma, Dongdong Zhang, Furu Wei, Wai Lam

Despite the fact that multilingual agreement (MA) has shown its importance for multilingual neural machine translation (MNMT), current methodologies in the field have two shortages: (i) require parallel data between multiple language pairs, which is not always realistic and (ii) optimize the agreement in an ambiguous direction, which hampers the translation performance.

Document Level Machine Translation Document Translation +2

Paper
Add Code

GTrans: Grouping and Fusing Transformer Layers for Neural Machine Translation

1 code implementation • 29 Jul 2022 • Jian Yang, Yuwei Yin, Liqun Yang, Shuming Ma, Haoyang Huang, Dongdong Zhang, Furu Wei, Zhoujun Li

Transformer structure, stacked by a sequence of encoder and decoder network layers, achieves significant development in neural machine translation.

Machine Translation Translation

Paper
Code

UM4: Unified Multilingual Multiple Teacher-Student Model for Zero-Resource Neural Machine Translation

1 code implementation • 11 Jul 2022 • Jian Yang, Yuwei Yin, Shuming Ma, Dongdong Zhang, Shuangzhi Wu, Hongcheng Guo, Zhoujun Li, Furu Wei

Most translation tasks among languages belong to the zero-resource translation problem where parallel corpora are unavailable.

Machine Translation NMT +1

Paper
Code

HLT-MT: High-resource Language-specific Training for Multilingual Neural Machine Translation

1 code implementation • 11 Jul 2022 • Jian Yang, Yuwei Yin, Shuming Ma, Dongdong Zhang, Zhoujun Li, Furu Wei

Nonetheless, multilingual training is plagued by language interference degeneration in shared parameters because of the negative interference among different translation directions, especially on high-resource languages.

Machine Translation Translation

Paper
Code

Language Models are General-Purpose Interfaces

1 code implementation • 13 Jun 2022 • Yaru Hao, Haoyu Song, Li Dong, Shaohan Huang, Zewen Chi, Wenhui Wang, Shuming Ma, Furu Wei

Experimental results across various language-only and vision-language benchmarks show that our model outperforms or is competitive with specialized models on finetuning, zero-shot generalization, and few-shot learning.

Ranked #2 on Image Captioning on nocaps val

Causal Language Modeling Few-Shot Learning +6

18,338

Paper
Code

On the Representation Collapse of Sparse Mixture of Experts

2 code implementations • 20 Apr 2022 • Zewen Chi, Li Dong, Shaohan Huang, Damai Dai, Shuming Ma, Barun Patra, Saksham Singhal, Payal Bajaj, Xia Song, Xian-Ling Mao, Heyan Huang, Furu Wei

We also present a comprehensive analysis on the representation and routing behaviors of our models.

Clustering Language Modelling

18,338

Paper
Code

StableMoE: Stable Routing Strategy for Mixture of Experts

1 code implementation • ACL 2022 • Damai Dai, Li Dong, Shuming Ma, Bo Zheng, Zhifang Sui, Baobao Chang, Furu Wei

We point out that existing learning-to-route MoE methods suffer from the routing fluctuation issue, i. e., the target expert of the same input may change along with training, but only one expert will be activated for the input during inference.

Language Modelling Machine Translation

Paper
Code

DeepNet: Scaling Transformers to 1,000 Layers

6 code implementations • 1 Mar 2022 • Hongyu Wang, Shuming Ma, Li Dong, Shaohan Huang, Dongdong Zhang, Furu Wei

In this paper, we propose a simple yet effective method to stabilize extremely deep Transformers.

Translation

48,169

Paper
Code

Zero-shot Cross-lingual Transfer of Prompt-based Tuning with a Unified Multilingual Prompt

1 code implementation • 23 Feb 2022 • Lianzhe Huang, Shuming Ma, Dongdong Zhang, Furu Wei, Houfeng Wang

To collocate with the unified prompt, we propose a new initialization method for the target label word to further improve the model's transferability across languages.

Zero-Shot Cross-Lingual Transfer

Paper
Code

A Unified Strategy for Multilingual Grammatical Error Correction with Pre-trained Cross-Lingual Language Model

no code implementations • 26 Jan 2022 • Xin Sun, Tao Ge, Shuming Ma, Jingjing Li, Furu Wei, Houfeng Wang

Synthetic data construction of Grammatical Error Correction (GEC) for non-English languages relies heavily on human-designed and language-specific rules, which produce limited error-corrected patterns.

Grammatical Error Correction Language Modelling +3

Paper
Add Code

PAEG: Phrase-level Adversarial Example Generation for Neural Machine Translation

no code implementations • COLING 2022 • Juncheng Wan, Jian Yang, Shuming Ma, Dongdong Zhang, Weinan Zhang, Yong Yu, Zhoujun Li

While end-to-end neural machine translation (NMT) has achieved impressive progress, noisy input usually leads models to become fragile and unstable.

Machine Translation NMT +1

Paper
Add Code

SMDT: Selective Memory-Augmented Neural Document Translation

no code implementations • 5 Jan 2022 • Xu Zhang, Jian Yang, Haoyang Huang, Shuming Ma, Dongdong Zhang, Jinlong Li, Furu Wei

Existing document-level neural machine translation (NMT) models have sufficiently explored different context settings to provide guidance for target generation.

Document Level Machine Translation Document Translation +4

Paper
Add Code

Multilingual Machine Translation Systems from Microsoft for WMT21 Shared Task

no code implementations • WMT (EMNLP) 2021 • Jian Yang, Shuming Ma, Haoyang Huang, Dongdong Zhang, Li Dong, Shaohan Huang, Alexandre Muzio, Saksham Singhal, Hany Hassan Awadalla, Xia Song, Furu Wei

This report describes Microsoft's machine translation systems for the WMT21 shared task on large-scale multilingual machine translation.

Machine Translation Translation

Paper
Add Code

Towards Making the Most of Multilingual Pretraining for Zero-Shot Neural Machine Translation

1 code implementation • 16 Oct 2021 • Guanhua Chen, Shuming Ma, Yun Chen, Dongdong Zhang, Jia Pan, Wenping Wang, Furu Wei

Abstractive Text Summarization Cross-Lingual Abstractive Summarization +5

Paper
Code

Multilingual Agreement for Multilingual Neural Machine Translation

no code implementations • ACL 2021 • Jian Yang, Yuwei Yin, Shuming Ma, Haoyang Huang, Dongdong Zhang, Zhoujun Li, Furu Wei

Although multilingual neural machine translation (MNMT) enables multiple language translations, the training process is based on independent multilingual objectives.

Machine Translation Translation

Paper
Add Code

XLM-E: Cross-lingual Language Model Pre-training via ELECTRA

3 code implementations • ACL 2022 • Zewen Chi, Shaohan Huang, Li Dong, Shuming Ma, Bo Zheng, Saksham Singhal, Payal Bajaj, Xia Song, Xian-Ling Mao, Heyan Huang, Furu Wei

In this paper, we introduce ELECTRA-style tasks to cross-lingual language model pre-training.

Ranked #1 on Zero-Shot Cross-Lingual Transfer on XTREME

Language Modelling Translation +1

18,338

Paper
Code

DeltaLM: Encoder-Decoder Pre-training for Language Generation and Translation by Augmenting Pretrained Multilingual Encoders

2 code implementations • 25 Jun 2021 • Shuming Ma, Li Dong, Shaohan Huang, Dongdong Zhang, Alexandre Muzio, Saksham Singhal, Hany Hassan Awadalla, Xia Song, Furu Wei

While pretrained encoders have achieved success in various natural language understanding (NLU) tasks, there is a gap between these pretrained encoders and natural language generation (NLG).

Abstractive Text Summarization Machine Translation +5

18,337

Paper
Code

Smart-Start Decoding for Neural Machine Translation

no code implementations • NAACL 2021 • Jian Yang, Shuming Ma, Dongdong Zhang, Juncheng Wan, Zhoujun Li, Ming Zhou

Most current neural machine translation models adopt a monotonic decoding order of either left-to-right or right-to-left.

Machine Translation Translation

Paper
Add Code

How Does Distilled Data Complexity Impact the Quality and Confidence of Non-Autoregressive Machine Translation?

no code implementations • Findings (ACL) 2021 • Weijia Xu, Shuming Ma, Dongdong Zhang, Marine Carpuat

While non-autoregressive (NAR) models are showing great promise for machine translation, their use is limited by their dependence on knowledge distillation from autoregressive models.

Knowledge Distillation Machine Translation +1

Paper
Add Code

MT6: Multilingual Pretrained Text-to-Text Transformer with Translation Pairs

1 code implementation • EMNLP 2021 • Zewen Chi, Li Dong, Shuming Ma, Shaohan Huang Xian-Ling Mao, Heyan Huang, Furu Wei

Multilingual T5 (mT5) pretrains a sequence-to-sequence model on massive monolingual texts, which has shown promising results on many cross-lingual tasks.

Abstractive Text Summarization Machine Translation +7

128

Paper
Code

Zero-shot Cross-lingual Transfer of Neural Machine Translation with Multilingual Pretrained Encoders

1 code implementation • EMNLP 2021 • Guanhua Chen, Shuming Ma, Yun Chen, Li Dong, Dongdong Zhang, Jia Pan, Wenping Wang, Furu Wei

In this paper, we focus on a zero-shot cross-lingual transfer task in NMT.

Machine Translation NMT +2

Paper
Code

BlonDe: An Automatic Evaluation Metric for Document-level Machine Translation

2 code implementations • NAACL 2022 • Yuchen Eleanor Jiang, Tianyu Liu, Shuming Ma, Dongdong Zhang, Jian Yang, Haoyang Huang, Rico Sennrich, Ryan Cotterell, Mrinmaya Sachan, Ming Zhou

Standard automatic metrics, e. g. BLEU, are not reliable for document-level MT evaluation.

Document Level Machine Translation Machine Translation +2

Paper
Code

XLM-T: Scaling up Multilingual Machine Translation with Pretrained Cross-lingual Transformer Encoders

no code implementations • 31 Dec 2020 • Shuming Ma, Jian Yang, Haoyang Huang, Zewen Chi, Li Dong, Dongdong Zhang, Hany Hassan Awadalla, Alexandre Muzio, Akiko Eriguchi, Saksham Singhal, Xia Song, Arul Menezes, Furu Wei

Multilingual machine translation enables a single model to translate between different languages.

Language Modelling Machine Translation +2

Paper
Add Code

A Simple and Effective Unified Encoder for Document-Level Machine Translation

no code implementations • ACL 2020 • Shuming Ma, Dong-dong Zhang, Ming Zhou

Most of the existing models for document-level machine translation adopt dual-encoder structures.

Document Level Machine Translation Machine Translation +1

Paper
Add Code

Improving Neural Machine Translation with Soft Template Prediction

no code implementations • ACL 2020 • Jian Yang, Shuming Ma, Dong-dong Zhang, Zhoujun Li, Ming Zhou

Although neural machine translation (NMT) has achieved significant progress in recent years, most previous NMT models only depend on the source text to generate translation.

Machine Translation NMT +1

Paper
Add Code

Multimodal Matching Transformer for Live Commenting

no code implementations • 7 Feb 2020 • Chaoqun Duan, Lei Cui, Shuming Ma, Furu Wei, Conghui Zhu, Tiejun Zhao

In this work, we aim to improve the relevance between live comments and videos by modeling the cross-modal interactions among different modalities.

Text Generation

Paper
Add Code

Group, Extract and Aggregate: Summarizing a Large Amount of Finance News for Forex Movement Prediction

no code implementations • WS 2019 • Deli Chen, Shuming Ma, Keiko Harimoto, Ruihan Bao, Qi Su, Xu sun

In this work, we propose a BERT-based Hierarchical Aggregation Model to summarize a large amount of finance news to predict forex movement.

Extractive Summarization Stock Market Prediction

Paper
Add Code

Recursive Graphical Neural Networks for Text Classification

no code implementations • 18 Sep 2019 • Wei Li, Shuheng Li, Shuming Ma, Yancheng He, Deli Chen, Xu sun

Graph is a natural structure to describe the complicated relation between tokens.

General Classification text-classification +1

Paper
Add Code

Key Fact as Pivot: A Two-Stage Model for Low Resource Table-to-Text Generation

1 code implementation • ACL 2019 • Shuming Ma, Pengcheng Yang, Tianyu Liu, Peng Li, Jie zhou, Xu sun

We propose a novel model to separate the generation into two stages: key fact prediction and surface realization.

Table-to-Text Generation

Paper
Code

A Deep Reinforced Sequence-to-Set Model for Multi-Label Classification

1 code implementation • ACL 2019 • Pengcheng Yang, Fuli Luo, Shuming Ma, Junyang Lin, Xu sun

In this way, we can reduce the dependence of the model on the label order, as well as capture high-order correlations between labels.

General Classification Multi-Label Classification

Paper
Code

Phrase-level Self-Attention Networks for Universal Sentence Encoding

no code implementations • EMNLP 2018 • Wei Wu, Houfeng Wang, Tianyu Liu, Shuming Ma

As a result, the memory consumption can be reduced because the self-attention is performed at the phrase level instead of the sentence level.

Multi-class Classification Natural Language Inference +4

Paper
Add Code

Unsupervised Machine Commenting with Neural Variational Topic Model

no code implementations • 13 Sep 2018 • Shuming Ma, Lei Cui, Furu Wei, Xu sun

To fully exploit the unpaired data, we completely remove the need for parallel data and propose a novel unsupervised approach to train an automatic article commenting model, relying on nothing but unpaired articles and comments.

Retrieval

Paper
Add Code

LiveBot: Generating Live Video Comments Based on Visual and Textual Contexts

3 code implementations • 13 Sep 2018 • Shuming Ma, Lei Cui, Damai Dai, Furu Wei, Xu sun

We introduce the task of automatic live commenting.

Retrieval

124

Paper
Code

A Deep Reinforced Sequence-to-Set Model for Multi-Label Text Classification

no code implementations • 10 Sep 2018 • Pengcheng Yang, Shuming Ma, Yi Zhang, Junyang Lin, Qi Su, Xu sun

However, the Seq2Seq model is not suitable for the MLTC task in essence.

General Classification Multi Label Text Classification +2

Paper
Add Code

Semantic-Unit-Based Dilated Convolution for Multi-Label Text Classification

1 code implementation • EMNLP 2018 • Junyang Lin, Qi Su, Pengcheng Yang, Shuming Ma, Xu sun

We propose a novel model for multi-label text classification, which is based on sequence-to-sequence learning.

General Classification Multi Label Text Classification +2

154

Paper
Code

Identifying High-Quality Chinese News Comments Based on Multi-Target Text Matching Model

no code implementations • 22 Aug 2018 • Deli Chen, Shuming Ma, Pengcheng Yang, Xu sun

In this work, we introduce a novel task: high-quality comment identification (HQCI), which aims to automatically assess the quality of online comments.

Informativeness Text Matching

Paper
Add Code

A Neural Question Answering Model Based on Semi-Structured Tables

no code implementations • COLING 2018 • Hao Wang, Xiaodong Zhang, Shuming Ma, Xu sun, Houfeng Wang, Mengxiang Wang

Then the system measures the relevance between each question and candidate table cells, and choose the most related cell as the source of answer.

Knowledge Graphs Multiple-choice +1

Paper
Add Code

SGM: Sequence Generation Model for Multi-label Classification

1 code implementation • COLING 2018 • Pengcheng Yang, Xu sun, Wei Li, Shuming Ma, Wei Wu, Houfeng Wang

Further analysis of experimental results demonstrates that the proposed methods not only capture the correlations between labels, but also select the most informative words automatically when predicting different labels.

Classification General Classification +1

429

Paper
Code

Deconvolution-Based Global Decoding for Neural Machine Translation

1 code implementation • COLING 2018 • Junyang Lin, Xu sun, Xuancheng Ren, Shuming Ma, Jinsong Su, Qi Su

A great proportion of sequence-to-sequence (Seq2Seq) models for Neural Machine Translation (NMT) adopt Recurrent Neural Network (RNN) to generate translation word by word following a sequential order.

Ranked #9 on Machine Translation on IWSLT2015 English-Vietnamese

Machine Translation NMT +1

Paper
Code

Autoencoder as Assistant Supervisor: Improving Text Representation for Chinese Social Media Text Summarization

1 code implementation • ACL 2018 • Shuming Ma, Xu sun, Junyang Lin, Houfeng Wang

In this work, we supervise the learning of the representation of the source content with that of the summary.

Abstractive Text Summarization

136

Paper
Code

Bag-of-Words as Target for Neural Machine Translation

1 code implementation • ACL 2018 • Shuming Ma, Xu sun, Yizhong Wang, Junyang Lin

However, most of the existing neural machine translation models only use one of the correct translations as the targets, and the other correct sentences are punished as the incorrect sentences in the training stage.

Machine Translation Sentence +1

Paper
Code

Automatic Academic Paper Rating Based on Modularized Hierarchical Convolutional Neural Network

1 code implementation • ACL 2018 • Pengcheng Yang, Xu sun, Wei Li, Shuming Ma

As more and more academic papers are being submitted to conferences and journals, evaluating all these papers by professionals is time-consuming and can cause inequality due to the personal factors of the reviewers.

Paper
Code

Global Encoding for Abstractive Summarization

4 code implementations • ACL 2018 • Junyang Lin, Xu sun, Shuming Ma, Qi Su

To tackle the problem, we propose a global encoding framework, which controls the information flow from the encoder to the decoder based on the global information of the source context.

Ranked #29 on Text Summarization on GigaWord

Abstractive Text Summarization

273

Paper
Code

A Hierarchical End-to-End Model for Jointly Improving Text Summarization and Sentiment Classification

no code implementations • 3 May 2018 • Shuming Ma, Xu sun, Junyang Lin, Xuancheng Ren

Text summarization and sentiment classification both aim to capture the main ideas of the text but at different levels.

Abstractive Text Summarization Classification +3

Paper
Add Code

Query and Output: Generating Words by Querying Distributed Word Representations for Paraphrase Generation

1 code implementation • NAACL 2018 • Shuming Ma, Xu sun, Wei Li, Sujian Li, Wenjie Li, Xuancheng Ren

The existing sequence-to-sequence model tends to memorize the words and the patterns in the training dataset instead of learning the meaning of the words.

Abstractive Text Summarization Paraphrase Generation +2

Paper
Code

Decoding-History-Based Adaptive Control of Attention for Neural Machine Translation

no code implementations • 6 Feb 2018 • Junyang Lin, Shuming Ma, Qi Su, Xu sun

ACA learns to control the attention by keeping track of the decoding history and the current information with a memory vector, so that the model can take the translated contents and the current information into consideration.

Machine Translation NMT +1

Paper
Add Code

Complex Structure Leads to Overfitting: A Structure Regularization Decoding Method for Natural Language Processing

no code implementations • 25 Nov 2017 • Xu Sun, Weiwei Sun, Shuming Ma, Xuancheng Ren, Yi Zhang, Wenjie Li, Houfeng Wang

The decoding of the complex structure model is regularized by the additionally trained simple structure model.

Structured Prediction

Paper
Add Code

Does Higher Order LSTM Have Better Accuracy for Segmenting and Labeling Sequence Data?

1 code implementation • COLING 2018 • Yi Zhang, Xu sun, Shuming Ma, Yang Yang, Xuancheng Ren

In our work, we first design a new model called "high order LSTM" to predict multiple tags for the current token which contains not only the current tag but also the previous several tags.

Chunking NER +1

Paper
Code

Training Simplification and Model Simplification for Deep Learning: A Minimal Effort Back Propagation Method

3 code implementations • 17 Nov 2017 • Xu Sun, Xuancheng Ren, Shuming Ma, Bingzhen Wei, Wei Li, Jingjing Xu, Houfeng Wang, Yi Zhang

Based on the sparsified gradients, we further simplify the model by eliminating the rows or columns that are seldom updated, which will reduce the computational cost both in the training and decoding, and potentially accelerate decoding in real-world applications.

110

Paper
Code

Label Embedding Network: Learning Label Representation for Soft Training of Deep Networks

1 code implementation • ICLR 2018 • Xu Sun, Bingzhen Wei, Xuancheng Ren, Shuming Ma

We propose a method, called Label Embedding Network, which can learn label representation (label embedding) during the training process of deep networks.

Paper
Code

A Semantic Relevance Based Neural Network for Text Summarization and Text Simplification

1 code implementation • 6 Oct 2017 • Shuming Ma, Xu sun

In this work, our goal is to improve semantic relevance between source texts and simplified texts for text summarization and text simplification.

Paper
Code

meProp: Sparsified Back Propagation for Accelerated Deep Learning with Reduced Overfitting

2 code implementations • ICML 2017 • Xu Sun, Xuancheng Ren, Shuming Ma, Houfeng Wang

In back propagation, only a small subset of the full gradient is computed to update the model parameters.

110

Paper
Code

Improving Semantic Relevance for Sequence-to-Sequence Learning of Chinese Social Media Text Summarization

1 code implementation • ACL 2017 • Shuming Ma, Xu sun, Jingjing Xu, Houfeng Wang, Wenjie Li, Qi Su

In this work, our goal is to improve semantic relevance between source texts and summaries for Chinese social media summarization.

Paper
Code

A Generic Online Parallel Learning Framework for Large Margin Models

no code implementations • 2 Mar 2017 • Shuming Ma, Xu sun

To speed up the training process, many existing systems use parallel technology for online learning algorithms.

Paper
Add Code

Lock-Free Parallel Perceptron for Graph-based Dependency Parsing

no code implementations • 2 Mar 2017 • Xu Sun, Shuming Ma

To deal with this problem, we propose a parallel algorithm called parallel perceptron.

Dependency Parsing

Paper
Add Code

A New Recurrent Neural CRF for Learning Non-linear Edge Features

no code implementations • 14 Nov 2016 • Shuming Ma, Xu sun

Conditional Random Field (CRF) and recurrent neural models have achieved success in structured prediction.

Chinese Word Segmentation Chunking +3

Paper
Add Code

Towards Easier and Faster Sequence Labeling for Natural Language Processing: A Search-based Probabilistic Online Learning Framework (SAPO)

4 code implementations • 29 Mar 2015 • Xu Sun, Shuming Ma, Yi Zhang, Xuancheng Ren

We show that this method with fast training and theoretical guarantee of convergence, which is easy to implement, can support search-based optimization and obtain top accuracy.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.