TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	REMOVE
Machine Translation	WMT2014 English-German	PartialFormer	BLEU score	29.56	# 23
Machine Translation	WMT2014 English-German	PartialFormer	Number of Params	68M	# 10

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/partialformer-modeling-part-instead-of-whole/machine-translation-on-wmt2014-english-german)](https://paperswithcode.com/sota/machine-translation-on-wmt2014-english-german?p=partialformer-modeling-part-instead-of-whole)`

PartialFormer: Modeling Part Instead of Whole

23 Oct 2023 · Tong Zheng, Bei Li, Huiwen Bao, Weiqiao Shan, Tong Xiao, Jingbo Zhu ·

The design choices in Transformer feed-forward neural networks have resulted in significant computational and parameter overhead. In this work, we emphasize the importance of hidden dimension in designing lightweight FFNs, a factor often overlooked in previous architectures. Guided by this principle, we introduce PartialFormer, a parameter-efficient Transformer architecture utilizing multiple smaller FFNs to reduce parameters and computation while maintaining essential hidden dimensions. These smaller FFNs are integrated into a multi-head attention system to enable effective collaboration. We also propose a tailored head scaling strategy to enhance PartialFormer's capabilities. Furthermore, we present a residual-like attention calculation to improve depth scaling within PartialFormer. Extensive experiments on 9 translation tasks and 1 abstractive summarization task validate the effectiveness of our PartialFormer approach. Our code would be available at: \url{https://github.com/zhengkid/PartialFormer}.

PDF Abstract

Code

Add Remove Mark official

zhengkid/partialformer official

Tasks

Add Remove

Abstractive Text Summarization

Machine Translation

Translation

Datasets

WMT 2014

Results from the Paper

Edit

Ranked #23 on Machine Translation on WMT2014 English-German

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Result	Benchmark
Machine Translation	WMT2014 English-German	PartialFormer	BLEU score	29.56	# 23		Compare
Machine Translation	WMT2014 English-German	PartialFormer	Number of Params	68M	# 10		Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Dense Connections • Dropout • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

PartialFormer: Modeling Part Instead of Whole

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove