Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Neural sequence-to-sequence models, particularly the Transformer, are the state of the art in machine translation. Yet these neural networks are very sensitive to architecture and hyperparameter settings... (read more)

PDF Abstract WS 2019 PDF WS 2019 Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper


METHOD TYPE
Random Search
Hyperparameter Search
Residual Connection
Skip Connections
Adam
Stochastic Optimization
ReLU
Activation Functions
Dropout
Regularization
Multi-Head Attention
Attention Modules
BPE
Subword Segmentation
Dense Connections
Feedforward Networks
Label Smoothing
Regularization
Softmax
Output Functions
Layer Normalization
Normalization
Scaled Dot-Product Attention
Attention Mechanisms
Transformer
Transformers