HyperGrid: Efficient Multi-Task Transformers with Grid-wise Decomposable Hyper Projections

12 Jul 2020Yi TayZhe ZhaoDara BahriDonald MetzlerDa-Cheng Juan

Achieving state-of-the-art performance on natural language understanding tasks typically relies on fine-tuning a fresh model for every task. Consequently, this approach leads to a higher overall parameter cost, along with higher technical maintenance for serving multiple models... (read more)

PDF Abstract

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper