Search Results for author: Dezhou Shen

FoundationLayerNorm: Scaling BERT and GPT to 1,000 Layers

The mainstream BERT/GPT model contains only 10 to 20 layers, and there is little literature to discuss the training of deep BERT/GPT.

Paper
Add Code

Large-scale Transformer models have significantly promoted the recent development of natural language processing applications.

Paper
Add Code

In recent years, driven by the Asian film industry, such as China and India, the global box office has maintained a steady growth trend.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.