Improving Deep Transformer with Depth-Scaled Initialization and Merged Attention

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Fingerprint Dive into the research topics of 'Improving Deep Transformer with Depth-Scaled Initialization and Merged Attention'. Together they form a unique fingerprint.

Engineering & Materials Science