Implementing attention mechanisms, multi-head attention, transformer architecture, etc. from scratch in Tensorflow.
Mar 10, 2021