Introduction to Attention Mechanism and Transformers

Transformers have demonstrated excellent capabilities and they overcome challenges such NLP, Text-To-Image Generation or Image Completion with large datasets, great model size and enough compute. Talking about transformers nowadays is as casual as talking about CNNs, MLPs or Linear Regressions. Why not take a glance through this state-of-the-art architecture? In this post, we’ll introduce the Sequence-to-Sequence (Seq2Seq) paradigm, explore the attention mechanism, and provide a detailed, step-by-step explanation of the components that make up transformer architectures....

Date: February 17, 2025 · Estimated Reading Time: 10 min · Author: Oriol Alàs Cercós