[Learning notes] reading "Attention is all you need" paper

By Silent Atlas · March 22, 2026 · 1 min read

Abstract Encoder: is just the part we studied about RNN that reads the input sequence and "digest" it, the part responsible for creating and updating the hidden state, in a way it's like a person reading something and keeping the "gist" of it in mind Decoder: is the part responsible for using that "gist", that hidden state, the mathematical vector, and use it to produce an output Attention mechanism: hidden state in case RNN ig Dispensing with: getting rid of Pros of transformers: Better results Parallelization Less time to train "Speedometer"reading for AI translation ability: BLEU (Bilingual Evaluation Understudy): so it seems to be some math formula, used as a metric to grade a machine's translation, comparing it with a translation written by a professional, a human ofc 0.0 would mean there was no matching and the model produced a horribly wrong result 100.0 or 1.0 would mean it was a perfect match, but ofc this can never be the case, we can say the exact same thing using lots of di

[Learning notes] reading "Attention is all you need" paper

Related Posts

Trending on ShareHub

Latest on ShareHub

Browse Topics

Around the Network