🧀 BigCheese.ai

Social

Writing an LLM from scratch, part 8 – trainable self-attention

🧀

Giles Thomas' latest blog post details his journey through the book 'Build a Large Language Model (from Scratch)' by Sebastian Raschka. He delves into section 3.4 covering 'Implementing self-attention with trainable weights'. The post offers his perspective on understanding the concept of self-attention in LLMs, where inputs are transformed into different spaces to calculate attention scores, leading to the production of context vectors representing the meaning of tokens.

  • Self-attention transforms inputs into different spaces.
  • Use of matrix multiplication simplifies operations.
  • Context vectors are calculated from attention scores.
  • Softmax function normalizes attention scores.
  • Trainable weights in matrices project points.