Transformer models: Encoder-Decoders

Name: Transformer models: Encoder-Decoders
Uploaded: 2021-06-14T14:44:02.000Z
Duration: 12 min 47 s

Understanding Encoder-Decoder Architecture

This video explains the encoder-decoder architecture and how it works. It also highlights the importance of understanding encoders and decoders as standalone models before studying the encoder-decoder model.

Encoder

The encoder takes words as inputs, casts them through the encoder, and retrieves a numerical representation for each word cast through it.

The numerical representation holds information about the meaning of the sequence.

Decoder

The decoder is used in a manner that we haven't seen before.

We pass the outputs of the encoder directly to it along with an initial sequence.

Using this input alongside its usual sequence input, the decoder will take a stab at decoding the sequence.

Encoder-Decoder Magic

The encoder accepts a sequence as input, computes a prediction, and outputs a numerical representation. Then, it sends that over to the decoder.

The decoder decodes what the encoder has output using this input alongside its usual sequence input.

Auto-regressive Decoding

Once we have both feature vector and an initial generated word, we don't need the encoder anymore.

As we have seen before with the decoder, it can act in an auto-regressive manner;the word it has just output can now be used as an input.

Translation Language Modeling Example

We use an example of translating "Welcome to NYC" from English to French using a transformer model trained for that task explicitly.

We use start of sequence word to ask decoder to output first word which is "Bienvenue" meaning "Welcome".

We then use "Bienvenue" as the input sequence for the decoder. This, alongside the feature vector, allows the decoder to predict the second word, "à", which is "to" in English.

Finally, we ask the decoder to predict a third word; it predicts "NYC", which is correct.

Importance of Encoder-Decoder Architecture

Encoders-decoders are special because they're able to manage sequence-to-sequence tasks like translation.

The encoder and decoder often do not share weights. Therefore, we have an entire block (the encoder) that can be trained to understand the sequence and extract relevant information.

Sequence to Sequence Transformers

This section discusses the use of sequence to sequence transformers in natural language processing tasks such as translation and summarization.

Translation

Transformers can be used for translating sequences of words from one language to another.

A decoder can generate translations in an auto-regressive manner.

Summarization

Transformers are useful for summarizing long sequences of text.

The encoder and decoder can have different context lengths, allowing for a smaller context for the summarized sequence.

Encoder-Decoder Models

There are many types of encoder-decoder models available in the transformers library.

Specific encoders and decoders can be chosen based on their performance on specific tasks.

Playlists: Hugging Face Course

Video description

A general high-level introduction to the Encoder-Decoder, or sequence-to-sequence models using the Transformer architecture. What is it, when should you use it? This video is part of the Hugging Face course: http://huggingface.co/course Related videos: - The Transformer architectutre: https://youtu.be/H39Z_720T5s - Encoder models: https://youtu.be/MUqNwgPjJvQ - Decoder models: https://youtu.be/d_ixlCubqQw To understand what happens inside the Transformer network on a deeper level, we recommend the following blogposts by Jay Alammar: - The Illustrated Transformer: https://jalammar.github.io/illustrated-transformer/ - The Illustrated GPT-2: https://jalammar.github.io/illustrated-gpt2/ - Understanding Attention: https://jalammar.github.io/visualizing-neural-machine-translation-mechanics-of-seq2seq-models-with-attention/ Furthermore, for a code-oriented perspective, we recommend taking a look at the following post: - The Annotated Transformer, by Harvard NLP https://nlp.seas.harvard.edu/2018/04/03/attention.html Have a question? Checkout the forums: https://discuss.huggingface.co/c/course/20 Subscribe to our newsletter: https://huggingface.curated.co/