•
How do transformers work? The introduction of transformer architecture addresses the preceding shortcomings of RNNs and CNNs. Transformers use an attention mechanism, which allows the model to focus on different parts of the input when generating each word in the output. Simply put, the attention mechanism measures how words interrelate in a sentence, paragraph,…
•
Benefits of transformers As mentioned earlier, transformers are a type of neural network architecture that replaces traditional RNNs and CNNs with an entirely attention-based mechanism. But how does the attention mechanism work? Attention does this by calculating “soft” weights for each word in the context window and doing this in parallel in the transformer…
•
Conversation prompts and completions – under the covers Prompts, or the input entered by you or an application/service, play a crucial role in NLP + LLMs by facilitating the interaction between humans and language models. If you have had any experience with GenAI, you may have already entered a prompt into an online service…
•
In the preceding simplified transformer architecture, the interaction is the input/output described in the white boxes. The larger gray box is the entirety of the processing taking place without user interaction. Some of the phases in the prompt and completion sequence in the preceding image include the following: As you can see in our…
•
LLMs landscape, progression, and expansion We can write many chapters on how modern LLMs have leveraged transformer model architecture, along with its explosive expansion and the numerous models being created on almost on a daily basis. However, in this last section, let’s distill the usage of LLMs and their progression thus far and also…
•
AutoGen At the time of writing, significant work is being done by Microsoft Research on the next major breakthrough: autonomous agents, or AutoGen. AutoGen hopes to take LLMs and the evolution of the transformer model architecture to the next level. The Microsoft AutoGen framework is an open source platform for building multi-agent systems using…
•
What is fine-tuning and why does it matter? Issues inherent in general LLMs such as GPT-3 include their tendency to produce outputs that are false, toxic content, or negative sentiments. This is attributed to the training of LLMs, which focuses on predicting subsequent words from vast internet text, rather than securely accomplishing the user’s…
•
Fine-tuning applications Fine-tuning can be applied to a wide range of natural language processing tasks, including the following: The aforementioned fine-tuning tasks are the most popular ones. This is a rapidly evolving field, and more tasks are emerging and can be found on Hugging Face (source: https://huggingface.co/ docs/transformers/training) and Azure’s Machine Learning Studio (Model…
•
Pre-training process Pre-training is the initial phase of training a language model. During this phase, the model learns from a massive amount of text data, often referred to as the “pre-training corpus.” The goal of pre-training is to help the model learn grammar, syntax, context, and even some world knowledge from the text. The…
•
Techniques for fine-tuning models In this section, we’ll discuss two fine-tuning methods: the traditional full fine-tuning approach and advanced techniques such as PEFT, which integrates optimizations to attain comparable results to full fine-tuning but with higher efficiency and reduced memory and computational expenses. Full fine-tuning Full fine-tuning refers to the approach where all parameters/weights…