Large Language Models (LLM)

We show how a large language model can be adapted to your specific requirements.

Industry

various

Topic

NLP, LLM

Tools

Torch, Transformers, DeepSpeed

Project duration

2 weeks

Overview

Using a small example, we demonstrate how large language models can be adapted to individual requirements. To this end, we have written a variety of articles. Below, you can gain an overview of the topics and explore each article in more detail.

Text Preprocessing

In our first blog post, we show how one of the most important available language models, LLaMa by Meta, was trained and what different model sizes are available. We also provide an overview of the distribution and volume of training data. Since the training data for the LLaMa model is not publicly available, we explain how such a dataset can be created and which comparable datasets are freely accessible.

Finally, we show how to preprocess internal company documents (HTML, EPUBs, and PDFs) for use in later training of a language model.

Learn more in our → article

Pretraining

The second blog post revisits one of the most important available language models, LLaMa by Meta, and discusses how it was trained and what different model sizes exist. We also examine the cost of such training and the resulting CO2 emissions and electricity consumption.

Additionally, we explain how a tokenizer works and provide an interactive comparison between a German and English tokenizer. We also train the preprocessed text data from Goethe's 'Faust' into such a language model and demonstrate its impact.

Learn more in our → article

Translation

In another article, we explain why we want to fine-tune a language model and what different applications are possible. We also provide a brief overview of available datasets for fine-tuning such a model, how they can be created and translated. In addition, we discuss the costs involved in translating such a dataset.

Next, we show how to use a freely available translation model from Hugging Face and what considerations are necessary during translation. We have also published this translated dataset on Hugging Face.

Learn more in our → article

Fine-tuning

In a further step, we show how the previously translated dataset can be used to adapt the language model. We also demonstrate the different use cases for which language models can be applied.

As a first step, we use the German pre-trained BLOOM language model to perform full fine-tuning. With only 6.4B parameters, the model is relatively small and can therefore be trained on consumer-grade hardware

Furthermore, we show how larger language models can be fine-tuned using a so-called QLoRa adaptation. We explain how it works and illustrate the training process using a code example. With QLoRa fine-tuning, we are able to adapt a language model with 30B parameters. We have published these language models on Hugging Face.

Learn more in our → article

Evaluation

In our final article, we examine the strengths and weaknesses of various freely available language models in comparison to commercial language models from Aleph Alpha. It becomes clear that these models can compete with Aleph Alpha’s offerings and often deliver better performance. In addition, interacting with the adapted models is easier, as no special prompt templates or stopping criteria are required.

In all examined use cases, the open-source language models performed better and, in the case of the MPT model, were even able to generate functional Python code.

Learn more in our → article