Harness the Potential of AI Tools with ChatGPT. Our blog offers comprehensive insights into the world of AI technology, showcasing the latest advancements and practical applications facilitated by ChatGPT’s intelligent capabilities.
Exploring Pre-Quantized Large Language Models
Throughout the last year, we have seen the Wild West of Large Language Models (LLMs). The pace at which new technology and models were released was astounding! As a result, we have many different standards and ways of working with LLMs.
In this article, we will explore one such topic, namely loading your local LLM through several (quantization) standards. With sharding, quantization, and different saving and compression strategies, it is not easy to know which method is suitable for you.
Throughout the examples, we will use Zephyr 7B, a fine-tuned variant of Mistral 7B that was trained with Direct Preference Optimization (DPO).
🔥 TIP: After each example of loading an LLM, it is advised to restart your notebook to prevent OutOfMemory errors. Loading multiple LLMs requires significant RAM/VRAM. You can reset memory by deleting the models and resetting your cache like so:
# Delete any models previously created
del model, tokenizer, pipe# Empty VRAM cache
import torch
torch.cuda.empty_cache()
You can also follow along with the Google Colab Notebook to make sure everything works as intended.
The most straightforward, and vanilla, way of loading your LLM is through 🤗 Transformers. HuggingFace has created a large suite of packages that allow us to do amazing things with LLMs!
We will start by installing HuggingFace, among others, from its main branch to support newer models:
# Latest HF transformers version for Mistral-like models
pip install git+https://github.com/huggingface/transformers.git
pip install accelerate bitsandbytes xformers
After installation, we can use the following pipeline to easily load our LLM:
from torch import bfloat16
from transformers import pipeline# Load in your LLM without any compression tricks
pipe = pipeline(
"text-generation",
model="HuggingFaceH4/zephyr-7b-beta",
torch_dtype=bfloat16,
device_map="auto"
)
Discover the vast possibilities of AI tools by visiting our website at
https://chatgptoai.com/ to delve deeper into this transformative technology.
Reviews
There are no reviews yet.