Docs Development Howto Add Model

Unlock the Power of AI for Your European SME

Are you ready to take your business to the next level with cutting-edge AI technology? In this article, we’ll guide you through the process of adding a new model architecture to llama.cpp, a popular open-source AI framework. Whether you’re a seasoned developer or just starting out, our step-by-step approach will help you navigate the technical aspects and unlock the full potential of AI for your European SME.

Step 1: Convert Your Model to GGUF

The first step in adding a new model architecture is to convert it to GGUF (General Graph Unified Format). This process involves using a convert script in Python, which reads the model configuration, tokenizer, tensor names+data and converts them to GGUF metadata and tensors. For HF models, you’ll need to define the model ModelBase.register annotation in a new TextModel or MmprojModel subclass.

Let’s take a look at an example for the Falcon model:

@ModelBase.register("MyModelForCausalLM")
class MyModel(TextModel):
    model_arch = gguf.MODEL_ARCH.MYMODEL

You’ll also need to define the layout of the GGUF tensors in constants.py. This involves adding an enum entry in MODEL_ARCH, the model human-friendly name in MODEL_ARCH_NAMES, and the GGUF tensor names in MODEL_TENSORS.

Step 2: Define Your Model Architecture in llama.cpp

Once you’ve converted your model to GGUF, it’s time to define its architecture in llama.cpp. This involves defining the model architecture using a combination of ModelBase.register annotations and tensor mappings.

For example, let’s say we want to add a new attention layer with normalization. We’d need to map the original tensor names to their equivalent GGUF tensor names:

block_mappings_cfg: dict[MODEL_TENSOR, tuple[str, ...]] = {
        # Attention norm
        MODEL_TENSOR.ATTN_NORM: (
            "gpt_neox.layers.{bid}.input_layernorm",

Step 3: Build the GGML Graph Implementation

With your model architecture defined in llama.cpp, it’s time to build the GGML graph implementation. This involves using a combination of C++ and Python code to create a graph representation of your model.

Don’t worry if this sounds daunting – we’ve got you covered! With our step-by-step guide, you’ll be able to navigate even the most complex technical aspects with ease.

Step 4: Test Your Model (Optional)

Once you’ve added your new model architecture, it’s essential to test it thoroughly. This involves running a series of tests on different backends, including CUDA, METAL, and CPU.

To streamline this process, we recommend using VORLUX AI’s comprehensive testing framework, which includes automated testing tools and expert support.

Key Takeaways

Adding a new model architecture to llama.cpp requires converting the model to GGUF.
Defining the model architecture in llama.cpp involves using a combination of ModelBase.register annotations and tensor mappings.
Building the GGML graph implementation requires C++ and Python code.

Get Started with VORLUX AI

Ready to unlock the full potential of AI for your European SME? Contact us today to learn more about our comprehensive AI solutions, including expert support, automated testing tools, and cutting-edge technology. Don’t miss out on this opportunity to transform your business – get started with VORLUX AI today!

Docs Development Howto Add Model

Blog

Revolutionizing European SMEs: How VORLUX AI's Expertise in Subagents and Web Search Can Transform Your Business

Simplifying AI Integration for European SMEs: A Quick Guide to VORLUX AI and OpenClaw

Access exclusive resources

15 minutes to evaluate your case

VORLUX AI