LLM module. how to use another llm model ?

erictiquet

Hi Everyone,

I am new to the module.
Want to switch llm model as new are available
( ex: llama3.2-1b-prefill-ax630c or qwen2.5-1.5b-ax630c )
in short can't succed to load other model

Any suggestions ? or pointing to the proper documentation ? ( did not find any topic regarding changing model via Arduino )

What I did so far ?

log into the llm module via serial
ip a
connect via ssh root@ip (via the ethernet compagnon board )
load my ssh public key and then ssh
proceed succesfully to install other models via apt-get install xxx
reboot (just in case )

Then test via serial text ( via a M5STACK core grey, with a simple forward serial > serial2 app)
the sequence :

reset :

{ "request_id": "11212155", "work_id": "sys", "action": "reset" }
{"created":1746310691,"data":"None","error":{"code":0,"message":"llm server restarting ..."},"object":"None","request_id":"11212155","work_id":"sys"}
{"request_id": "0","work_id": "sys","created": 1746310696,"error":{"code":0, "message":"reset over"}}
then...

load model :

{ "request_id": "3", "work_id": "llm", "action": "setup","object": "llm.setup", "data": { "model": "qwen2.5-1.5b-ax630c", "response_format": "llm.utf-8.stream", "input": "llm.utf-8", "enoutput": true, "max_token_len": 256, "prompt": "You are a knowledgeable assistant capable of answering various questions and providing information." } }
{"created":1746310710,"data":"None","error":{"code":-5,"message":"Model loading failed."},"object":"None","request_id":"3","work_id":"llm"}

but it works with...

{ "request_id": "3", "work_id": "llm", "action": "setup", "object": "llm.setup", "data": { "model": "qwen2.5-0.5B-prefill-20e", "response_format": "llm.utf-8.stream", "input": "llm.utf-8", "enoutput": true, "max_token_len": 256, "prompt": "You are a knowledgeable assistant capable of answering various questions and providing information." } }
{"created":1746310813,"data":"None","error":{"code":0,"message":""},"object":"None","request_id":"3","work_id":"llm.1004"}

kuriko

@erictiquet
have you check this:
https://pulsar2-docs.readthedocs.io/en/latest/appendix/build_llm.html

erictiquet

Hello Kuriko, Everyone,

Found the solutions ;)

The best help came from :

chat.m5stack.com ( support you guys for making it stable )
chatgtp to write a little arduino ino code that display a web page. ( you've got the code below )
and the following page for the llm json syntax : ( that's what is exchange in the dialog )
https://github.com/m5stack/StackFlow/blob/main/doc/projects_llm_framework_doc/llm_llm_en.md

** First install the new models, they should appear in /opt/m5stack/data/

Connect via SSH: (as a normal linux server)

require that you plug the RJ45 and access the debug port via serial and type "ip a " to get the IP, or that you can sniff your dhcp server
to be safe and ease the work, suggest you upload your ssh-key on the llm module ( ssh-copy-id )
the default login is root@<your ip> and password "123456", change it after loading you key successfully.

**Then install the new models : (for example)
apt-get install llm-model-llama3.2-1b-prefill-ax630c llm-model-qwen2.5-1.5b-p256-ax630c

you should see something like :

root@m5stack-LLM:/# ls -la /opt/m5stack/data
total 68
drwxrwxr-x 17 root root 4096 May 4 07:23 .
drwxrwxr-x 7 root root 4096 Feb 20 21:24 ..
drwxrwxr-x 2 root root 4096 Dec 5 17:03 audio
drwxrwxr-x 3 1000 1000 4096 May 4 04:48 llama3.2-1B-prefill-ax630c
drwxrwxr-x 2 root root 4096 Dec 5 17:03 melotts_zh-cn
drwxrwxr-x 2 root root 4096 May 4 07:25 models
drwxrwxr-x 2 1000 1000 4096 May 4 05:50 qwen2.5-0.5B-prefill-20e
drwxr-xr-x 3 1000 1000 4096 May 4 04:49 qwen2.5-1.5B-p256-ax630c
drwxrwxr-x 2 root root 4096 Dec 5 17:03 sherpa-ncnn-streaming-zipformer-20M-2023-02-17
drwxrwxr-x 2 root root 4096 Dec 5 17:03 sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23
drwxrwxr-x 2 root root 4096 Dec 5 17:03 sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01
drwxrwxr-x 2 root root 4096 Dec 5 17:03 sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01
drwxrwxr-x 2 root root 4096 Dec 5 17:03 single_speaker_english_fast
drwxrwxr-x 2 root root 4096 Dec 5 17:03 single_speaker_fast
drwxrwxr-x 2 root root 4096 Dec 5 17:03 yolo11n
drwxrwxr-x 2 root root 4096 Dec 5 17:03 yolo11n-pose
drwxrwxr-x 2 root root 4096 Dec 5 17:03 yolo11n-seg

watch out the mmc space with "df" command", with 2 more models you are reaching 74% of the avail space.
(another topic to tackle how to use and sdcard for additional storage space.. )

root@m5stack-LLM:/# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/root 29289340 21660980 7611976 74% /
tmpfs 490876 0 490876 0% /dev/shm
tmpfs 196352 876 195476 1% /run
tmpfs 5120 0 5120 0% /run/lock
tmpfs 490876 0 490876 0% /tmp
/dev/mmcblk1p1 30554112 3424 30550688 1% /mnt/mmcblk1p1
tmpfs 98172 0 98172 0% /run/user/0
root@m5stack-LLM:/#

Then to use it, just name the llm name with the name of the model install in the folder
ex : llama3.2-1B-prefill-ax630c
To play with the model you could use ino page and enter the following json :

{
"request_id": "2",
"work_id": "llm",
"action": "setup",
"object": "llm.setup",
"data": {
"model": "llama3.2-1B-prefill-ax630c",
"response_format": "llm.utf-8.stream",
"input": "llm.utf-8",
"enoutput": true,
"max_token_len": 256,
"prompt": "You are a helpful AI assistant."
}
}

should receive the following return code like :

{"created":1746846795,"data":"None","error":{"code":0,"message":""},"object":"None","request_id":"2","work_id":"llm.1004"}

pick the last value like "llm.xxxx" and create a prompt :

{
"request_id": "2",
"work_id": "llm.xxxx",
"action": "inference",
"object": "llm.utf-8.stream",
"data": {
"delta": "What's ur name?",
"index": 0,
"finish": true
}
}

then you will see something like... :

{"created":1746846972,"data":{"delta":"I'm an","finish":false,"index":0},"error":{"code":0,"message":""},"object":"llm.utf-8.stream","request_id":"2","work_id":"llm.1004"}
{"created":1746846973,"data":{"delta":" artificial intelligence model","finish":false,"index":1},"error":{"code":0,"message":""},"object":"llm.utf-8.stream","request_id":"2","work_id":"llm.1004"}
{"created":1746846974,"data":{"delta":" known as L","finish":false,"index":2},"error":{"code":0,"message":""},"object":"llm.utf-8.stream","request_id":"2","work_id":"llm.1004"}
{"created":1746846974,"data":{"delta":"lama. L","finish":false,"index":3},"error":{"code":0,"message":""},"object":"llm.utf-8.stream","request_id":"2","work_id":"llm.1004"}
{"created":1746846975,"data":{"delta":"lama stands for","finish":false,"index":4},"error":{"code":0,"message":""},"object":"llm.utf-8.stream","request_id":"2","work_id":"llm.1004"}
{"created":1746846976,"data":{"delta":" "Large Language","finish":false,"index":5},"error":{"code":0,"message":""},"object":"llm.utf-8.stream","request_id":"2","work_id":"llm.1004"}
{"created":1746846976,"data":{"delta":" Model Meta AI","finish":false,"index":6},"error":{"code":0,"message":""},"object":"llm.utf-8.stream","request_id":"2","work_id":"llm.1004"}
{"created":1746846977,"data":{"delta":"."","finish":false,"index":7},"error":{"code":0,"message":""},"object":"llm.utf-8.stream","request_id":"2","work_id":"llm.1004"}
{"created":1746846977,"data":{"delta":"","finish":true,"index":8},"error":{"code":0,"message":""},"object":"llm.utf-8.stream","request_id":"2","work_id":"llm.1004"}

et voilà.

erictiquet

the .ino code I used :

#include <Arduino.h>
#include <M5Unified.h>
#include <WiFi.h>
#include <WebServer.h>

// ⚙️ Configuration WiFi
const char* ssid = "you ssid";
const char* password = "you password";

// UART2 pour le module LLM
HardwareSerial LLM(2); // GPIO16 = RX, GPIO17 = TX

WebServer server(80);
String lastSerialMessage = "";

// 🔐 Encodage simple pour éviter les problèmes d’affichage HTML
String htmlEscape(String text) {
text.replace("&", "&");
text.replace("<", "<");
text.replace(">", ">");
text.replace(""", """);
text.replace("'", "'");
return text;
}

// 💻 Page HTML dynamique
String getHTMLPage() {
String html = R"rawliteral(
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>M5Core Web Serial</title>
</head>
<body>
<h1>M5Core Web Serial</h1>
<textarea id="msg" rows="15" cols="70" placeholder="Votre message ici..."></textarea><br>
<button onclick="sendMessage()">Envoyer</button>
<button onclick="clearOutput()">Effacer</button>
<p><strong>Réponse série :</strong></p>
<pre id="lastMessage" style="background:#eee; padding:10px; border:1px solid #ccc;"></pre>

  <script>
    function sendMessage() {
      const msg = document.getElementById("msg").value;
      fetch("/send", {
        method: "POST",
        headers: { "Content-Type": "application/x-www-form-urlencoded" },
        body: "msg=" + encodeURIComponent(msg)
      }).then(response => response.text())
        .then(text => {
          document.getElementById("lastMessage").innerText = text;
        });
    }

    function clearOutput() {
      fetch("/clear").then(r => r.text()).then(txt => {
        document.getElementById("lastMessage").innerText = "";
      });
    }

    setInterval(() => {
      fetch("/last").then(r => r.text()).then(txt => {
        document.getElementById("lastMessage").innerText = txt;
      });
    }, 2000);
  </script>
</body>
</html>

)rawliteral";
html.replace("%LAST_MESSAGE%", htmlEscape(lastSerialMessage));
return html;
}

void handleRoot() {
server.send(200, "text/html", getHTMLPage());
}

void handleSend() {
if (server.hasArg("msg")) {
String msg = server.arg("msg");
LLM.println(msg);
lastSerialMessage = "Envoyé : " + msg;
server.send(200, "text/plain", "Envoyé : " + msg);
} else {
server.send(400, "text/plain", "Argument 'msg' manquant");
}
}

void handleLast() {
server.send(200, "text/plain", lastSerialMessage);
}

void handleClear() {
lastSerialMessage = "";
server.send(200, "text/plain", "Effacé");
}

void setup() {
M5.begin();
M5.Lcd.setTextSize(2);
M5.Lcd.println("Initialisation...");

Serial.begin(115200); // PC USB
LLM.begin(115200, SERIAL_8N1, 16, 17); // RX2, TX2

WiFi.begin(ssid, password);
M5.Lcd.print("Connexion WiFi");
while (WiFi.status() != WL_CONNECTED) {
delay(500);
M5.Lcd.print(".");
}

M5.Lcd.println("\nConnecté");
M5.Lcd.println(WiFi.localIP());

server.on("/", handleRoot);
server.on("/send", HTTP_POST, handleSend);
server.on("/last", handleLast);
server.on("/clear", handleClear);
server.begin();
M5.Lcd.println("Serveur web actif !");
}

void loop() {
server.handleClient();

if (Serial.available()) {
char c = Serial.read();
LLM.write(c);
Serial.print(c);
}

if (LLM.available()) {
char c = LLM.read();
Serial.print(c);
lastSerialMessage += c;
M5.Lcd.print(c);

if (lastSerialMessage.length() > 20000) {
  lastSerialMessage = lastSerialMessage.substring(lastSerialMessage.length() - 20000);
}

}
}

erictiquet