LLM via Termux?

2024-07-02

2 minute read

So after recently getting Termux installed on my phone to support various blogging endeavours, my next question was naturally:

“How easy would it be to run a LLM in Termux?”

How to run a LLM in Termux

As it turns out, it was both quick and easy. Note that YMMV depending on your phone’s hardware specifications.

Step 0: Dependencies

There’s a few things that we’re going to need:

Termux
wget: pkg install wget

Step 1: Download llamafile

llamafile is a very easy way to run LLMs, so to get started download the latest release from the project page:

1# 0.8.9 at the time of writing
2wget https://github.com/Mozilla-Ocho/llamafile/releases/download/0.8.9/llamafile-0.8.9
3chmod +x llamafile-0.8.9

Step 2: Download a LLM

As we’re on a phone, I’ve opted to use a lightweight model: Microsoft’s excellent Phi 3 Mini 4k Instruct.

1wget https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf/resolve/main/Phi-3-mini-4k-instruct-q4.gguf?download=true

Step 3: Run it up

With llamafile ready and a LLM downloaded, we’re ready to go:

./llamafile-0.8.9 -m Phi-3-mini-4k-instruct-q4.gguf

Once that loads, your system default browser should automatically open with an interface for interacting with the LLM: http://127.0.0.1:8080/

What does Phi 3 think?

So before posting, I decided to ask Phi 3 what it thought of this blog post. The conversation is included below, and was all run locally on my phone:

User: Please review this blog post: <Contents of this post removed for brevity>

Llama: This blog post looks great and provides a clear step-by-step guide on how to run a Large Language Model (LLM) like Microsoft's Phi 3 Mini 4k Instruct using the llamafile toolkit, specifically tailored for mobile devices. The instructions are well organized, making it easy for readers to follow along and set up their own LLM environment on a phone.

Here is some feedback that could further enhance your blog post:

1. Include screenshots or visuals during each step of the process to make it more user-friendly and visually engaging, especially since you're targeting mobile users who might be new to this topic.

2. Provide information about potential limitations when running LLM on a phone compared to using desktop computers with higher computational power. This will help readers set realistic expectations for the performance of their models in different environments.

3. Mention alternative lightweight language models that users can consider if they are not specifically interested in Microsoft's Phi 3 Mini 4k Instruct, to give them options beyond this model alone.

4. Include a brief explanation or resources for readers who may want more advanced usage of the LLM once it is set up on their phone (e.g., integrating with other applications).

5. Discuss any security concerns that might arise when running an LLM like Phi 3 Mini 4k Instruct on a mobile device, and suggest best practices for keeping users' data safe during the process.

Overall, your blog post is informative and provides valuable insights into setting up a personal AI environment using llamafile toolkit; these suggested improvements will help you make it even more comprehensive and helpful to readers in different use cases and contexts.

Honestly, pretty good feedback - particularly from such a small model.

For completeness, the performance I was getting on a S21 Ultra while generating the above:

386 tokens predicted, 803 ms per token, 1.25 tokens per second