Llama.cpp
Build llama.cpp
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make
Convert Hugging Face Model to GGUF
pip install -r requirements.txt
python convert_hf_to_gguf.py --help
#M1 MPS does not support bf16
python convert_hf_to_gguf.py ~/Documents/MODELS/Qwen2-0.5B --outfile ~/Documents/MODELS/qwen2-0.5b-fp16.gguf --outtype f16
Run the model
./llama-cli -m ~/Documents/MODELS/qwen2-0.5b-fp16.gguf -p "Hi, who are you?" -n 128