Llama.cpp:修订间差异

来自WHY42
无编辑摘要
 
(未显示同一用户的4个中间版本)
第6行: 第6行:
cd llama.cpp
cd llama.cpp
make
make
</syntaxhighlight>
=Convert Hugging Face Model to GGUF =
<syntaxhighlight lang="bash">
pip install -r requirements.txt
python convert_hf_to_gguf.py --help
#M1 MPS does not support bf16
python convert_hf_to_gguf.py ~/Documents/MODELS/Qwen2-0.5B --outfile ~/Documents/MODELS/qwen2-0.5b-fp16.gguf --outtype f16
</syntaxhighlight>
=Run the model =
<syntaxhighlight lang="bash">
./llama-cli -m ~/Documents/MODELS/qwen2-0.5b-fp16.gguf -p "Hi, who are you?" -n 128
</syntaxhighlight>
= quantize=
<ref>https://cloud.aigonna.com/2024/03/25/llama-cpp%E9%87%8F%E5%8C%96/</ref>
<syntaxhighlight lang="bash">
</syntaxhighlight>
</syntaxhighlight>


[[Category:Deep Learning]]
[[Category:Deep Learning]]

2024年7月19日 (五) 06:15的最新版本

Build llama.cpp

[1]

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make

Convert Hugging Face Model to GGUF

pip install -r requirements.txt
python convert_hf_to_gguf.py --help
#M1 MPS does not support bf16
python convert_hf_to_gguf.py ~/Documents/MODELS/Qwen2-0.5B --outfile ~/Documents/MODELS/qwen2-0.5b-fp16.gguf --outtype f16

Run the model

./llama-cli -m ~/Documents/MODELS/qwen2-0.5b-fp16.gguf -p "Hi, who are you?" -n 128

quantize

[2]