(实战)推理 Ray

实战 #

环境 #

modelscope GPU

实战1 #

  • 脚本[1]

  • 遇到的异常[2]

实战2 #

  • 脚本
### 变更模型名字

### import 'modelscope' package
  • 异常[11]

实战3[20] #

  • 脚本
    vllm 0.2.3 -> 报异常
    vllm 0.3.3 -> 报另一个异常

实战4 #

  • 脚本 [30]

  • 异常 [31]

# 运行这个命令报异常
python -m vllm.entrypoints.openai.api_server --trust-remote-code --served-model-name gpt-4 --model mistralai/Mixtral-8x7B-Instruct-v0.1 --gpu-memory-utilization 1 --tensor-parallel-size 8 --port 8000

monitor[40] #

Ray Dashboard[41] #

Ray logging #

Loki grafana

Built-in Ray Serve metrics #

Prometheus

参考 #

实战1 #

  1. Serve a Large Language Model with vLLM

  2. Invalid device id when using pytorch dataparallel! 运行时碰到的异常

实战2 #

  1. examples/offline_inference_distributed.py

  2. 报错:RuntimeError: CUDA error: no kernel image is available for execution on the device

实战3 #

  1. Ray vLLM Interence

1xx. GitHub - ray-project/langchain-ray: Examples on how to use LangChain and Ray git

实战4 #

  1. 在甲骨文云上用 Ray +Vllm 部署 Mixtral 8*7B 模型_mixtral 8x7b 部署-CSDN博客

  2. 报错:RuntimeError: CUDA error: no kernel image is available for execution on the device-CSDN博客

monitor #

  1. Monitor Your Application

  2. Ray Dashboard