推理 框架[1] #

-
inference execute engine(server)
vLLM,TensorRT, deepspeed -
inference execute engine(pc/edge 移动端)
llama.cpp
mlc-llm
ollama -
inference Server
Triton Server, Ray -
Chat Server [2]
FastChat, XInference, modelscope SWIFT
参考 #
1xx. 一文探秘LLM应用开发(18)-模型部署与推理(框架工具-Triton Server、RayLLM、OpenLLM)
1xx. 一文探秘LLM应用开发(16)-模型部署与推理(框架工具-TGI,vLLM,TensorRT-LLM,DS-MII)
1xx. 大模型推理框架概述