(原理)AI平台

1-genai-platform.png

Step 2. Put in Guardrails #

Input guardrails

Output guardrails

Guardrail tradeoffs

Step 3. Add Model Router and Gateway #

Router

Gateway

Step 4. Reduce Latency with Cache #

Prompt cache

Exact cache

Semantic cache

Observability #

Metrics #

  • Time to First Token (TTFT): The time it takes for the first token to be generated.
  • Time Between Tokens (TBT): The interval between each token generation.
  • Tokens Per Second (TPS): The rate at which tokens are generated.
  • Time Per Output Token (TPOT): The time it takes to generate each output token.
  • Total Latency: The total time required to complete a response.

参考 #

Building A Generative AI Platform

【译】构建生成式 AI 平台概述

聊聊AI应用架构演进