Architecture Overview #
Application concepts [1] #
- Task - A remote function invocation.
- Object - An application value.
- Actor - a stateful worker process (an instance of a
@ray.remoteclass). - Driver - The program root, or the “main” program.
- Job - The collection of tasks, objects, and actors originating (recursively) from the same driver, and their runtime environment.
Design [1] #
- Components
- One or more worker processes
- A raylet.
- scheduler
- object store
- head node
- Global Control Service (GCS)
- driver process(es)
- cluster-level services
Spark vs. Ray[10] #
-
总的来说,Ray和Spark的主要差别在于他们的抽象层次。Spark对并行进行抽象和限制,不允许用户编写真正并行的应用,从而使框架有更多的控制权。Ray的层次要低得多,虽然给用户提供了更多灵活性,但更难编程。可以说,Ray揭示和暴露了并行,而Spark抽象和隐藏了并行。
-
就架构而言,Spark采用BSP模型,是无副作用的,而Ray本质上是一个RPC 框架+Actor框架+对象存储。
参考 #
1xx. 基于 Ray 的大规模离线推理 字节
字节跳动基于 Ray 的大规模离线推理
1xx. Ray Design Patterns 查看->模式
1xx. 大模型训练部署利器–开源分布式计算框架Ray原理介绍
Spark vs. Ray #
Internal #
1xx. Ray 分布式计算框架介绍
1xx. Ray 1.0 架构解读