介绍
本文将使用gateway api inference extension作为envoy的ext_proc服务端
启动Ext_Proc
基于Gateway API Inference Extension
https://github.com/kubernetes-sigs/gateway-api-inference-extension.git
先clone代码到本地
git clone https://github.com/kubernetes-sigs/gateway-api-inference-extension.git
回退到一个较早commit,该commit是一个最基础的实现
git reset --hard 90c2b645c1515e760e511d00ed9f6e5324084acc

切换到examples/poc/ext-proc目录
将main.go中以下k8s相关内容注释掉

直接指定Pod IP和Pod启动服务,Pod名随意,以下只是使用vllm容器启动的deepseek服务
go run main.go --podIPs 128.128.0.14:8000 --pods pod1
会看到定时从指定的IP中获取指标

配置Envoy
增加ext_proc配置
1. 增加一个http-filter:类型为ext_proc,放在router之前
2. 增加一个cluster:指向上述启动的服务
admin:
address:
socket_address:
protocol: TCP
address: 0.0.0.0
port_value: 9901
static_resources:
listeners:
- name: listener_0
address:
socket_address:
address: 0.0.0.0
port_value: 10000
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
stat_prefix: ingress_http
access_log:
- name: envoy.access_loggers.stdout
typed_config:
"@type": type.googleapis.com/envoy.extensions.access_loggers.stream.v3.StdoutAccessLog
route_config:
name: local_route
virtual_hosts:
- name: local_service
domains: ["*"]
routes:
- match:
prefix: "/"
route:
cluster: original_destination_cluster
http_filters:
- name: envoy.filters.http.ext_proc
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.ext_proc.v3.ExternalProcessor
grpc_service:
envoy_grpc:
cluster_name: ext_proc
failure_mode_allow: true
processing_mode:
request_header_mode: SEND
response_header_mode: SEND
request_body_mode: BUFFERED_PARTIAL
response_body_mode: BUFFERED_PARTIAL
request_trailer_mode: SEND
response_trailer_mode: SEND
message_timeout: 1000s
- name: envoy.filters.http.router
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
clusters:
- name: ext_proc
type: STATIC
connect_timeout: 86400s
http2_protocol_options: {}
lb_policy: ROUND_ROBIN
load_assignment:
cluster_name: ext_proc
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: 10.42.16.26
port_value: 9002
load_balancing_weight: 1
- name: original_destination_cluster
type: ORIGINAL_DST
connect_timeout: 1000s
lb_policy: CLUSTER_PROVIDED
circuit_breakers:
thresholds:
- max_connections: 40000
max_pending_requests: 40000
max_requests: 40000
original_dst_lb_config:
use_http_header: true
#http_header_name: x-gateway-destination-endpoint
http_header_name: target-pod
重启envoy服务
测试
不用在请求头里自己加header,envoy会自动加上
curl http://128.128.0.13:10000/v1/completions -H "Content-Type: application/json" -d '{
"model": "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B",
"prompt": "San Francisco is a",
"max_tokens": 7,
"temperature": 0
}' -v

参考
深入解析 Envoy 外部处理过滤器(ext_proc) - Jimmy Song
External Processing Filter (proto) — envoy 1.36.0-dev-0c7818 documentation

37

被折叠的 条评论
为什么被折叠?



