Ambient Mesh: Can Sidecar-less Istio Make Applications Faster?
Ambient mode is the new sidecar-less data plane introduced in Istio in 2022. When ambient mode reached Beta status in May this year, I watched users kick the tires and run load tests to understand the performance implications after adding their applications to the mesh. Inspired by Quentin Joly’s blog about the incredible performance of Istio in ambient mode and similar feedback from other users in the community that sometimes applications are slightly faster in ambient mode, I decided to validate these results myself.
Test Environment:
I used a three-worker node Kubernetes cluster with 256GB RAM and a 32-core CPU in each node.
Istio uses a few tools to make consistent benchmarking easy. First, we use a load testing tool called Fortio, which runs at a specified number of requests per second (RPS), records a histogram of execution time and calculates percentiles — e.g., P99, the response time where 99% of the requests took less than that number.
We also provide a sample app called Bookinfo, which includes microservices written in Python, Java, Node.js and Ruby.
Each of the Bookinfo deployments has two replicas, which are evenly distributed to the three-worker nodes. Using a pod anti-affinity rule, I made sure that Fortio was placed on a different node than the details service.
Initial Test Result
I installed the Bookinfo application from the Istio v1.22.3 release. Using the Fortio tool to drive load to individual Bookinfo services (for example, details) or the full Bookinfo app, I noticed near-zero latency impact after adding everything to the ambient mesh. Most of the time they are within the range of 0-5% increase for the average or P90. I have noticed consistently that the details service in Istio ambient mode is slightly faster, just like Quentin reported in his blog.Load Testing the Details Service
I did the same test as Quentin, sending 100 RPS via 10 connections to the details service, and collected results for no mesh and ambient.
No Mesh: 100 RPS to the details service.

Ambient: 100 RPS to the details service.
| Fortio to details | Average | P50 | P75 | P90 | P99 | Differences |
| No Mesh run 1 | 0.89ms | 0.64ms | 0.74ms | 0.85ms | 2.67ms | 11% slower on average and 5% slower for P90 |
| Ambient run 1 | 0.80ms | 0.6ms | 0.71ms | 0.81ms | 1.4ms | |
| No Mesh run 2 | 0.86ms | 0.65ms | 0.75ms | 0.86ms | 1.71ms | 6% slower on average and 4% slower for P90 |
| Ambient run 2 | 0.81ms | 0.61ms | 0.72ms | 0.83ms | 1.56ms | |
| No Mesh run 3 | 0.90ms | 0.65ms | 0.76ms | 0.88ms | 1.92ms | 10% slower on average and 5% slower for P90 |
| Ambient run 3 | 0.82ms | 0.63ms | 0.72ms | 0.84ms | 1.5ms |
Why Are Apps Sometimes Faster in the Ambient Mesh?
We’ve been taught that service meshes add latency. Quentin’s results, replicated here, show a case where a workload is faster when running through a service mesh. What is happening?First Theory
When your applications are in the ambient mesh, the load requests travel first through a lightweight local node proxy called ztunnel, then to the destination ztunnel, and onward to the service. The details service is using HTTP/1.1 with the Webrick library in Ruby and we have seen poor connection management and keep-alive behaviors in older or poorly configured HTTP libraries. My first hypothesis was that when the client and server are on different nodes, proxying through client and server ztunnels could actually be faster if the applications are not using efficient HTTP/2 connections. Ztunnel uses connection pooling and HTTP Connect to establish secure tunnels between nodes to leverage parallelism and HTTP/2 stream multiplexing under loads.
However, this theory has some challenges. Why have I only observed this consistently with the details service but not any other Bookinfo services?
Researching further, I discovered that our Fortio load tool has connection keep-alive enabled by default. With 10 connections from Fortio to the details service and the details service (using the WEBrick Ruby library) respecting the connection keep-alive settings, the connections can be reused effectively without ambient.
Load Testing With Connection Close
Next, I explored running the same load testing with setting the `Connection: close` header. This forcibly disables any HTTP connection pooling which is a good way to test this hypothesis.
curl -v -d '{"metadata": {"url":"http://details:9080/details/0", "c":"10", "qps": "100", "n": "2000", "async":"on", "save":"on"}}'
"localhost:8081/fortio/rest/run?jsonPath=.metadata" -H "Connection: close"

No Mesh: Fortio to the details service 100 RPS 10 connections with connection close.

Ambient: Fortio to the details service 100 RPS 10 connections with connection close.
| Fortio to details | Average | P50 | P75 | P90 | P99 | Differences |
| No Mesh | 1.90ms | 1.72ms | 2.28ms | 2.77ms | 3.98ms | |
| Ambient | 2.06ms | 2.15ms | 2.65ms | 2.94ms | 4ms | 8% slower for average & 6% slower for P90 |
Second Theory
I noticed there is a performance-related PR from John Howard in the details and productpage services of the Bookinfo application in our new Istio v1.23 release. For the details service, the PR enabled the TCP_NODELAY flag for the details WEBrick server, which would reduce the unnecessary delay (up to 40ms) from the response time of the details service. For the productpage service, the PR enabled keep-alive on incoming requests, which will reuse existing incoming connections and thus improve performance. With the newly updated details deployment that includes the fix, I repeated the same tests sending 100 RPS via 10 connections to the details service. The results for no mesh and ambient are really close so I ran each of the tests three times to ensure the results are consistent. Below are screenshots of the first run for each scenario:
No Mesh: Fortio to the new details service 100 RPS 10 connections.

Ambient: Fortio to the new details service 100 RPS 10 connections.
| Fortio to details | Average | P50 | P75 | P90 | P99 | Differences | |
| 1 | No Mesh | 0.76ms | 0.58ms | 0.69ms | 0.81ms | 1.56ms | 5% slower on average and P90. 25% slower on P99 |
| Ambient | 0.72ms | 0.57ms | 0.66ms | 0.76ms | 1.24ms | ||
| 2 | No Mesh | 0.72ms | 0.59ms | 0.7ms | 0.82ms | 1.6ms | 3% slower on P90 and 18% slower on P99 |
| Ambient | 0.76ms | 0.59ms | 0.69ms | 0.8ms | 1.37ms | 5% slower on average | |
| 3 | No Mesh | 0.77ms | 0.58ms | 0.7ms | 0.8ms | 1.49ms | 1% slower on average and 8% slower on P99 |
| Ambient | 0.76ms | 0.59ms | 0.69ms | 0.81ms | 1.38ms | 1% slower on P90 | |
Third Theory
Continue reviewing the test results from Table 3, why would there be similar latency between no mesh and ambient when there are extra hops to ztunnel pods and significant benefits provided by ambient such as mTLS and L4 observability between the Fortio and details service? For the P99 case, why would the details service in the ambient mode be faster consistently? Ztunnel provides great read/write buffer management with HTTP/2 multiplexing, which could effectively minimize or sometimes even eliminate the overhead added by the extra hops through the client and the server ztunnel pods. I decided to measure this with syscalls using strace from both the Fortio and details service by getting into their Kubernetes worker nodes and attaching the pids using strace while filtering out the irrelevant traces:strace -fp {pid} -e trace=write,writev,read,recvfrom,sendto,readv
The strace output from the details service is similar for the no-mesh and ambient cases:
…
read(9, "GET /details/0 HTTP/1.1\r\nHost: d"..., 8192) = 118
write(9, "HTTP/1.1 200 OK\r\nContent-Type: a"..., 180) = 180
write(9, "{\"id\":0,\"author\":\"William Shakes"..., 178) = 178
write(2, "192.168.239.19 - - [13/Aug/2024:"..., 80) = 80
…
Output 1: No mesh or ambient — attach strace to the details service’s PID.
The strace outputs from the Fortio service for no-mesh vs ambient are different. In the no-mesh case, we see Fortio executed two reads, one for the HTTP headers and another for the body.
…
read(13, "HTTP/1.1 200 OK\r\nContent-Type: a"..., 4096) = 180
read(13, "{\"id\":0,\"author\":\"William Shakes"..., 4096) = 178
…
write(19, "GET /details/0 HTTP/1.1\r\nHost: d"..., 118) = 118
…
Output 2: No mesh — attach strace to Fortio’s PID.
In the ambient case we consistently see just one read for both the headers and the body.
… read(19, "HTTP/1.1 200 OK\r\nContent-Type: a"..., 4096) = 358 … write(19, "GET /details/0 HTTP/1.1\r\nHost: d"..., 118) = 118 …Output 3: Ambient mesh — attach strace to Fortio’s PID. Why would this happen? It makes sense that the write calls are unchanged since they are entirely based on the application behavior which is not changed in this case. Ambient coalesces these multiple application writes and converts them into a single network write and by implication a single read in the peer. In the test scenario above I observed a 60% reduction in total syscalls by the Fortio service with ambient enabled. This is very substantial and explains the majority of the improvement in latency and ~25% CPU reduction of the Fortio pod at peak time with ambient. The reduction in syscalls is more than offsetting the cost of mTLS and the other features of ztunnel. I expect this pattern to be quite common in enterprises with some HTTP libraries and applications doing a better job of buffering and flushing and some not so much. Often this will correlate with the age of applications and the SDKs they were built on.

No mesh and ambient runs: Fortio to the details service 100 QPS 10 connections.
What About the Entire Bookinfo Application?
With the newly updated details and productpage deployments, I started with sending 1000 RPS via 100 connections to the Bookinfo application, and observed great results for no mesh and ambient.
No Mesh: Fortio to the new Bookinfo app 1000 RPS 100 connections.

No Mesh: Fortio to the new Bookinfo app 1000 RPS 100 connections.
| Fortio to Bookinfo | Average | P50 | P75 | P90 | P99 | Average Differences |
| No Mesh | 1.39ms | 1.32ms | 1.42ms | 1.67ms | 2.19ms | |
| Ambient | 1.40ms | 1.34ms | 1.48ms | 1.68ms | 2.94ms | Less than 1% slower for average and P90 |
| Fortio to Bookinfo | Average | P50 | P75 | P90 | P99 | Average Differences |
| No Mesh | 6.35ms | 4.68ms | 7.44ms | 11.4ms | 36.63ms | |
| Ambient | 6.74ms | 4.9ms | 7.79ms | 12.12ms | 41.14ms | 6% slower |

Ambient: Fortio to the new Bookinfo app 4000 RPS 400 connections.

Ambient: Fortio to the new Bookinfo app 4000 RPS 400 connections.
| Fortio to Bookinfo | Average | P50 | P75 | P90 | P99 | Average Differences |
| No Mesh | 1.54ms | 1.33ms | 1.54ms | 2.25ms | 3.98ms | |
| Ambient | 1.58ms | 1.37ms | 1.57ms | 2.33ms | 4.9ms | 3% slower on average and 4% slower on P90 |