Velvet AI gateway latency benchmarks

Latency is nominal, plus a 50% improvement with caching

Latency is the delay between a user action and the system response. When leveraging AI, there are additional factors to consider - including inference speed, token generation, and prompt implementation. Read OpenAI’s docs on latency optimization to learn more.

‍

Velvet operates as a proxy, so it’s critical that we don’t add unnecessary latency to requests. We ran an experiment to test average and p99 latency.

‍

In summary, we found that Velvet’s gateway latency is nominal - between 200-300ms per request on average, with minimums as low as 85ms. With caching, we can improve response times by 50% or more on concise chat completions (with increased benefit on longer completions). Velvet’s latency should be imperceptible to end users.

Velvet’s latency benchmarks

We benchmarked Velvet’s gateway latency relative to industry standards.

‍

Test conditions

‍

Network is at a crowded coffee shop with 40-100ms of loaded latency
100 requests per test
Concise chat completion example
No gaming of results — these are first shot attempts for 3 scenarios

‍

Definitions

‍

Latency: Delay between a user action and the system response
Response caching: Return the same response without additional inference cost
p99: 99% percent of requests will be faster than the given number

‍

TLDR‍

The average latency delta for a chat completion between OpenAI and Velvet is 208ms, with a p99 delta of 231ms. Caching decreases response times by more than 50%.

‍

‍

Average latency, no cache

‍

Min delta: 85ms
Mean delta: 208ms

‍

Average latency, cached

‍

Min delta: -127ms
Mean delta: -349ms

‍

Delta between p OpenAI and Gateway, no cache

‍

p99: 231.347ms
p95: 299.669ms
p90: 216.082ms

‍

Delta between p OpenAI and Gateway, cached

‍

p99: -644.526ms (50% decrease)
p95: -516.373ms (55.54% decrease)
p90: -519.085ms (56.99% decrease)

‍

Enable caching to optimize latency

As illustrated in our benchmark results, introducing caching can lead to a meaningful reduction in latency and costs. If you use Velvet, enabling caching is easy. Simply add a 'velvet-cache-enabled' header set to 'true'.

‍

Read our article on caching to learn more.

‍