HTTP Caching
HTTP Caching#
The HTTP cache stores a response associated with a request and reuses the stored response for subsequent requests.
- No need to deliver the request to the origin server, closer the client and cache, the faster the response
- The typical example: browser itself stores a cache for browser requests
- The origin server does not need to parse and route the request, restore the session based on the cookie, query the DB for results, or render the template engine - reduces the load on the server.
Proper operation of the cache is critical to the health of the system.
Types of Caches#
The IETF HTTP working group RFC (Request for Comments) on HTTP Caching states 2 types of caches:
- Private caches
- Shared caches
Private Caches#
- cache tied to a specific client - typically the browser
- not shared so can store a personalised response
- May cause information leakage if shared
- Personalized contents are usually controlled by cookies
Must specify:
Cache-Control: private
Note: if the response has an
Authorization
header, it cannot be stored in the private cache (or a shared cache, unless public is specified)
Shared Cache#
The shared cache is located between the client and the server and can store responses that can be shared among users
2 types:
- proxy caches
- managed caches
Proxy caches#
- Reduce traffic outside the network
- Not managed by service developer - controlled by HTTP headers
- Some proxies are old and do not respect headers
Kitchen sink headers:
Cache-Control: no-store, no-cache, max-age=0, must-revalidate, proxy-revalidate
In recent years, as HTTPS has become more common and client/server communication has become encrypted, proxy caches in the path can only tunnel a response and can’t behave as a cache, in many cases. In this case there is no need to worry about outdated proxy cache.
On the other hand, if a TLS bridge proxy decrypts all communications in a person-in-the-middle manner by installing a certificate from a CA (certificate authority) managed by the organization on the PC, and performs access control, etc. — it is possible to see the contents of the response and cache it.
Managed caches#
Managed caches are explicitly deployed by service developers to offload the origin server and to deliver content efficiently
Example:
- reverse proxies
- CDNs
- service workers with Cache API
In most cases they are managed with the Cache-Control
header
To opt out of private or proxy cache:
Cache-Control: no-store
Example:
- Varnish Cache uses VCL (Varnish Configuration Language, a type of DSL) logic to handle cache storage
- Service workers and Cache API allow for control with javascript
Heuristic caching#
Automatic caching for certain characteristics
Example - a response that ahs not been updated in a long while:
HTTP/1.1 200 OK
Content-Type: text/html
Content-Length: 1024
Date: Tue, 22 Feb 2022 22:22:22 GMT
Last-Modified: Tue, 22 Feb 2021 22:22:22 GMT
Despite no max-age
header the heuristics will determine it to be cached.
More interesting info in MDN: HTTP Caching
- The
ETag
- Cache Busting for CSS and JS