This whole article refers to Contao 4.9 and higher. Previous versions also have caching mechanisms, but they are not nearly as efficient, so we have not documented how the caching works for the older versions.
The greatest performance gain in any application can be achieved by not having to start it at all. In other words: We want the output that Contao generates to be saved and delivered the next time it is called without having to even boot Contao at all.
To do so, Contao relies on the Hypertext Transfer Protocol (HTTP) - the backbone of the Internet.
Why should Contao reinvent the wheel when smart people thought about caching HTTP responses already when HTTP/1.1 was introduced back in 1999?
Back to the basics, then. To understand how HTTP caching can be configured in Contao, let’s take a quick look at how HTTP, and therefore the entire Internet and the communication of many other devices, works:
In principle, any other intermediaries (so-called “proxies”) can be connected in between, which is not only conceivable but also very common:
These proxies can perform any tasks, including but not limited to
To allow the client (in our case mostly the browser), the server and intermediaries to communicate with each other, every HTTP request and every HTTP response can be enhanced with metadata. These so called “HTTP headers” are standardized but it is up to every developer to invent and use additional headers. E.g. a typical request could look like this:
GET /about-us.html HTTP/1.1
Accept-Language: de,en
Host: www.contao.org
In this case the client would like to request the resource that is at /about-us.html
and it tells the server that it
understands the languages en
and de
(German) whereas it prefers de
over en
. It also requests the response from
the host www.contao.org
.
Remember, the communication between client and server uses TCP/IP and one server can serve any number of domains so
it is important to tell the server which one you’d like.
A response to this could look like this:
HTTP/1.1 200 OK
Content-Type: text/html; charset=utf-8
Content-Length: 42
Cache-Control: max-age=3600
<html>
...
</html>
Here, the server tells us that everything was fine (200 OK
). The response contains UTF-8
encoded HTML
and is 42
bytes long. In addition, this response may be cached for 3600
seconds, i.e. one hour.
There are a number of HTTP headers that are relevant for HTTP caching. To explain them all would go beyond the scope of this documentation. The topic is also already extremely well documented on MDN, so you can learn more there if you are interested.
The regular user of Contao is mainly interested in the Cache-Control
header and the three most important attributes:
private
or public
Tells a client whether the response is private or public. The browser, for example, may cache both private and public
responses, since it does not pass on the response. A proxy, on the other hand, may not cache private responses, because
it passes the response on. They cannot be combined. Something that can be cached publicly can also be cached privately,
so a public (public
) response implicitly means that multiple clients share the same cache entry. This is why we also
speak of a “shared cache” in this case.
max-age
Contains the number of seconds any client may cache this response.
s-maxage
Contains the number of seconds a public client is allowed to cache this response. This attribute is only used if the cache duration should be different for public and private clients.
For better understanding, let’s look at a few examples:
HTTP header | Interpretation |
---|---|
Cache-Control: max-age=3600 | Only private clients may cache this response for one hour. |
Cache-Control: max-age=3600, private | Only private clients may cache this response for one hour. |
Cache-Control: max-age=3600, public | Both public and private clients may cache this response for one hour. |
Cache-Control: max-age=3600, s-maxage=7200, public | Both public and private clients may cache this response. Private ones for one hour, public ones for two hours. |
And that is why you can find exactly these cache duration settings in the Contao page settings.
The following configuration translates to Cache-Control: max-age=1800, s-maxage=3600, public
:
The Contao Managed Edition is shipped with a cache proxy that is written in PHP too and sits directly in front of Contao. This means that every response that Contao generates is sent through our caching proxy before it is delivered to the client and is cached (or not) according to the HTTP headers.
It is important to understand that the included proxy knows absolutely nothing about Contao. Although it is also written in PHP, it is completely independent. Everything it does is based on the headers of the HTTP requests and responses it receives from the client and Contao.
The big advantage we gain by using HTTP standards is the free choice of the caching proxy. Let’s say the number of visitors gets higher and higher and PHP slowly starts reaching its limits. Maybe you will want to try out a more powerful caching proxy which was explicitly designed for caching such as Varnish?
However, this would go too far at this point.
Good to know for you: Whatever settings you have configured in Contao, it follows the HTTP standards and just works out-of-the-box for you. And if one day, the requirements become more complex, Contao will not let you down!
This section is about what can be stored in the shared cache. We don’t talk about the private cache anymore, i.e. your personal browser cache, because we want to make sure that as many of our visitors as possible don’t have to wait for Contao to generate the response but can benefit from the shared cache instead.
We already know the most important criterion: The Cache-Control: public
response header. If this header is missing,
the shared cache can never cache this response. But there are other criteria as well:
The HTTP status code must either be 200 OK
, 301 Permanent Redirect
or 404 Not Found
(there are a number of other statuses, but they are not really relevant for us here).
Cache-Control
must not contain no-store
. This value prevents any caching.
You need to specify the cache duration, i.e max-age
and optionally s-maxage
in the Cache-Control
header
(there are additional headers here as well, but they are not relevant for us either)
In case of the included Contao cache proxy, you can easily check if caching works. All responses
will have a Contao-Cache
header that can take on three values:
miss
The Contao cache could not find a cache entry. Contao is started, generates the response but it cannot be stored in the cache.
miss/store
The Contao cache could not find a cache entry. Contao is started, generates the response and it is stored accordingly.
fresh
The Contao cache has found the cache entry and the response is served directly from the cache. Ideally, you should
have already noticed the speed. In case of Contao-Cache: fresh
there is also the HTTP Age
header. It tells you how
many seconds the cache entry already exists. An Age: 120
means that this entry was created two minutes ago.
If the Contao Managed Edition runs in debug mode, the cache proxy is disabled entirely.
As we have seen, there are some basic requirements for HTTP caching to work: In addition to the requirements mentioned
above, there are other special reasons for Contao to enforce Cache-Control: private
and thus exclude a response
from the shared cache:
If the request contains an Authorization
HTTP header
The Authorization
header contains standard authentication details. For Contao this means that a module or content
element may listen to this header and potentially deliver user-specific data. To ensure that private data is not stored
in the shared cache and potentially delivered to a different visitor, Contao forces Cache-Control: private
. Contao will
also tell you about this: The Contao-Response-Private-Reason
header contains authorization
in this case.
It is very popular to protect installations using Basic Authentication during development. This means that the
request contains an Authorization
header and therefore the cache is always disabled. Keep this in mind when
testing cache settings.
If the request contains a PHP session cookie or the response sets it
The most obvious case here is of course the PHP session, i.e. when a user logs into the back end or a member logs into
the front end. In this case all responses are always private
and the Contao-Response-Private-Reason
header
contains session-cookie
.
If the response contains a cookie
If the response contains a cookie, it means that the developer wanted to personalize the response. The current response
therefore potentially already contains personal data. The Contao-Response-Private-Reason
header in this case
contains response-cookies
and includes a list of the affected cookie names.
If the request contains a cookie
This is by far the most likely, but also the most complex case. All request cookies can, but do not have to, disable the cache.
The rule is quite straightforward: Every cookie can potentially identify a visitor and therefore the application (i.e. Contao) could deliver personalized content (shopping carts, personal recommendations, logins etc.). Remember: The proxy is completely separate from the application, may even be located on another server in another part of the world and therefore has no way of knowing which cookies are relevant for the application. We have to tell it.
Thus, a cache proxy is not allowed to deliver anything from the cache if the request contains a cookie. Let’s repeat: Every single cookie implicitly means that the reverse proxy is bypassed. If we want to have a response generated from the cache, we have to make sure that not a single cookie finds its way to the cache proxy.
However, there is a large number of cookies. Here are a few examples:
_ga_*
cookies by Google Analytics_pk_*
cookies by Matomo__cfduid
cookiecookiebar_accepted
cookie, to know if the cookie policy has been acceptedThe list is quite long and the attentive reader has already discovered a fundamental difference between the
PHP session cookie and the _ga_*
cookie: One of them is really relevant for the application (i.e. the PHP session)
and the other one is responsible for client-side tracking (i.e. within the browser) of the user and is
completely irrelevant for Contao.
In other words, if a request contains only a _ga_*
cookie, we can still deliver the response from the cache,
because this cookie does not generate any personal content. A smart cache proxy could therefore only consider
relevant cookies and throw away all the others before deciding whether a response can be delivered from the cache.
To get the highest possible hit rate when caching out-of-the-box, Contao maintains a list of irrelevant cookies within the Contao cache proxy. These cookies are removed from the request before cache lookup and therefore do not reach the cache proxy or (in case of a cache miss) Contao itself.
More advanced users can override this list, see the section about configuring the Contao Cache Proxy.
In the case of request cookies, the Contao-Private-Response-Reason
header contains request-cookies
plus a list of
all the cookies that have not been removed.
The Contao Cache Proxy can be customized to a certain degree, just like Contao itself. Since Contao provides internal lists of irrelevant cookies, it has good default settings, but you can optimize it further by adjusting environment variables and tune it for performance.
Are you wondering why Contao is configured by editing the config.yaml
and the Contao Cache Proxy using environment variables?
The config.yaml
for Contao itself is application configuration. The included Contao Cache Proxy, however, needs to know its
settings before Contao is even booted. Booting is exactly what we want to prevent.
Thus, environment variables are the best choice for configuring our proxy.
The following environment variables allow you to further optimize the cache proxy:
COOKIE_ALLOW_LIST
In Contao 4.9 this environment variable is called COOKIE_WHITELIST
.
This is a special environment variable related to the default caching proxy which is shipped with the Contao Managed
Edition by default.
Contao disables any HTTP caching as soon as there is either a Cookie
or an Authorization
header present in the
request. That is because these headers can potentially authenticate a user and thus cause personalized content to
be generated in which case, we never want to serve any content from the cache.
However, unfortunately, the web consists of tons of different cookies. Most of which are completely irrelevant to
the application itself and are only used in JavaScript (although there are better alternatives such as LocalStorage,
SessionStorage or IndexedDB). You will find that e.g. Google Analytics, Matomo, Facebook etc. all set cookies your
application (Contao in this case) is not interested in at all. However, because the HTTP cache has to decide whether to
serve a response from the cache or not before the application is even started, there’s no way it can know which cookies
are relevant and which ones are not.
So, we have to tell it.
The Contao Managed Edition ships with a list of irrelevant cookies that are ignored by default to increase the hit rate
but if you want to optimize it even more, you can disable the default list by providing an explicit list of cookies
you need.
These are the cookies you know are relevant to the application and in this case, the cache must be omitted.
By default, Contao only uses the PHP session ID cookie to authenticate users and members, the CSRF cookie to
protect visitors from CSRF attacks when submitting forms, the trusted devices cookie for two-factor authentication and
the remember me cookie to automatically log in users if desired.
So in most cases, the following configuration will score the maximum cache hits but you may have to allow additional
cookies of extensions you installed:
COOKIE_ALLOW_LIST=PHPSESSID,csrf_https-contao_csrf_token,csrf_contao_csrf_token,trusted_device,REMEMBERME
The name of the PHP session cookie is configurable through the php.ini
so you might want to check if it’s PHPSESSID
for you too. Moreover, the CSRF cookie is different for http
and https
for security reasons. If you serve your
website over http
, note that the cookie name will be csrf_http-contao_csrf_token
.
However, protecting your users from CSRF attacks but let them submit the form via unsecured http
connections is
not really a valid use case.
COOKIE_REMOVE_FROM_DENY_LIST
This feature is available in Contao 4.10 and later.
In case you don’t want to manage the whole COOKIE_ALLOW_LIST
because you are unsure what your application needs but
you want to disable one or more of the existing entries on the deny list that is managed by Contao, you can specify this
using:
COOKIE_REMOVE_FROM_DENY_LIST=__utm.+,AMP_TOKEN
QUERY_PARAMS_ALLOW_LIST
This feature is available in Contao 4.10 and later.
For the very same reason we strip irrelevant cookies, we also strip irrelevant query parameters. E.g. you might be
familiar with the typical ?utm_*>=<randomtoken>
query parameters that are added to links of your website. Because they
change the URL every single time, they also generate new cache entries every single time, eventually maybe even flooding
your cache.
As with the irrelevant cookies, Contao also manages a list of irrelevant query parameters which again, you may completely
override by providing a list of allowed query parameters if you know all the query parameters your application ever
needs. This is highly unlikely which is why there is also QUERY_PARAMS_REMOVE_FROM_DENY_LIST
.
QUERY_PARAMS_REMOVE_FROM_DENY_LIST
This feature is available in Contao 4.10 and later.
As with COOKIE_REMOVE_FROM_DENY_LIST
, you can use QUERY_PARAMS_REMOVE_FROM_DENY_LIST
to remove an entry from the
default deny list shipped with Contao. If you e.g. need the Facebook click identifier (fbclid
) in your server side
code, you may update your list like so:
QUERY_PARAMS_REMOVE_FROM_DENY_LIST=fbclid
If you do so, make sure to disable caching by e.g. setting Cache-Control: no-store
on this response if fbclid
is
present as otherwise you are back to having thousands of cache entries in your cache proxy.
At this point we need to explain yet another concept. As a user of Contao, you won’t even notice this, but it is an important part of the excellent caching framework of Contao and therefore a great sales argument.
A key difference between the shared cache and private cache is that the location of the shared cache is known and can be accessed. So we can actively manage the shared cache and invalidate single cache entries or the whole cache. We cannot do this for the private cache.
So if you make changes to the content, you want to always make sure that the shared cache is emptied and visitors will receive the latest version of your website. To do so, use the “Purge the shared cache” option in the “Maintenance” menu.
But Contao would not be Contao if you had to do it yourself every time you applied a change, right?
Contao is equipped with a framework that allows developers to work with “cache tagging”. When the response is generated, it is tagged so that the cache proxy can store those tags as metadata alongside the cache entry. Based on this information, entries related to certain cache tags can then be invalidated.
Every response generated by Contao contains a lot of cache tags. A response could thus look like this:
HTTP/1.1 200 OK
Content-Type: text/html; charset=utf-8
Content-Length: 42
Cache-Control: max-age=3600
X-Cache-Tags: contao.db.tl_page.18, contao.db.tl_layout.16, contao.db.tl_content.42, contao.db.tl_content.10
<html>
...
</html>
The number of tags can grow to as many as you require. In this example, Contao has tagged the response with three
cache tags and as you might have noticed, these tags contain the information that the response relates to the page with
ID 18
, using the page layout ID 16
and the content elements with ID 42
and 10
are used on it.
And here’s the kicker: When you edit one of these elements in the Contao back end, Contao automatically invalidates
all cache entries that are related to this element! For example, if the content element 42
was used in a news item,
all cache entries of the detail page and potential list views would be deleted automatically. If you change the page
layout 18
and select an additional CSS file to be loaded, all responses generated with this page layout will be
invalidated automatically without any further action required.
Pretty smart, isn’t it?
Basically, this means that we can set the cache duration for the shared cache relatively high, since the invalidation always takes place when we change anything. You should still be careful and test actively, because there may still be cases where the automatic invalidation of entries does not work as desired. For example, the developer of an extension or the Contao core itself might have failed to add the correct cache tags. However, such cases can certainly be improved, just contact the person responsible.
You will never see the X-Cache-Tags
header in your browser because the Contao Cache Proxy removes it.
It contains no relevant information for the client and would only cause unnecessary data traffic.
By the way: Invalidating cache entries also works via HTTP requests because the cache proxy does not necessarily
have to be on the same server as Contao. Most cache proxies support receiving requests PURGE
(yes, you are allowed
to invent your own HTTP methods in addition to the usual ones such as GET
, POST
etc.) and the invalidation logic
is completely abstracted in Contao so that you can connect any cache proxy programmatically if needed.
This way, any cache proxy can be tought to support the cache tag based invalidation logic of Contao!