Contao ships with a built-in site search. It’s good enough for many use cases. In fact, it’s actually pretty smart and deals properly with word splitting across all languages (so it also recognizes e.g. Japanese Katakana or Kanji), supports highlighting and provides some basic relevance sorting.
However, searching is such a complex topic that you will likely never find a search engine that does exactly what you would like to achieve out of the box. There’s always at least some tweaking required to achieve what you really need. Here’s an non-exhaustive list of features you might be looking for but are not supported out of the box:
Some of these features require special software such as tools providing linguistic analysis and thus Contao will never be able to provide these features as long as a basic MySQL server is all that’s required for a simple search engine.
That’s why Contao ships with a search indexer abstraction, giving you the freedom to disable the core features and connect it with your favourite search engine such as Algolia or Elasticsearch.
This feature is available in Contao 4.9 and later.
Indexing in Contao happens in two different ways which may or may not be combined:
You can trigger the built-in website crawler (which is based on Escargot) by either rebuilding the search index manually in the back end or having a cron job that triggers
$ vendor/bin/contao-console contao:crawl
There are numerous options to this command so be sure to run it using --help
to learn about them.
Make sure to set the domain, see [Command Line](https://docs.contao.org/dev/framework/cron/#command-line).
By visiting the pages one after the other, the SearchIndexListener
will listen to the Symfony kernel.terminate
event and index the generated response content on every request.
Indexing using the SearchIndexListener
has the advantage that it happens behind the scenes without you having to worry
about it. It doesn’t require any cron job and it makes sure the search index is always updated so that when your editors
edit a page, it’s indexed as soon as the first visitor visits the modified page. Moreover, if a request is not successful
(meaning the HTTP status code is not in the 2xx
range), it will also delete that URI from the search index.
However, it also comes with the disadvantage that it happens on every single page request which may affect the
performance of your website.
Although the kernel.terminate
event is triggered after the response has been sent to the browser, this only works
if you are running PHP using php-fpm
. In case of mod_php
or fcgi
this will happen before the response is sent
to the browser, which may extend the time needed to finish the response.
This is why you can configure the behaviour of the SearchIndexListener
as follows:
# config/config.yaml
contao:
search:
listener:
index: true # Configure whether you want to update the index entry on every request
delete: false # Configure whether you want to delete an entry if a request is not successful
As you can see, by setting both values to false
you can disable the listener completely and rely on e.g. your regular
crawl cron job to do the work and cleaning up orphan indexed entries yourself.
So far, we’ve only looked at what triggers the search indexing process but not the indexers themselves.
The search indexers are responsible to index a given Contao\CoreBundle\Search\Document
.
As mentioned in the intro section of this page, Contao ships with a default search indexer which indexes the data
within the MySQL database that holds all the other content.
If you build your own search indexer, you might want to disable the core search indexer:
# config/config.yaml
contao:
search:
default_indexer:
enable: false
Registering your own search indexer can be done by implementing the Contao\CoreBundle\Search\Indexer\IndexerInterface
and registering your service using the contao.search_indexer
service tag:
# config/services.yaml
services:
App\Search\ExampleSearchIndexer:
tags:
- { name: 'contao.search_indexer' }
The IndexerInterface
itself consists of three methods you have to implement:
index(Document $document)
- called when a document should be indexed. It’s your responsibility to check if it even
needs an update.
delete(Document $document)
- called when a document should be deleted from the index.
clear()
- called when the whole search index shall be cleared.
The Document
gives you access to everything you might need from an HTTP response, most notably:
In addition to this, it provides helpers to work with JSON-LD. The JSON Linked Data format is very suitable to enhance your HTML responses with additional metadata such as schema.org information that is also interpreted by popular search engines such as Google, Bing, DuckDuckGo etc.
So let’s say you would like to provide a search module that is able to filter for product prices. First of all, we need to enhance our product detail page with JSON LD. You may invent your own schema but why not reusing what’s already standardized by schema.org and thus also supported by other search engines?
<h1>My product</h1>
<p class="description">My product is super nice!</p>
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "Product",
"description": "My product is super nice!",
"name": "My product",
"offers": {
"@type": "Offer",
"availability": "http://schema.org/InStock",
"price": "55.00",
"priceCurrency": "USD"
}
}
</script>
The great thing about this is, not only do other search engines now know that our product costs USD 55.00 but as this
is now part of our response body, our custom IndexerInterface
implementation has access to it.
You can extract it from $document->getBody()
but Contao wouldn’t be Contao if it wasn’t already prepared for that
use case:
$jsonLdScriptsData = $document->extractJsonLdScripts('https://schema.org', 'Product');
The variable $jsonLdScriptsData
now contains an already decoded array of all JSON LD scripts with @context
equal to
https://schema.org
and @type
equal to Product
. You can conveniently extract the product price and index it along
with all the other data you need.
Within the back end, you can disable search indexing for specific pages in the settings of each page. In some cases, you
might want to dynamically disable indexing under certain conditions for arbitrary pages. To disable indexing, you need
to set the noSearch
property of the PageModel
object of the current page to true
, before the page’s content is added
to the index.
There are multiple ways to achieve this. One way is to use the generatePage hook:
// src/EventListener/GeneratePageListener.php
namespace App\EventListener;
use Contao\CoreBundle\DependencyInjection\Attribute\AsHook;
use Contao\PageModel;
#[AsHook('generatePage')]
class GeneratePageListener
{
public function __invoke(PageModel $pageModel): void
{
if (/* … */) {
$pageModel->noSearch = true;
}
}
}
Starting with Contao 4.6 you could also fetch the pageModel
from the request attributes (if present) in your own
kernel
event listener.