Improvise the Search Index


Conquering the necessary query is a battle half won (Read QueryBuilder API).  The other crucial aspect of Search in AEM is indexing. Many times it happens that even after the Query is well written with correct predicates, it results in slow executions and traversals of the repository. This can prove to be an expensive request for the system and can contribute to slower page response. OOTB Oak doesn’t perform indexing like Jackrabbit2 used to do when the AEM instance restarts every-time. If you see below error warnings in your logs, it’s a sign that Indexing is missing in the instance. There is a need to create Indexes using various tools and below approaches.

*WARN* Traversed 10000 nodes with filter Filter(query=select ...) consider creating an index or changing the query

Through this post, I would like to give an insight into Indexing fundamentals, with various Tools, by which you can improvise the Oak Indexes for your application with more understanding.

 

How to Approach Indexing?

Index Manager / Reindexing  Tools > Operations> Diagnosis > Index Manager

This tool was first introduced in ACS commons and is now available in AEM instance. Its pretty straight forward you can select which indexes you want to reindex using a checkbox. Just keep in mind that Reindexing is a heavy operation so avoid doing that directly on your running production instances. I would suggest to do it once completely at initial stages before the launch of a website.  You can see the properties which are defined for indexing in this link. In subsequent builds or content changes, if there is a need, you can Reindex from this tool.

Screen Shot 2017-10-18 at 3.38.44 PM

You may see such Loggers as a confirmation:

18.10.2017 15:41:13.953 *INFO* [aysnc-index-update-async] org.apache.jackrabbit.oak.plugins.index.IndexUpdate Indexing report
    - /oak:index/cqPageLucene*(774)

Query Performance    Tools > Operations> Diagnosis > Query Performance

This tool is quite helpful to understand the running queries in your AEM instance. In the link, you would be able to see Slow Queries & Popular Queries. You should be able to distinguish queries from application code to the queries coming from AEM. I would suggest revisiting the queries coming from Application code to optimize as much as possible. Screen Shot 2018-02-05 at 4.56.06 PM

Explain Query  Tools > Operations> Diagnosis > Query Performance

This tool should be used for the application queries which are found to be slow or popular, as our aim should be to optimize the queries before creating missing Indexes. Once you paste the query and language in the Explain Query form and Execute, you will see a detailed analysis of the Query – look for Indexes Used section-  it can be used to know which indexes are responsible for this query. Next step should be to check the index definition for that query in order to create appropriate indexes.

Screen Shot 2018-02-05 at 5.03.13 PM

Query Debugger

I use this tool almost every day. It should be used to optimize the queries before going for any indexing. Indexing is necessary and important, but it won’t be able to help much if are not optimizing the Queries. Look for various predicates and examples to improve the queries.

Oak Index Definition Generator

This tool is quite a handy one to paint a picture of the required Indexing definition needed for your query. You can paste XPath, SQL or SQL2 queries in there and the node definition would be generated based on the required Index. Next step should be to replicate that node property hierarchy in your oak: index structure.

Screen Shot 2018-02-05 at 5.16.13 PM

– Logging

Sometimes to identify the culprit we have to go to the logs. For checking the query problems and indexing mismatches open a Debug log for these paths –

  • org.apache.jackrabbit.oak.plugins.index
  • org.apache.jackrabbit.oak.query
  • com.day.cq.search

-JMX MBean

JMX console in /system/console/jmx could be used to analyze the Index statistics in your existing index. Sometimes you would find that a particular index is taking too much space and it might not be targetted as per your use-case. You can look for these 2 Mbeans

This MBean would give you statistics on the Lucene Index in your environment. You can keep a check on IndexSize for various Lucene Index paths. If the size is quite huge for a particular index, that means that it would take more time for the system to make that query. You should explore restricting the path as per this property – evaluatePathRestrictions . More such design considerations and optimizations could be found at Docs.Adobe

Screen Shot 2018-02-05 at 5.44.54 PM

This Mbean is used to monitor Property Index in your application. You can check the paths for various Property Indexes defined in AEM. Try to avoid property indexes as they are refreshed as soon as the content is changed. So until and unless its utmost necessary go for Lucene Indexes.

Screen Shot 2018-02-05 at 5.49.18 PM.png

In the end, I would like to say that your first step should always be optimization of the query and then optimization of Indexes. I hope this blog post would help to make your systems faster than before.

This MBean could be used to reindex the Property Index using the method startPropertyIndexAsyncReindex(). 

Thanks.

References :

 

Leave a comment