Autoindex JCR performance question

davidrowe · October 2, 2012, 7:21am

I’m currently looking to see if I can find (and hopefully) fix any major performance bottlenecks in an implementation - using the excellent timing macros that were recently posted to this forum.

On many pages of our implementation we have three autoindexes that populate a sidebar in our pages. An example of this is an autoindex that finds news items that are related to the page I’m generating.

I’ve tried using the reverse slot content finder but that was painfully slow. Instead I’m using a more straightforward autoindex query which looks as follows

select rx:sys_contentid, jcr:path from rx:news where sh_local_expertise=23606 or sh_global_expertise=23606 and rx:news_sort_date>'2009-10-02' and rx:sys_lang='en-gb' order by rx:news_sort_date desc, rx:displaytitle asc

Using the timing macros as follows…

#markTime("> query ${params.query}")
#initslot(“AutoIndex” $params)
#markTime("< query")

I can see that the #initslot macro is taking around 2500ms to run each time. Most pages have three similar autoindexes on their sidebars and this is on most pages of the site. The overhead of processing these three autoindexes on each page adds up to around 10 seconds every time the page is rendered. This is far from ideal.

I wondered if velocity was slowing things down so I ran the query in the JCR search test debugger and it was taking around the same time to run.

Does anyone have any thoughts on what I could look at here to improve things (other than remove the autoindexes altogether!). Perhaps I might need to think about not running the autoindexes during preview and only process them during publishing - but I’d ideally like to give users a prompt and accurate preview of the published page as much as possible.

Any thoughts would be most welcome.

Rileyw · October 9, 2012, 9:48am

David,

You should talk with Percussion about the Dynamic Delivery Tier. They demonstrated the product at a recent User Group Summit. It may provide this implementation with a solution.

jitendra · October 9, 2012, 11:23am

We created a reverse slot lookup extension (implementing IPSSlotContentFinder) that queries the database directly. This is orders of magnitude faster than the default way of getting things (granted our reverse slot has 7k + items http://www.vtnews.vt.edu/articles/location/blacksburg-va-res.html), but YMMV (and by orders i mean a whole lot faster…I don’t have the numbers off the top of my head but I think it went from 10+ minutes to 3 minutes to render that page).

I have been toying with the idea of doing something similar to the regular auto slot because it currently suffers from a bug where the folder location is not honored in the results (ie if a content item is in multiple folders and you index one of the folders, then the link generated to the item may point to the other folder, not the one you index), but I would rather like to think that this is a product level bug that should be fixed (yes, it has been reported to TS…a year or so ago…). I have not analyzed the performance of auto indexers yet, but clearly a direct sql query would be faster than a jcr one…

Fair warning, Percussion can change the tables on you (which is why jcr queries would be the “official” way to do things) and your sql query may use database specific functions… but as this is a discussion about performance, this is one “non-official” and “unsupported” way to make things faster

Chris_Leger · October 10, 2012, 11:45am

As Riley points out, the work we’re doing around Dynamic Delivery Tier will present another optional approach, but it’s still in beta. When it’s ready, you’ll be able to query a web service dynamically to construct lists, and so forth, from a static page. It will cut the time needed to publish an item or items drastically.

As Jit says, going directly to the DB will be faster, but the approach is unsupported, in that we can’t provide any kind of assurance that the schema won’t change on you at some point along the upgrade path. Still…

Other considerations

I assume you’ve compared your result from the timing macros with the time given by the Query Debugger, to isolate the query performance? If there’s a marked difference, the performance could stem in part from the processing going on in your slot iterator.

Doing a data comparison as you are will result in a table scan. You could try adding an index to your date field(s,) but keep in mind that any subsequent changes to your Content Type(s) may result in the loss of those indices.

There are conditions under which the order of criteria in your WHERE clause can have an impact… you could try shuffling them around.

davidrowe · November 1, 2012, 6:35am

Thanks for the comments posted here. I had a good look through our database and by turning on SQL debugging got to see some of the queries that are taking time to run on the database.

From here I’ve created a number of new indexes on tables (and even found some content tables without primary keys!) and have ensured that there’s indexes on some of the columns that we’re using for date checking. Just looking at the execution plan information that SQL Server Management Studio gives you before and after adding the indexes shows that big improvements can be made.

The additional indexes have made a big improvement to the performance of our system and is a great step in the right direction.

Thanks once again (and I like the idea about a custom slot lookup extension).