How to fix that burning smell coming from your Sitecore Search Index

A few weeks ago, we decided to go through our Sitecore log files to clean up small bugs and unnecessary logging. After quite successfully eliminating a lot of the “noise” entries in our log files, we noticed that there were A LOT of search index update messages. Below is just a 2 second slice of it:

ManagedPoolThread #6 17:00:00 INFO  Sitecore.Data.Managers.IndexingProvider Starting update of index for the database 'web' (6 pending).
ManagedPoolThread #6 17:00:00 INFO  Sitecore.Data.Managers.IndexingProvider Update of index for the database 'web' done.
ManagedPoolThread #8 17:00:00 INFO  Sitecore.Data.Managers.IndexingProvider Starting update of index for the database 'web' (8 pending).
ManagedPoolThread #8 17:00:00 INFO  Sitecore.Data.Managers.IndexingProvider Update of index for the database 'web' done.
ManagedPoolThread #15 17:00:00 INFO  Sitecore.Data.Managers.IndexingProvider Starting update of index for the database 'web' (8 pending).
ManagedPoolThread #15 17:00:00 INFO  Sitecore.Data.Managers.IndexingProvider Update of index for the database 'web' done.
ManagedPoolThread #12 17:00:01 INFO  Sitecore.Data.Managers.IndexingProvider Starting update of index for the database 'web' (8 pending).
ManagedPoolThread #12 17:00:01 INFO  Sitecore.Data.Managers.IndexingProvider Update of index for the database 'web' done.
ManagedPoolThread #4 17:00:01 INFO  Sitecore.Data.Managers.IndexingProvider Starting update of index for the database 'web' (9 pending).
ManagedPoolThread #4 17:00:01 INFO  Sitecore.Data.Managers.IndexingProvider Update of index for the database 'web' done.
ManagedPoolThread #4 17:00:01 INFO  Sitecore.Data.Managers.IndexingProvider Starting update of index for the database 'web' (10 pending).
ManagedPoolThread #4 17:00:01 INFO  Sitecore.Data.Managers.IndexingProvider Update of index for the database 'web' done.
ManagedPoolThread #4 17:00:01 INFO  Sitecore.Data.Managers.IndexingProvider Starting update of index for the database 'web' (1 pending).
ManagedPoolThread #4 17:00:01 INFO  Sitecore.Data.Managers.IndexingProvider Update of index for the database 'web' done.
ManagedPoolThread #19 17:00:01 INFO  Sitecore.Data.Managers.IndexingProvider Starting update of index for the database 'web' (2 pending).
ManagedPoolThread #19 17:00:01 INFO  Sitecore.Data.Managers.IndexingProvider Update of index for the database 'web' done.
ManagedPoolThread #10 17:00:01 INFO  Sitecore.Data.Managers.IndexingProvider Starting update of index for the database 'web' (4 pending).
ManagedPoolThread #10 17:00:01 INFO  Sitecore.Data.Managers.IndexingProvider Update of index for the database 'web' done.
ManagedPoolThread #7 17:00:01 INFO  Sitecore.Data.Managers.IndexingProvider Starting update of index for the database 'web' (4 pending).
ManagedPoolThread #7 17:00:01 INFO  Sitecore.Data.Managers.IndexingProvider Update of index for the database 'web' done.
ManagedPoolThread #13 17:00:01 INFO  Sitecore.Data.Managers.IndexingProvider Starting update of index for the database 'master' (2 pending).
ManagedPoolThread #3 17:00:01 INFO  Sitecore.Data.Managers.IndexingProvider Starting update of index for the database 'web' (6 pending).
ManagedPoolThread #3 17:00:01 INFO  Sitecore.Data.Managers.IndexingProvider Update of index for the database 'web' done.

Now it WOULD fluctuate. There were periods where there was less updating going on but it still seemed like too many for the amount of content publishing that was happening. I should add that we were also seeing general sluggishness in the Sitecore system around these periods as well. Publishing operations would take much longer than normal for users.

At this point, I decided to crack open the code and see where that burning smell was coming from. After quite a bit of digging, I found a few issues. (NOTE: Issues 1 & 2 were submitted to Sitecore support and deemed bugs, they did provide a patch assembly which I can share if you need it.)

ISSUE #1 – Concurrent Index Updates

When items are modified, the search indexes are automatically updated. The code eventually calls IndexManager.UpdateIndexAsync which creates the UpdateIndex job. Before creating the job, this code first checks to see if that job is already running (JobManager.IsJobRunning).

Image

The problem with this is that it doesn’t check to see if the job is “Queued”. So even if there are 10 queued jobs with that name, it will add this one as well. This can happen if a lot of people are updating items or if branches of items are being published (they are “saved” to the web DB).

Sitecore support provided a patch assembly along with this suggestion:

The issue with multiple queued job is fixed by this patch, but you should also consider the following setting in order to control the update job frequency:
<setting name=”Indexing.UpdateJobThrottle” value=”00:00:01>
This setting controls the update job “waiting time” – during this timespan, the single queued update job will stay in pool waiting to be executed.

ISSUE #2 – Unnecessary Item Indexing

Another smaller problem I found is that after the update index job has completed, it saves the “Created” date of the last HistoryEntry record that it processed.. however the precision is only down to the second. Such as: “20140417T170427”. The next time the update index job runs it grabs all history records including any with a created date in the same second as the HistoryEntry from the last index update (this can be more than just 1.. I’ve seen at least 6). This results in a lot of wasted processing to re-index something that was already re-indexed a few seconds ago.

ISSUE #3 – Indexing ALL Item Versions

Did you know that the default Sitecore Search Index indexes ALL versions of an item?

So when you save an item that have over 100+ versions, they are all re-indexed. If you use workflow, you’ve probably seen how the number of versions an item has can really swell. I’m sure there is a use-case for indexing all versions of an item, however we decided that it was just added overhead that we could do without.

I created a custom DatabaseCrawler which extends Sitecore.Search.Crawlers.DatabaseCrawler in order to add in functionality to only index the last “N” item versions. The default value of “N” being 1. I also made configurable by adding a “Index.IndexLastNVersions” setting to the web.config.

NOTE: If you are on Sitecore 7.+ there is a better solution.

Here’s a snippet:

protected override void AddItem(Item item, IndexUpdateContext context)
{
    Assert.ArgumentNotNull(item, "item");
    Assert.ArgumentNotNull(context, "context");
    if (this.IsMatch(item))
    {
        foreach (Sitecore.Globalization.Language language in item.Languages)
        {
            Item item2;
            try
            {
                Switcher<bool, DatabaseCacheDisabler>.Enter(Settings.Indexing.DisableDatabaseCaches);
                item2 = item.Database.GetItem(item.ID, language, Sitecore.Data.Version.Latest);
            }
            finally
            {
                Switcher<bool, DatabaseCacheDisabler>.Exit();
            }
            if (item2 != null)
            {
                Item[] versions;
                try
                {
                    Switcher<bool, DatabaseCacheDisabler>.Enter(Settings.Indexing.DisableDatabaseCaches);
                    versions = item2.Versions.GetVersions(false);
                }
                finally
                {
                    Switcher<bool, DatabaseCacheDisabler>.Exit();
                }

                // PRM - set max versions to index
                var lastNVersions = Sitecore.Configuration.Settings.GetIntSetting("Index.IndexLastNVersions", 1);
                if (versions.Length > lastNVersions)
                    versions = versions.OrderByDescending(x => x.Version.Number).Take(lastNVersions).ToArray();

                foreach (Item item3 in versions)
                {
                    var message = string.Format("update index [{0}][{1}][root:{2}]: {4}#{3}", new object[] { this.Index.Name, this.Database, this.IndexRootItem.Paths.Path, item3.Version.Number, item3.Paths.Path });
                    Sitecore.Diagnostics.Log.Debug(message, this);
                    this.IndexVersion(item3, item2, context);
                }
            }
        }
    }
}

ISSUE #4 – Workflow AutoPublish is “deep” by default

If you use workflow, chances are pretty good that you are using the AutoPublish workflow action that came with the “Sample Workflow”. By default, this workflow action has a parameter “deep=1”. This parameter determines whether to just publish the context item on which the workflow state has changed (deep=0) or to publish the context item and ALL its descendants (deep=1). Also… to clarify… this is NOT “smart publish”, where items are only published if they have changed… this is “Republish” (everything).

So now imagine that you have a site that uses workflow (with AutoPublish deep=1). And then imagine that this site has been around for quite a while and has lots of items. Then consider a top level item being changed and approved through workflow. Assuming you haven’t made any of the changes/fixes above, the next thing you can expect is to start smelling something burning. Because that AutoPublish action is now re-publishing (AND re-indexing) all items under that top level item.

In any case, we set “deep=0”.

Happy coding!

Advertisements

About Paul Martin

I enjoy rock climbing, playing guitar, writing code...
This entry was posted in Indexing, Lucene, Search, Sitecore and tagged , , , . Bookmark the permalink.

12 Responses to How to fix that burning smell coming from your Sitecore Search Index

  1. Tyler Evert says:

    Did Sitecore say they would include the fix for issue #1 in the standard product? Any idea what version that would’ve come in with? And did you have any success with their workaround?

    We’re on version 7.2.2, and using a couple of the new treelist features in that specific update. I’m reluctant to use whatever patched DLLs they gave you, in case it rolls us back.

    Thanks!

    • Paul Martin says:

      Hi Tyler. Sitecore support said, “[we] have registered this issue as a bug in the current version of the Sitecore CMS and will notify you when it gets fixed”. That was on Apr 25, 2014 and I’ve heard nothing since then. The fix they supplied replaced the IndexingManager and IndexingProvider classes. As for implementing the fix they provided… we never did. The issue wasn’t bad enough to merit putting a temporary fix in place.. a fix that might break other features down the road. Unless you think this issue is really hindering your sites performance (maybe you have a lot of custom indexes), I’d say this particular issue is a relatively small performance leak that can wait for the proper fix. Hope that helps!

      • Tyler Evert says:

        Thanks Paul!

        Yeah, patching binaries is probably not a feasible route for us. We’ll probably spend some more investigating ways to make our custom crawlers more efficient instead.

  2. Menno Visser says:

    Hello Paul,

    We having the same issue as you had with the multiple queued jobs.

    Can you share the solution with us?

    Best regards,
    Menno

  3. Danny says:

    Hi Martin, I have bumped into some similar indexing issues. Just wondering which version of Sitecore might you be using when you encountered the problem? Thank you.

  4. Gabbar Singh says:

    What was the bug number?

    • Paul Martin says:

      Good question. I just checked and unfortunately they did not supply a bug number and we have not been notified (that I’m aware of) of a fix. In case you are in contact with Sitecore support and want to reference my ticket it was ticket #408874.

  5. Owen Wattley says:

    Hi Paul,

    Thanks for posting this. I’m running Sitecore 7.2 rev. 140526 and I’m investigating some indexing issues (many, many, queued index operations). I believe issue #1 you have talked about could be related to the symptoms I’m seeing but I don’t know if it has already been fixed in my version of Sitecore. Would you be able to send me the patched DLL so I can take a look at the code Sitecore supplied to compare against the base case I’m already running in 7.2?

    Thanks very much.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s