A few weeks ago, we decided to go through our Sitecore log files to clean up small bugs and unnecessary logging. After quite successfully eliminating a lot of the “noise” entries in our log files, we noticed that there were A LOT of search index update messages. Below is just a 2 second slice of it:
ManagedPoolThread #6 17:00:00 INFO Sitecore.Data.Managers.IndexingProvider Starting update of index for the database 'web' (6 pending).
ManagedPoolThread #6 17:00:00 INFO Sitecore.Data.Managers.IndexingProvider Update of index for the database 'web' done.
ManagedPoolThread #8 17:00:00 INFO Sitecore.Data.Managers.IndexingProvider Starting update of index for the database 'web' (8 pending).
ManagedPoolThread #8 17:00:00 INFO Sitecore.Data.Managers.IndexingProvider Update of index for the database 'web' done.
ManagedPoolThread #15 17:00:00 INFO Sitecore.Data.Managers.IndexingProvider Starting update of index for the database 'web' (8 pending).
ManagedPoolThread #15 17:00:00 INFO Sitecore.Data.Managers.IndexingProvider Update of index for the database 'web' done.
ManagedPoolThread #12 17:00:01 INFO Sitecore.Data.Managers.IndexingProvider Starting update of index for the database 'web' (8 pending).
ManagedPoolThread #12 17:00:01 INFO Sitecore.Data.Managers.IndexingProvider Update of index for the database 'web' done.
ManagedPoolThread #4 17:00:01 INFO Sitecore.Data.Managers.IndexingProvider Starting update of index for the database 'web' (9 pending).
ManagedPoolThread #4 17:00:01 INFO Sitecore.Data.Managers.IndexingProvider Update of index for the database 'web' done.
ManagedPoolThread #4 17:00:01 INFO Sitecore.Data.Managers.IndexingProvider Starting update of index for the database 'web' (10 pending).
ManagedPoolThread #4 17:00:01 INFO Sitecore.Data.Managers.IndexingProvider Update of index for the database 'web' done.
ManagedPoolThread #4 17:00:01 INFO Sitecore.Data.Managers.IndexingProvider Starting update of index for the database 'web' (1 pending).
ManagedPoolThread #4 17:00:01 INFO Sitecore.Data.Managers.IndexingProvider Update of index for the database 'web' done.
ManagedPoolThread #19 17:00:01 INFO Sitecore.Data.Managers.IndexingProvider Starting update of index for the database 'web' (2 pending).
ManagedPoolThread #19 17:00:01 INFO Sitecore.Data.Managers.IndexingProvider Update of index for the database 'web' done.
ManagedPoolThread #10 17:00:01 INFO Sitecore.Data.Managers.IndexingProvider Starting update of index for the database 'web' (4 pending).
ManagedPoolThread #10 17:00:01 INFO Sitecore.Data.Managers.IndexingProvider Update of index for the database 'web' done.
ManagedPoolThread #7 17:00:01 INFO Sitecore.Data.Managers.IndexingProvider Starting update of index for the database 'web' (4 pending).
ManagedPoolThread #7 17:00:01 INFO Sitecore.Data.Managers.IndexingProvider Update of index for the database 'web' done.
ManagedPoolThread #13 17:00:01 INFO Sitecore.Data.Managers.IndexingProvider Starting update of index for the database 'master' (2 pending).
ManagedPoolThread #3 17:00:01 INFO Sitecore.Data.Managers.IndexingProvider Starting update of index for the database 'web' (6 pending).
ManagedPoolThread #3 17:00:01 INFO Sitecore.Data.Managers.IndexingProvider Update of index for the database 'web' done.
Now it WOULD fluctuate. There were periods where there was less updating going on but it still seemed like too many for the amount of content publishing that was happening. I should add that we were also seeing general sluggishness in the Sitecore system around these periods as well. Publishing operations would take much longer than normal for users.
At this point, I decided to crack open the code and see where that burning smell was coming from. After quite a bit of digging, I found a few issues. (NOTE: Issues 1 & 2 were submitted to Sitecore support and deemed bugs, they did provide a patch assembly which I can share if you need it.)
ISSUE #1 – Concurrent Index Updates
When items are modified, the search indexes are automatically updated. The code eventually calls IndexManager.UpdateIndexAsync which creates the UpdateIndex job. Before creating the job, this code first checks to see if that job is already running (JobManager.IsJobRunning).

The problem with this is that it doesn’t check to see if the job is “Queued”. So even if there are 10 queued jobs with that name, it will add this one as well. This can happen if a lot of people are updating items or if branches of items are being published (they are “saved” to the web DB).
Sitecore support provided a patch assembly along with this suggestion:
The issue with multiple queued job is fixed by this patch, but you should also consider the following setting in order to control the update job frequency:
<setting name=”Indexing.UpdateJobThrottle” value=”00:00:01>
This setting controls the update job “waiting time” – during this timespan, the single queued update job will stay in pool waiting to be executed.
ISSUE #2 – Unnecessary Item Indexing
Another smaller problem I found is that after the update index job has completed, it saves the “Created” date of the last HistoryEntry record that it processed.. however the precision is only down to the second. Such as: “20140417T170427”. The next time the update index job runs it grabs all history records including any with a created date in the same second as the HistoryEntry from the last index update (this can be more than just 1.. I’ve seen at least 6). This results in a lot of wasted processing to re-index something that was already re-indexed a few seconds ago.
ISSUE #3 – Indexing ALL Item Versions
Did you know that the default Sitecore Search Index indexes ALL versions of an item?
So when you save an item that have over 100+ versions, they are all re-indexed. If you use workflow, you’ve probably seen how the number of versions an item has can really swell. I’m sure there is a use-case for indexing all versions of an item, however we decided that it was just added overhead that we could do without.
I created a custom DatabaseCrawler which extends Sitecore.Search.Crawlers.DatabaseCrawler in order to add in functionality to only index the last “N” item versions. The default value of “N” being 1. I also made configurable by adding a “Index.IndexLastNVersions” setting to the web.config.
NOTE: If you are on Sitecore 7.+ there is a better solution.
Here’s a snippet:
protected override void AddItem(Item item, IndexUpdateContext context)
{
Assert.ArgumentNotNull(item, "item");
Assert.ArgumentNotNull(context, "context");
if (this.IsMatch(item))
{
foreach (Sitecore.Globalization.Language language in item.Languages)
{
Item item2;
try
{
Switcher<bool, DatabaseCacheDisabler>.Enter(Settings.Indexing.DisableDatabaseCaches);
item2 = item.Database.GetItem(item.ID, language, Sitecore.Data.Version.Latest);
}
finally
{
Switcher<bool, DatabaseCacheDisabler>.Exit();
}
if (item2 != null)
{
Item[] versions;
try
{
Switcher<bool, DatabaseCacheDisabler>.Enter(Settings.Indexing.DisableDatabaseCaches);
versions = item2.Versions.GetVersions(false);
}
finally
{
Switcher<bool, DatabaseCacheDisabler>.Exit();
}
// PRM - set max versions to index
var lastNVersions = Sitecore.Configuration.Settings.GetIntSetting("Index.IndexLastNVersions", 1);
if (versions.Length > lastNVersions)
versions = versions.OrderByDescending(x => x.Version.Number).Take(lastNVersions).ToArray();
foreach (Item item3 in versions)
{
var message = string.Format("update index [{0}][{1}][root:{2}]: {4}#{3}", new object[] { this.Index.Name, this.Database, this.IndexRootItem.Paths.Path, item3.Version.Number, item3.Paths.Path });
Sitecore.Diagnostics.Log.Debug(message, this);
this.IndexVersion(item3, item2, context);
}
}
}
}
}
ISSUE #4 – Workflow AutoPublish is “deep” by default
If you use workflow, chances are pretty good that you are using the AutoPublish workflow action that came with the “Sample Workflow”. By default, this workflow action has a parameter “deep=1”. This parameter determines whether to just publish the context item on which the workflow state has changed (deep=0) or to publish the context item and ALL its descendants (deep=1). Also… to clarify… this is NOT “smart publish”, where items are only published if they have changed… this is “Republish” (everything).
So now imagine that you have a site that uses workflow (with AutoPublish deep=1). And then imagine that this site has been around for quite a while and has lots of items. Then consider a top level item being changed and approved through workflow. Assuming you haven’t made any of the changes/fixes above, the next thing you can expect is to start smelling something burning. Because that AutoPublish action is now re-publishing (AND re-indexing) all items under that top level item.
In any case, we set “deep=0”.
Happy coding!