Custom FieldType metadata voodoo

When you create a custom field type in Sitecore, often times you want to have access to the context Item on which the field is currently being displayed. Or even more tricky, perhaps you want to know the Field ID or Source that your custom field is displaying. By default/design, field types are left in the dark about this kind of metadata. Sitecore just passes the field type code the value from that field for display and then grabs the value back upon saving.

Well, in typical ‘Field of Dreams’ fashion, “if you build it, they will come”…

It turns out that if you simply add a public property to your custom field type class (such as ItemID), that property will get set by Sitecore. VOODOO!!

Well, really its the work of Sitecore.Shell.Applications.ContentEditor.EditorFormatter class in Sitecore.Client.dll. Specifically, if you look at the SetProperties method, you can see that Sitecore will set any of the properties shown if they are available using reflection. Cool.

 


public static void SetProperties(System.Web.UI.Control editor, Editor.Field field, bool readOnly)
{
Assert.ArgumentNotNull(editor, "editor");
Assert.ArgumentNotNull(field, "field");
ReflectionUtil.SetProperty(editor, "ID", field.ControlID);
ReflectionUtil.SetProperty(editor, "ItemID", field.ItemField.Item.ID.ToString());
ReflectionUtil.SetProperty(editor, "ItemVersion", field.ItemField.Item.Version.ToString());
ReflectionUtil.SetProperty(editor, "ItemLanguage", field.ItemField.Item.Language.ToString());
ReflectionUtil.SetProperty(editor, "FieldID", field.ItemField.ID.ToString());
ReflectionUtil.SetProperty(editor, "Source", field.ItemField.Source);
ReflectionUtil.SetProperty(editor, "ReadOnly", readOnly);
ReflectionUtil.SetProperty(editor, "Disabled", readOnly);
}

In addition, specifying a public “Value” property gets you the field value. But most samples, documentation out there already have that bit.

Happy coding!

Advertisements
Posted in Field Editor, Field Types, Sitecore | Leave a comment

LowerCaseKeywordAnalyzer not lowercasing… or is it?

I recently ran into an issue where it appeared that the LowerCaseKeywordAnalyzer was NOT making a custom field’s value lower case in our Lucene index. (Sitecore 7.5)

My custom field was defined as follows:

          <fieldMap>
            <fieldNames hint="raw:AddFieldByFieldName">
              <field fieldName="previousurl" storageType="YES" indexType="TOKENIZED" vectorType="NO" boost="1f" type="System.String" settingType="Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider">
                <analyzer type="Sitecore.ContentSearch.LuceneProvider.Analyzers.LowerCaseKeywordAnalyzer, Sitecore.ContentSearch.LuceneProvider" />
              </field>
            </fieldNames>
          </fieldMap>

When viewing the field value in LUKE (lucene index viewer), I was seeing different values depending on where I looked. For example, if my item had a “previousurl” field value of “/Some/Old/URL” I was expecting to see the value “/some/old/url”.

The overview tab (available fields) –> Click “previousurl” –> Click “Show top terms”

LukeTopTerms

Good! As expected.. but then when I double-clicked on that previousurl top term (switches to “Document” tab) and clicked “First Doc”. I saw this:

LukeDocField

This was quite confusing and seemed to indicated that perhaps Lucene was NOT storing the lower case value as I was expecting. I took all this information and submitted a ticket to Sitecore support. As usual, they were very responsive and helpful. At first though, they were just as baffled as I was, so it took some time for them to find the answer… which made me feel a little better 🙂

Their conclusion was as follows (Thanks to Alexander Sharamok):

Analyzers just apply their settings/corrections/filters to terms. E.g. when Sitecore indexes fields, it takes into account fields analyzers and creates terms for indexed fields according to analyzers…

storageType doesn’t affect search in any case. Mostly that is set to ‘YES’ if you want to use values from index somewhere instead of getting these values from databases.

So basically, the “/Some/Old/URL” value I was seeing on the document tab in LUKE was the “stored value” since I had set storageType=”YES”. By design, this is the original (un-analyzed) field value from the item. Lucene does not use that value for searching.

This new-found knowledge forced me to take a closer look at my own code and sure enough the case-sensitivity was a bug in my own code. (hand-smack-head)

I did at least learn something new about Lucene and LUKE… and perhaps this will help you too.

Happy Coding!

Posted in Indexing, Lucene, Sitecore | Tagged , , | 1 Comment

How to fix that burning smell coming from your Sitecore Search Index

A few weeks ago, we decided to go through our Sitecore log files to clean up small bugs and unnecessary logging. After quite successfully eliminating a lot of the “noise” entries in our log files, we noticed that there were A LOT of search index update messages. Below is just a 2 second slice of it:

ManagedPoolThread #6 17:00:00 INFO  Sitecore.Data.Managers.IndexingProvider Starting update of index for the database 'web' (6 pending).
ManagedPoolThread #6 17:00:00 INFO  Sitecore.Data.Managers.IndexingProvider Update of index for the database 'web' done.
ManagedPoolThread #8 17:00:00 INFO  Sitecore.Data.Managers.IndexingProvider Starting update of index for the database 'web' (8 pending).
ManagedPoolThread #8 17:00:00 INFO  Sitecore.Data.Managers.IndexingProvider Update of index for the database 'web' done.
ManagedPoolThread #15 17:00:00 INFO  Sitecore.Data.Managers.IndexingProvider Starting update of index for the database 'web' (8 pending).
ManagedPoolThread #15 17:00:00 INFO  Sitecore.Data.Managers.IndexingProvider Update of index for the database 'web' done.
ManagedPoolThread #12 17:00:01 INFO  Sitecore.Data.Managers.IndexingProvider Starting update of index for the database 'web' (8 pending).
ManagedPoolThread #12 17:00:01 INFO  Sitecore.Data.Managers.IndexingProvider Update of index for the database 'web' done.
ManagedPoolThread #4 17:00:01 INFO  Sitecore.Data.Managers.IndexingProvider Starting update of index for the database 'web' (9 pending).
ManagedPoolThread #4 17:00:01 INFO  Sitecore.Data.Managers.IndexingProvider Update of index for the database 'web' done.
ManagedPoolThread #4 17:00:01 INFO  Sitecore.Data.Managers.IndexingProvider Starting update of index for the database 'web' (10 pending).
ManagedPoolThread #4 17:00:01 INFO  Sitecore.Data.Managers.IndexingProvider Update of index for the database 'web' done.
ManagedPoolThread #4 17:00:01 INFO  Sitecore.Data.Managers.IndexingProvider Starting update of index for the database 'web' (1 pending).
ManagedPoolThread #4 17:00:01 INFO  Sitecore.Data.Managers.IndexingProvider Update of index for the database 'web' done.
ManagedPoolThread #19 17:00:01 INFO  Sitecore.Data.Managers.IndexingProvider Starting update of index for the database 'web' (2 pending).
ManagedPoolThread #19 17:00:01 INFO  Sitecore.Data.Managers.IndexingProvider Update of index for the database 'web' done.
ManagedPoolThread #10 17:00:01 INFO  Sitecore.Data.Managers.IndexingProvider Starting update of index for the database 'web' (4 pending).
ManagedPoolThread #10 17:00:01 INFO  Sitecore.Data.Managers.IndexingProvider Update of index for the database 'web' done.
ManagedPoolThread #7 17:00:01 INFO  Sitecore.Data.Managers.IndexingProvider Starting update of index for the database 'web' (4 pending).
ManagedPoolThread #7 17:00:01 INFO  Sitecore.Data.Managers.IndexingProvider Update of index for the database 'web' done.
ManagedPoolThread #13 17:00:01 INFO  Sitecore.Data.Managers.IndexingProvider Starting update of index for the database 'master' (2 pending).
ManagedPoolThread #3 17:00:01 INFO  Sitecore.Data.Managers.IndexingProvider Starting update of index for the database 'web' (6 pending).
ManagedPoolThread #3 17:00:01 INFO  Sitecore.Data.Managers.IndexingProvider Update of index for the database 'web' done.

Now it WOULD fluctuate. There were periods where there was less updating going on but it still seemed like too many for the amount of content publishing that was happening. I should add that we were also seeing general sluggishness in the Sitecore system around these periods as well. Publishing operations would take much longer than normal for users.

At this point, I decided to crack open the code and see where that burning smell was coming from. After quite a bit of digging, I found a few issues. (NOTE: Issues 1 & 2 were submitted to Sitecore support and deemed bugs, they did provide a patch assembly which I can share if you need it.)

ISSUE #1 – Concurrent Index Updates

When items are modified, the search indexes are automatically updated. The code eventually calls IndexManager.UpdateIndexAsync which creates the UpdateIndex job. Before creating the job, this code first checks to see if that job is already running (JobManager.IsJobRunning).

Image

The problem with this is that it doesn’t check to see if the job is “Queued”. So even if there are 10 queued jobs with that name, it will add this one as well. This can happen if a lot of people are updating items or if branches of items are being published (they are “saved” to the web DB).

Sitecore support provided a patch assembly along with this suggestion:

The issue with multiple queued job is fixed by this patch, but you should also consider the following setting in order to control the update job frequency:
<setting name=”Indexing.UpdateJobThrottle” value=”00:00:01>
This setting controls the update job “waiting time” – during this timespan, the single queued update job will stay in pool waiting to be executed.

ISSUE #2 – Unnecessary Item Indexing

Another smaller problem I found is that after the update index job has completed, it saves the “Created” date of the last HistoryEntry record that it processed.. however the precision is only down to the second. Such as: “20140417T170427”. The next time the update index job runs it grabs all history records including any with a created date in the same second as the HistoryEntry from the last index update (this can be more than just 1.. I’ve seen at least 6). This results in a lot of wasted processing to re-index something that was already re-indexed a few seconds ago.

ISSUE #3 – Indexing ALL Item Versions

Did you know that the default Sitecore Search Index indexes ALL versions of an item?

So when you save an item that have over 100+ versions, they are all re-indexed. If you use workflow, you’ve probably seen how the number of versions an item has can really swell. I’m sure there is a use-case for indexing all versions of an item, however we decided that it was just added overhead that we could do without.

I created a custom DatabaseCrawler which extends Sitecore.Search.Crawlers.DatabaseCrawler in order to add in functionality to only index the last “N” item versions. The default value of “N” being 1. I also made configurable by adding a “Index.IndexLastNVersions” setting to the web.config.

NOTE: If you are on Sitecore 7.+ there is a better solution.

Here’s a snippet:

protected override void AddItem(Item item, IndexUpdateContext context)
{
    Assert.ArgumentNotNull(item, "item");
    Assert.ArgumentNotNull(context, "context");
    if (this.IsMatch(item))
    {
        foreach (Sitecore.Globalization.Language language in item.Languages)
        {
            Item item2;
            try
            {
                Switcher<bool, DatabaseCacheDisabler>.Enter(Settings.Indexing.DisableDatabaseCaches);
                item2 = item.Database.GetItem(item.ID, language, Sitecore.Data.Version.Latest);
            }
            finally
            {
                Switcher<bool, DatabaseCacheDisabler>.Exit();
            }
            if (item2 != null)
            {
                Item[] versions;
                try
                {
                    Switcher<bool, DatabaseCacheDisabler>.Enter(Settings.Indexing.DisableDatabaseCaches);
                    versions = item2.Versions.GetVersions(false);
                }
                finally
                {
                    Switcher<bool, DatabaseCacheDisabler>.Exit();
                }

                // PRM - set max versions to index
                var lastNVersions = Sitecore.Configuration.Settings.GetIntSetting("Index.IndexLastNVersions", 1);
                if (versions.Length > lastNVersions)
                    versions = versions.OrderByDescending(x => x.Version.Number).Take(lastNVersions).ToArray();

                foreach (Item item3 in versions)
                {
                    var message = string.Format("update index [{0}][{1}][root:{2}]: {4}#{3}", new object[] { this.Index.Name, this.Database, this.IndexRootItem.Paths.Path, item3.Version.Number, item3.Paths.Path });
                    Sitecore.Diagnostics.Log.Debug(message, this);
                    this.IndexVersion(item3, item2, context);
                }
            }
        }
    }
}

ISSUE #4 – Workflow AutoPublish is “deep” by default

If you use workflow, chances are pretty good that you are using the AutoPublish workflow action that came with the “Sample Workflow”. By default, this workflow action has a parameter “deep=1”. This parameter determines whether to just publish the context item on which the workflow state has changed (deep=0) or to publish the context item and ALL its descendants (deep=1). Also… to clarify… this is NOT “smart publish”, where items are only published if they have changed… this is “Republish” (everything).

So now imagine that you have a site that uses workflow (with AutoPublish deep=1). And then imagine that this site has been around for quite a while and has lots of items. Then consider a top level item being changed and approved through workflow. Assuming you haven’t made any of the changes/fixes above, the next thing you can expect is to start smelling something burning. Because that AutoPublish action is now re-publishing (AND re-indexing) all items under that top level item.

In any case, we set “deep=0”.

Happy coding!

Posted in Indexing, Lucene, Search, Sitecore | Tagged , , , | 12 Comments

IndexSearchContext.Search converts TermQuery to PrefixQuery

As part of upgrading our sites to Sitecore 6.6, I had to update our search/indexing code because 6.6 uses a newer version of Lucene.NET. I took this as an opportunity to switch over to using Sitecore.Search instead of native Lucene.NET. It should be easier right? Well, sort of.. They did wrap a lot of that pesky ugly looking query building logic up safe behind the walls of Sitecore.Search, but unfortunately I needed functionality that was not exposed (at least not that I found..).

Specifically, I needed to create a two clause query like this:

+itempath:/sitecore/content/home* +previousurl:/soe/academics/graduate/gec

In Lucene this means, find a match where the “itempath” field starts with “/sitecore/content/home” AND where the “previousurl” field matches “/soe/academics/graduate/gec”.

The IndexSearchContext class has helper methods to create these classes (CreateBooleanQuery, CreatePrefixQuery, CreateTermQuery) but I just create them directly. From that point you need to call the Search method on the IndexSearchContext object:

using (IndexSearchContext context = this.Index.CreateSearchContext())
{
     var hits = context.Search((Query)MyBooleanQuery, maxResults);
     return hits.FetchResults(0, Math.Min(hits.Length, maxResults));
}

Here’s where it gets interesting (or frustrating depending on your point of view). Internally, when you call this method Sitecore rewrites your query using the “FullTextRewriteStrategy”.

Okay.

Unfortunately, one side effect of this is that my TermQuery gets rewritten as a PrefixQuery.

Not okay.

private class FullTextRewriteStrategy : IndexSearchContext.RewriteStrategy
{
    // Methods
    public override Query Rewrite(Query query)
    {
        Assert.ArgumentNotNull(query, "query");
        TermQuery query2 = query as TermQuery;
        if (query2 != null)
        {
            Term term = query2.GetTerm();
            term = new Term(term.Field(), term.Text().ToLowerInvariant());
            if (term.Field() != BuiltinFields.Content)
            {
                return IndexSearchContext.StaticQueryFactory.CreatePrefixQuery(term.Field(), term.Text().ToLowerInvariant(), query2.GetBoost());
            }
            return IndexSearchContext.StaticQueryFactory.CreateBooleanQuery(true, query2.GetBoost(), new object[] { new PrefixQuery(term), IndexSearchContext.StaticQueryFactory.CreatePrefixQuery(BuiltinFields.Name, term.Text().ToLowerInvariant(), 3f) });
        }
        PrefixQuery query3 = query as PrefixQuery;
        if (query3 == null)
        {
            return query;
        }
        Term prefix = query3.GetPrefix();
        if (prefix.Field() == BuiltinFields.Content)
        {
            return IndexSearchContext.StaticQueryFactory.CreateBooleanQuery(true, query3.GetBoost(), new object[] { query3, IndexSearchContext.StaticQueryFactory.CreatePrefixQuery(BuiltinFields.Name, prefix.Text().ToLowerInvariant(), 3f) });
        }
        return IndexSearchContext.StaticQueryFactory.CreatePrefixQuery(prefix.Field(), prefix.Text().ToLowerInvariant(), query3.GetBoost());
    }
}

I’m not sure if this was intended behavior or not but in order to get around it, I just wrap my query with a “PreparedQuery”. Like this:

using (IndexSearchContext context = this.Index.CreateSearchContext())
{
     var hits = context.Search((new PreparedQuery(MyBooleanQuery), maxResults);
     return hits.FetchResults(0, Math.Min(hits.Length, maxResults));
}

Internally, this still rewrites your query but uses the LowerCaseRewriteStrategy instead.. which keeps your TermQuery a TermQuery.

Happy Coding!

UPDATE: Well, well.. I missed this: http://stackoverflow.com/questions/15323527/how-to-search-for-an-exact-word-phrase-using-sitecore-advanced-database-crawler

Posted in Lucene, Sitecore | Tagged | Leave a comment

showModalDialog returnValue is undefined in Google Chrome.. in Sitecore

DatasetTemplateField

I have a custom field type which has its own custom editor dialog window (much like the Rich Text editor window). When users want to edit the field, they click “Show Editor” button above the field. The modal dialog editor window opens..

DatasetTemplateEditorModal

they make their changes.. and then close the window which saves the new field value back to the Content Editor field. No problem!

Enter… Chrome!

This works great in IE, however there were reports of problems in Google Chrome. The issue was that the new value from the modal dialog was not updating the field value in Content Editor after the user clicked “OK”. Weird…

Well, after a lot of digging, I found the reason.. actually a few reasons. But before I get into that.. a quick look at how the dialog saves the value back to Content Editor.

Basically, in the “OK” button postback, I use ClientScriptManager.RegisterStartupScript to return some javascript to run. This javascript is basically:

window.returnValue = "the new value of my field";

I had created this modal dialog a few years ago, and at the time I guess I wasn’t overly concerned if it would work in any other browsers besides IE.

Using Chrome’s “Developer Tools” (F12), I was able to see that the value (in window.returnValue) returned to the Content Editor page was “undefined”. It seems this is in fact a well known issue with Chrome. Luckily, there were plenty of workarounds. Such as this one by Brian Pedersen.

  if (window.opener) {
    window.opener.returnValue = "your return value";
  }
  window.returnValue = "your return value";
  self.close();

Unfortunately, this didn’t work for me in Sitecore. When running Content Editor in Chrome, it uses /sitecore/shell/controls/Gecko.js to handle showing modal dialogs… (as opposed to /sitecore/shell/controls/InternetExplorer.js when using IE).

From Gecko.js:

scBrowser.prototype.showModalDialog = function(url, arguments, features, request) {
  window.top.returnValue = null;

  showModalDialog(url, arguments, features);

  var result = window.top.returnValue;

  if (request != null) {
    // When we close modal dialog, many commands expect 'undefined' (not 'null') result (since IE returns 'undefined' )
    request.dialogResult = result == null ? undefined : result;
  }

  return result;
}

As you can see, its looking for window.top.returnValue instead of window.returnValue. So, I changed Brian’s code just slightly and viola! It worked:

  if (window.opener) {
    window.opener.top.returnValue = "your return value";
  }
  window.returnValue = "your return value";
  self.close();

Happy Coding!

Posted in Field Editor, Field Types, Sitecore | Tagged , | 2 Comments

Execute a Sitecore Scheduled Task from the ribbon

A few weeks ago I deployed the “Sitecore Shell Wax” shared source module to the Marketplace. This week, I’ve added a little more wax to that shell!

Currently, if you want to run a Scheduled Task immediately (not waiting for the scheduled execution time) you need to either write some code to kick it off or install one of the shared source modules that allows it such as Scheduled Task Utils.

That’s all fine, but I was thinking it’d be nice to have something more intuitive and immediately available.

Like this:

Schedule Execute Now

This functionality has been added to verions 1.1 of the “Sitecore Shell Wax” module.

Happy coding!

Posted in Module, Shared Source, Sitecore, Tasks | 1 Comment

Introducing the ‘Sitecore Version Pruner’ shared source module!

We’ve been using Sitecore workflow for quite a while now. One of the great things about workflow is that each time a user wants to make a change to a published content item, it automatically creates a new item version for you. The only problem is that eventually, you start really accumulating A LOT of versions. We have some items with 100 or more versions! As discussed on SDN, in order to prevent performance issues it is advisable to delete old/unused versions.

There are several options available as outlined by John West to remove old versions. They each have their pros/cons. The ‘VersionManager’ module has the advantage that it backs up the removed versions before deleting them by serializing the entire item to a XML file. This actually writes ALL versions to the XML file. In order to keep track of which versions were deleted, the file name encodes the versions that were deleted.

We were all set to use VersionManager but ran into a snag.. When a user creates a new version, it attempted to delete the old versions in excess of the currently set maximum. The problem was that the users do not have ‘delete rights’.

This led me to start investigating a different approach. After some lengthy comment exchanging with John West on his post about ‘Remove Old Versions…’, I settled on creating a scheduled task that would include the serialization from VersionManager but would run in the background.

And so I created the ‘Sitecore Version Pruner’ module.

Image

As detailed on the Sitecore Marketplace page, the module..

“is a scheduled task that removes old/unused item versions from your content tree. Removed version(s) may be preserved as serialized XML and/or (BETA) copied to the archive database.”

It basically meets the same requirements as VersionManager except:

  1. It runs as a background task. No need for users to have ‘delete’ rights
  2. It uses the Sitecore Rules Engine to define the “pruning” rules (credit to John West for the idea)
  3. In addition to serialization, it also provides an option to archive the deleted versions to the archive database. Read more about this feature on the about tab here.

I have to give a lot of the credit to John West and Ivan Buzyka (VersionManager module author). I basically took their solutions and combined them.

Happy coding!

Posted in Shared Source, Sitecore, Tasks | Tagged , , , , , , | 11 Comments