Search not returning all results when user in some locales

Hi,

version 7.2

Most of our content, including all our images, was loaded in an english locale.

When users are logged into another locale they can’t always find the images they’re searching for.

It doesn’t happen for every single image but for quite a few – e.g. searching “tower” in German returns no results but is fine in English.

At the moment my users are using combo of switching locale and browsing through the folders to find images.

Does anyone know why this is happening? Are there separate Lucene indices for separate locales? (This just comes to mind because that’s how our web site works.) Or is Lucene configured to apply different stemming rules for different languages? (I can see how -er might trigger stemming rules in some languages but not others. I’ll ask my users whether they have any examples with proper names.)

thanks,
Steve

Hi Steve,

Rhythmyx does use the Lucene Snowball Analyzer (this is a stemming analyzer) for the following locales:

“Danish”, “Dutch”, “English”, “Finnish”, “French”, “German”,“Italian”,“Kp”, “Lovins”, “Norwegian”, “Porter”, “Portuguese”, “Russian”, “Spanish”, “Swedish”

This will load stop words lists for the specific locale from under rx_resources/search by default.

For “Chinese”, “Japanese”, or “Korean” the CJKAnalyzer is used.

If the locale can’t be determined at all for some reason, the system will fall back to the generic WhitespaceAnalyzer.

This functionality can be overridden on the Search tab of the System Administration tool, where you can configure your own locale / analyzers. The online help for this section is actually pretty good.

-n

[QUOTE=Steve Pugh;20826]Hi,

version 7.2

Most of our content, including all our images, was loaded in an english locale.

When users are logged into another locale they can’t always find the images they’re searching for.

It doesn’t happen for every single image but for quite a few – e.g. searching “tower” in German returns no results but is fine in English.

At the moment my users are using combo of switching locale and browsing through the folders to find images.

Does anyone know why this is happening? Are there separate Lucene indices for separate locales? (This just comes to mind because that’s how our web site works.) Or is Lucene configured to apply different stemming rules for different languages? (I can see how -er might trigger stemming rules in some languages but not others. I’ll ask my users whether they have any examples with proper names.)

thanks,
Steve[/QUOTE]

Thanks,

I’ve done an experiment :

We have a content item with the system title “Saatchi Gallery Kadar Attia NS”, this is in the English locale.

Searching for the word “Saatchi” in English, German or Spanish returns this item. Searching in French or Italian does not.
Searching for the word “Kadar” in English, French or German returns this item. Searching in Spanish or Italian does not.
Searching for the word “Attia” in English or German returns this item. Searching in French, Spanish or Italian does not.

Not stop words, and as an exact match shouldn’t need to worry about stemming.

So great if you’re adding this image to an English or German page. Tough luck if it’s an Italian page. And better guess right which keyword to use if it’s a French or Spanish page.

Looking at the search tab of the System Administration tool. Can this only specify new analyzers, not switching between the built in ones? When I select a locale, the analyzer dropdown is empty.

Hi Steve,

You should be able to use any Analyzer that is on the class path. It looks like with 7.2 this would be any of the Analyzers included in these jars:

[ul]
[li]lucene-analyzers-2.2.0
[/li][li]lucene-snowball-2.2.0.jar
[/li][li]lucene-wordnet-2.2.0.jar
[/li][li]lucene-core-2.2.0.jar
[/li][/ul]

Also I did a quick scan of the docs.

There is this paragraph in the Internationalizing / Localizing doc. I wonder if the Advanced search using multiple locales could be another work around for your users? Basically guide them to select all locales when searching?

By default, a search query is performed on the user’s log-on Locale, and the text analyzer associated with
the Locale also analyzes the term entered in Search for. In an advanced search, the user can select
multiple Locales to search on. If multiple Locales are chosen, separate searches are performed for each
Locale using the appropriate analyzer, and the results are combined.

Other thing I can think of is trying to re-index that item in the Admin Console:

search index item id

-n