Items not found by editors' search

Much as we like Rhythmyx in other aspects, we have had repeated problems with editors being unable to find items that they have created when searching in Content Explorer. Typically, no search will find the affected items, although in one instance we could search by title but not by content ID; usually we teach content ID as the first method to use.

Obviously, this means that managed links and variants cannot be inserted. As one would expect, the same result happens whether the LH menu New Search in CE is used or whether the search is from within an editing session in EditLive!

Sometimes we have had problems with the indexing queue not being processed quickly enough; editors, of course, tend to work on one item and then on another that links to it, and hence need the first to be indexed by the time they have started to edit the second. Check the following day and the item can be found - but by then the editor has lost track.

We’d be interested in others’ experience here.

Regards

I thought we were the only people that happened to. Yes, it seems to happen often. Today I can find yesterday’s new content items through Active Assembly, but not nothing shows up when I search using through Active Assembly Table Editor.

It sounds like 2 issues are going on here. Some of the behaviors you are describing sound like possible bugs:
“Could find an item by title, but not id” - this should never happen
“I can find an item via AA, but not the same item using the Table editor.” - this is not correct behavior either.

There is a delay between inserting an item and it being available for search, but that delay should only be ~5sec (unless inserting pdfs or other binary files which can take a lot longer to index.) If you are seeing times much longer than this, there may be something else going on.

Possibly mulitple Rx installations running on the same network and using same search ports? (We had problems with search until port numbers were changed.)

That can cause odd behavior as the search request is processed by different servers (one has the expected item, the others don’t, causing inconsistent results.)

We don’t have multiple installations, so that’s not the issue in our case.

Paul Howard is right about multiple issues. We have had various search issues over time, and my post reflected that. Thus, being able to find items by title but not by content ID is not a current issue. Equally, in the past, indexing has seemingly failed altogether. In summary then, because of past experience we are concerned by the vulnerability of indexing, which is a key process, and with indexing not happening in a timely way, which is the case again now but not necessarily for the same reasons as previously.

Paul Howard’s indication of a five second index time is consistent with our experience, and in my view that in itself is a problem. Typical editor behaviour is to create an item, and then immediately to create/edit another that depends on the first (say an image, then a page that includes it, or a page, then another that links to it or uses one of its variants). Thus, any item must always be fully-indexed in less than the time taken to start work on another. In some cases, we are probably talking about a thirty-second maximum. Any failure to meet this timescale results in loss of editor confidence in the system, and involves editors in understanding under-the-hood processes from which Rhythmyx is supposed to protect them.

That’s fine if you’ve got one editor on the system and nothing delays things, as five seconds is plenty quick enough. However, on one of our sites we are now running imports from an external data source. That puts a couple of thousand items, each with four events, so eight thousand items in the queue, and an eleven hour delay. We can run the import off-peak, but we are supporting 24*7 international editing. We can also work to make the import more efficient (changed items only instead of a full import) - but even ten items are going to push us over the thirty-second target above.

We’ve investigated the current issue more since I first posted, and found a substantial number of corrupted items in the PSX_SEARCHINDEXQUEUE table, so we have cleared that and restarted, and we are currently running a full reindex. We’re also aware of the need to make sure that the RXSERVER user has appropriate privileges. However, in view of the above, it seems likely that even a fully-operating system will not meet editors’ reasonable expectations.

In summary, five seconds is an enormous time to index one item!

I really hate to beat this drum, but developer install(s) would also be counted under multiple installations. So even a Rhythmyx installed on your local workstation, if installed with default port numbers, would cause problems with Convera engine. (We had two dev installs, a test and production, and search was very buggy, until ports were changed.) Can’t imagine a Rx solution being developed and tested on production server.

We usually recommend that development machines be installed without the search engine.

However, there are things you cannot test (e.g. complex saved searches) without the full text search engine, and so sometimes you need it.

Dave

Well, I’m not using the default port on my development machine, and I’m still having the same problem I reported below.

It is the port used by Convera that matters, not the port where Rhythmyx listens.

We’re still not convinced that this is relevant to our issues, but for completeness our live and development/staging machines are on different subnets. Are you really saying that Convera will attempt to index across those? We could understand better within the subnet.

The machines can communicate. Is Convera by default using 9993, the Rhythmyx search port?

I’d restate that, in my view, a five-second-per-item index time is inherently vulnerable to failing because it is too close to the turnaround needed by editors working on successive and inter-related items. Is there any tuning that can be done?

Convera uses port 9993 on our production server and port 10003 on our development server.

Hope this helps…

Just to confirm, the fact that I’m using different ports on each server should minimize search problems?

Convera uses port 9993 on our production server and port 10003 on our development server.

It sure did for us.

[QUOTE=drossall;2105]We’re still not convinced that this is relevant to our issues, but for completeness our live and development/staging machines are on different subnets. Are you really saying that Convera will attempt to index across those? We could understand better within the subnet.

The machines can communicate. Is Convera by default using 9993, the Rhythmyx search port?
[/QUOTE]

David,

I don’t think that Subnets have any effect. If the machines can “see” each other, they will attempt to share indexing, which is one of the things that is known to cause problems.

Dave