Search-by-contentid failing

Hi there,

On our production machine, searches are failing, but only when you search by contentid. So if you search by content type, for example, you get results. If you do a search by contentid, you get an error that says that the search engine cannot connect to the server. The stack trace of the error looks like a TCP connection problem. But that’s confusing, since it only happens when you search by contentid.

Do you know what this behavior is?

Also, is there a way to fix it without restarting Rhythmyx? We are restarting Rhythmyx now, but we would like to know if there is another way, so that if it happens again, we can fix it without interrupting people’s work.

Forgot to mention: we did try restarting just the search engine. But that didn’t fix the problem.

Thanks.

2009-02-02 09:46:55,748 ERROR [PSSearchHandler] void com.percussion.server.webservices.PSSearchHandler.searchAction(com.percussion.server.PSRequest,org.w3c.dom.Document) throws com.percussion.error.PSException
com.percussion.search.convera.PSConveraSearchException: The search engine ‘Convera’ failed to connect to the server. The specific problem is: server failed.
at com.percussion.search.convera.PSQueryContext.openQuery(Native Method)
at com.percussion.search.convera.PSQueryContext.o00000(Unknown Source)
at com.percussion.search.convera.PSSearchQueryImpl.super(Unknown Source)
at com.percussion.search.convera.PSSearchQueryImpl.performSearch(Unknown Source)
at com.percussion.server.webservices.PSSearchHandler.super(Unknown Source)
at com.percussion.server.webservices.PSSearchHandler.search(Unknown Source)
at com.percussion.server.webservices.PSSearchHandler.searchAction(Unknown Source)
at sun.reflect.GeneratedMethodAccessor942.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at com.percussion.server.webservices.PSWebServicesBaseHandler.processAction(Unknown Source)
at com.percussion.server.webservices.PSWebServicesRequestHandler.processRequest(Unknown Source)
at com.percussion.servlets.PSAppServlet.service(Unknown Source)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:810)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:252)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
at com.percussion.webdav.PSWebDavRequestFilter.doFilter(Unknown Source)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
at com.percussion.servlets.PSSecurityFilter.doFilter(Unknown Source)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
at org.jboss.web.tomcat.filters.ReplyHeaderFilter.doFilter(ReplyHeaderFilter.java:81)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178)
at org.jboss.web.tomcat.security.CustomPrincipalValve.invoke(CustomPrincipalValve.java:39)
at org.jboss.web.tomcat.security.SecurityAssociationValve.invoke(SecurityAssociationValve.java:159)
at org.jboss.web.tomcat.security.JaccContextValve.invoke(JaccContextValve.java:59)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148)
at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:856)
at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.processConnection(Http11Protocol.java:744)
at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:527)
at org.apache.tomcat.util.net.MasterSlaveWorkerThread.run(MasterSlaveWorkerThread.java:112)
at java.lang.Thread.run(Unknown Source)

Turns out that restarting didn’t work. I turned on ‘debug search’ via the console and ran a failed search. Here is the output:

2009-02-02 11:00:08,232 INFO [PSSearchEngineImpl] ‘status’ completed with return code = 0
2009-02-02 11:00:08,232 INFO [PSSearchEngineImpl] ‘status’ console output:
exec v7.0.3 at 127.0.0.1:9993 (localhost.localdomain)
Program | Interface | State | Status
-------------------±----------±-------------±-------------------------------
cqdh | cqdh | not running |
cqns | cqnameser+| active |
cqxref | cqxref | not running |
nfserv | netfile | not running |
cqindex_ce101 | cqindex | not running |
cqindex_ce101;229 | cqindex | not running |
cqindex_ce101;435 | cqindex | not running |
cqindex_ce303 | cqindex | not running |
cqindex_ce313 | cqindex | not running |
cqindex_ce314 | cqindex | not running |
cqindex_ce315 | cqindex | not running |
cqindex_ce320 | cqindex | not running |
cqindex_ce322 | cqindex | not running |
cqindex_ce322;1 | cqindex | not running |
cqindex_ce322;2 | cqindex | not running |
cqindex_ce342 | cqindex | not running |
cqindex_ce378 | cqindex | not running |
cqindex_ce396 | cqindex | not running |
cqindex_ce396;2 | cqindex | not running |
cqindex_ce396;3 | cqindex | not running |
cqindex_ce396;4 | cqindex | not running |
cqindex_ce401 | cqindex | not running |
cqindex_ce403 | cqindex | not running |
cqindex_ce403;1 | cqindex | not running |
cqindex_ce405 | cqindex | not running |
cqindex_ce407 | cqindex | not running |
cqsched_1 | cqsched | active |
cqquery_1 | cqquery | active |
cqserv_1 | cqserv | active |
cqsched_2 | cqsched | active |
cqquery_2 | cqquery | active |
cqserv_2 | cqserv | active |
cqsched_3 | cqsched | active |
cqquery_3 | cqquery | exited | time1: Mon Feb 2 10:49:15 2009
| | | process terminated - signal 6
| | | time2: Mon Feb 2 10:49:05 2009
| | | process terminated - signal 6
cqserv_3 | cqserv | active |
2009-02-02 11:00:08,542 ERROR [PSSearchHandler] void com.percussion.server.webservices.PSSearchHandler.search
Action(com.percussion.server.PSRequest,org.w3c.dom.Document) throws com.percussion.error.PSException
com.percussion.search.convera.PSConveraSearchException: The search engine ‘Convera’ failed to connect to the
server. The specific problem is: server failed.

We had a similar situation. We could observe that the search index was corrupt by looking at the *.err logs in Rhythmyx/sys_search/rware/rx/logs.

Turned out that there was an orphaned process trying to write to the same .dat file as the currently running process.

Ultimately, we ended up bouncing the physical hardware; although, shutting down Rhythmyx and then just killing any remaining processes probably would have been sufficient.

Not saying that’ll fix your problem, but it may be a start.

We’ve had similar problems. I do not have a solution, but the following points may be useful:
[ol]
[li]The reason it fails on content ID searches, but not when searching on other fields, is probably because that searches every content type and item on the system. In my experience, if one item or type fails for some reason, the entire search fails.[/li][li]You can restart the search engine without restarting the server. Go to http://server:port/Rhythmyx/admin/console.jsp, log in, then enter restart search. Other commands that may be useful include:[/li][ul][li]show status search[]search index item contentid[]search index type contenttypeid [*]search index recreate contenttypeid.[/ul][/li][/ol]

Thanks for the tips.

I took a look at the logs, and indeed there is a data file, cqquery_3.dat, that it can’t open. And yesterday, we rebuilt the search indexes, and it looks like that particular data file never changed, while the other ones did:

Feb 2 17:10 cqquery_1.dat
Feb 3 09:13 cqquery_2.dat
Jan 30 15:51 cqquery_3.dat

I don’t see any suspicious processes hanging around though. They are all children of the current Rhythmyx process, and they look the same as the ones on our dev server (which doesn’t have the problem)

–April

Whenever I see that error, the first thing I do is look at the .err logs as noted by Darrell. This may give a better indicator of why it failed and provide a pointer of where to look next.
As far as restarting, there is the technique noted by Andrew. However, if one of the Convera processes is ‘stuck’ and won’t shut down, this won’t work. In that case, you can manually kill execd and all cq* processes (kill the tree, execd starts all the other processes.) Then use ‘restart search’. This can be done while the server is running.