Publish aborting for unknown reason

I am seeing examples like this in my logs:

2013-03-20 17:37:51,214 WARN [PSPublishingJob] Edition ‘Decision_Support_Cert’ (394) has been aborted by publish queue timed out. The job is still waiting for 196 unprocessed items.

What kinds of issues would cause a publish to fail in this way? Database slowness maybe? This is on a Windows Server 2003 machine using 7.0.3. Database backend is MSSQL on a separate server. Any tips for troubleshooting? The frequency seems to be one to five times a day, clustered close together, usually early morning, (this example is from a rare afternoon failure) but doesn’t align with any maintenance schedules or anything that I am aware of.

Thanks,

-Jason

There was a rare scenario in 6.7 where publishing could deadlock, so a publishing time out was introduced in a patch I believe, to kill the job if it timed out. This shouldn’t be happening during normal operation though.

Vaguely, this is how it works internally:

  • Content Lists are run to select content.
  • Items are added to a message queue.
  • Worker threads are started to pop items of the queue and assemble content. This is where the time out check is.
  • Content is Assembled into to a temp file.
  • When all items are popped off of the queue, content is delivered using the Delivery Handler specified on the jobs Delivery Type.

1.) Check patch level and make sure it is current. There were a couple of bugs related to this error that were patched. 7.2 with the latest patch has many publishing fixes.

2.) Check Rhythmyx\AppServer\server\rx\deploy\rxapp.ear\rxapp.war\WEB-INF\config\spring\server-beans.xml

The timeout values can be set by setting these properties here. The default on the queue timeout is 10 minutes. This will require a restart.

<bean class="com.percussion.services.utils.general.PSServiceConfigurationBean" id="sys_beanConfiguration">
		<!-- Properties that can be set here:
			(defaults to 0. 
				It is the max number of content nodes cached by the assembly service)  
			<property name="maxCachedContentNodeSize" value="0"/> 
			(defaults to 600 minutes or 10 hours.
				It is the timeout that a publishing job to wait for the job specific status updates from the processed publishing work)  
			<property name="publishJobTimeout" value="600"/>
			(defaults to 10 minutes
				It is the timeout that a publishing job to wait for the notification of any publishing job status updates (from the queue). This is acting like a heartbeat of the publishing queue.  
			<property name="publishQueueTimeout" value="10"/>
		-->
      <property name="quartzProperties">
         <props>
				<!-- The properties specified here will override the properties defined
					 in the 'quartzProperties' property of the 'sys_quartzScheduler' bean
					 found in beans.xml.
				-->
            <prop key="org.quartz.jobStore.isClustered">false</prop>
            <prop key="org.quartz.scheduler.instanceName">SystemMaster</prop>
         </props>
      </property>

3.) Make sure nothing like a virus scanner or windows search is scanning the Rhythmyx install. For this particular problem, the Message Queue used by publishing is stored in the Rhythmyx\AppServer\server\rx\data directory. This is a temporary database that the product uses for queues.

4.) Try clearing temp dirs. Shutdown server, check Java processes to make sure it is down, clear the contents of (don’t delete the directories just the contents):


Rhythmyx\AppServer\server\rx\work
Rhythmyx\AppServer\server\rx	mp
Rhythmyx\AppServer\server\rx\data

Restart server and see if that resolves problem.

5.) Check thread counts on Publishing. Rhythmyx\AppServer\server\rx\deploy\rxapp.ear\rxapp.war\WEB-INF\config\user\spring\publisher-beans.xml
Try throttling this back to defaults with concurrentConsumers at 2. This will require a restart.


<!-- Publishing handler setup for the assembly and status queues -->
	<bean id="sys_publishMessageHandlerContainerQ"
		  class="org.springframework.jms.listener.DefaultMessageListenerContainer">
		  <property name="concurrentConsumers" value="2"/>
		  <property name="connectionFactory" ref="sys_jmsConnectioarnFactory" />
		  <property name="destination" ref="sys_publishQueueDestination" />
		  <property name="messageListener" ref="sys_publishQueueListener" />
	</bean>	

6.) Turn on Debug Logging in Rhythmyx\AppServer\server\rx\conf\log4j.xml

Debug logging is VERY chatty, so you only want to set this while trying to catch the problem.

Change:

<logger name="com.percussion">
      <level value="INFO" />
      <appender-ref ref="RXFILE"/>
   </logger>

to

<logger name="com.percussion">
      <level value="DEBUG" />
      <appender-ref ref="RXFILE"/>
   </logger>

Restart is not required after changing this file. If tailing the log you will see the system detect the change.

If all of that fails I would log a ticket with support and include your debug heavy logs.

-n