Incremental publishing picking up too many items

I have another incremental publishing puzzler - hoping that someone on this forum will have some insight or ideas for me.

We just moved our production Rhythmyx 6.5.2 server to a new virtual machine (we had been running on RHEL 4, and have moved to CentOS 5). Ever since the move, one of our incremental content lists has been picking up too many items - instead of picking up 1 or 2 modified items, it’s picking up 1665 items, many of which do not even have recent modification dates. We did not change the server name, or anything about any content list/edition, or publishing configuration. We are using the same repository and file system.

I did notice that when we brought the server down for the move, it was in the middle of publishing this content list. I’m wondering if we left a file or some log entry in an inconsistent state, resulting in these items not having the right status when the incremental publishing query picks them up.

I did try deleting all the rows in the RXSITEITEMS table for this content list (i.e. wiping out the publishing history), and then re-running. The interesting thing there was that when I ran the content list, it picked up 2318 items and published them, but on successive runs of that content list, it consistently picks up the same 1655 items. In other words, there are some items that the content list is handling correctly (i.e. they haven’t changed, and aren’t picked up), and some items that it is NOT handling correctly (i.e. they haven’t changed but they ARE picked up).

And there are no publishing errors.

Any takers for this puzzle?

Thanks for any and all ideas -

Is there any correlation between the items specified in the generated content list and their content types? Is there one content type completely excluded from the list? Is there one content type comprising all 1665 items? This could point to an issue in the content type / template configuration. Perhaps you have a template, that has a slot which is defined with the sys_AutoSlotContentFinder. This would cause any content item associated with that slot to publish on every incremental.

Do those 1665 content items happen to have any one piece of embedded content in common?

Does the content list preview also generate 1665 items in its list? I’d imagine it does, but worth checking.

From what I recall, content lists are generated in memory. Only the results appear in the database, so I don’t think this would be an issue of a file or log entry in an inconsistent state.

The content list preview does generate the same set of items that are including in the actual publishing. All of the items have the same content type - the JCR query limits the result based on content type. And those items don’t have any related item in common, nor does the template include any auto slots.

If you were to create a new content list, does this new content list generate the same number of items for publish? This would help you figure out if it is something wrong with the content list generator or not.

I have created a new content list (not by copying the problematic one - I created the new one from scratch). It is showing the same behavior - picking up the items even if they haven’t changed. So there is something specific to these items that is not working correctly - it’s not specific to the content list.

After our server moved to the new virtual machine, the first attempt to run this content list failed - I tracked down the server.log entry. It retrieved 0 items, and then failed with a “Connection closed” error:

2011-03-09 14:07:41,511 INFO [] Getting content list data took <Stopwatch stopped elapsed 37,138.59 ms> for 0 items
2011-03-09 14:07:41,589 ERROR [] Content list failure
org.hibernate.SessionException: Session is closed!
at org.hibernate.impl.AbstractSessionImpl.errorIfClosed(
at org.hibernate.impl.SessionImpl.flush(
at org.springframework.orm.hibernate3.SpringSessionSynchronization.beforeCommit(
at org.springframework.transaction.interceptor.TransactionAspectSupport.commitTransactionAfterReturning(
at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(
at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(
at $Proxy105.executeContentList(Unknown Source)
at Source)
at javax.servlet.http.HttpServlet.service(
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(
at org.apache.catalina.core.ApplicationFilterChain.doFilter(
at org.apache.catalina.core.ApplicationDispatcher.invoke(
at org.apache.catalina.core.ApplicationDispatcher.doInclude(
at org.apache.catalina.core.ApplicationDispatcher.include(
at com.percussion.utils.servlet.PSServletUtils.callServlet(Unknown Source)
atÔO0000(Unknown Source)
at Source)
at com.percussion.publisher.server.PSPubLookupRequest.processRequest(Unknown Source)
at Source)
2011-03-09 14:07:41,603 INFO [STDOUT] com.percussion.publisher.server.PSPublisherLookupRequestException: Content is not allowed in prolog.
2011-03-09 14:07:41,603 INFO [STDOUT] at com.percussion.publisher.server.PSPubLookupRequest.processRequest(Unknown Source)
2011-03-09 14:07:41,603 INFO [STDOUT] at Source)

I assume that you have compared this particular incremental vs the other ones that work? (For us this means that the incremental checkbox is checked and the generator is the sys_SearchGenerator). I also assume that you have tried a full publish of the content type (with a full publish content list) and then tried an incremental?

With those questions you could probably move on to testing out content items in particular…create a new content item that will be picked up by the incremental…does this new item always publish in the incremental? If it doesn’t, then you know that there is a problem with the previous items (and probably has to do with the move while this content list was running)…At this point, i would be to looking for cache files (that are used by the publisher…I don’t know if such a thing exists, but if it did, I would clear them out :wink: )…

Thanks for the suggestions. Yes, I have compared this incremental with the others that work, and also tried a full followed by an incremental.

I agree about looking for cache files used by the publisher! I’ve been looking around the file system, but haven’t found anything yet. Today I tried a Linux restart, in case there was some weird file or file locking issue, but alas, still picking up too many content items.

This issue seems have something to do with the incremental process picking up all “auto indexed” items.
You need to check with your site developers, see if there is any content types that were changed to be “auto indexed” type. Contact Pecussion support in case you have question about “auto indexed” content type.

The incremental process always pick up items with “auto indexed” content types regardless their last modified time and last published time.

I assume from your comment that you tried the new content item test and it didn’t show up on a subsequent incremental after it was published?

(Hmm - thought I replied a couple of days ago, but guess I didn’t hit the Save button!)

Thanks for the suggestions - a couple of responses:

  • The content type in question is not Auto Indexed - and in any case, we didn’t change any code or configuration when we moved to the new VM.
  • New content items have been created and successfully published by this content list. They are being picked up by the content list, but once they are published, they continue to be picked up by the content list, even if they have not changed. So, the content list is growing as items are created.

Well, we found a solution to this, although it is still a mystery about why this problem appeared when we moved from one VM to another.

The content type in question, Seminar, did not have any templates with auto slots. However, the Seminar items have related Seminar Instance items, and that content type (Seminar Instance) did have a template with auto slots. I noticed that only the Seminars with related Seminar Instances were being included in the repetitive re-publishing, and followed the trail from there. Once I removed the auto slots from the Seminar Instance template that had them, the Seminars went back to publishing correctly, only getting picked up when they have been modified.

I have no idea why this problem arose after our move to a new VM!