Publishing run content lists XML file

chrisclancy · November 3, 2011, 9:07am

Hi,

Is there a straightforward way of producing an XML file after a site publishing run which will include all of the content on the site that has just been published? We’d also like to exclude certain content types from this list but we could take the full XML file and remove other content afterwards if that is easier. An alternative would be to produce the XML directly from the RX database but that seems like it may be complicated.
The point of this is to provide a search provider with the xml showing which pages have been updated and avoid the need to trawl through the whole site daily.

Thanks

Chris

darrell.wells · November 3, 2011, 2:43pm

Assuming your on Rhythmyx 6.7 or CM System 7.x (which I haven’t seen the guts of yet, but I assume the needed infrastructure hasn’t changed), you could set up a post-edition task to run your script to produce the XML doc. Or, if you only want to run it once a day, week, whatever, set it up as a Quartz scheduled job.

Regarding the actual creation of your XML document, that sounds like a perfect opportunity to leverage an XML application (created via Workbench). You’d need to define the DTD (probably given to you by the search provider). A DTD is always the starting point for an XML application. Then, you set up the XML application to use a SQL query to populate your XML. The database tables you’d need to join are PSX_PUBLICATION_STATUS (because you don’t want results from editions that aren’t done publishing/unpublishing), PSX_PUBLICATION_DOC, and PSX_PUBLICATION_SITE_ITEM. The joins on those tables should be relatively easy to decipher. If you need assistance with that, just ping back.

Straightforward? yes. Trivial? no. Are the tools required to create such a report already on the server? yes, but you need to put it all together.

Alternatively, you could write a JSP to build your XML doc by using a SQL query and iterating through the results. I built something like this at Virginia Tech to provide a Google site map document. I don’t feel comfortable posting it without first checking with them regarding intellectual property rights. Now, you’re asking… why didn’t I follow my own advice about building an XML app? As I recall it (from 1.5 years ago), we were building the site map functionality as a Content Explorer UI Action Menu. And while the UI Action Menus have the Visibility settings and various usage contexts, we wanted to ensure that one community could not hit the map application URL directly and query the site map of another community’s site. Therefore, we needed to implement authorization controls. That seemed easier to do in the JSP (since we’d already implemented it elsewhere). If you don’t need that kind of an ACL, then the XML app would probably be a cleaner implementation path.

chrisclancy · November 4, 2011, 11:07am

Thanks Darrell, will take a look at this shortly.