error when uploading pdf

Hi

I’ve implemented the sys_textExtraction exit on a Rhythmyx 5.71 system and all is fine apart from some PDFs are uploaded with the following error message and no text is extracted.

The text extraction failed with result code ‘1’ for file type ‘Adobe Acrobat (PDF)’. The message was: Inso error #9: file is corrupt.

It’s a bit strange as I can open the PDF file OK. I can also upload the PDF to a 6.5.2 system and the text is extracted fine.

Any clues?

Cheers
James

What version of “PDF” is this? The text extraction utilities are rather old (as they come to us through Convera) and I’m not sure what versions they support.

Dave

Hi Dave

I’ve opened the PDF in a text editor and the first line is

%PDF-1.6

I’ve managed to upload the Rhythmyx 6.5.2 documentation but this is

%PDF-1.4

Cheers
James

The extraction code has not changed between 5.7 and 6.5.2, so if it works on one, you should be able to get it to work on the other.
There are 2 PDF extractors and they each have different limitations. For example, (if I remember correctly) one of them doesn’t handle multi-column pdfs. I’m thinking the ‘corrupt’ error may be ‘wrong’ in the sense that the document is OK, but the converter can’t read it (as Dave mentioned, these are older technology.) I would verify that you are using the same PDF extractor in the version that is not working as in the working version.

Hi Paul

I would verify that you are using the same PDF extractor in the version that is not working as in the working version.

How do I find this information?

Cheers
James

Check where you’ve configured the sys_textExtraction exit and see what you’ve supplied for the last (9th) parameter. The parameter name is “PDFConversion”, and the value can be either “SINGLE” or “MULTI”. If no value is specified, then it defaults to “SINGLE”.

Hi Jay

It was set to MULTI. I’ve now removed this so it defaults to SINGLE and I can now upload the PDF.

Cheers
James