Extracting PDF metadata on upload

olivertwood · February 19, 2010, 1:07pm

I’m extracting meta-data from PDF documents when they are uploaded to a content editor, and storing the meta-data in fields. I’ve written a Java exit to do this, and I want to use sys_fileinfo to determine the usual _extension, _filesize fields as usual.

The exit does its extraction quite efficiently, but the binary file itself is not being stored after the exit runs: it always shows as null in the database, and sys_fileinfo gives it a size of 0.

Any thoughts as to why this should be? Am I not closing something that I should be?

I attach my source code for reference.

thanks,

Oliver

dbenua · February 22, 2010, 10:30am

Oliver,

When you release a PSPurgableTempFile, it will automatically purge itself. The PSPurgableFileInputStream.close() method will call the release() method on the underlying temp file. The file is null because you’ve released it before that part of the code gets called.

It should be possible, rather than copying the code from sys_fileInfo, you could use the standard sys_fileinfo extension write your code only to look for the PDF properties. This would be a little less efficient at runtime, but it would make your code simpler to maintain.

Dave

olivertwood · February 24, 2010, 4:28am

Thanks Dave.

I ended up doing just that! It all works smoothly now

Cheers

Oliver