When your correctly configured portal tool is not working

published Mar 18, 2009

Case in point: portal_transforms has a pdf_to_text transform but when indexing a pdf the transform is not found so SearchableText returns no content from the pdf file.

For a customer at Zest Software I am migrating a site from Plone 2.5 to Plone 3.1. In the migrated site I uploaded a pdf file. None of its contents ended up in the SearchableText index. In a fresh Plone Site in the same Zope instance this did work. In the portal_transforms tool the pdf_to_text transform was correctly registered. The mimetypes_registry looked okay. The pdftotext binary was available on the system. So everything looked fine, but did not actually work. What is going on?

Well, it turned out that the portal_transforms in the ZMI was not actually used. A getToolByName call was made which did not give back this tool but a utility. And the utility did not have the pdf_to_text transform. So I went to the Components tab in the ZMI of the Plone Site root. I removed the portal_transforms utility from the xml listed there and applied the changes. This made the pdf_to_text transform available again. Problem solved.

Note that this is the first time I edited the xml on the Components tab, so be careful if you do this: it may have adverse effects that I have not noticed yet; and I can imagine that typos are dangerous here.

So how did this go wrong? I did not explore this further, so I can only guess. I think during the migration from 2.5 to 3.1 the pdf_to_text binary was not available. Or there was some other reason why the transform did not work. During the migration the utility got added so it missed this transform. I removed the portal_transforms tool and added it again to get the missing transforms. At that point the utility and the tool were not linked to each other anymore. Again, it is a guess.

So, the conclusion of all this: if your portal tool does not work like you think it should, check the Components tab and see if a utility is registered under the same name. Removing it there may help. Note: keep a backup of your Data.fs when you do this and do not try it out on a production website but try it locally first.