[Catalyst] *****SPAM***** Plain text extraction

Anton Katsarov tony at katsarov.org
Tue Dec 8 20:58:36 GMT 2009


Hello, all,

I have an issue not exactly related to Catalyst. I'm working on a
Catalyst application which need to do a text search inside various
formats. So I am trying to convert all files to plain text. It needs to
support PDF, spreadsheets and rich text (MS Word and OpenOffice
formats). Up till now I've managed to extract only the spreadsheets
using Spreadsheet::Read. I tried CAM::PDF for PDFs ad works fine for
Latin content. Unfortunately I need to extract Cyrillic files too, but I
get only strange characters.

Anyone has experience in this? I'll appreciate any help.

Thanks in advance and sorry for bothering.

Best regards,

-- 
Anton Katsarov <tony at katsarov.org>




More information about the Catalyst mailing list