[Catalyst] Plain text extraction [update]

Anton Katsarov tony at katsarov.org
Tue Dec 8 21:32:14 GMT 2009


В 22:58 +0200 на 08.12.2009 (вт), Anton Katsarov написа:
> Hello, all,
> 
> I have an issue not exactly related to Catalyst. I'm working on a
> Catalyst application which need to do a text search inside various
> formats. So I am trying to convert all files to plain text. It needs to
> support PDF, spreadsheets and rich text (MS Word and OpenOffice
> formats). Up till now I've managed to extract only the spreadsheets
> using Spreadsheet::Read. I tried CAM::PDF for PDFs ad works fine for
> Latin content. Unfortunately I need to extract Cyrillic files too, but I
> get only strange characters.
> 
> Anyone has experience in this? I'll appreciate any help.
> 
> Thanks in advance and sorry for bothering.
> 
> Best regards,
> 

I used  SWISH::Filter for PDFs and DOCs. So let's say it is only ODT
format needed to be converted.

Regards





More information about the Catalyst mailing list