[Catalyst]  Plain text extraction [update]
    Anton Katsarov 
    tony at katsarov.org
       
    Tue Dec  8 21:32:14 GMT 2009
    
    
  
В 22:58 +0200 на 08.12.2009 (вт), Anton Katsarov написа:
> Hello, all,
> 
> I have an issue not exactly related to Catalyst. I'm working on a
> Catalyst application which need to do a text search inside various
> formats. So I am trying to convert all files to plain text. It needs to
> support PDF, spreadsheets and rich text (MS Word and OpenOffice
> formats). Up till now I've managed to extract only the spreadsheets
> using Spreadsheet::Read. I tried CAM::PDF for PDFs ad works fine for
> Latin content. Unfortunately I need to extract Cyrillic files too, but I
> get only strange characters.
> 
> Anyone has experience in this? I'll appreciate any help.
> 
> Thanks in advance and sorry for bothering.
> 
> Best regards,
> 
I used  SWISH::Filter for PDFs and DOCs. So let's say it is only ODT
format needed to be converted.
Regards
    
    
More information about the Catalyst
mailing list