Ruby Forum Ferret > How to index PDF

Posted by Sébastien Mizrahi (slum)
on 11.08.2008 17:01
Hello,

I'm actually trying to index PDF without success.
Anyone could explain how does it works ?

Thank you.
Posted by Nathan Li (nasi)
on 12.08.2008 05:54
Sébastien Mizrahi wrote:
> Hello,
> 
> I'm actually trying to index PDF without success.
> Anyone could explain how does it works ?
> 
> Thank you.

You must parse the PDF into pure text using some libs
Posted by Sébastien Mizrahi (slum)
on 12.08.2008 08:49
Nathan Li wrote:
> Sébastien Mizrahi wrote:
>> Hello,
>> 
>> I'm actually trying to index PDF without success.
>> Anyone could explain how does it works ?
>> 
>> Thank you.
> 
> You must parse the PDF into pure text using some libs

Thank you for your quick answer :)
Do you have the name of the lib I should use, and an small tutorial ?
Posted by neongrau __ (neongrau)
on 26.09.2008 13:56
Sébastien Mizrahi wrote:
> Nathan Li wrote:
>> Sébastien Mizrahi wrote:
>>> Hello,
>>> 
>>> I'm actually trying to index PDF without success.
>>> Anyone could explain how does it works ?
>>> 
>>> Thank you.
>> 
>> You must parse the PDF into pure text using some libs
> 
> Thank you for your quick answer :)
> Do you have the name of the lib I should use, and an small tutorial ?

i use the command line tool "pdftotext" for this which i put into 
lib/bin inside my app.

add a method to your model and add id to your indexed fields

e.g.

def text
  path = 'path/to/your/file.pdf'
  text = `#{RAILS_ROOT}/lib/bin/pdftotext -q \"#{path}\" -`
end


ralf