On Wed, Jul 20, 2011 at 3:17 PM, raphael812 <[email protected]> wrote:
> Hello everyone,
>
> I am quite new to lucene and i am using the book lucene in action to learn.
> I need help in extracting the body content of a html page using tika. The
> implementation from the book only extracts the html's metadata not the main
> body content which i need. Is it possible to extract body content from htmls
> and pdfs and how.
> Thanks for you help.

hey,
 this seems to be a tika / extraction specific question. you should
try to ask this question on the tika list, I bet you get a quick
response there!

simon
>
> Raphael
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Indexing-with-Lucene-tp3185409p3185409.html
> Sent from the Lucene - General mailing list archive at Nabble.com.
>

Reply via email to