Indexing with Lucene

raphael812 Wed, 20 Jul 2011 06:17:53 -0700

Hello everyone,

I am quite new to lucene and i am using the book lucene in action to learn.
I need help in extracting the body content of a html page using tika. The
implementation from the book only extracts the html's metadata not the main
body content which i need. Is it possible to extract body content from htmls
and pdfs and how.
Thanks for you help.


Raphael

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-with-Lucene-tp3185409p3185409.html
Sent from the Lucene - General mailing list archive at Nabble.com.

Indexing with Lucene

Reply via email to