Hello everyone, I am quite new to lucene and i am using the book lucene in action to learn. I need help in extracting the body content of a html page using tika. The implementation from the book only extracts the html's metadata not the main body content which i need. Is it possible to extract body content from htmls and pdfs and how. Thanks for you help.
Raphael -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-with-Lucene-tp3185409p3185409.html Sent from the Lucene - General mailing list archive at Nabble.com.
