On Wed, Jul 20, 2011 at 3:17 PM, raphael812 <[email protected]> wrote: > Hello everyone, > > I am quite new to lucene and i am using the book lucene in action to learn. > I need help in extracting the body content of a html page using tika. The > implementation from the book only extracts the html's metadata not the main > body content which i need. Is it possible to extract body content from htmls > and pdfs and how. > Thanks for you help.
hey, this seems to be a tika / extraction specific question. you should try to ask this question on the tika list, I bet you get a quick response there! simon > > Raphael > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Indexing-with-Lucene-tp3185409p3185409.html > Sent from the Lucene - General mailing list archive at Nabble.com. >
