Re: Indexing with Lucene

Raphael Osamede Omoregbee Wed, 20 Jul 2011 19:59:50 -0700

On 20/07/11 22:32, Simon Willnauer wrote:

On Wed, Jul 20, 2011 at 3:17 PM, raphael812<[email protected]>  wrote:

Hello everyone,


I am quite new to lucene and i am using the book lucene in action to learn.
I need help in extracting the body content of a html page using tika. The
implementation from the book only extracts the html's metadata not the main
body content which i need. Is it possible to extract body content from htmls
and pdfs and how.
Thanks for you help.

hey,
  this seems to be a tika / extraction specific question. you should
try to ask this question on the tika list, I bet you get a quick
response there!

simon

Raphael

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-with-Lucene-tp3185409p3185409.html
Sent from the Lucene - General mailing list archive at Nabble.com.

Hello all,

i tried searching through an index i created but it gives me thefollowing error in Netbeans 6.9Exception in thread "main"org.apache.lucene.index.CorruptIndexException: Unknown format version: -11

        at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:249)

atorg.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:73)atorg.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:677)atorg.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:69)

        at org.apache.lucene.index.IndexReader.open(IndexReader.java:316)
        at org.apache.lucene.index.IndexReader.open(IndexReader.java:202)

atorg.apache.lucene.search.IndexSearcher.<init>(IndexSearcher.java:63)

        at Searcher.search(Searcher.java:66)
        at Searcher.main(Searcher.java:59)

The trouble is i am able to search that same index using the commandline. does anyone have an idea why this is so. it was working some weeksago on netbeans and now it throws this error.

thanks for the help.

Re: Indexing with Lucene

Reply via email to