RE: Solr Highlighting, word index

2007-12-10 Thread Owens, Martin
est Regards, Martin Owens -Original Message- From: Binkley, Peter [mailto:[EMAIL PROTECTED] Sent: Wed 12/5/2007 4:07 PM To: solr-user@lucene.apache.org Subject: RE: Solr Highlighting, word index We're doing a similar process using term vectors to look up the bounding-box data in

RE: Solr Highlighting, word index

2007-12-05 Thread Binkley, Peter
380. We're looking at using Lucene's new payload functionality. Peter -Original Message- From: Mike Klaas [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 05, 2007 2:19 PM To: solr-user@lucene.apache.org Subject: Re: Solr Highlighting, word index On 5-Dec-07, at 1:02 PM,

Re: Solr Highlighting, word index

2007-12-05 Thread Mike Klaas
On 5-Dec-07, at 1:02 PM, Owens, Martin wrote: Thanks Mike, So in essence I need to write a new RequestHandler plugin which takes the query string, tokenises it then perform a some kind of action against the index to return results which I should then be able to get the termVectors from?

RE: Solr Highlighting, word index

2007-12-05 Thread Owens, Martin
You do not necessarily need two requests; instead, you can override or modify the request handler you are using (StandardRequestHandler, DisMaxREquestHandler) to return the information. You'll have to process the Query to extract the terms (like HighlighingUtils does), then get the TermVe

Re: Solr Highlighting, word index

2007-12-05 Thread Mike Klaas
On 3-Dec-07, at 10:58 AM, Owens, Martin wrote: You can tell lucene to store token offsets using TermVectors (configurable via schema.xml). Then you can customize the request handler to return the token offsets (and/or positions) by retrieving the TVs. I think that is the best plan of actio

Re: Solr Highlighting, word index

2007-12-05 Thread Ryan McKinley
Owens, Martin wrote: Hello everyone, We're working to replace the old Linux version of dtSearch with Lucene/Solr, using the http requests for our perl side and java for the indexing. The functionality that is causing the most problems is the highlighting since we're not storing the text in so

RE: Solr Highlighting, word index

2007-12-03 Thread Owens, Martin
> You can tell lucene to store token offsets using TermVectors > (configurable via schema.xml). Then you can customize the request > handler to return the token offsets (and/or positions) by retrieving > the TVs. I think that is the best plan of action, how do I create a custom request h

Re: Solr Highlighting, word index

2007-11-30 Thread Erick Erickson
It's good you already have the data because if you somehow got it from some sort of calculations I'd have to tell my product manager that the feature he wanted that I told him couldn't be done with our data was possible after all ... About page breaks: Another approach to paging is to index a spe

Re: Solr Highlighting, word index

2007-11-30 Thread Mike Klaas
On 30-Nov-07, at 1:02 PM, Owens, Martin wrote: Hello everyone, We're working to replace the old Linux version of dtSearch with Lucene/Solr, using the http requests for our perl side and java for the indexing. The functionality that is causing the most problems is the highlighting since

RE: Solr Highlighting, word index

2007-11-30 Thread Owens, Martin
> Or I'm just completely off base here. A little, we already have the locations for each word on every ocr, we just need the word index to feed into the existing program. Best Regards, Martin Owens

Re: Solr Highlighting, word index

2007-11-30 Thread Erick Erickson
Oh, good luck on this! I've had similar issues and have just thrown up my hands. How do you expect to be able to correlate a word in the index with the bounding box in the OCR? I'm not sure this is a solved problem unless your OCR is *very* regular and clean. Even if you can calculate the ordinal p