est Regards, Martin Owens
-Original Message-
From: Binkley, Peter [mailto:[EMAIL PROTECTED]
Sent: Wed 12/5/2007 4:07 PM
To: solr-user@lucene.apache.org
Subject: RE: Solr Highlighting, word index
We're doing a similar process using term vectors to look up the
bounding-box data in
380. We're looking at using
Lucene's new payload functionality.
Peter
-Original Message-
From: Mike Klaas [mailto:[EMAIL PROTECTED]
Sent: Wednesday, December 05, 2007 2:19 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr Highlighting, word index
On 5-Dec-07, at 1:02 PM,
On 5-Dec-07, at 1:02 PM, Owens, Martin wrote:
Thanks Mike, So in essence I need to write a new RequestHandler
plugin which takes the query string, tokenises it then perform a
some kind of action against the index to return results which I
should then be able to get the termVectors from?
You do not necessarily need two requests; instead, you can override
or modify the request handler you are using (StandardRequestHandler,
DisMaxREquestHandler) to return the information. You'll have to
process the Query to extract the terms (like HighlighingUtils does),
then get the TermVe
On 3-Dec-07, at 10:58 AM, Owens, Martin wrote:
You can tell lucene to store token offsets using TermVectors
(configurable via schema.xml). Then you can customize the request
handler to return the token offsets (and/or positions) by retrieving
the TVs.
I think that is the best plan of actio
Owens, Martin wrote:
Hello everyone,
We're working to replace the old Linux version of dtSearch with Lucene/Solr,
using the http requests for our perl side and java for the indexing.
The functionality that is causing the most problems is the highlighting since
we're not storing the text in so
> You can tell lucene to store token offsets using TermVectors
> (configurable via schema.xml). Then you can customize the request
> handler to return the token offsets (and/or positions) by retrieving
> the TVs.
I think that is the best plan of action, how do I create a custom request
h
It's good you already have the data because if you somehow got it from
some sort of calculations I'd have to tell my product manager that
the feature he wanted that I told him couldn't be done with our data
was possible after all ...
About page breaks:
Another approach to paging is to index a spe
On 30-Nov-07, at 1:02 PM, Owens, Martin wrote:
Hello everyone,
We're working to replace the old Linux version of dtSearch with
Lucene/Solr, using the http requests for our perl side and java for
the indexing.
The functionality that is causing the most problems is the
highlighting since
> Or I'm just completely off base here.
A little, we already have the locations for each word on every ocr, we just
need the word index to feed into the existing program.
Best Regards, Martin Owens
Oh, good luck on this! I've had similar issues and have just thrown up my
hands. How do you expect to be able to correlate a word in the index
with the bounding box in the OCR? I'm not sure this is a solved problem
unless your OCR is *very* regular and clean. Even if you can calculate
the ordinal p
11 matches
Mail list logo