Re: capturing field length into a stored document field

Grant Ingersoll Fri, 04 Sep 2009 15:27:58 -0700

The Similarity.lengthNorm() is a callback from Lucene that gives youthe information you seek. Of course, the trick still is how to usethat. Perhaps you can describe a bit more about why you need thatlength.


On Sep 4, 2009, at 11:34 AM, mike.schultz wrote:

For various statistics I collect from an index it's important for meto know
the length (measured in tokens) of a document field.  I can get that
information to some degree from the "norms" for the field but a) the
resolution isn't that great, and b) more importantly, if boosts areused
it's almost impossible to get lengths from this.

Here's two ideas I was thinking about that maybe some can comment on.
1) Use copyto to copy the field in question, fieldA to an additionfield,fieldALength, which has an extra filter that just counts the tokensand only
outputs a token representing the length of the field.  This has the
disadvantage of retokenizing basically the whole document (becausethe fieldin question is basically the body). Plus I would think litteringthe term
space with these tokens might be bad for performance, I'm not sure.
2) Add a filter to the field in question which again counts thetokens.This filter allows the regular tokens to be indexed as usual butsomehowmanages to get the token-count into a stored field of the document.This
has the advantage of not having to retokenize the field and instead of
littering the token space, the count becomes docdata for each doc.Can this
be done?  Maybe using threadLocal to temporarily store the count?

Thanks.

--
View this message in context: 
http://www.nabble.com/capturing-field-length-into-a-stored-document-field-tp25297690p25297690.html
Sent from the Solr - User mailing list archive at Nabble.com.


--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)using Solr/Lucene:

http://www.lucidimagination.com/search

Re: capturing field length into a stored document field

Reply via email to