On Thu, Nov 26, 2015 at 3:32 AM, Toke Eskildsen <t...@statsbiblioteket.dk> 
wrote:
> If we had a hashing method String->long and guaranteed that there would
> be no collisions (or we accepted the occasional faulty result), then we
> could avoid the segment->global map as well as the centralized term
> server. To my knowledge, this has not yet been attempted.
I've thought about that before, but another problem with that approach
is how to map back to the actual term value (a string->long won't be
reversible).  A naive  approach would also index the hash and then
also store the original string values in docvalues.  Hence after you
find the top K hashes, you can look up a document with that hash to
find a docid containing it, and then use the string docvalues to look
it up (or store it as a payload).  That's a lot of overhead.

-Yonik

Reply via email to