On Thu, Nov 26, 2015 at 3:32 AM, Toke Eskildsen <t...@statsbiblioteket.dk> wrote: > If we had a hashing method String->long and guaranteed that there would > be no collisions (or we accepted the occasional faulty result), then we > could avoid the segment->global map as well as the centralized term > server. To my knowledge, this has not yet been attempted.
I've thought about that before, but another problem with that approach is how to map back to the actual term value (a string->long won't be reversible). A naive approach would also index the hash and then also store the original string values in docvalues. Hence after you find the top K hashes, you can look up a document with that hash to find a docid containing it, and then use the string docvalues to look it up (or store it as a payload). That's a lot of overhead. -Yonik