Is there any difference to the relevancy score for a document that has
been added directly to an index vs. the same document that got into the
index because of a merge?
In other words, I'd like to build my index in pieces (since people in
different cities will be working on parts of it), but I want the search
results to be as if it were one index.
My first thought was to keep the indexes separate and use multicore
shards to search both indexes. I decided against that because of two things:
1) It is slower.
2) The relevancies are wrong, since the frequency of words is really
different in the two indexes.
My second thought is to have the people work on separate indexes, and
merge them together just before going to production. That would
definitely solve the first problem, but I don't know if it solves the
second.
I also don't know how to test that myself. I want to build my index both
ways then do a search and compare the results, but how decisive that is
depends on the particular words I use in the search. Is there a way to
dump everything about a particular document, so I could compare the two
indexes? Are there other tools available that would help?
Thanks for any insight.
- relevancy and merging Paul Rosen
-