Hi Peter, Thanks for the recommendation - I believe we are thinking along the same lines, but wanted to check to make sure. Are you suggesting something different than my #5 (below) or are we essentially suggesting the same thing?
On Wed, Oct 24, 2012 at 1:20 PM, Peter Keegan <peterlkee...@gmail.com> wrote: > Could you index your 'phrase tags' as single tokens? Then your phrase > queries become simple TermQuerys. >> >> 5) *This is my current favorite*: stop tokenizing/analyzing these >> terms and just use KeywordTokenizer. Most of these phrases are >> pre-vetted, and it may be possible to clean/process any others before >> creating the docs. My main worry here is that, currently, if I >> understand correctly, a document with the phrase "brazilian pop" would >> still be returned as a match to a seed document containing only the >> phrase "brazilian" (not the other way around, but that is not >> necessary), however, with KeywordTokenizer, this would no longer be >> the case. If I switched from the current dubious tokenize/stem/etc... >> and just used Keyword, would this allow queries like "this used to be >> a long phrase query" to match documents that have "this used to be a >> long phrase query" as one of the multivalued values in the field >> without having to pull term positions? (and thus significantly speed >> up performance). >> Thanks again, Aaron