Hi Peter,

Thanks for the recommendation - I believe we are thinking along the
same lines, but wanted to check to make sure. Are you suggesting
something different than my #5 (below) or are we essentially
suggesting the same thing?

On Wed, Oct 24, 2012 at 1:20 PM, Peter Keegan <peterlkee...@gmail.com> wrote:
> Could you index your 'phrase tags' as single tokens? Then your phrase
> queries become simple TermQuerys.

>>
>> 5) *This is my current favorite*: stop tokenizing/analyzing these
>> terms and just use KeywordTokenizer. Most of these phrases are
>> pre-vetted, and it may be possible to clean/process any others before
>> creating the docs. My main worry here is that, currently, if I
>> understand correctly, a document with the phrase "brazilian pop" would
>> still be returned as a match to a seed document containing only the
>> phrase "brazilian" (not the other way around, but that is not
>> necessary), however, with KeywordTokenizer, this would no longer be
>> the case. If I switched from the current dubious tokenize/stem/etc...
>> and just used Keyword, would this allow queries like "this used to be
>> a long phrase query" to match documents that have "this used to be a
>> long phrase query" as one of the multivalued values in the field
>> without having to pull term positions? (and thus significantly speed
>> up performance).
>>

Thanks again,
     Aaron

Reply via email to