I've been thinking about this too, and haven't come up with any GREAT way. But
there are several possible ways, that will do different things, good or bad,
depending on the nature of your data and exactly what you want to do. So here
are some ideas I've been thinking about, but not a ready made solution for you.
One thing first, the statement about "copy field to copy all dismax terms into
one big field." doesn't exactly make sense. Copyfield is something that happens
at index time, whereas dismax is only something that is used at query time.
Since it's only used at query time, just because you are using dismax for your
main search, doesn't mean you have to use dismax for your autocomplete query.
The autocomplete query, that returns the things you're going to display in your
auto-complete list, can be set up however you want. (we are talking about an
auto-complete list, not a "Google Instant" style autocomplete, right? The
latter would introduce even more issues).
So, do you want the autocomplete to only match on the _entire query_ as
entered, or do you want an autocomplete for each word? For instance, if I
enter "dog walking", should the autocomplete be autocompleting "dog walking" as
a whole, or should it be autocompleting "walking" by the time I've typed in
"dog walking"? It's easier to set up to autocomplete on the whole phrase.
Next, though, you probably want autocomplete to complete on partial words, not
just complete words. "Dog wal" should autocomplete to "dog walking". That
introduces an extra kink too. But let's assume we want that.
So one idea. At index time, populate a field that will be used exclusively for
auto-completing. Make this field actually _non-tokenizing_, probably a Text
type but with the KeywordTokenizer (ie, the non-tokenizing tokenizer, heh).
So if you're indexing "dog walking", then the token in the field is actually
"dog walking", not ["dog","walking"]. Next, normalize it by removing
punctuation (because we probably don't want to consider punctuation for
auto-completing), and maybe normalizing whitespace by collapsing any adjacent
whitespace to a single space, and removing whitespace at beginning and end. So
" dog walking " will index as "dog walking". (This matters more at
query time then index time, but less confusing to do the same normalization at
both points). That can be done with a charpatternfilter.
But now we've also got to n-gram expand it. So if the term being indexed is
"dog walking", we actually want to store ALL these terms in the index:
"d"
"do"
"dog"
"dog "
"dog w"
"dog wa"
etc
Ie, n-grams, but only expanded out from the front. I believe you can use the
EdgeNGramFilterFactory for this (at index time only, this one you don't want in
your query-time analyzers). Although I haven't actually tried the
EdgeNGramFilterFactory with a non-tokenized field, I think it should work. This
will expand the size of your index, hopefully not to a problematic degree.
Now, to actually do the auto-complete. At query time, take the whole thing the
user has entered, and issue a query, with whatever fq's you want too, but use
the "field" type query parser (NOT "dismax" or "lucene", because we don't want
the query parser to pre-tokenize on whitespace, but not "raw" because we DO
want to go through the query-time field analyzers), restricted to this
autocomplete field you've created. One way to do this is: << q={!field
f=my_autocomplete_field}the user's query >> (url-encoded, naturally).
That's pretty much it, I think that should work, depending on the requirements
of 'work'. Although I haven't tried it yet.
Now, if you want the user's query to auto-complete match in the middle of your
terms, things get a lot more complicated. Ie, if you want "walk" to
auto-complete to "dog walking" too. This won't do that. Also, if you want
some kind of stemming to happen in auto-complete, this won't do that either.
And also, if you want to auto-complete not the entire phrase the user has typed
in, but each white-space-seperated word as they type it, this won't do THAT
either. Trying to get all those things to work becomes even more complicated
-- especially with the requirement that you want to be able to apply the 'fq's
from your current search context to the auto-complete. I haven't entirely
thought through a possible way to do all that.
But hopefully this gives you some clues to think about it.
Jonathan
________________________________________
From: David Yang [[email protected]]
Sent: Friday, September 10, 2010 11:14 AM
To: [email protected]
Subject: Autocomplete with Filter Query
Hi,
Is there any way to provide autocomplete while filtering results?
Suppose I had a bunch of people and each person has multiple
occupations. When I select 'Assistant' in a filter box, it would be nice
if autocomplete only provides assistant names, instead of all names. The
other issue is that I use DisMax to do my search (name, title, phone
number etc) - so it might be more complex to do autocomplete. I could
have a copy field to copy all dismax terms into one big field.
Cheers,
David