RE: Autocomplete with Filter Query

Jonathan Rochkind Fri, 10 Sep 2010 08:41:28 -0700

I've been thinking about this too, and haven't come up with any GREAT way. But 
there are several possible ways, that will do different things, good or bad, 
depending on the nature of your data and exactly what you want to do.  So here 
are some ideas I've been thinking about, but not a ready made solution for you.


One thing first, the statement about "copy field to copy all dismax terms into 
one big field." doesn't exactly make sense. Copyfield is something that happens 
at index time, whereas dismax is only something that is used at query time.  
Since it's only used at query time, just because you are using dismax for your 
main search, doesn't mean you have to use dismax for your autocomplete query.   
The autocomplete query, that returns the things you're going to display in your 
auto-complete list, can be set up however you want.  (we are talking about an 
auto-complete list, not a "Google Instant" style autocomplete, right?  The 
latter would introduce even more issues). 

So, do you want the autocomplete to only match on the _entire query_ as 
entered, or do you want an autocomplete for each word?  For instance, if I 
enter "dog walking", should the autocomplete be autocompleting "dog walking" as 
a whole, or should it be autocompleting "walking" by the time I've typed in 
"dog walking"?  It's easier to set up to autocomplete on the whole phrase. 

Next, though, you probably want autocomplete to complete on partial words, not 
just complete words. "Dog wal" should autocomplete to "dog walking". That 
introduces an extra kink too. But let's assume we want that. 

So one idea. At index time, populate a field that will be used exclusively for 
auto-completing. Make this field actually _non-tokenizing_, probably a Text 
type but with the KeywordTokenizer (ie, the non-tokenizing tokenizer, heh).   
So if you're indexing "dog walking", then the token in the field is actually 
"dog walking", not ["dog","walking"].   Next, normalize it by removing 
punctuation (because we probably don't want to consider punctuation for 
auto-completing), and maybe normalizing whitespace by collapsing any adjacent 
whitespace to a single space, and removing whitespace at beginning and end. So 
"   dog     walking   " will index as "dog walking". (This matters more at 
query time then index time, but less confusing to do the same normalization at 
both points).  That can be done with a charpatternfilter.  

But now we've also got to n-gram expand it.  So if the term being indexed is 
"dog walking", we actually want to store ALL these terms in the index:
"d"
"do"
"dog"
"dog "
"dog w"
"dog wa"
etc

Ie, n-grams, but only expanded out from the front.  I believe you can use the 
EdgeNGramFilterFactory for this (at index time only, this one you don't want in 
your query-time analyzers).  Although I haven't actually tried the 
EdgeNGramFilterFactory with a non-tokenized field, I think it should work. This 
will expand the size of your index, hopefully not to a problematic degree. 

Now, to actually do the auto-complete. At query time, take the whole thing the 
user has entered, and issue a query, with whatever fq's you want too, but use 
the "field" type query parser (NOT "dismax" or "lucene", because we don't want 
the query parser to pre-tokenize on whitespace, but not "raw" because we DO 
want to go through the query-time field analyzers), restricted to this 
autocomplete field you've created. One way to do this is:  << q={!field 
f=my_autocomplete_field}the user's query >> (url-encoded, naturally). 

That's pretty much it, I think that should work, depending on the requirements 
of 'work'.  Although I haven't tried it yet. 

Now, if you want the user's query to auto-complete match in the middle of your 
terms, things get a lot more complicated. Ie, if you want "walk" to 
auto-complete to "dog walking" too.  This won't do that.  Also, if you want 
some kind of stemming to happen in auto-complete, this won't do that either. 
And also, if you want to auto-complete not the entire phrase the user has typed 
in, but each white-space-seperated word as they type it, this won't do THAT 
either.  Trying to get all those things to work becomes even more complicated 
-- especially with the requirement that you want to be able to apply the 'fq's 
from your current search context to the auto-complete.  I haven't entirely 
thought through a possible way to do all that. 

But hopefully this gives you some clues to think about it. 

Jonathan
________________________________________
From: David Yang [dy...@nextjump.com]
Sent: Friday, September 10, 2010 11:14 AM
To: solr-user@lucene.apache.org
Subject: Autocomplete with Filter Query

Hi,



Is there any way to provide autocomplete while filtering results?
Suppose I had a bunch of people and each person has multiple
occupations. When I select 'Assistant' in a filter box, it would be nice
if autocomplete only provides assistant names, instead of all names. The
other issue is that I use DisMax to do my search (name, title, phone
number etc) - so it might be more complex to do autocomplete. I could
have a copy field to copy all dismax terms into one big field.



Cheers,



David

RE: Autocomplete with Filter Query

Reply via email to