Re: Autocomplete with Filter Query

Peter Karich Fri, 10 Sep 2010 13:23:06 -0700

Hi there,

I don't know if my idea is perfect but it seems to work ok in my
twitter-search prototype:
http://www.jetwick.com
(keep in mind it is a vhost and only one fat index, no sharding, etc...
so performance isn't perfect ;-))


That said, type in 'so' and you will get 'soldier', 'solar', ... but
this is context sensitive:
e.g. search 'timetabling' (shameless self propaganda). Then again type
'so' in the query box you'll get (a) different suggestion(s) 'solr'.
The same context dependency stuff (should ;-)) work for the filters on
the left side.

How it works?

I am indexing into a special 'tag' field only relevant terms from the
tweet (removing a lot noise word, removing whitespaces and strange chars).
then I am doing a faceted search with a tag.facet.prefix=<query> (or so)
and I get context sensitive suggestions, because I can use fq as well.
Now if the query contains at least two terms I am splitting the query:
the last term goes to the facet prefix parameter and the first term(s)
go to q

e.g. 'michael ja'=> q=michael&tag.facet.prefix=ja and I will get back
'jackie','jackson'.

Regards,
Peter.


PS: jetwick even has a google instant alike featue: when you are
selecting the suggestions it will update the results ...
google instant is too disruptive in my opinion (results moving up and
down because of different number of suggestions),
but I am working on a less disruptive solution which doesn't hide the
first results

> Cool idea! 
> I was suggesting a copy field because I want to provide autocomplete on
> any field that the dismax can search on - eg if dismax searches both
> name and phone, then when they start typing name or phone, I want it to
> give autocompletion there 
>
> So to get your idea clear are you suggesting a field like this:
>
> <field name="AutoComplete" multivalued="true" type="myngramsplitter"/>
> <copyField source="name" dest="Autocomplete"/>
> <copyField source="phone" dest="Autocomplete"/>
>
> And searching like this: 
> solr/core/select?q=Autocomplete:(dog wal)&fq=UserSelectedFilter
>
> On a related note: how do you deal with no exact ngram match, but some
> relevant ngrams? E.g. user types "dog wam" and it finds no ngrams with
> "dog wam" but there are ngrams for "dog wal" (for dog walking) - this is
> probably not too relevant though since mostly prefix suggestion should
> be enough.
>
> -----Original Message-----
> From: Jonathan Rochkind [mailto:rochk...@jhu.edu] 
> Sent: Friday, September 10, 2010 11:41 AM
> To: solr-user@lucene.apache.org
> Subject: RE: Autocomplete with Filter Query
>
> I've been thinking about this too, and haven't come up with any GREAT
> way. But there are several possible ways, that will do different things,
> good or bad, depending on the nature of your data and exactly what you
> want to do.  So here are some ideas I've been thinking about, but not a
> ready made solution for you. 
>
> One thing first, the statement about "copy field to copy all dismax
> terms into one big field." doesn't exactly make sense. Copyfield is
> something that happens at index time, whereas dismax is only something
> that is used at query time.  Since it's only used at query time, just
> because you are using dismax for your main search, doesn't mean you have
> to use dismax for your autocomplete query.   The autocomplete query,
> that returns the things you're going to display in your auto-complete
> list, can be set up however you want.  (we are talking about an
> auto-complete list, not a "Google Instant" style autocomplete, right?
> The latter would introduce even more issues). 
>
> So, do you want the autocomplete to only match on the _entire query_ as
> entered, or do you want an autocomplete for each word?  For instance, if
> I enter "dog walking", should the autocomplete be autocompleting "dog
> walking" as a whole, or should it be autocompleting "walking" by the
> time I've typed in "dog walking"?  It's easier to set up to autocomplete
> on the whole phrase. 
>
> Next, though, you probably want autocomplete to complete on partial
> words, not just complete words. "Dog wal" should autocomplete to "dog
> walking". That introduces an extra kink too. But let's assume we want
> that. 
>
> So one idea. At index time, populate a field that will be used
> exclusively for auto-completing. Make this field actually
> _non-tokenizing_, probably a Text type but with the KeywordTokenizer
> (ie, the non-tokenizing tokenizer, heh).   So if you're indexing "dog
> walking", then the token in the field is actually "dog walking", not
> ["dog","walking"].   Next, normalize it by removing punctuation (because
> we probably don't want to consider punctuation for auto-completing), and
> maybe normalizing whitespace by collapsing any adjacent whitespace to a
> single space, and removing whitespace at beginning and end. So "   dog
> walking   " will index as "dog walking". (This matters more at query
> time then index time, but less confusing to do the same normalization at
> both points).  That can be done with a charpatternfilter.  
>
> But now we've also got to n-gram expand it.  So if the term being
> indexed is "dog walking", we actually want to store ALL these terms in
> the index:
> "d"
> "do"
> "dog"
> "dog "
> "dog w"
> "dog wa"
> etc
>
> Ie, n-grams, but only expanded out from the front.  I believe you can
> use the EdgeNGramFilterFactory for this (at index time only, this one
> you don't want in your query-time analyzers).  Although I haven't
> actually tried the EdgeNGramFilterFactory with a non-tokenized field, I
> think it should work. This will expand the size of your index, hopefully
> not to a problematic degree. 
>
> Now, to actually do the auto-complete. At query time, take the whole
> thing the user has entered, and issue a query, with whatever fq's you
> want too, but use the "field" type query parser (NOT "dismax" or
> "lucene", because we don't want the query parser to pre-tokenize on
> whitespace, but not "raw" because we DO want to go through the
> query-time field analyzers), restricted to this autocomplete field
> you've created. One way to do this is:  << q={!field
> f=my_autocomplete_field}the user's query >> (url-encoded, naturally). 
>
> That's pretty much it, I think that should work, depending on the
> requirements of 'work'.  Although I haven't tried it yet. 
>
> Now, if you want the user's query to auto-complete match in the middle
> of your terms, things get a lot more complicated. Ie, if you want "walk"
> to auto-complete to "dog walking" too.  This won't do that.  Also, if
> you want some kind of stemming to happen in auto-complete, this won't do
> that either. And also, if you want to auto-complete not the entire
> phrase the user has typed in, but each white-space-seperated word as
> they type it, this won't do THAT either.  Trying to get all those things
> to work becomes even more complicated -- especially with the requirement
> that you want to be able to apply the 'fq's from your current search
> context to the auto-complete.  I haven't entirely thought through a
> possible way to do all that. 
>
> But hopefully this gives you some clues to think about it. 
>
> Jonathan
> ________________________________________
> From: David Yang [dy...@nextjump.com]
> Sent: Friday, September 10, 2010 11:14 AM
> To: solr-user@lucene.apache.org
> Subject: Autocomplete with Filter Query
>
> Hi,
>
>
>
> Is there any way to provide autocomplete while filtering results?
> Suppose I had a bunch of people and each person has multiple
> occupations. When I select 'Assistant' in a filter box, it would be nice
> if autocomplete only provides assistant names, instead of all names. The
> other issue is that I use DisMax to do my search (name, title, phone
> number etc) - so it might be more complex to do autocomplete. I could
> have a copy field to copy all dismax terms into one big field.
>
>
>
> Cheers,
>
>
>
> David
>

Re: Autocomplete with Filter Query

Reply via email to