I believe that very many installations of solr actually need a query expansion such as the one you describe below with an indexing of each textual fields in multiple forms (string, straight (whitespace/ideaograms), stemmed, phonetic).
Thanks to edismax, I think, you would do the following expansion: - 2.0 for string match (same field only, complete value) - 1.8 for straight match phrase (same field only, using a slop) - 1.5 for straight match in bag of words - 1.3 for stemmed match in bag of words - 1.1 for phonetic match in bag of words I think you can do that with edismax: sure about the parameter distribution, just not sure about the pf usage, this might need two straight fields which is quite cheap. As others indicated having intelligence to recognize the terms (e.g. Kate should be in name) or some user indication to do so can make thing more precise but is rarely done. Please note that this is just a suggestion. In particular, parameters really need some testing and adjustment, I think. Paul > Erick Erickson <mailto:erickerick...@gmail.com> > 1 novembre 2015 07:40 > Yeah, that's actually a tough one. You have no control over what the > user types, > you have to try to guess what they meant. > > To do that right, you really have to have some meta-data besides what > the user > typed in, i.e. recognize "kate" and "winslet" are proper names and > "movies" is > something else and break up the query appropriately behind the scenes. > > edismax might help here. You could copyField for everything into a > bag_of_words field then boost the name field quite high relative to the > bag_of_words field. That way, and _assuming_ that the bag_of_words > field had all three words, then the user at least gets something. > > You can also do some tricks with edismax and the "pf" parameters. That > option automatically takes the input and makes a phrase out of it against > the field, so you get better scores for, say, the name field if it > contains > the phrase "kate winslet". doesn't help with the kate winslet movies > though. > > On Sat, Oct 31, 2015 at 11:11 PM, Daniel Valdivia > Daniel Valdivia <mailto:h...@danielvaldivia.com> > 1 novembre 2015 07:11 > Perhaps > > q=name:("Kate AND Winslet") > > q=name:("Kate Winslet") > > Sent from my iPhone > > Yangrui Guo <mailto:guoyang...@gmail.com> > 1 novembre 2015 06:21 > Thanks for the reply. Putting the name: before the terms did the work. I > just wanted to generalize the search query because users might be > interested in querying Kate Winslet herself or her movies. If user enter > query string "Kate Winslet movie", the query q=name:(Kate AND Winslet AND > movie) will return nothing. > > Yangrui Guo > > On Saturday, October 31, 2015, Erick Erickson <erickerick...@gmail.com> > > Erick Erickson <mailto:erickerick...@gmail.com> > 1 novembre 2015 05:27 > There are a couple of anomalies here. > > 1> kate AND winslet > What does the query look like if you add &debug=true to the statement > and look at the "parsed_query" section of the return? My guess is you > typed "q=name:kate AND winslet" which parses as "q=name:kate AND > default_search_field:winslet" and are getting matches you don't > expect. You need something like "q=name:(kate AND winslet)" or > "q=name:kate AND name:winslet". Note that if you're using eDIsmax it's > more complicated, but that should still honor the intent. > > 2> I have no idea why searching for "Kate Winslet" in quotes returns > anything, I wouldn't expect it to unless you mean you type in "q=kate > winslet" which is searching against your default field, not the name > field. > > Best, > Erick > Yangrui Guo <mailto:guoyang...@gmail.com> > 1 novembre 2015 04:52 > Hi today I found an interesting aspect of solr. I imported IMDB data into > solr. The IMDB puts last name before first name for its person's name > field > eg. "Winslet, Kate". When I search "Winslet Kate" with quotation marks I > could get the exact result. However if I search "Kate Winslet" or Kate AND > Winslet solr seem to return me all result containing either Kate or > Winslet > which is similar to "Winslet Kate"~999999. From user perspective I > certainly want solr to treat Kate Winslet the same as Winslet Kate. Is > there anyway to make solr score higher for terms in the same field? > > Yangrui >