Talking about performances you should take a look to the difference in performance between :
- disjunction of K sorted arrays ( n*k*log(k)) in Lucene - where *k* are the disjunction clauses and *n* the average posting list size (just learned today from an expert lucene committer)) - conjunction of K sorted arrays - not 100 % sure about the complexity, i should check concretely the algorithm, but i suggest there is no difference, or not so much difference ( I would be glad someone here to show the resources, or knowledge) . basically when dealing with union or intersection of sorted arrays, the algorithm that solve the two problems are quite comparable in term of performances. I would say that the performance difference is irrelevant but i would like someone to contradict me . Cheers 2015-07-15 17:34 GMT+01:00 Steven White <swhite4...@gmail.com>: > Hi Erick, > > I understand there are variables that will impact ranking. However, if I > leave my edismax setting as is and simply switch from AND to OR as the > default Boolean, now if a user types "apples oranges" (without quotes) will > the ranking be the same as when I had AND? Will the performance be the > same as when I had AND as the default? > > Thanks > > Steve > > On Wed, Jul 15, 2015 at 12:26 PM, Erick Erickson <erickerick...@gmail.com> > wrote: > > > This is really an apples/oranges comparison. They're essentially > different > > queries, and scores aren't comparable across different queries. > > > > If you're asking "if doc 1 and doc 2 are returned by defaulting to AND or > > OR, > > are they in the same position relative to each other?" then I'm pretty > > sure the > > answer is "you can't count on it". You'll match on different fields > > depending on > > what the default is, and with boosting you just don't know. > > > > Best, > > Erick > > > > On Wed, Jul 15, 2015 at 9:14 AM, Steven White <swhite4...@gmail.com> > > wrote: > > > By the way, using OR as the default, other than returning more results > as > > > more words are entered, the ranking and performance of the search > remains > > > the same right? > > > > > > Steve > > > > > > On Wed, Jul 15, 2015 at 12:12 PM, Steven White <swhite4...@gmail.com> > > wrote: > > > > > >> Thank you all. Looks like OR is a better choice vs. AND. > > >> > > >> Charles: I don't understand what you mean by the "spellcheck > component". > > >> Do you mean OR works best with spell checker? > > >> > > >> Steve > > >> > > >> On Wed, Jul 15, 2015 at 11:07 AM, Reitzel, Charles < > > >> charles.reit...@tiaa-cref.org> wrote: > > >> > > >>> A common approach to this problem is to include the spellcheck > > component > > >>> and, if there are corrections, include a "Did you mean ..." link in > the > > >>> results page. > > >>> > > >>> -----Original Message----- > > >>> From: Walter Underwood [mailto:wun...@wunderwood.org] > > >>> Sent: Wednesday, July 15, 2015 10:36 AM > > >>> To: solr-user@lucene.apache.org > > >>> Subject: Re: Which default Boolean operator to set, AND or OR? > > >>> > > >>> The AND default has one big problem. If the user misspells a single > > word, > > >>> they get no results. About 10% of queries are misspelled, so that > > means a > > >>> lot more failures. > > >>> > > >>> wunder > > >>> Walter Underwood > > >>> wun...@wunderwood.org > > >>> http://observer.wunderwood.org/ (my blog) > > >>> > > >>> > > >>> On Jul 15, 2015, at 7:21 AM, Jack Krupansky < > jack.krupan...@gmail.com> > > >>> wrote: > > >>> > > >>> > It is simply precision (AND) vs. recall (OR) - the former tries to > > >>> > limit the total result count, while the latter tries to focus on > > >>> > relevancy of the top results even if the total result count is > > higher. > > >>> > > > >>> > Recall is good for discovery and browsing, where you sort of know > > what > > >>> > you generally want, but not exactly with any great precision. > > >>> > > > >>> > Recall will include results that almost meet the query terms, but > > >>> > maybe some are missing. > > >>> > > > >>> > Precision will guarantee and insist that all query terms are > present. > > >>> > > > >>> > One great example for recall is a plagiarism query - enter all the > > >>> > terms for a passage and then find documents that most closely > > >>> > approximate the passage without being necessarily exact matches. > IOW, > > >>> > the plagiarizer changes a word here and there. > > >>> > > > >>> > -- Jack Krupansky > > >>> > > > >>> > On Wed, Jul 15, 2015 at 8:16 AM, Steven White < > swhite4...@gmail.com> > > >>> wrote: > > >>> > > > >>> >> Hi Everyone, > > >>> >> > > >>> >> Out-of-the box, Solr (Lucene?) is set to use OR as the default > > >>> >> Boolean operator. Can someone tell me the advantages / > > disadvantages > > >>> >> of using OR or AND as the default? > > >>> >> > > >>> >> I'm leaning toward AND as the default because the more words a > user > > >>> >> types, the narrower the result set should be. > > >>> >> > > >>> >> Thanks > > >>> >> > > >>> >> Steve > > >>> >> > > >>> > > >>> > > >>> > > ************************************************************************* > > >>> This e-mail may contain confidential or privileged information. > > >>> If you are not the intended recipient, please notify the sender > > >>> immediately and then delete it. > > >>> > > >>> TIAA-CREF > > >>> > > ************************************************************************* > > >>> > > >>> > > >> > > > -- -------------------------- Benedetti Alessandro Visiting card - http://about.me/alessandro_benedetti Blog - http://alexbenedetti.blogspot.co.uk "Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry?" William Blake - Songs of Experience -1794 England