Toke, the search query will contain 4-5 words on an average (excluding the stopwords).
Mike, I don't care about the result count. Excluding the terms at the client side may be a good idea. Is there any way to alter scoring such that the docs containing only the searched-for terms are shown first? Can I use term frequency to do such kind of thing? -- Thanks Varun Gupta On Wed, Oct 27, 2010 at 7:13 PM, Mike Sokolov <soko...@ifactory.com> wrote: > Yes I missed that requirement (as Steven also pointed out in a private > e-mail). I now agree that the combinatorics are required. > > Another possibility to consider (if the queries are large, which actually > seems unlikely) is to use the default behavior where all terms are optional, > sort by relevance, and truncate the result list on the client side after > some unwanted term is found. I *think* the scoring should find only docs > with the searched-for terms first, although if there are a lot of repeated > terms maybe not? Also result counts will be screwy. > > -Mike > > > On 10/27/2010 09:34 AM, Toke Eskildsen wrote: > >> That does not work either as it requires that all the terms in the query >> are present in the document. The original poster did not state this >> requirement. On the contrary, his examples were mostly single-word >> matches, implying an OR-search at the core. >> >> The query-explosion still seems like the only working idea. Maybe Varun >> could comment on the maximum numbers of terms that his queries will >> contain? >> >> Regards, >> Toke Eskildsen >> >> On Wed, 2010-10-27 at 15:02 +0200, Mike Sokolov wrote: >> >> >>> Right - my point was to combine this with the previous approaches to >>> form a query like: >>> >>> samsung AND android AND GPS AND word_count:3 >>> >>> in order to exclude documents containing additional words. This would >>> avoid the combinatoric explosion problem otehrs had alluded to earlier. >>> Of course this would fail because android is "mis-" spelled :) >>> >>> -Mike >>> >>> On 10/27/2010 08:45 AM, Steven A Rowe wrote: >>> >>> >>>> I'm pretty sure the word-count strategy won't work. >>>> >>>> >>>> >>>> >>>>> If I search with the text "samsung andriod GPS", search results >>>>> should only conain "samsung", "GPS", "andriod" and "samsung andriod". >>>>> >>>>> >>>>> >>>> Using the word-count strategy, a document containing "samsung andriod >>>> PDQ" would be a hit, but Varun doesn't want it, because it contains a word >>>> that is not in the query. >>>> >>>> Steve >>>> >>>> >>>> >>>> >>>>> -----Original Message----- >>>>> From: Michael Sokolov [mailto:soko...@ifactory.com] >>>>> Sent: Wednesday, October 27, 2010 7:44 AM >>>>> To: solr-user@lucene.apache.org >>>>> Subject: RE: How do I this in Solr? >>>>> >>>>> You might try adding a field containing the word count and making sure >>>>> that >>>>> matches the query's word count? >>>>> >>>>> This would require you to tokenize the query and document yourself, >>>>> perhaps. >>>>> >>>>> -Mike >>>>> >>>>> >>>>> >>>>> >>>>>> -----Original Message----- >>>>>> From: Varun Gupta [mailto:varun.vgu...@gmail.com] >>>>>> Sent: Tuesday, October 26, 2010 11:26 PM >>>>>> To: solr-user@lucene.apache.org >>>>>> Subject: Re: How do I this in Solr? >>>>>> >>>>>> Thanks everybody for the inputs. >>>>>> >>>>>> Looks like Steven's solution is the closest one but will lead >>>>>> to performance issues when the query string has many terms. >>>>>> >>>>>> I will try to implement the two filters suggested by Steven >>>>>> and see how the performance matches up. >>>>>> >>>>>> -- >>>>>> Thanks >>>>>> Varun Gupta >>>>>> >>>>>> >>>>>> On Wed, Oct 27, 2010 at 8:04 AM, scott chu (???) >>>>>> <scott....@udngroup.com>wrote: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> I think you have to write a "yet exact match" handler >>>>>>> >>>>>>> >>>>>>> >>>>>> yourself (I mean >>>>>> >>>>>> >>>>>> >>>>>>> yet cause it's not quite exact match we normally know). >>>>>>> >>>>>>> >>>>>>> >>>>>> Steve's answer >>>>>> >>>>>> >>>>>> >>>>>>> is quite near your request. You can do further work based >>>>>>> >>>>>>> >>>>>>> >>>>>> on his solution. >>>>>> >>>>>> >>>>>> >>>>>>> At the last step, I'll suggest you eat up all blank within query >>>>>>> string and query result, respevtively& only returns those results >>>>>>> that has equal string length as the query string's. >>>>>>> >>>>>>> For example, giving: >>>>>>> *query string = "Samsung with GPS" >>>>>>> *query results: >>>>>>> resutl 1 = "Samsung has lots of mobile with GPS" >>>>>>> result 2 = "with GPS Samsng" >>>>>>> result 3 = "GPS mobile with vendors, such as Sony, Samsung" >>>>>>> >>>>>>> they become: >>>>>>> *query result = "SamsungwithGPS" (length =14) *query results: >>>>>>> resutl 1 = "SamsunghaslotsofmobilewithGPS" (length =29) result 2 = >>>>>>> "withGPSSamsng" (length =14) result 3 = >>>>>>> "GPSmobilewithvendors,suchasSony,Samsung" (length =43) >>>>>>> >>>>>>> so result 2 matches your request. >>>>>>> >>>>>>> In this way, you can avoid case-sensitive, >>>>>>> >>>>>>> >>>>>>> >>>>>> word-order-rearrange load >>>>>> >>>>>> >>>>>> >>>>>>> of works. Furthermore, you can do refined work, such as >>>>>>> >>>>>>> >>>>>>> >>>>>> remove white >>>>>> >>>>>> >>>>>> >>>>>>> characters, etc. >>>>>>> >>>>>>> Scott @ Taiwan >>>>>>> >>>>>>> >>>>>>> ----- Original Message ----- From: "Varun Gupta" >>>>>>> <varun.vgu...@gmail.com> >>>>>>> >>>>>>> To:<solr-user@lucene.apache.org> >>>>>>> Sent: Tuesday, October 26, 2010 9:07 PM >>>>>>> >>>>>>> Subject: How do I this in Solr? >>>>>>> >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> >>>>>>> >>>>>>>> I have lot of small documents (each containing 1 to 15 >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> words) indexed >>>>>> >>>>>> >>>>>> >>>>>>> in Solr. For the search query, I want the search results >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> to contain >>>>>> >>>>>> >>>>>> >>>>>>> only those documents that satisfy this criteria "All of >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> the words of >>>>>> >>>>>> >>>>>> >>>>>>> the search result document are present in the search query" >>>>>>>> >>>>>>>> For example: >>>>>>>> If I have the following documents indexed: "nokia n95", "GPS", >>>>>>>> "android", "samsung", "samsung andriod", "nokia andriod", >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> "mobile with GPS" >>>>>> >>>>>> >>>>>> >>>>>>> If I search with the text "samsung andriod GPS", search results >>>>>>>> should only conain "samsung", "GPS", "andriod" and >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> "samsung andriod". >>>>>> >>>>>> >>>>>> >>>>>>> Is there a way to do this in Solr. >>>>>>>> >>>>>>>> -- >>>>>>>> Thanks >>>>>>>> Varun Gupta >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> ---------------------------------------------------------------------- >>>>>> >>>>>> >>>>>> >>>>>>> ---------- >>>>>>> >>>>>>> >>>>>>> >>>>>>> %<&b6G$J0T.'$$'d(l/f,r!C >>>>>>> Checked by AVG - www.avg.com >>>>>>> Version: 9.0.862 / Virus Database: 271.1.1/3220 - Release Date: >>>>>>> 10/26/10 14:34:00 >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>> >> >> >