Toke, the search query will contain 4-5 words on an average (excluding the
stopwords).

Mike, I don't care about the result count. Excluding the terms at the client
side may be a good idea. Is there any way to alter scoring such that the
docs containing only the searched-for terms are shown first? Can I use term
frequency to do such kind of thing?

--
Thanks
Varun Gupta

On Wed, Oct 27, 2010 at 7:13 PM, Mike Sokolov <soko...@ifactory.com> wrote:

> Yes I missed that requirement (as Steven also pointed out in a private
> e-mail).  I now agree that the combinatorics are required.
>
> Another possibility to consider (if the queries are large, which actually
> seems unlikely) is to use the default behavior where all terms are optional,
> sort by relevance, and truncate the result list on the client side after
> some unwanted term is found.  I *think* the scoring should find only docs
> with the searched-for terms first, although if there are a lot of repeated
> terms maybe not? Also result counts will be screwy.
>
> -Mike
>
>
> On 10/27/2010 09:34 AM, Toke Eskildsen wrote:
>
>> That does not work either as it requires that all the terms in the query
>> are present in the document. The original poster did not state this
>> requirement. On the contrary, his examples were mostly single-word
>> matches, implying an OR-search at the core.
>>
>> The query-explosion still seems like the only working idea. Maybe Varun
>> could comment on the maximum numbers of terms that his queries will
>> contain?
>>
>> Regards,
>> Toke Eskildsen
>>
>> On Wed, 2010-10-27 at 15:02 +0200, Mike Sokolov wrote:
>>
>>
>>> Right - my point was to combine this with the previous approaches to
>>> form a query like:
>>>
>>> samsung AND android AND GPS AND word_count:3
>>>
>>> in order to exclude documents containing additional words. This would
>>> avoid the combinatoric explosion problem otehrs had alluded to earlier.
>>> Of course this would fail because android is "mis-" spelled :)
>>>
>>> -Mike
>>>
>>> On 10/27/2010 08:45 AM, Steven A Rowe wrote:
>>>
>>>
>>>> I'm pretty sure the word-count strategy won't work.
>>>>
>>>>
>>>>
>>>>
>>>>> If I search with the text "samsung andriod GPS", search results
>>>>> should only conain "samsung", "GPS", "andriod" and "samsung andriod".
>>>>>
>>>>>
>>>>>
>>>> Using the word-count strategy, a document containing "samsung andriod
>>>> PDQ" would be a hit, but Varun doesn't want it, because it contains a word
>>>> that is not in the query.
>>>>
>>>> Steve
>>>>
>>>>
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: Michael Sokolov [mailto:soko...@ifactory.com]
>>>>> Sent: Wednesday, October 27, 2010 7:44 AM
>>>>> To: solr-user@lucene.apache.org
>>>>> Subject: RE: How do I this in Solr?
>>>>>
>>>>> You might try adding a field containing the word count and making sure
>>>>> that
>>>>> matches the query's word count?
>>>>>
>>>>> This would require you to tokenize the query and document yourself,
>>>>> perhaps.
>>>>>
>>>>> -Mike
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Varun Gupta [mailto:varun.vgu...@gmail.com]
>>>>>> Sent: Tuesday, October 26, 2010 11:26 PM
>>>>>> To: solr-user@lucene.apache.org
>>>>>> Subject: Re: How do I this in Solr?
>>>>>>
>>>>>> Thanks everybody for the inputs.
>>>>>>
>>>>>> Looks like Steven's solution is the closest one but will lead
>>>>>> to performance issues when the query string has many terms.
>>>>>>
>>>>>> I will try to implement the two filters suggested by Steven
>>>>>> and see how the performance matches up.
>>>>>>
>>>>>> --
>>>>>> Thanks
>>>>>> Varun Gupta
>>>>>>
>>>>>>
>>>>>> On Wed, Oct 27, 2010 at 8:04 AM, scott chu (???)
>>>>>> <scott....@udngroup.com>wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> I think you have to write a "yet exact match" handler
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> yourself (I mean
>>>>>>
>>>>>>
>>>>>>
>>>>>>> yet cause it's not quite exact match we normally know).
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> Steve's answer
>>>>>>
>>>>>>
>>>>>>
>>>>>>> is quite near your request. You can do further work based
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> on his solution.
>>>>>>
>>>>>>
>>>>>>
>>>>>>> At the last step, I'll suggest you eat up all blank within query
>>>>>>> string and query result, respevtively&   only returns those results
>>>>>>> that has equal string length as the query string's.
>>>>>>>
>>>>>>> For example, giving:
>>>>>>> *query string = "Samsung with GPS"
>>>>>>> *query results:
>>>>>>> resutl 1 = "Samsung has lots of mobile with GPS"
>>>>>>> result 2 = "with GPS Samsng"
>>>>>>> result 3 = "GPS mobile with vendors, such as Sony, Samsung"
>>>>>>>
>>>>>>> they become:
>>>>>>> *query result = "SamsungwithGPS" (length =14) *query results:
>>>>>>> resutl 1 = "SamsunghaslotsofmobilewithGPS" (length =29) result 2 =
>>>>>>> "withGPSSamsng" (length =14) result 3 =
>>>>>>> "GPSmobilewithvendors,suchasSony,Samsung" (length =43)
>>>>>>>
>>>>>>> so result 2 matches your request.
>>>>>>>
>>>>>>> In this way, you can avoid case-sensitive,
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> word-order-rearrange load
>>>>>>
>>>>>>
>>>>>>
>>>>>>> of works. Furthermore, you can do refined work, such as
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> remove white
>>>>>>
>>>>>>
>>>>>>
>>>>>>> characters, etc.
>>>>>>>
>>>>>>> Scott @ Taiwan
>>>>>>>
>>>>>>>
>>>>>>> ----- Original Message ----- From: "Varun Gupta"
>>>>>>> <varun.vgu...@gmail.com>
>>>>>>>
>>>>>>> To:<solr-user@lucene.apache.org>
>>>>>>> Sent: Tuesday, October 26, 2010 9:07 PM
>>>>>>>
>>>>>>> Subject: How do I this in Solr?
>>>>>>>
>>>>>>>
>>>>>>>   Hi,
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> I have lot of small documents (each containing 1 to 15
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> words) indexed
>>>>>>
>>>>>>
>>>>>>
>>>>>>> in Solr. For the search query, I want the search results
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> to contain
>>>>>>
>>>>>>
>>>>>>
>>>>>>> only those documents that satisfy this criteria "All of
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> the words of
>>>>>>
>>>>>>
>>>>>>
>>>>>>> the search result document are present in the search query"
>>>>>>>>
>>>>>>>> For example:
>>>>>>>> If I have the following documents indexed: "nokia n95", "GPS",
>>>>>>>> "android", "samsung", "samsung andriod", "nokia andriod",
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> "mobile with GPS"
>>>>>>
>>>>>>
>>>>>>
>>>>>>> If I search with the text "samsung andriod GPS", search results
>>>>>>>> should only conain "samsung", "GPS", "andriod" and
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> "samsung andriod".
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Is there a way to do this in Solr.
>>>>>>>>
>>>>>>>> --
>>>>>>>> Thanks
>>>>>>>> Varun Gupta
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> ----------------------------------------------------------------------
>>>>>>
>>>>>>
>>>>>>
>>>>>>> ----------
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> %<&b6G$J0T.'$$'d(l/f,r!C
>>>>>>> Checked by AVG - www.avg.com
>>>>>>> Version: 9.0.862 / Virus Database: 271.1.1/3220 - Release Date:
>>>>>>> 10/26/10 14:34:00
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>

Reply via email to