Re: Don't snowball depending on terms

Erick Erickson Wed, 30 Nov 2011 06:30:57 -0800

Ahhh, I hate making a new implementation match all of the old behavior, but
sometimes ya' just got no choice.


I *swear* that there's a JIRA with an approach to creating a filter for
this situation, but I can't find it....

Best
Erick

On Wed, Nov 30, 2011 at 9:19 AM, Robert Brown <r...@intelcompute.com> wrote:
> Thanks Erick,
>
> This is a required feature since we're swapping out an existing search
> engine for Solr - users have saved searches that need to behave the
> same.
>
> I'll look into the edismax stuff, that's the handler we're using
> anyway.
>
>
>
> ---
>
> IntelCompute
> Web Design & Local Online Marketing
>
> http://www.intelcompute.com
>
> On Wed, 30 Nov 2011 09:12:11 -0500, Erick Erickson
> <erickerick...@gmail.com> wrote:
>> First, watch the syntax <G>....
>>
>> q=+(stemmed:perl^2 or stemmed:java^3) +unstemmed:"development manager"^5
>> although it is a bit confusing to see the dismax stuff where the boost
>> is put on the
>> field name, but that's not how the queries are formed.
>>
>> BTW, have you looked at edismax queries? You can distribute your terms
>> across the fields, applying whatever boost you want and have the query
>> input be pretty simple. It takes a bit to get your head around what
>> edismax does,
>> but it's worth it....
>>
>> But before you go there.... You've presented no evidence that this is
>> desirable.
>> What is the use-case here? You say "users may want"... Well, why do the work
>> unless they *do* want this capability? I'd strongly advise that you
>> just forget about
>> this feature unless and until there's a demonstrated need. Here's a
>> blog I made at
>> Lucid. Long-winded, but I'm like that sometimes....
>>
>> http://www.lucidimagination.com/blog/2011/11/03/stop-being-so-agreeable/
>>
>> Best
>> Erick
>>
>>
>> On Wed, Nov 30, 2011 at 8:50 AM, Robert Brown <r...@intelcompute.com> wrote:
>>> Boosts can be included there too can't they?
>>>
>>> so this is valid?
>>>
>>> q=+(stemmed^2:perl or stemmed^3:java) +unstemmed^5:"development
>>> manager"
>>>
>>> is it possible to have different boosts on the same field btw?
>>>
>>> We currently search across 5 fields anyway, so my queries are gonna
>>> start getting messy.  :-/
>>>
>>>
>>> ---
>>>
>>> IntelCompute
>>> Web Design & Local Online Marketing
>>>
>>> http://www.intelcompute.com
>>>
>>> On Wed, 30 Nov 2011 08:08:41 -0500, Erick Erickson
>>> <erickerick...@gmail.com> wrote:
>>>> You can't have multiple "q" clauses (as opposed to "fq" clauses).
>>>> You could form something like
>>>> q=unstemmed:perl or java&fq=stemmed:manager
>>>> or
>>>> q=+(unstemmed:perl or java) +stemmed:manager
>>>>
>>>> BTW, this fragment of the query probably doesn't do
>>>> what you expect:
>>>> unstemmed:perl or java
>>>> would be parsed as
>>>> unstemmed:perl OR default_search_field:java
>>>>
>>>> FWIW
>>>> Erick
>>>>
>>>> On Wed, Nov 30, 2011 at 7:39 AM, Rob Brown <r...@intelcompute.com> wrote:
>>>>> I guess I could do a bit of pre-processing, look for any words that are
>>>>> quoted, and search in a diff field for those
>>>>>
>>>>> How is a query like this formulated?
>>>>>
>>>>> q=unstemmed:perl or java&q=stemmed:manager
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> IntelCompute
>>>>> Web Design and Online Marketing
>>>>>
>>>>> http://www.intelcompute.com
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: Tomas Zerolo <tomas.zer...@axelspringer.de>
>>>>> Reply-to: solr-user@lucene.apache.org
>>>>> To: solr-user@lucene.apache.org
>>>>> Subject: Re: Don't snowball depending on terms
>>>>> Date: Wed, 30 Nov 2011 08:49:37 +0100
>>>>>
>>>>> On Tue, Nov 29, 2011 at 01:53:44PM -0500, François Schiettecatte wrote:
>>>>>> It won't and depending on how your analyzer is set up the terms are most 
>>>>>> likely stemmed at index time.
>>>>>>
>>>>>> You could create a separate field for unstemmed terms though, or use a 
>>>>>> less aggressive stemmer such as EnglishMinimalStemFilterFactory.
>>>>>
>>>>> This is surprising to me. Snowball introduces new homonyms, meaning it
>>>>> will lump e.g. "management" and "manage" into one index entry. Thus,
>>>>> I'd expect a handful of "false positives" (but usually not too many).
>>>>>
>>>>> That's a "lossy index" (loosely speaking) and could be fixed by
>>>>> post-filtering (instead of introducing another index, which in
>>>>> most cases would seem a waste of resurces).
>>>>>
>>>>> Is there no way in SOLR of filtering the results *after* the index
>>>>> scan? I'd be disappointed!
>>>>>
>>>>> Regards
>>>>> -- tomás
>>>>>
>>>
>

Re: Don't snowball depending on terms

Reply via email to