Re: Question about Edismax - Solr 4.0

Sandeep Mestry Fri, 17 May 2013 08:28:54 -0700

Hello Jack,

Thanks for pointing the issues out and for your valuable suggestion. My
preliminary tests were okay on search but I will be doing more testing to
see if this has impacted any other searches.


Thanks once again and have a nice sunny weekend,
Sandeep


On 17 May 2013 05:35, Jack Krupansky <j...@basetechnology.com> wrote:

> Ah... I think your issue is the preserveOriginal=1 on the query analyzer
> as well as the fact that you have all of these catenatexx="1" options on
> the query analyzer - I indicated that you should remove them all.
>
> The problem is that the whitespace analyzer leaves the leading comma in
> place, and the preserveOriginal="1" also generates an extra token for the
> term, with the comma in place . But, with the space, the comma and "10" are
> separate terms and get analyzed independently.
>
> The query results probably indicate that you don't have that exact
> combination of the term and leading punctuation - or that there is no
> standalone comma in your input data.
>
> Try the following replacement for the query-time WDF:
>
>
> <filter class="solr.**WordDelimiterFilterFactory"
> stemEnglishPossessive="0" generateWordParts="1" generateNumberParts="1"
> catenateWords="0" catenateNumbers="0" catenateAll="0"
> splitOnCaseChange="1" splitOnNumerics="0" preserveOriginal="0" />
>
>
> -- Jack Krupansky
>
> -----Original Message----- From: Sandeep Mestry
> Sent: Thursday, May 16, 2013 5:50 PM
>
> To: solr-user@lucene.apache.org
> Subject: Re: Question about Edismax - Solr 4.0
>
> Hi Jack,
>
> Thanks for your response again and for helping me out to get through this.
>
> The URL is definitely encoded for spaces and it looks like below. As I
> mentioned in my previous mail, I can't add it to query parameter as that
> searches on multiple fields.
>
> The title field is defined as below:
> <field name="title" type="text_wc" indexed="true" stored="false"
> multiValued="true"/>
>
> q=countryside&rows=20&qt=**assdismax&fq=%28title%3A%28,**
> 10%29%29&fq=collection:assets
>
> <requestHandler name="assdismax" class="solr.SearchHandler">
> <lst name="defaults">
> <str name="defType">edismax</str>
> <str name="echoParams">explicit</**str>
> <float name="tie">0.01</float>
> <str name="qf">title^10 description^5 annotations^3 notes^2
> categories</str>
> <str name="pf">title</str>
> <int name="ps">0</int>
> <str name="q.alt">*:*</str>
> <str name="fl">*,score</str>
> <str name="mm">100%</str>
> <str name="q.op">AND</str>
> <str name="sort">score desc</str>
> <str name="facet">true</str>
> <str name="facet.limit">-1</str>
> <str name="facet.mincount">1</str>
> <str name="facet.field">uniq_**subtype_id</str>
> <str name="facet.field">component_**type</str>
> <str name="facet.field">genre_type<**/str>
> </lst>
> <lst name="appends">
> <str name="fq">collection:assets</**str>
> </lst>
> </requestHandler>
>
> The term 'countryside' needs to be searched against multiple fields
> including titles, descriptions, annotations, categories, notes but the UI
> also has a feature to limit results by providing a title field.
>
>
> I can see that the filter queries are always parsed by LuceneQueryParser
> however I'd expect it to generate the parsed_filter_queries debug output in
> every situation.
>
> I have tried it as the main query with both edismax and lucene defType and
> it gives me correct output and correct results.
> But, there is some problem when this is used as a filter query as the the
> parser is not able to parse a comma with a space.
>
> Thanks again Jack, please let me know in case you need more inputs from my
> side.
>
> Best Regards,
> Sandeep
>
> On 16 May 2013 18:03, Jack Krupansky <j...@basetechnology.com> wrote:
>
>  Could you show us the full query URL - spaces must be encoded in URL query
>> parameters.
>>
>> Also show the actual field XML - you omitted that.
>>
>> Try the same query as a main query, using both defType=edismax and
>> defType=lucene.
>>
>> Note that the filter query is parsed using the Lucene query parser, not
>> edismax, independent of the defType parameter. But you don't have any
>> edismax features in your fq anyway.
>>
>> But you can stick {!edismax} in front of the query to force edismax to be
>> used for the fq, although it really shouldn't change anything:
>>
>> Also, catenate is fine for indexing, but will mess up your queries at
>> query time, so set them to "0" in the query analyzer
>>
>> Also, make sure you have autoGeneratePhraseQueries="****true" on the
>> field
>>
>> type, but that's not the issue here.
>>
>>
>> -- Jack Krupansky
>>
>> -----Original Message----- From: Sandeep Mestry
>> Sent: Thursday, May 16, 2013 12:42 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Question about Edismax - Solr 4.0
>>
>>
>> Thanks Jack for your reply..
>>
>> The problem is, I'm finding results for fq=title:(,10) but not for
>> fq=title:(, 10) - apologies if that was not clear from my first mail.
>> I have already mentioned the debug analysis in my previous mail.
>>
>> Additionally, the title field is defined as below:
>> <fieldType name="text_wc" class="solr.TextField"
>> positionIncrementGap="100"
>>
>>
>>>          <analyzer type="index">
>>>
>>                <tokenizer class="solr.****WhitespaceTokenizerFactory"/>
>>                <filter class="solr.****WordDelimiterFilterFactory"
>>
>> stemEnglishPossessive="0" generateWordParts="1" generateNumberParts="1"
>> catenateWords="1" catenateNumbers="1" catenateAll="1"
>> splitOnCaseChange="1"
>> splitOnNumerics="0" preserveOriginal="1" />
>>                <filter class="solr.****LowerCaseFilterFactory"/>
>>            </analyzer>
>>            <analyzer type="query">
>>                <tokenizer class="solr.****WhitespaceTokenizerFactory"/>
>>                <filter class="solr.****WordDelimiterFilterFactory"
>>
>> stemEnglishPossessive="0" generateWordParts="1" generateNumberParts="1"
>> catenateWords="1" catenateNumbers="1" catenateAll="1"
>> splitOnCaseChange="1"
>> splitOnNumerics="0" preserveOriginal="1" />
>>                <filter class="solr.****LowerCaseFilterFactory"/>
>>
>>            </analyzer>
>>        </fieldType>
>>
>> I have the set catenate options to 1 for all types.
>> I can understand if ',' getting ignored when it is on its own (title:(,
>> 10)) but
>> - Why solr is not searching for 10 in that case just like it did when the
>> query was (title:(,10))?
>> - And why other filter queries did not show up (collection:assets) in
>> debug
>> section?
>>
>>
>> Thanks,
>> Sandeep
>>
>>
>> On 16 May 2013 13:57, Jack Krupansky <j...@basetechnology.com> wrote:
>>
>>  You haven't indicated any problem here! What is the symptom that you
>>
>>> actually think is a problem.
>>>
>>> There is no comma operator in any of the Solr query parsers. Comma is
>>> just
>>> another character that may or may not be included or discarded depending
>>> on
>>> the specific field type and analyzer. For example, a white space analyzer
>>> will keep commas, but the standard analyzer or the word delimiter filter
>>> will discard them. If "title" were a "string" type, all punctuation would
>>> be preserved, including commas and spaces (but spaces would need to be
>>> escaped or the term text enclosed in parentheses.)
>>>
>>> Let us know what your symptom is though, first.
>>>
>>> I mean, the filter query looks perfectly reasonable from an abstract
>>> perspective.
>>>
>>> -- Jack Krupansky
>>>
>>> -----Original Message----- From: Sandeep Mestry
>>> Sent: Thursday, May 16, 2013 6:51 AM
>>> To: solr-user@lucene.apache.org
>>> Subject: Question about Edismax - Solr 4.0
>>>
>>> -- *Edismax and Filter Queries with Commas and spaces* --
>>>
>>>
>>> Dear Experts,
>>>
>>> This appears to be a bug, please suggest if I'm wrong.
>>>
>>> If I search with the following filter query,
>>>
>>> 1) fq=title:(, 10)
>>>
>>> - I get no results.
>>> - The debug output does NOT show the section containing
>>> parsed_filter_queries
>>>
>>> if I carry a search with the filter query,
>>>
>>> 2) fq=title:(,10) - (No space between , and 10)
>>>
>>> - I get results and the debug output shows the parsed filter queries
>>> section as,
>>> <arr name="filter_queries">
>>> <str>(titles:(,10))</str>
>>> <str>(collection:assets)</str>
>>>
>>> As you can see above, I'm also passing in other filter queries
>>> (collection:assets) which appear correctly but they do not appear in case
>>> 1
>>> above.
>>>
>>> I can't make this as part of the query parameter as that needs to be
>>> searched against multiple fields.
>>>
>>> Can someone suggest a fix in this case please. I'm using Solr 4.0.
>>>
>>> Many Thanks,
>>> Sandeep
>>>
>>>
>>>
>>
>

Re: Question about Edismax - Solr 4.0

Reply via email to