Re: Problems with WordDelimiterFilterFactory

Christian Zambrano Thu, 08 Oct 2009 15:46:29 -0700

Bern,

The only way that could be happening is if you are not using the fieldtype you described on your original e-mail. The TokenFilterWordDelimiterFilterFactory should take care of the hyphen.


On 10/08/2009 05:30 PM, Bernadette Houghton wrote:

Thanks for this Patrick. If I remove one of the hyphens, solr doesn't throw up 
the error, but still doesn't find the right record. I see from marklo's 
analysis page that solr is still parsing it with a hyphen. Changing this part 
of our schema.xml -

         <filter class="solr.PatternReplaceFilterFactory"
                 pattern="([^a-z])" replacement="" replace="all"
         />

To

         <filter class="solr.PatternReplaceFilterFactory"
                 pattern="([^a-z])" replacement=" " replace="all"
         />

i.e. replacing non-alpha chars with a space, looks like it may handle that 
aspect.

Regards
Bern

-----Original Message-----
From: Patrick Jungermann [mailto:[email protected]]
Sent: Friday, 9 October 2009 9:03 AM
To: [email protected]
Subject: Re: Problems with WordDelimiterFilterFactory

Hi Bern,

the problem is the character sequence "--". A query is not allowed to
have minus characters that consequent upon another one. Remove one minus
character and the query will be parsed without problems.

Because of this parsing problem, I'd recommend a query cleanup before
the submit to the Solr server that replaces each sequence of minus
characters by a single one.


Regards, Patrick



Bernadette Houghton schrieb:

Sorry, the last line was truncated -

HTTP Status 400 - org.apache.lucene.queryParser.ParseException: Cannot parse '(Asia -- Civilization AND status_i:(2)) ': Encountered "-" at line 1, 
column 7. Was expecting one of: "(" ... "*" ...<QUOTED>  ...<TERM>  ...<PREFIXTERM>  ...<WILDTERM>  ... 
"[" ... "{" ...<NUMBER>  ...

-----Original Message-----
From: Bernadette Houghton [mailto:[email protected]]
Sent: Friday, 9 October 2009 8:22 AM
To: '[email protected]'
Subject: RE: Problems with WordDelimiterFilterFactory

Here's the query and the error -

Oct 09 08:20:17  [debug] [196] Solr query string:    (Asia -- Civilization AND 
status_i:(2))
Oct 09 08:20:17  [debug] [196] Solr sort by:  score desc
Oct 09 08:20:17  [error] Error on searching: "400" Status: 
org.apache.lucene.queryParser.ParseException: Cannot parse '   (Asia -- Civilization AND 
status_i:(2)) ': Encount

Bern

-----Original Message-----
From: Christian Zambrano [mailto:[email protected]]
Sent: Thursday, 8 October 2009 12:48 PM
To: [email protected]
Cc: [email protected]
Subject: Re: Problems with WordDelimiterFilterFactory

Bern,

I am interested on the solr query. In other words, the query that your
system sends to solr.

Thanks,


Christian

On Oct 7, 2009, at 5:56 PM, Bernadette 
Houghton<[email protected]
  >  wrote:

Hi Christian, try this one - http://www.deakin.edu.au/dro/view/DU:30000601

Either scroll down and click one of the "television broadcasting --
asia" links, or type it in the Quick Search box.


TIA

bern

-----Original Message-----
From: Christian Zambrano [mailto:[email protected]]
Sent: Thursday, 8 October 2009 9:43 AM
To: [email protected]
Subject: Re: Problems with WordDelimiterFilterFactory

Could you please provide the exact URL of a query where you are
experiencing this problem?
eg(Not URL encoded): q=fieldName:"hot and cold: temperatures"

On 10/07/2009 05:32 PM, Bernadette Houghton wrote:

We are having some issues with our solr parent application not
retrieving records as expected.

For example, if the input query includes a colon (e.g. hot and
cold: temperatures), the relevant record (which contains a colon in
the same place) does not get retrieved; if the input query does not
include the colon, all is fine.  Ditto if the user searches for a
query containing hyphens, e.g. "asia - civilization, although with
the qualifier that something like "asia-civilization" (no spaces
either side of the hyphen) works fine, whereas "asia -
civilization" (spaces either side of hyphen) doesn't work.

Our schema.xml contains the following -

     <fieldType name="text" class="solr.TextField"
positionIncrementGap="100">
       <analyzer type="index">
         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
         <!-- in this example, we will only use synonyms at query time
         <filter class="solr.SynonymFilterFactory"
synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
         -->
                                 <filter
class="solr.ISOLatin1AccentFilterFactory"/>
         <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
         <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0"/>
         <filter class="solr.LowerCaseFilterFactory"/>
         <filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>
         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
       </analyzer>
       <analyzer type="query">
         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                                 <filter
class="solr.ISOLatin1AccentFilterFactory"/>
         <filter class="solr.SynonymFilterFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
         <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
         <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0"/>
         <filter class="solr.LowerCaseFilterFactory"/>
         <filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>
         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
       </analyzer>
     </fieldType>

Bernadette Houghton, Library Business Applications Developer
Deakin University Geelong Victoria 3217 Australia.
Phone: 03 5227 8230 International: +61 3 5227 8230
Fax: 03 5227 8000 International: +61 3 5227 8000
MSN: [email protected]
Email: 
[email protected]<mailto:[email protected]
Website: http://www.deakin.edu.au
<http://www.deakin.edu.au/>Deakin University CRICOS Provider Code
00113B (Vic)

Important Notice: The contents of this email are intended solely
for the named addressee and are confidential; any unauthorised use,
reproduction or storage of the contents is expressly prohibited. If
you have received this email in error, please delete it and any
attachments immediately and advise the sender by return email or
telephone.
Deakin University does not warrant that this email and any
attachments are error or virus free

Re: Problems with WordDelimiterFilterFactory

Reply via email to