Re: Search differences between solr 1.4.0 and 3.6.1

Jack Krupansky Wed, 28 Nov 2012 14:32:37 -0800

One change was to change the default for autoGeneratePhraseQueries from trueto false. That means that now RoC would match Ro OR C rather than "Ro C"(phrase).

Simply add autoGeneratePhraseQueries=true to your field type - no need tore-index.


-- Jack Krupansky

-----Original Message-----From: Frederico Azeiteiro

Sent: Wednesday, November 28, 2012 12:31 PM
To: [email protected]
Subject: RE: Search differences between solr 1.4.0 and 3.6.1

Also, i'm having issues with searching "RoC" . It returns thousands ofmatches on 3.6.1 against just a few on solr 1.4.0.

Looking to analysis I see no differences...

Should I add "RoC" to protected keywords or can I tweak something on schemato achieve exact "RoC" matches?



-----Mensagem original-----
De: Frederico Azeiteiro [mailto:[email protected]]
Enviada: quarta-feira, 28 de Novembro de 2012 17:19
Para: [email protected]
Assunto: RE: Search differences between solr 1.4.0 and 3.6.1

Ok, I'll test that and let you know.

Is there some test I can easily do to confirm that is was really aside-effect of the bug?


____________________________________________
Frederico Azeiteiro
Developer



-----Mensagem original-----
De: Jack Krupansky [mailto:[email protected]]
Enviada: quarta-feira, 28 de Novembro de 2012 13:39
Para: [email protected]
Assunto: Re: Search differences between solr 1.4.0 and 3.6.1

You need to add the generateNumberParts=1 attribute - assuming you actuallywant the number generated.

The fact that your schema worked in 1.4 was probably simply a side effect ofthis bug:

https://issues.apache.org/jira/browse/SOLR-1706
"wrong tokens output from WordDelimiterFilter depending upon options"

-- Jack Krupansky

-----Original Message-----
From: Frederico Azeiteiro
Sent: Monday, November 26, 2012 9:06 AM
To: [email protected]
Subject: Search differences between solr 1.4.0 and 3.6.1

Hi,

While updating our SOLR to 3.6.1 I noticed some results differences whenusing search strings with letters+number.


For a text field defined as:

<analyzer type="index">
<http://cbrsrvmtr04:8983/solr/WISE/admin/file/?file=schema.xml>

<tokenizer class="solr.WhitespaceTokenizerFactory"/>

<charFilter class="solr.MappingCharFilterFactory"
mapping="mapping-ISOLatin1Accent.txt"/>

<filter class="solr.WordDelimiterFilterFactory"
protected="protwords.txt" splitOnCaseChange="1" catenateAll="0"
catenateNumbers="1" catenateWords="1" generateNumberParts="0"
generateWordParts="1" stemEnglishPossessive="0"/>

</analyzer>

<analyzer type="query">
<http://cbrsrvmtr04:8983/solr/WISE/admin/file/?file=schema.xml>

<tokenizer class="solr.WhitespaceTokenizerFactory"/>

<filter class="solr.SynonymFilterFactory" ignoreCase="true"
expand="true" synonyms="synonyms.txt"/>

<filter class="solr.WordDelimiterFilterFactory"
protected="protwords.txt" splitOnCaseChange="1" catenateAll="0"
catenateNumbers="0" catenateWords="0" generateNumberParts="0"
generateWordParts="1"/>

</analyzer>

Searching for string GAMES12 returns a lot of results on 3.6.1 that are notreturned on 1.4.0.

It looks like WordDelimiterFilterFactory is acting different for 3.6.1, thenumeric part of the keyword is being ignored and the search is performedusing only GAMES.




Analisys returns for 1.4.0:

org.apache.solr.analysis.WordDelimiterFilterFactory

{protected=protwords.txt, splitOnCaseChange=1, generateNumberParts=0,catenateWords=0, generateWordParts=1, catenateAll=0, catenateNumbers=0}


term position

1

2

term text

GAMES

12

term type

word

word

source start,end

0,5

5,7

payload





AND for 3.6.1



org.apache.solr.analysis.WordDelimiterFilterFactory

{protected=protwords.txt, splitOnCaseChange=1, generateNumberParts=0,catenateWords=0, luceneMatchVersion=LUCENE_36, generateWordParts=1,catenateAll=0, catenateNumbers=0}


position

1

term text

GAMES

startOffset

0

endOffset

5

type

word

positionLength

1





Is this something that can be modified/fixed to return the same results?



Thank you.



Regards,

Frederico

Re: Search differences between solr 1.4.0 and 3.6.1

Reply via email to