Re: [MASSMAIL]Re: High fieldNorm values causing really odd results

Chris Hostetter Thu, 14 May 2015 16:13:39 -0700

: Sorry for leaving the Solr version out in my previous email, I'm using 
: Solr 4.10.3 running on Centos7, with the following JRE: Oracle 
: Corporation OpenJDK 64-Bit Server VM (1.7.0_75 24.75-b04)


I can't reproduce Using Solr 4.10.3 (or 4.10.4 - mistread your email the 
first time)

Are you certain you didn't *build* this index with a different Similarity 
configured? or did you perhaps build it with an older version of Solr that 
might have had a bug in it?

Here's what i tried...

applied this patch to the example configs based on the fieldType you 
specified...

hossman@tray:~/lucene/lucene_solr_4_10_3_tag$ svn diff
Index: solr/example/solr/collection1/conf/schema.xml
===================================================================
--- solr/example/solr/collection1/conf/schema.xml       (revision 1679472)
+++ solr/example/solr/collection1/conf/schema.xml       (working copy)
@@ -46,6 +46,21 @@
 -->
 
 <schema name="example" version="1.5">
+
+        <fieldType name="hoss_type" class="solr.TextField" 
sortMissingLast="true">
+            <analyzer>
+                <charFilter class="solr.HTMLStripCharFilterFactory"/>
+                <tokenizer class="solr.StandardTokenizerFactory"/>
+                <filter class="solr.ASCIIFoldingFilterFactory"/>
+                <filter class="solr.StopFilterFactory"
+                    ignoreCase="true" words="stopwords.txt"/>
+                <filter class="solr.LowerCaseFilterFactory"/>
+                <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
+            </analyzer>
+        </fieldType>
+
+        <field name="hoss_test" type="hoss_type" stored="true" indexed="true" 
multiValued="true"/>
+  
   <!-- attribute "name" is the name of this schema and is only used for 
display purposes.
        version="x.y" is Solr's version number for the schema syntax and 
        semantics.  It should not normally be changed by applications.

...started up "java -jar start.jar" and then wrote & ran this script to 
generate a doc with the number of unique terms in my field that you mentioned & 
indexed it...

hossman@tray:~/tmp$ cat make-big-field.pl
#/usr/bin/perl

print qq{<add><doc><field name="id">hoss</field><field 
name="hoss_test">\n};
for (1..119669) {
    print "term${_} ";
}
print qq{</field></doc></add>\n};
hossman@tray:~/tmp$ perl make-big-field.pl > tmp.xml
hossman@tray:~/tmp$ curl -X POST -H 'Content-Type: application/xml' 
--data-binary @tmp.xml 
"http://localhost:8983/solr/collection1/update?commit=true";
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">0</int><int 
name="QTime">877</int></lst>
</response>


Then confirmed i got a very small fieldNorm when querying against this 
field...

hossman@tray:~/tmp$ curl 
'http://localhost:8983/solr/collection1/select?q=hoss_test:term1&debug=results&wt=json&indent=true&fl=id&omitHeader=true'
{
  "response":{"numFound":1,"start":0,"docs":[
      {
        "id":"hoss"}]
  },
  "debug":{
    "explain":{
      "hoss":"\n7.491524E-4 = (MATCH) weight(hoss_test:term1 in 0) 
[DefaultSimilarity], result of:\n  7.491524E-4 = fieldWeight in 0, product 
of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 = termFreq=1.0\n    
0.30685282 = idf(docFreq=1, maxDocs=1)\n    0.0024414062 = 
fieldNorm(doc=0)\n"}}}


-Hoss
http://www.lucidworks.com/

Re: [MASSMAIL]Re: High fieldNorm values causing really odd results

Reply via email to