On Thu, May 29, 2008 at 9:44 PM, Tim Christensen <[EMAIL PROTECTED]> wrote: > Yonik, > > Thank you for the response. You are correct, regular (non-accessory) > products are boosted '2.0' at index time. However both items the non ipod > item and the ipod would have received the initial boost on the same fields > since they are both non-accessory items. > > Is your comment still relevant in that context?
Yes. There's a bug somewhere that ended up boosting that document or field much more than normal. First thing is to determine if it's in your indexing code, or in Solr. Is there a way for you to verify the exact data you sent to Solr for that document (the exact XML, if that is what you are sending?) -Yonik > Tim > > On May 29, 2008, at 7:30 PM, Yonik Seeley wrote: > >> field norms of un-boosted fields are normally less than 1 (it's a >> factor that weights larger fields less). >> The index-time boost is also multiplied into this factor though. >> Given that your first doc had a huge norm, it looks like the document >> or field was boosted at index time? >> >> -Yonik >> >> On Thu, May 29, 2008 at 9:22 PM, Tim Christensen <[EMAIL PROTECTED]> wrote: >>> >>> Hi, >>> >>> This is my first post. I have been working with Lucene for about 4 weeks >>> and >>> Solr for just about 10 days. We are going to convert our site search over >>> to >>> Solr as soon as we figure out some of the nuances. >>> >>> As I was testing out the synonyms features to decide how we could best >>> use >>> it, I searched for iPod (I know it is an example, but we actually sell >>> them). I was shocked when the search results were nothing close to an >>> iPod. >>> >>> Looking closer, I could see that the description had an iPod word in it, >>> just 1. With debug on, that fact is confirmed (this is the first result): >>> <str name="id=502999430,internal_docid=6247"> >>> 152529.23 = (MATCH) fieldWeight(search_text:ipod in 6247), product of: >>> 1.0 = tf(termFreq(search_text:ipod)=1) >>> 3.7238584 = idf(docFreq=522) >>> 40960.0 = fieldNorm(field=search_text, doc=6247) >>> </str> >>> Here is an explainOther, FOR an actual iPod SKU (in the same search): >>> <str name="otherQuery">id:650085488</str> >>> <lst name="explainOther"> >>> <str name="id=650085488,internal_docid=6985"> >>> 1.0473351 = (MATCH) fieldWeight(search_text:ipod in 6985), product of: >>> 3.0 = tf(termFreq(search_text:ipod)=9) >>> 3.7238584 = idf(docFreq=522) >>> 0.09375 = fieldNorm(field=search_text, doc=6985) >>> </str> >>> If the term frequency is higher, the only difference is'fieldNorm' which >>> I >>> do not understand in the context of relevancy. Does this have to do with >>> omitNorms in some way? >>> In a related factor, I also tried the dismax query with the following >>> line >>> in it: >>> <str name="qf">search_text^0.5 brand^10.0 keywords^5.0 title^20.0 >>> sub_title^1.5 model^2.0 attribute^1.1</str> >>> As an experiment I boosted the title a bunch, since this is where the >>> term >>> iPod exists the most. It made no effect, in fact, it was not even >>> working. >>> The title was not being used at all, just the search_text, even though I >>> have it indexed. >>> Here is the relevant schema parts >>> <field name="id" type="string" indexed="true" stored="true" >>> required="true" /> >>> <field name="brand" type="string" indexed="true" stored="true" /> >>> <field name="model" type="string" indexed="true" stored="true" /> >>> <field name="manufacturer_model" type="string" indexed="true" >>> stored="true" /> >>> <field name="keywords" type="string" indexed="true" stored="false" /> >>> <field name="title" type="string" indexed="true" stored="true" /> >>> <field name="sub_title" type="string" indexed="true" stored="true" /> >>> <field name="attribute" type="string" indexed="true" stored="true" >>> multiValued="true" /> >>> <field name="type" type="string" indexed="true" stored="true" /> >>> <field name="description_category" type="string" indexed="true" >>> stored="true" /> >>> <field name="description" type="string" indexed="true" stored="true" /> >>> <field name="brand_id" type="string" indexed="false" stored="true" /> >>> <field name="code" type="string" indexed="false" stored="true" /> >>> <field name="color" type="string" indexed="true" stored="true" /> >>> <field name="description_category_id" type="string" indexed="false" >>> stored="true" /> >>> <field name="display_price" type="sfloat" indexed="false" stored="true" >>> /> >>> <field name="line_item_price" type="sfloat" indexed="true" stored="true" >>> /> >>> <field name="main_category" type="string" indexed="true" stored="true" >>> /> >>> <field name="main_category_id" type="string" indexed="false" >>> stored="true" >>> /> >>> <field name="regular_price" type="sfloat" indexed="false" stored="true" >>> /> >>> <field name="sku" type="string" indexed="true" stored="true" /> >>> <field name="type_id" type="string" indexed="false" stored="true" /> >>> <field name="upc" type="string" indexed="true" stored="true" /> >>> <field name="size" type="string" indexed="true" stored="true" /> >>> <field name="search_text" type="text" indexed="true" stored="false" >>> multiValued="true" termVectors="true"/> >>> >>> <defaultSearchField>search_text</defaultSearchField> >>> >>> <copyField source="brand" dest="search_text"/> >>> <copyField source="model" dest="search_text"/> >>> <copyField source="manufacturer_model" dest="search_text"/> >>> <copyField source="keywords" dest="search_text"/> >>> <copyField source="title" dest="search_text"/> >>> <copyField source="sub_title" dest="search_text"/> >>> <copyField source="attribute" dest="search_text"/> >>> <copyField source="description_category" dest="search_text"/> >>> <copyField source="type" dest="search_text"/> >>> <copyField source="description" dest="search_text"/> >>> <copyField source="main_category" dest="search_text"/> >>> <copyField source="sku" dest="search_text"/> >>> <copyField source="upc" dest="search_text"/> >>> Thanks to all who are willing to take a look at this and help. >>> >>> ---------------------------------------------------- >>> Tim Christensen >>> Director Media & Technology >>> Vann's Inc. >>> 406-203-4656 >>> >>> [EMAIL PROTECTED] >>> >>> http://www.vanns.com >>> >>> >>> >>> >>> >>> >>> >>> > > > ---------------------------------------------------- > Tim Christensen > Director Media & Technology > Vann's Inc. > 406-203-4656 > > [EMAIL PROTECTED] > > http://www.vanns.com > > > > > > > >