If you can bring up your solr setup on a public machine then im sure a lot
of debugging can be done. Without that, i think what you should look at is
the tf-idf scores of the terms like "camry" etc. Usually idf is the
deciding factor into which results show at the top (tf should be 1 for your
data).
Enable &debugQuery=true and look at explain section to see show score is
getting calculated.

You should try giving different boosts to class, type, drive, size to
control the results.


On Sun, Mar 31, 2013 at 8:52 PM, dc tech <dctech1...@gmail.com> wrote:

> I am running some experiments on more like this and the results seem
> rather odd - I am doing something wrong but just cannot figure out what.
> Basically, the similarity results are decent - but not great.
>
> *Issue 1  = Quality*
> Toyota Camry : finds Altima (good) but then next one is Camry Hybrid
> whereas it should have found Accord.
> I have normalized the data into a simi field which has only the attributes
> that I care about.
> Without the simi field, I could not get mlt.qf boosts to work well enough
> to return results
>
> *Issue 2*
> Some fields do not work at all. For instance, text+simi (in mlt.fl) works
> whereas just simi does not.
> So some weirdness that am just not understanding.
>
> Would be grateful for your guidance !
>
>
> Here is the setup:
> *1. SOLR Version*
> solr-spec 4.2.0.2013.03.06.22.32.13
> solr-impl 4.2.0 1453694   rmuir - 2013-03-06 22:32:13
> lucene-spec 4.2.0
> lucene-impl 4.2.0 1453694 -  rmuir - 2013-03-06 22:25:29
>
> *2. Machine Information*
> Sun Microsystems Inc. Java HotSpot(TM) 64-Bit Server VM (1.6.0_23 19.0-b09)
> Windows 7 Home 64 Bit with 4 GB RAM
>
> *3. Sample Data *
> I created this 'dummy' data of cars  - the idea being that these would be
> sufficient and simple to generate similarity and understand how it would
> work.
> There are 181 rows in the data set (I have attached it for reference in
> CSV format)
>
> [image: Inline image 1]
>
> *4. SCHEMA*
> *Field Definitions*
>    <field name="id" type="string" indexed="true" stored="true"
> termVectors="true" multiValued="false"/>
>    <field name="make" type="string" indexed="true" stored="true"
> termVectors="true" multiValued="false"/>
>    <field name="model" type="string" indexed="true" stored="true"
> termVectors="true" multiValued="false"/>
>    <field name="class" type="string" indexed="true" stored="true"
> termVectors="true" multiValued="false"/>
>    <field name="type" type="string" indexed="true" stored="true"
> termVectors="true" multiValued="false"/>
>    <field name="drive" type="string" indexed="true" stored="true"
> termVectors="true" multiValued="false"/>
>    <field name="comment" type="text_general" indexed="true" stored="true"
> termVectors="true" multiValued="true"/>
>    <field name="size" type="string" indexed="true" stored="true"
> termVectors="true" multiValued="false"/>
> *
> *
> *Copy Fields*
> <copyField   source="make"     dest="make_en"   />  <!-- Search  -->
> <copyField   source="model"     dest="model_en"   />  <!-- Search  -->
> <copyField   source="class"     dest="class_en"   />  <!-- Search  -->
> <copyField   source="type"     dest="type_en"   />  <!-- Search  -->
> <copyField   source="drive"     dest="drive_en"   />  <!-- Search  -->
> <copyField   source="comment"     dest="comment_en"   />  <!-- Search  -->
> <copyField   source="size"     dest="size_en"   />  <!-- Search  -->
> <copyField   source="id"     dest="text"   />  <!-- Glob  -->
> <copyField   source="make"     dest="text"   />  <!-- Glob  -->
> <copyField   source="model"     dest="text"   />  <!-- Glob  -->
> <copyField   source="class"     dest="text"   />  <!-- Glob  -->
> <copyField   source="type"     dest="text"   />  <!-- Glob  -->
> <copyField   source="drive"     dest="text"   />  <!-- Glob  -->
> <copyField   source="comment"     dest="text"   />  <!-- Glob  -->
> <copyField   source="size"     dest="text"   />  <!-- Glob  -->
> <copyField   source="size"     dest="text"   />  <!-- Glob  -->
> *<copyField   source="class"     dest="simi_en"   />  <!-- similarity  -->
> *
> *<copyField   source="type"     dest="simi_en"   />  <!-- similarity  -->*
> *<copyField   source="drive"     dest="simi_en"   />  <!-- similarity  -->
> *
> *<copyField   source="size"     dest="simi_en"   />  <!-- similarity  -->*
>
> Note that the "simi" field ends up with values like  make, class, size and
> drive:
> - Luxury SUV 4WD Large
> - Standard Sedan Front Familt
>
>
> *5. MLT Setup*
> a. mlt.FL  = *text* QF=*text*  Works but results are obviously not good
> (make is not a good similarity indicator)
>
> http://localhost:8983/solr/cars/select/?q=id:2&mlt=true&fl=text&mlt.fl=text&mlt.qf=text
>
> b. mlt.FL  = *simi* QF=*simi*  Does not work at all (0 results)
>
> http://localhost:8983/solr/cars/select/?q=id:2&mlt=true&fl=text&mlt.fl=simi&mlt.qf=simi
>
> c.  mlt.FL  = *simi,text * QF=*simi^10 text^.1*   Works with decent
> results in most cases
>
> http://localhost:8983/solr/cars/select/?q=id:2&mlt=true&fl=text&mlt.fl=simi,text&mlt.qf=simi
> ^10%20text^.01
> Works for getting similarity for Acura MDX (Luxury SUV 4WD Large)
> But for Toyota Camry - it finds hybrid family cars (Prius) ahead of Honda.
>
>
> *
> *
>
>
>
>
>
>
>
>

Reply via email to