Thanks David - I suppose it is an AWS question and thank you for the pointers. 

As a further input to the MLT question - it does seem that 3.6 behavior is 
different from 4.2 - the issue seems to be more in terms of the raw query that 
is generated. 
I will some more research and revert back with details. 

David Parks <davidpark...@yahoo.com> wrote:

>Isn't this an AWS security groups question? You should probably post this 
>question on the AWS forums, but for the moment, here's the basic reading 
>material - go set up your EC2 security groups and lock down your systems.
>
>       
> http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-network-security.html
>
>If you just want to password protect Solr here are the instructions:
>
>       http://wiki.apache.org/solr/SolrSecurity
>
>But I most certainly would not leave it open to the world even with a password 
>(note that the basic password authentication sends passwords in clear text if 
>you're not using HTTPS, best lock the thing down behind a firewall).
>
>Dave
>
>
>-----Original Message-----
>From: DC tech [mailto:dctech1...@gmail.com] 
>Sent: Tuesday, April 02, 2013 1:02 PM
>To: solr-user@lucene.apache.org
>Subject: Re: MoreLikeThis - Odd results - what am I doing wrong?
>
>OK - so I have my SOLR instance running on AWS. 
>Any suggestions on how to safely share the link?  Right now, the whole SOLR 
>instance is totally open. 
>
>
>
>Gagandeep singh <gagan.g...@gmail.com> wrote:
>
>>say &debugQuery=true&mlt=true and see the scores for the MLT query, not 
>>a sample query. You can use Amazon ec2 to bring up your solr, you 
>>should be able to get a micro instance for free trial.
>>
>>
>>On Mon, Apr 1, 2013 at 5:10 AM, dc tech <dctech1...@gmail.com> wrote:
>>
>>> I did try the raw query against the *simi* field and those seem to 
>>> return results in the order expected.
>>> For instance, Acura MDX has  ( large, SUV, 4WD   Luxury) in the simi field.
>>> Running a query with those words against the simi field returns the 
>>> expected models (X5, Audi Q5, etc) and then the subsequent documents 
>>> have decreasing relevance. So the basic query mechanism seems to be fine.
>>>
>>> The issue just seems to be with MoreLikeThis component and handler.
>>> I can post the index on a public SOLR instance - any suggestions? (or 
>>> for
>>> hosting)
>>>
>>>
>>> On Sun, Mar 31, 2013 at 1:54 PM, Gagandeep singh 
>>> <gagan.g...@gmail.com
>>> >wrote:
>>>
>>> > If you can bring up your solr setup on a public machine then im 
>>> > sure a
>>> lot
>>> > of debugging can be done. Without that, i think what you should 
>>> > look at
>>> is
>>> > the tf-idf scores of the terms like "camry" etc. Usually idf is the 
>>> > deciding factor into which results show at the top (tf should be 1 
>>> > for
>>> your
>>> > data).
>>> > Enable &debugQuery=true and look at explain section to see show 
>>> > score is getting calculated.
>>> >
>>> > You should try giving different boosts to class, type, drive, size 
>>> > to control the results.
>>> >
>>> >
>>> > On Sun, Mar 31, 2013 at 8:52 PM, dc tech <dctech1...@gmail.com> wrote:
>>> >
>>> >> I am running some experiments on more like this and the results 
>>> >> seem rather odd - I am doing something wrong but just cannot figure out 
>>> >> what.
>>> >> Basically, the similarity results are decent - but not great.
>>> >>
>>> >> *Issue 1  = Quality*
>>> >> Toyota Camry : finds Altima (good) but then next one is Camry 
>>> >> Hybrid whereas it should have found Accord.
>>> >> I have normalized the data into a simi field which has only the 
>>> >> attributes that I care about.
>>> >> Without the simi field, I could not get mlt.qf boosts to work well
>>> enough
>>> >> to return results
>>> >>
>>> >> *Issue 2*
>>> >> Some fields do not work at all. For instance, text+simi (in 
>>> >> mlt.fl)
>>> works
>>> >> whereas just simi does not.
>>> >> So some weirdness that am just not understanding.
>>> >>
>>> >> Would be grateful for your guidance !
>>> >>
>>> >>
>>> >> Here is the setup:
>>> >> *1. SOLR Version*
>>> >> solr-spec 4.2.0.2013.03.06.22.32.13
>>> >> solr-impl 4.2.0 1453694   rmuir - 2013-03-06 22:32:13
>>> >> lucene-spec 4.2.0
>>> >> lucene-impl 4.2.0 1453694 -  rmuir - 2013-03-06 22:25:29
>>> >>
>>> >> *2. Machine Information*
>>> >> Sun Microsystems Inc. Java HotSpot(TM) 64-Bit Server VM (1.6.0_23
>>> >> 19.0-b09)
>>> >> Windows 7 Home 64 Bit with 4 GB RAM
>>> >>
>>> >> *3. Sample Data *
>>> >> I created this 'dummy' data of cars  - the idea being that these 
>>> >> would
>>> be
>>> >> sufficient and simple to generate similarity and understand how it 
>>> >> would work.
>>> >> There are 181 rows in the data set (I have attached it for 
>>> >> reference in CSV format)
>>> >>
>>> >> [image: Inline image 1]
>>> >>
>>> >> *4. SCHEMA*
>>> >> *Field Definitions*
>>> >>    <field name="id" type="string" indexed="true" stored="true"
>>> >> termVectors="true" multiValued="false"/>
>>> >>    <field name="make" type="string" indexed="true" stored="true"
>>> >> termVectors="true" multiValued="false"/>
>>> >>    <field name="model" type="string" indexed="true" stored="true"
>>> >> termVectors="true" multiValued="false"/>
>>> >>    <field name="class" type="string" indexed="true" stored="true"
>>> >> termVectors="true" multiValued="false"/>
>>> >>    <field name="type" type="string" indexed="true" stored="true"
>>> >> termVectors="true" multiValued="false"/>
>>> >>    <field name="drive" type="string" indexed="true" stored="true"
>>> >> termVectors="true" multiValued="false"/>
>>> >>    <field name="comment" type="text_general" indexed="true"
>>> stored="true"
>>> >> termVectors="true" multiValued="true"/>
>>> >>    <field name="size" type="string" indexed="true" stored="true"
>>> >> termVectors="true" multiValued="false"/>
>>> >> *
>>> >> *
>>> >> *Copy Fields*
>>> >> <copyField   source="make"     dest="make_en"   />  <!-- Search  -->
>>> >> <copyField   source="model"     dest="model_en"   />  <!-- Search  -->
>>> >> <copyField   source="class"     dest="class_en"   />  <!-- Search  -->
>>> >> <copyField   source="type"     dest="type_en"   />  <!-- Search  -->
>>> >> <copyField   source="drive"     dest="drive_en"   />  <!-- Search  -->
>>> >> <copyField   source="comment"     dest="comment_en"   />  <!-- Search
>>>  -->
>>> >> <copyField   source="size"     dest="size_en"   />  <!-- Search  -->
>>> >> <copyField   source="id"     dest="text"   />  <!-- Glob  -->
>>> >> <copyField   source="make"     dest="text"   />  <!-- Glob  -->
>>> >> <copyField   source="model"     dest="text"   />  <!-- Glob  -->
>>> >> <copyField   source="class"     dest="text"   />  <!-- Glob  -->
>>> >> <copyField   source="type"     dest="text"   />  <!-- Glob  -->
>>> >> <copyField   source="drive"     dest="text"   />  <!-- Glob  -->
>>> >> <copyField   source="comment"     dest="text"   />  <!-- Glob  -->
>>> >> <copyField   source="size"     dest="text"   />  <!-- Glob  -->
>>> >> <copyField   source="size"     dest="text"   />  <!-- Glob  -->
>>> >> *<copyField   source="class"     dest="simi_en"   />  <!-- similarity
>>> >>  -->*
>>> >> *<copyField   source="type"     dest="simi_en"   />  <!-- similarity
>>>  -->
>>> >> *
>>> >> *<copyField   source="drive"     dest="simi_en"   />  <!-- similarity
>>> >>  -->*
>>> >> *<copyField   source="size"     dest="simi_en"   />  <!-- similarity
>>>  -->
>>> >> *
>>> >>
>>> >> Note that the "simi" field ends up with values like  make, class, 
>>> >> size and drive:
>>> >> - Luxury SUV 4WD Large
>>> >> - Standard Sedan Front Familt
>>> >>
>>> >>
>>> >> *5. MLT Setup*
>>> >> a. mlt.FL  = *text* QF=*text*  Works but results are obviously not 
>>> >> good (make is not a good similarity indicator)
>>> >>
>>> >>
>>> http://localhost:8983/solr/cars/select/?q=id:2&mlt=true&fl=text&mlt.f
>>> l=text&mlt.qf=text
>>> >>
>>> >> b. mlt.FL  = *simi* QF=*simi*  Does not work at all (0 results)
>>> >>
>>> >>
>>> http://localhost:8983/solr/cars/select/?q=id:2&mlt=true&fl=text&mlt.f
>>> l=simi&mlt.qf=simi
>>> >>
>>> >> c.  mlt.FL  = *simi,text * QF=*simi^10 text^.1*   Works with decent
>>> >> results in most cases
>>> >>
>>> >>
>>> http://localhost:8983/solr/cars/select/?q=id:2&mlt=true&fl=text&mlt.f
>>> l=simi,text&mlt.qf=simi
>>> >> ^10%20text^.01
>>> >> Works for getting similarity for Acura MDX (Luxury SUV 4WD Large) 
>>> >> But for Toyota Camry - it finds hybrid family cars (Prius) ahead 
>>> >> of
>>> Honda.
>>> >>
>>> >>
>>> >> *
>>> >> *
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >
>>>
>

Reply via email to