Thanks David - I suppose it is an AWS question and thank you for the pointers.
As a further input to the MLT question - it does seem that 3.6 behavior is different from 4.2 - the issue seems to be more in terms of the raw query that is generated. I will some more research and revert back with details. David Parks <davidpark...@yahoo.com> wrote: >Isn't this an AWS security groups question? You should probably post this >question on the AWS forums, but for the moment, here's the basic reading >material - go set up your EC2 security groups and lock down your systems. > > > http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-network-security.html > >If you just want to password protect Solr here are the instructions: > > http://wiki.apache.org/solr/SolrSecurity > >But I most certainly would not leave it open to the world even with a password >(note that the basic password authentication sends passwords in clear text if >you're not using HTTPS, best lock the thing down behind a firewall). > >Dave > > >-----Original Message----- >From: DC tech [mailto:dctech1...@gmail.com] >Sent: Tuesday, April 02, 2013 1:02 PM >To: solr-user@lucene.apache.org >Subject: Re: MoreLikeThis - Odd results - what am I doing wrong? > >OK - so I have my SOLR instance running on AWS. >Any suggestions on how to safely share the link? Right now, the whole SOLR >instance is totally open. > > > >Gagandeep singh <gagan.g...@gmail.com> wrote: > >>say &debugQuery=true&mlt=true and see the scores for the MLT query, not >>a sample query. You can use Amazon ec2 to bring up your solr, you >>should be able to get a micro instance for free trial. >> >> >>On Mon, Apr 1, 2013 at 5:10 AM, dc tech <dctech1...@gmail.com> wrote: >> >>> I did try the raw query against the *simi* field and those seem to >>> return results in the order expected. >>> For instance, Acura MDX has ( large, SUV, 4WD Luxury) in the simi field. >>> Running a query with those words against the simi field returns the >>> expected models (X5, Audi Q5, etc) and then the subsequent documents >>> have decreasing relevance. So the basic query mechanism seems to be fine. >>> >>> The issue just seems to be with MoreLikeThis component and handler. >>> I can post the index on a public SOLR instance - any suggestions? (or >>> for >>> hosting) >>> >>> >>> On Sun, Mar 31, 2013 at 1:54 PM, Gagandeep singh >>> <gagan.g...@gmail.com >>> >wrote: >>> >>> > If you can bring up your solr setup on a public machine then im >>> > sure a >>> lot >>> > of debugging can be done. Without that, i think what you should >>> > look at >>> is >>> > the tf-idf scores of the terms like "camry" etc. Usually idf is the >>> > deciding factor into which results show at the top (tf should be 1 >>> > for >>> your >>> > data). >>> > Enable &debugQuery=true and look at explain section to see show >>> > score is getting calculated. >>> > >>> > You should try giving different boosts to class, type, drive, size >>> > to control the results. >>> > >>> > >>> > On Sun, Mar 31, 2013 at 8:52 PM, dc tech <dctech1...@gmail.com> wrote: >>> > >>> >> I am running some experiments on more like this and the results >>> >> seem rather odd - I am doing something wrong but just cannot figure out >>> >> what. >>> >> Basically, the similarity results are decent - but not great. >>> >> >>> >> *Issue 1 = Quality* >>> >> Toyota Camry : finds Altima (good) but then next one is Camry >>> >> Hybrid whereas it should have found Accord. >>> >> I have normalized the data into a simi field which has only the >>> >> attributes that I care about. >>> >> Without the simi field, I could not get mlt.qf boosts to work well >>> enough >>> >> to return results >>> >> >>> >> *Issue 2* >>> >> Some fields do not work at all. For instance, text+simi (in >>> >> mlt.fl) >>> works >>> >> whereas just simi does not. >>> >> So some weirdness that am just not understanding. >>> >> >>> >> Would be grateful for your guidance ! >>> >> >>> >> >>> >> Here is the setup: >>> >> *1. SOLR Version* >>> >> solr-spec 4.2.0.2013.03.06.22.32.13 >>> >> solr-impl 4.2.0 1453694 rmuir - 2013-03-06 22:32:13 >>> >> lucene-spec 4.2.0 >>> >> lucene-impl 4.2.0 1453694 - rmuir - 2013-03-06 22:25:29 >>> >> >>> >> *2. Machine Information* >>> >> Sun Microsystems Inc. Java HotSpot(TM) 64-Bit Server VM (1.6.0_23 >>> >> 19.0-b09) >>> >> Windows 7 Home 64 Bit with 4 GB RAM >>> >> >>> >> *3. Sample Data * >>> >> I created this 'dummy' data of cars - the idea being that these >>> >> would >>> be >>> >> sufficient and simple to generate similarity and understand how it >>> >> would work. >>> >> There are 181 rows in the data set (I have attached it for >>> >> reference in CSV format) >>> >> >>> >> [image: Inline image 1] >>> >> >>> >> *4. SCHEMA* >>> >> *Field Definitions* >>> >> <field name="id" type="string" indexed="true" stored="true" >>> >> termVectors="true" multiValued="false"/> >>> >> <field name="make" type="string" indexed="true" stored="true" >>> >> termVectors="true" multiValued="false"/> >>> >> <field name="model" type="string" indexed="true" stored="true" >>> >> termVectors="true" multiValued="false"/> >>> >> <field name="class" type="string" indexed="true" stored="true" >>> >> termVectors="true" multiValued="false"/> >>> >> <field name="type" type="string" indexed="true" stored="true" >>> >> termVectors="true" multiValued="false"/> >>> >> <field name="drive" type="string" indexed="true" stored="true" >>> >> termVectors="true" multiValued="false"/> >>> >> <field name="comment" type="text_general" indexed="true" >>> stored="true" >>> >> termVectors="true" multiValued="true"/> >>> >> <field name="size" type="string" indexed="true" stored="true" >>> >> termVectors="true" multiValued="false"/> >>> >> * >>> >> * >>> >> *Copy Fields* >>> >> <copyField source="make" dest="make_en" /> <!-- Search --> >>> >> <copyField source="model" dest="model_en" /> <!-- Search --> >>> >> <copyField source="class" dest="class_en" /> <!-- Search --> >>> >> <copyField source="type" dest="type_en" /> <!-- Search --> >>> >> <copyField source="drive" dest="drive_en" /> <!-- Search --> >>> >> <copyField source="comment" dest="comment_en" /> <!-- Search >>> --> >>> >> <copyField source="size" dest="size_en" /> <!-- Search --> >>> >> <copyField source="id" dest="text" /> <!-- Glob --> >>> >> <copyField source="make" dest="text" /> <!-- Glob --> >>> >> <copyField source="model" dest="text" /> <!-- Glob --> >>> >> <copyField source="class" dest="text" /> <!-- Glob --> >>> >> <copyField source="type" dest="text" /> <!-- Glob --> >>> >> <copyField source="drive" dest="text" /> <!-- Glob --> >>> >> <copyField source="comment" dest="text" /> <!-- Glob --> >>> >> <copyField source="size" dest="text" /> <!-- Glob --> >>> >> <copyField source="size" dest="text" /> <!-- Glob --> >>> >> *<copyField source="class" dest="simi_en" /> <!-- similarity >>> >> -->* >>> >> *<copyField source="type" dest="simi_en" /> <!-- similarity >>> --> >>> >> * >>> >> *<copyField source="drive" dest="simi_en" /> <!-- similarity >>> >> -->* >>> >> *<copyField source="size" dest="simi_en" /> <!-- similarity >>> --> >>> >> * >>> >> >>> >> Note that the "simi" field ends up with values like make, class, >>> >> size and drive: >>> >> - Luxury SUV 4WD Large >>> >> - Standard Sedan Front Familt >>> >> >>> >> >>> >> *5. MLT Setup* >>> >> a. mlt.FL = *text* QF=*text* Works but results are obviously not >>> >> good (make is not a good similarity indicator) >>> >> >>> >> >>> http://localhost:8983/solr/cars/select/?q=id:2&mlt=true&fl=text&mlt.f >>> l=text&mlt.qf=text >>> >> >>> >> b. mlt.FL = *simi* QF=*simi* Does not work at all (0 results) >>> >> >>> >> >>> http://localhost:8983/solr/cars/select/?q=id:2&mlt=true&fl=text&mlt.f >>> l=simi&mlt.qf=simi >>> >> >>> >> c. mlt.FL = *simi,text * QF=*simi^10 text^.1* Works with decent >>> >> results in most cases >>> >> >>> >> >>> http://localhost:8983/solr/cars/select/?q=id:2&mlt=true&fl=text&mlt.f >>> l=simi,text&mlt.qf=simi >>> >> ^10%20text^.01 >>> >> Works for getting similarity for Acura MDX (Luxury SUV 4WD Large) >>> >> But for Toyota Camry - it finds hybrid family cars (Prius) ahead >>> >> of >>> Honda. >>> >> >>> >> >>> >> * >>> >> * >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> > >>> >