Re: Design optimal Solr Schema

2014-12-11 Thread tomas.kalas
Thanks for help, but how wrote Alex, I used synonm filter and it is what i
want. When i wrote to synonym for example Hello, Hi. And sentence is Hello
how are you and my query is Hi how are you, so that find it too.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Design-optimal-Solr-Schema-tp4166632p4173690.html
Sent from the Solr - User mailing list archive at Nabble.com.


Alternative synonymum

2014-12-11 Thread tomas.kalas
Hello, i want to searching in between transcripts of phone conversations. And
the machine which is make transcript the conversation to text is making some
alternatives. For example If we have sentence.
Hello how are you. 

1. Segment 
Hello  
Halo
Hollow

2.Segment
How
Bow


When i want for example search Halo how are you. So i for this example use
synonym filter.

For Hello set alternatives, Halo, Hollow ...

It works, but if is at next segments the same word with other alternatives,
for example How, Know, and i give it to synonym filter too at new line, then
it now have word How all alternatives How, Know, Bow and if i search Hello
Know, that found the sentence where is not Bow between alternatives too. 

In this case found the example sentence Hello how are you. First sentence
has at word how alternative bow, but from the next alternative word is save
value know too. 

Is possible treat this case, for example by the segments, when i know at 1
segment are specific words, use to in synonym. And at the further positions
is the same but with other segment number.

Thanks, i hope so you understand me, what i think.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Alternative-synonymum-tp4173694.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Length norm not functioning in solr queries.

2014-12-11 Thread S.L
Mikhail,

Thank you for confirming this , however Ahmet's proposal seems more simpler
to implement to me .

On Wed, Dec 10, 2014 at 5:07 AM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:
>
> S.L,
>
> I briefly skimmed Lucene50NormsConsumer.writeNormsField(), my conclusion
> is: if you supply own similarity, which just avoids putting float to byte
> in Similarity.computeNorm(FieldInvertState), you get right this value in .
> Similarity.decodeNormValue(long).
> You may wonder but this is what's exactly done in PreciseDefaultSimilarity
> in TestLongNormValueSource. I think you can just use it.
>
> On Wed, Dec 10, 2014 at 12:11 PM, S.L  wrote:
>
> > Hi Ahmet,
> >
> > Is there already an implementation of the suggested work around ? Thanks.
> >
> > On Tue, Dec 9, 2014 at 6:41 AM, Ahmet Arslan 
> > wrote:
> >
> > > Hi,
> > >
> > > Default length norm is not best option for differentiating very short
> > > documents, like product names.
> > > Please see :
> > > http://find.searchhub.org/document/b3f776512ab640ec#b3f776512ab640ec
> > >
> > > I suggest you to create an additional integer field, that holds number
> of
> > > tokens. You can populate it via update processor. And then penalise
> > (using
> > > fuction queries) according to that field. This way you have more fine
> > > grained and flexible control over it.
> > >
> > > Ahmet
> > >
> > >
> > >
> > > On Tuesday, December 9, 2014 12:22 PM, S.L 
> > > wrote:
> > > Hi ,
> > >
> > > Mikhail Thanks , I looked at the explain and this is what I see for the
> > two
> > > different documents in questions, they have identical scores   even
> > though
> > > the document 2 has a shorter productName field, I do not see any
> > lenghtNorm
> > > related information in the explain.
> > >
> > > Also I am not exactly clear on what needs to be looked in the API ?
> > >
> > > *Search Query* : q=iphone+4s+16gb&qf= productName&mm=1&pf=
> > > productName&ps=1&pf2= productName&pf3=
> > > productName&stopwords=true&lowercaseOperators=true
> > >
> > > *productName Details about Apple iPhone 4s 16GB Smartphone AT&T Factory
> > > Unlocked *
> > >
> > >
> > >- *100%* 10.649221 sum of the following:
> > >   - *10.58%* 1.1270299 sum of the following:
> > >  - *2.1%* 0.22383358 productName:iphon
> > >  - *3.47%* 0.36922288 productName:"4 s"
> > >  - *5.01%* 0.53397346 productName:"16 gb"
> > >   - *30.81%* 3.2814684 productName:"iphon 4 s 16 gb"~1
> > >   - *27.79%* 2.959255 sum of the following:
> > >  - *10.97%* 1.1680154 productName:"iphon 4 s"~1
> > >  - *16.82%* 1.7912396 productName:"4 s 16 gb"~1
> > >   - *30.81%* 3.2814684 productName:"iphon 4 s 16 gb"~1
> > >
> > >
> > > *productName Apple iPhone 4S 16GB for Net10, No Contract, White*
> > >
> > >
> > >- *100%* 10.649221 sum of the following:
> > >   - *10.58%* 1.1270299 sum of the following:
> > >  - *2.1%* 0.22383358 productName:iphon
> > >  - *3.47%* 0.36922288 productName:"4 s"
> > >  - *5.01%* 0.53397346 productName:"16 gb"
> > >   - *30.81%* 3.2814684 productName:"iphon 4 s 16 gb"~1
> > >   - *27.79%* 2.959255 sum of the following:
> > >  - *10.97%* 1.1680154 productName:"iphon 4 s"~1
> > >  - *16.82%* 1.7912396 productName:"4 s 16 gb"~1
> > >   - *30.81%* 3.2814684 productName:"iphon 4 s 16 gb"~1
> > >
> > >
> > >
> > >
> > >
> > > On Mon, Dec 8, 2014 at 10:25 AM, Mikhail Khludnev <
> > > mkhlud...@griddynamics.com> wrote:
> > >
> > > > It's worth to look into  to check particular scoring values.
> > But
> > > > for most suspect is the reducing precision when float norms are
> stored
> > in
> > > > byte vals. See javadoc for DefaultSimilarity.encodeNormValue(float)
> > > >
> > > >
> > > > On Mon, Dec 8, 2014 at 5:49 PM, S.L 
> wrote:
> > > >
> > > > > I have two documents doc1 and doc2 and each one of those has a
> field
> > > > called
> > > > > phoneName.
> > > > >
> > > > > doc1:phoneName:"Details about  Apple iPhone 4s - 16GB - White
> > (Verizon)
> > > > > Smartphone Factory Unlocked"
> > > > >
> > > > > doc2:phoneName:"Apple iPhone 4S 16GB for Net10, No Contract, White"
> > > > >
> > > > > Here if I search for
> > > > >
> > > > >
> > > >
> > >
> >
> q=iphone+4s+16gb&qf=phoneName&mm=1&pf=phoneName&ps=1&pf2=phoneName&pf3=phoneName&stopwords=true&lowercaseOperators=true
> > > > >
> > > > > Doc1 and Doc2 both have the same identical score , but since the
> > field
> > > > > phoneName in the doc2 has shorter length I would expect it to have
> a
> > > > higher
> > > > > score , but both have an identical score of 9.961212.
> > > > >
> > > > > The phoneName filed is defined as follows.As we can see no where
> am I
> > > > > specifying omitNorms=True, still the behavior seems to be that the
> > > length
> > > > > norm is not functioning at all. Can some one let me know whats the
> > > issue
> > > > > here ?
> > > > >
> > > > >  > indexed="true"
> > > > > stored="true" required="tr

Highlighting integer field

2014-12-11 Thread Pawel Rog
Hi,
Is it possible to highlight int (TrieLongField) or long (TrieLongField)
field in Solr?

--
Paweł


Re: Length norm not functioning in solr queries.

2014-12-11 Thread S.L
Ahmet,

Thank you , as the configurations in SolrCloud are uploaded to zookeeper ,
are there any special steps that need to be taken to make this work in
SolrCloud ?

On Wed, Dec 10, 2014 at 4:32 AM, Ahmet Arslan 
wrote:
>
> Hi,
>
> Or even better, you can use your new field for tie break purposes. Where
> scores are identical.
> e.g. sort=score desc, wordCount asc
>
> Ahmet
>
>
> On Wednesday, December 10, 2014 11:29 AM, Ahmet Arslan 
> wrote:
> Hi,
>
> You mean update processor factory?
>
> Here is augmented (wordCount field added) version of your example :
>
> doc1:
>
> phoneName:"Details about  Apple iPhone 4s - 16GB - White (Verizon)
> Smartphone Factory Unlocked"
> wordCount: 11
>
> doc2:
>
> phoneName:"Apple iPhone 4S 16GB for Net10, No Contract, White"
> wordCount: 9
>
>
> First task is simply calculate wordCount values. You can do it in your
> indexing code, or other places.
> I quickly skimmed existing update processors but I couldn't find stock
> implementation.
> CountFieldValuesUpdateProcessorFactory fooled me, but it looks like it is
> all about multivalued fields.
>
> I guess, A simple javascript that splits on whitespace and returns the
> produced array size would do the trick :
> StatelessScriptUpdateProcessorFactory
>
>
>
> At this point you have a int field named word count.
> boost=div(1,wordCount) should work. Or you can came up with more
> sophisticated math formula.
>
> Ahmet
>
>
> On Wednesday, December 10, 2014 11:12 AM, S.L 
> wrote:
> Hi Ahmet,
>
> Is there already an implementation of the suggested work around ? Thanks.
>
>
> On Tue, Dec 9, 2014 at 6:41 AM, Ahmet Arslan 
> wrote:
>
> > Hi,
> >
> > Default length norm is not best option for differentiating very short
> > documents, like product names.
> > Please see :
> > http://find.searchhub.org/document/b3f776512ab640ec#b3f776512ab640ec
> >
> > I suggest you to create an additional integer field, that holds number of
> > tokens. You can populate it via update processor. And then penalise
> (using
> > fuction queries) according to that field. This way you have more fine
> > grained and flexible control over it.
> >
> > Ahmet
> >
> >
> >
> > On Tuesday, December 9, 2014 12:22 PM, S.L 
> > wrote:
> > Hi ,
> >
> > Mikhail Thanks , I looked at the explain and this is what I see for the
> two
> > different documents in questions, they have identical scores   even
> though
> > the document 2 has a shorter productName field, I do not see any
> lenghtNorm
> > related information in the explain.
> >
> > Also I am not exactly clear on what needs to be looked in the API ?
> >
> > *Search Query* : q=iphone+4s+16gb&qf= productName&mm=1&pf=
> > productName&ps=1&pf2= productName&pf3=
> > productName&stopwords=true&lowercaseOperators=true
> >
> > *productName Details about Apple iPhone 4s 16GB Smartphone AT&T Factory
> > Unlocked *
> >
> >
> >- *100%* 10.649221 sum of the following:
> >   - *10.58%* 1.1270299 sum of the following:
> >  - *2.1%* 0.22383358 productName:iphon
> >  - *3.47%* 0.36922288 productName:"4 s"
> >  - *5.01%* 0.53397346 productName:"16 gb"
> >   - *30.81%* 3.2814684 productName:"iphon 4 s 16 gb"~1
> >   - *27.79%* 2.959255 sum of the following:
> >  - *10.97%* 1.1680154 productName:"iphon 4 s"~1
> >  - *16.82%* 1.7912396 productName:"4 s 16 gb"~1
> >   - *30.81%* 3.2814684 productName:"iphon 4 s 16 gb"~1
> >
> >
> > *productName Apple iPhone 4S 16GB for Net10, No Contract, White*
> >
> >
> >- *100%* 10.649221 sum of the following:
> >   - *10.58%* 1.1270299 sum of the following:
> >  - *2.1%* 0.22383358 productName:iphon
> >  - *3.47%* 0.36922288 productName:"4 s"
> >  - *5.01%* 0.53397346 productName:"16 gb"
> >   - *30.81%* 3.2814684 productName:"iphon 4 s 16 gb"~1
> >   - *27.79%* 2.959255 sum of the following:
> >  - *10.97%* 1.1680154 productName:"iphon 4 s"~1
> >  - *16.82%* 1.7912396 productName:"4 s 16 gb"~1
> >   - *30.81%* 3.2814684 productName:"iphon 4 s 16 gb"~1
> >
> >
> >
> >
> >
> > On Mon, Dec 8, 2014 at 10:25 AM, Mikhail Khludnev <
> > mkhlud...@griddynamics.com> wrote:
> >
> > > It's worth to look into  to check particular scoring values.
> But
> > > for most suspect is the reducing precision when float norms are stored
> in
> > > byte vals. See javadoc for DefaultSimilarity.encodeNormValue(float)
> > >
> > >
> > > On Mon, Dec 8, 2014 at 5:49 PM, S.L  wrote:
> > >
> > > > I have two documents doc1 and doc2 and each one of those has a field
> > > called
> > > > phoneName.
> > > >
> > > > doc1:phoneName:"Details about  Apple iPhone 4s - 16GB - White
> (Verizon)
> > > > Smartphone Factory Unlocked"
> > > >
> > > > doc2:phoneName:"Apple iPhone 4S 16GB for Net10, No Contract, White"
> > > >
> > > > Here if I search for
> > > >
> > > >
> > >
> >
> q=iphone+4s+16gb&qf=phoneName&mm=1&pf=phoneName&ps=1&pf2=phoneName&pf3=phoneName&stopwords=true&lowercaseOperators=true
> > > >
> > > > Doc1 and Do

Re: Priority in search an synonyms

2014-12-11 Thread Antoine REBOUL
Hello,

First of all thank you for your answers !

In my schema.xml file:
- I created this field :

 


- the type of this field is a "copyfiled" :



I wonder if the following statement is required :
ebc_libelle

I test my results with the following settings :
http://IP:8983/solr/select/?qf=tmp_libelle
^75%20ebc_libelle^5&pf=ebc_libelle&q=Castorama&start=0&rows=100&indent=on&defType=edismax&sort=score%20asc

The problem I have now is that ebc_libelle synonyms reported for the field
are not show


The field ebc_libelle is analyzed/indexed as follows :
   





















  




Best Regards.

*Antoine Reboul*
Responsable Comparateurs / Plateforme emailing
Plebicom -  eBuyClub - Cashstore - Checkdeal

PLEBICOM – 29 avenue Joannes Masset – 69009 Lyon
Tel  : 04 72 85 81 49
Fax : 04 78 83 39 74

2014-12-10 16:40 GMT+01:00 Alexandre Rafalovitch :

> This might be written just for you:
>
> http://opensourceconnections.com/blog/2014/12/08/title-search-when-relevancy-is-only-skin-deep/
>
> Merchant would be same as title = short text
>
> Regards,
>Alex.
> Personal: http://www.outerthoughts.com/ and @arafalov
> Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
> Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
>
>
> On 10 December 2014 at 10:00, Antoine REBOUL
>  wrote:
> > hello,
> >
> > I have a question , I do not know if there is a solution ...
> >
> > I will index and search a field named " Libel " .
> > I use a " synomims " file.
> >
> > I have for example the following line in my file synonyms " ipad = >
> Apple,
> > Priceminister , Amazon"
> >
> > Research on iPad gives me much Apple, and Amazon Priceminister ( expected
> > result)
> > But when I am searching "Apple", i want that the merchant Apple is
> returned
> > first.
> > This is not the case , in fact, it is Amazon who gets the first place.
> >
> > Sorry for my poor English , I'm using a translator.
> >
> > Best Regards.
> >
> > *Antoine Reboul*
> > Responsable Comparateurs / Plateforme emailing
> > Plebicom -  eBuyClub - Cashstore - Checkdeal
> >
> > PLEBICOM – 29 avenue Joannes Masset – 69009 Lyon
> > Tel  : 04 72 85 81 49
> > Fax : 04 78 83 39 74
>


Histogram Facet and Aggregation Solr

2014-12-11 Thread Ankit Jain
Hi All,

We have an usecase where we want to perform histogram on 10 minutes time
period and then each 10 mins time frame we have to perform facet on some
field.

We are currently using 4.7.2 version of Solr.

Please suggest how we nested facet with histogram.

-- 
Thanks,
Ankit Jain


Re: Priority in search an synonyms

2014-12-11 Thread Ahmet Arslan
Hi Antoine,

By saying "The problem I have now is that ebc_libelle synonyms reported for the 
field are not show", you mean you have synonym entry for the word Castaroma, 
and documents containing those synonym entries do not show up in fist 100 
documents?

If yes, play with boost values (5 versus 75), tweak them until you have 
satisfactory diverse result set.

By the way, I think filling first/initial result set (whenever possible) with 
exact matches is a good thing.
I believe, the user types her query for some reason. If exact matched documents 
are too few, then other techniques (stemming, synonym, etc) should kick in. 
Please note that this approach makes sense for search applications where 
precision is more valuable than recall.

Ahmet





On Thursday, December 11, 2014 12:20 PM, Antoine REBOUL 
 wrote:
Hello,

First of all thank you for your answers !

In my schema.xml file:
- I created this field :

 


- the type of this field is a "copyfiled" :



I wonder if the following statement is required :
ebc_libelle

I test my results with the following settings :
http://IP:8983/solr/select/?qf=tmp_libelle
^75%20ebc_libelle^5&pf=ebc_libelle&q=Castorama&start=0&rows=100&indent=on&defType=edismax&sort=score%20asc

The problem I have now is that ebc_libelle synonyms reported for the field
are not show


The field ebc_libelle is analyzed/indexed as follows :
   





















  




Best Regards.

*Antoine Reboul*
Responsable Comparateurs / Plateforme emailing
Plebicom -  eBuyClub - Cashstore - Checkdeal

PLEBICOM – 29 avenue Joannes Masset – 69009 Lyon
Tel  : 04 72 85 81 49
Fax : 04 78 83 39 74


2014-12-10 16:40 GMT+01:00 Alexandre Rafalovitch :

> This might be written just for you:
>
> http://opensourceconnections.com/blog/2014/12/08/title-search-when-relevancy-is-only-skin-deep/
>
> Merchant would be same as title = short text
>
> Regards,
>Alex.
> Personal: http://www.outerthoughts.com/ and @arafalov
> Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
> Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
>
>
> On 10 December 2014 at 10:00, Antoine REBOUL
>  wrote:
> > hello,
> >
> > I have a question , I do not know if there is a solution ...
> >
> > I will index and search a field named " Libel " .
> > I use a " synomims " file.
> >
> > I have for example the following line in my file synonyms " ipad = >
> Apple,
> > Priceminister , Amazon"
> >
> > Research on iPad gives me much Apple, and Amazon Priceminister ( expected
> > result)
> > But when I am searching "Apple", i want that the merchant Apple is
> returned
> > first.
> > This is not the case , in fact, it is Amazon who gets the first place.
> >
> > Sorry for my poor English , I'm using a translator.
> >
> > Best Regards.
> >
> > *Antoine Reboul*
> > Responsable Comparateurs / Plateforme emailing
> > Plebicom -  eBuyClub - Cashstore - Checkdeal
> >
> > PLEBICOM – 29 avenue Joannes Masset – 69009 Lyon
> > Tel  : 04 72 85 81 49
> > Fax : 04 78 83 39 74
>


Re: Length norm not functioning in solr queries.

2014-12-11 Thread Ahmet Arslan
Hi,

No special steps to be taken for cloud setup. Please note that for both 
solutions, re-index is mandatory.

Ahmet



On Thursday, December 11, 2014 12:15 PM, S.L  wrote:
Ahmet,

Thank you , as the configurations in SolrCloud are uploaded to zookeeper ,
are there any special steps that need to be taken to make this work in
SolrCloud ?


On Wed, Dec 10, 2014 at 4:32 AM, Ahmet Arslan 
wrote:
>
> Hi,
>
> Or even better, you can use your new field for tie break purposes. Where
> scores are identical.
> e.g. sort=score desc, wordCount asc
>
> Ahmet
>
>
> On Wednesday, December 10, 2014 11:29 AM, Ahmet Arslan 
> wrote:
> Hi,
>
> You mean update processor factory?
>
> Here is augmented (wordCount field added) version of your example :
>
> doc1:
>
> phoneName:"Details about  Apple iPhone 4s - 16GB - White (Verizon)
> Smartphone Factory Unlocked"
> wordCount: 11
>
> doc2:
>
> phoneName:"Apple iPhone 4S 16GB for Net10, No Contract, White"
> wordCount: 9
>
>
> First task is simply calculate wordCount values. You can do it in your
> indexing code, or other places.
> I quickly skimmed existing update processors but I couldn't find stock
> implementation.
> CountFieldValuesUpdateProcessorFactory fooled me, but it looks like it is
> all about multivalued fields.
>
> I guess, A simple javascript that splits on whitespace and returns the
> produced array size would do the trick :
> StatelessScriptUpdateProcessorFactory
>
>
>
> At this point you have a int field named word count.
> boost=div(1,wordCount) should work. Or you can came up with more
> sophisticated math formula.
>
> Ahmet
>
>
> On Wednesday, December 10, 2014 11:12 AM, S.L 
> wrote:
> Hi Ahmet,
>
> Is there already an implementation of the suggested work around ? Thanks.
>
>
> On Tue, Dec 9, 2014 at 6:41 AM, Ahmet Arslan 
> wrote:
>
> > Hi,
> >
> > Default length norm is not best option for differentiating very short
> > documents, like product names.
> > Please see :
> > http://find.searchhub.org/document/b3f776512ab640ec#b3f776512ab640ec
> >
> > I suggest you to create an additional integer field, that holds number of
> > tokens. You can populate it via update processor. And then penalise
> (using
> > fuction queries) according to that field. This way you have more fine
> > grained and flexible control over it.
> >
> > Ahmet
> >
> >
> >
> > On Tuesday, December 9, 2014 12:22 PM, S.L 
> > wrote:
> > Hi ,
> >
> > Mikhail Thanks , I looked at the explain and this is what I see for the
> two
> > different documents in questions, they have identical scores   even
> though
> > the document 2 has a shorter productName field, I do not see any
> lenghtNorm
> > related information in the explain.
> >
> > Also I am not exactly clear on what needs to be looked in the API ?
> >
> > *Search Query* : q=iphone+4s+16gb&qf= productName&mm=1&pf=
> > productName&ps=1&pf2= productName&pf3=
> > productName&stopwords=true&lowercaseOperators=true
> >
> > *productName Details about Apple iPhone 4s 16GB Smartphone AT&T Factory
> > Unlocked *
> >
> >
> >- *100%* 10.649221 sum of the following:
> >   - *10.58%* 1.1270299 sum of the following:
> >  - *2.1%* 0.22383358 productName:iphon
> >  - *3.47%* 0.36922288 productName:"4 s"
> >  - *5.01%* 0.53397346 productName:"16 gb"
> >   - *30.81%* 3.2814684 productName:"iphon 4 s 16 gb"~1
> >   - *27.79%* 2.959255 sum of the following:
> >  - *10.97%* 1.1680154 productName:"iphon 4 s"~1
> >  - *16.82%* 1.7912396 productName:"4 s 16 gb"~1
> >   - *30.81%* 3.2814684 productName:"iphon 4 s 16 gb"~1
> >
> >
> > *productName Apple iPhone 4S 16GB for Net10, No Contract, White*
> >
> >
> >- *100%* 10.649221 sum of the following:
> >   - *10.58%* 1.1270299 sum of the following:
> >  - *2.1%* 0.22383358 productName:iphon
> >  - *3.47%* 0.36922288 productName:"4 s"
> >  - *5.01%* 0.53397346 productName:"16 gb"
> >   - *30.81%* 3.2814684 productName:"iphon 4 s 16 gb"~1
> >   - *27.79%* 2.959255 sum of the following:
> >  - *10.97%* 1.1680154 productName:"iphon 4 s"~1
> >  - *16.82%* 1.7912396 productName:"4 s 16 gb"~1
> >   - *30.81%* 3.2814684 productName:"iphon 4 s 16 gb"~1
> >
> >
> >
> >
> >
> > On Mon, Dec 8, 2014 at 10:25 AM, Mikhail Khludnev <
> > mkhlud...@griddynamics.com> wrote:
> >
> > > It's worth to look into  to check particular scoring values.
> But
> > > for most suspect is the reducing precision when float norms are stored
> in
> > > byte vals. See javadoc for DefaultSimilarity.encodeNormValue(float)
> > >
> > >
> > > On Mon, Dec 8, 2014 at 5:49 PM, S.L  wrote:
> > >
> > > > I have two documents doc1 and doc2 and each one of those has a field
> > > called
> > > > phoneName.
> > > >
> > > > doc1:phoneName:"Details about  Apple iPhone 4s - 16GB - White
> (Verizon)
> > > > Smartphone Factory Unlocked"
> > > >
> > > > doc2:phoneName:"Apple iPhone 4S 16GB for Net10, No Contract, White"
> > > >
> > > > Here if I search for

Suspicious message with attachment

2014-12-11 Thread help
The following message addressed to you was quarantined because it likely 
contains a virus:

Subject: Inconsistent doc value across two nodes - very simple test - what's 
the expected behavior?
From: Gili Nachum 

However, if you know the sender and are expecting an attachment, please reply 
to this message, and we will forward the quarantined message to you.


Re: Design optimal Solr Schema

2014-12-11 Thread Alexandre Rafalovitch
Tomas,

You have a difficult use case. You seem to have a speech recognition
domain and you want to be able to search that transcribed text with
reference back to timing. It's an interesting problem, but not an easy
one. Certainly not something one can give you the answer all at once.

The issue here is representation of that text. You want it both
per-word (so you have timing) and as a flowing text (so you could find
it). And then, you also have problems how to express it from the PHP
client.


But here are things you need to think about:
1) Do you have groups in your word sequence. You say find "how are
you" but what about "there ah how" which would be still together in
the stream but is the end of one sentence and start of another. If you
do want to find any sequence of consequent words, you need to index
them together and you end up with one very long document. If not, you
need to decide how you are going to break your continuous text into
groups (based on SILENCE, timing, or something else)

2) Then you have the association of multi-word sequence to time. You
say "Good morning to you" is at 5.25, but that's not possible as each
word has it's own duration. Does it mean the word Good was 5.25? Can
they find "Morning to you" and will it still return 5.25? or 5.28?
This design decision will affect how you index it.

3) And what happens if the matched text happens twice like "Chao" -
hello and "Chao" - goodbye. If you want two separate documents
returned, this implies two documents in Solr. So, that goes hand in
hand with (1) above.

4) Then you have a whole highlighting issue, which I am not even going
to start on, except that the text being highlighted needs to be in one
field, so that has impact too.

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 11 December 2014 at 03:33, tomas.kalas  wrote:
> Thanks for help, but how wrote Alex, I used synonm filter and it is what i
> want. When i wrote to synonym for example Hello, Hi. And sentence is Hello
> how are you and my query is Hi how are you, so that find it too.
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Design-optimal-Solr-Schema-tp4166632p4173690.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Highlighting integer field

2014-12-11 Thread Tomoko Uchida
Hi Pawel,

Essentially, highlighting is a feature to show "fragments of documents"
that matche user queries.
With that, he/she can find occurrence of their query in long documents and
can understand their results well.

For tint or tlong fields (or other non-text field types), "fragments"
usually have no meaning.

So, excuse me, I cannot understand your intent.
If you specify your need a little bit more, I or other fellows may be able
to help you.

Regards,
Tomoko

2014-12-11 19:12 GMT+09:00 Pawel Rog :

> Hi,
> Is it possible to highlight int (TrieLongField) or long (TrieLongField)
> field in Solr?
>
> --
> Paweł
>


Re: Design optimal Solr Schema

2014-12-11 Thread tomas.kalas
Oh no, i want to answered to this topic, where you help me with the synonym
filter:

http://lucene.472066.n3.nabble.com/Alternative-searching-td4172339.html

but i was opened this topic too and i checking my answer in google
translator and copy it here.

Now, i have a edit task, i do not have to search to specific time, but only
in phrase, but with alternatives. Synonym filter is good idea, but if i have
at specific word in more cases more altenatives, thats it the problem what i
now dealing. I asked in this topic:
http://lucene.472066.n3.nabble.com/Alternative-synonymum-td4173694.html

Sorry for chaos.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Design-optimal-Solr-Schema-tp4166632p4173748.html
Sent from the Solr - User mailing list archive at Nabble.com.


Is it possible in Solr to have document field value, based on context during query time, by request parameter ?

2014-12-11 Thread Nenko Ivanov

The Use Case:

Very large and sharded index with articles with different categorization 
fields, pre populated with algorithmic estimated values (simple type, 
mostly Integer values). The index is accessed from multiple “clients” 
and each client can override article property based on his context, for 
example the sentiment score for specific article.


If article X sentiment is overridden by Client A, its value persist in 
permanent storage for article X  with both its context value, alongside 
with the default algorithmic value.


When Client A queries index, the document value of article X for 
sentiment  has to match his overridden value in filter queries or in 
facet counts.


Client B sees default estimated value for sentiment for article X.

Currently the simplest solution is to duplicate content for each client, 
but that is not an option because of the index scale.



Some background:

The above effect was partly achieved few years ago for experimental 
purposes based on that tutorial - 
http://sujitpal.blogspot.com/2011/05/custom-sorting-in-solr-using-external.html



--
Nenko Ivanov


Re: Length norm not functioning in solr queries.

2014-12-11 Thread S.L
Yes, I understand that reindexing is neccesary , however for some reason I
was not able to invoke the js script from the updateprocessor, so I ended
up using Java only solution at index time.

Thanks.

On Thu, Dec 11, 2014 at 7:18 AM, Ahmet Arslan 
wrote:
>
> Hi,
>
> No special steps to be taken for cloud setup. Please note that for both
> solutions, re-index is mandatory.
>
> Ahmet
>
>
>
> On Thursday, December 11, 2014 12:15 PM, S.L 
> wrote:
> Ahmet,
>
> Thank you , as the configurations in SolrCloud are uploaded to zookeeper ,
> are there any special steps that need to be taken to make this work in
> SolrCloud ?
>
>
> On Wed, Dec 10, 2014 at 4:32 AM, Ahmet Arslan 
> wrote:
> >
> > Hi,
> >
> > Or even better, you can use your new field for tie break purposes. Where
> > scores are identical.
> > e.g. sort=score desc, wordCount asc
> >
> > Ahmet
> >
> >
> > On Wednesday, December 10, 2014 11:29 AM, Ahmet Arslan <
> iori...@yahoo.com>
> > wrote:
> > Hi,
> >
> > You mean update processor factory?
> >
> > Here is augmented (wordCount field added) version of your example :
> >
> > doc1:
> >
> > phoneName:"Details about  Apple iPhone 4s - 16GB - White (Verizon)
> > Smartphone Factory Unlocked"
> > wordCount: 11
> >
> > doc2:
> >
> > phoneName:"Apple iPhone 4S 16GB for Net10, No Contract, White"
> > wordCount: 9
> >
> >
> > First task is simply calculate wordCount values. You can do it in your
> > indexing code, or other places.
> > I quickly skimmed existing update processors but I couldn't find stock
> > implementation.
> > CountFieldValuesUpdateProcessorFactory fooled me, but it looks like it is
> > all about multivalued fields.
> >
> > I guess, A simple javascript that splits on whitespace and returns the
> > produced array size would do the trick :
> > StatelessScriptUpdateProcessorFactory
> >
> >
> >
> > At this point you have a int field named word count.
> > boost=div(1,wordCount) should work. Or you can came up with more
> > sophisticated math formula.
> >
> > Ahmet
> >
> >
> > On Wednesday, December 10, 2014 11:12 AM, S.L  >
> > wrote:
> > Hi Ahmet,
> >
> > Is there already an implementation of the suggested work around ? Thanks.
> >
> >
> > On Tue, Dec 9, 2014 at 6:41 AM, Ahmet Arslan 
> > wrote:
> >
> > > Hi,
> > >
> > > Default length norm is not best option for differentiating very short
> > > documents, like product names.
> > > Please see :
> > > http://find.searchhub.org/document/b3f776512ab640ec#b3f776512ab640ec
> > >
> > > I suggest you to create an additional integer field, that holds number
> of
> > > tokens. You can populate it via update processor. And then penalise
> > (using
> > > fuction queries) according to that field. This way you have more fine
> > > grained and flexible control over it.
> > >
> > > Ahmet
> > >
> > >
> > >
> > > On Tuesday, December 9, 2014 12:22 PM, S.L 
> > > wrote:
> > > Hi ,
> > >
> > > Mikhail Thanks , I looked at the explain and this is what I see for the
> > two
> > > different documents in questions, they have identical scores   even
> > though
> > > the document 2 has a shorter productName field, I do not see any
> > lenghtNorm
> > > related information in the explain.
> > >
> > > Also I am not exactly clear on what needs to be looked in the API ?
> > >
> > > *Search Query* : q=iphone+4s+16gb&qf= productName&mm=1&pf=
> > > productName&ps=1&pf2= productName&pf3=
> > > productName&stopwords=true&lowercaseOperators=true
> > >
> > > *productName Details about Apple iPhone 4s 16GB Smartphone AT&T Factory
> > > Unlocked *
> > >
> > >
> > >- *100%* 10.649221 sum of the following:
> > >   - *10.58%* 1.1270299 sum of the following:
> > >  - *2.1%* 0.22383358 productName:iphon
> > >  - *3.47%* 0.36922288 productName:"4 s"
> > >  - *5.01%* 0.53397346 productName:"16 gb"
> > >   - *30.81%* 3.2814684 productName:"iphon 4 s 16 gb"~1
> > >   - *27.79%* 2.959255 sum of the following:
> > >  - *10.97%* 1.1680154 productName:"iphon 4 s"~1
> > >  - *16.82%* 1.7912396 productName:"4 s 16 gb"~1
> > >   - *30.81%* 3.2814684 productName:"iphon 4 s 16 gb"~1
> > >
> > >
> > > *productName Apple iPhone 4S 16GB for Net10, No Contract, White*
> > >
> > >
> > >- *100%* 10.649221 sum of the following:
> > >   - *10.58%* 1.1270299 sum of the following:
> > >  - *2.1%* 0.22383358 productName:iphon
> > >  - *3.47%* 0.36922288 productName:"4 s"
> > >  - *5.01%* 0.53397346 productName:"16 gb"
> > >   - *30.81%* 3.2814684 productName:"iphon 4 s 16 gb"~1
> > >   - *27.79%* 2.959255 sum of the following:
> > >  - *10.97%* 1.1680154 productName:"iphon 4 s"~1
> > >  - *16.82%* 1.7912396 productName:"4 s 16 gb"~1
> > >   - *30.81%* 3.2814684 productName:"iphon 4 s 16 gb"~1
> > >
> > >
> > >
> > >
> > >
> > > On Mon, Dec 8, 2014 at 10:25 AM, Mikhail Khludnev <
> > > mkhlud...@griddynamics.com> wrote:
> > >
> > > > It's worth to look into  to check particular scoring values.
> > B

Re: Design optimal Solr Schema

2014-12-11 Thread Alexandre Rafalovitch
Ok. Make sure to post in the right topics. People get super confused
when the conversation thread changes.

Maybe ignore this last couple of messages and post the new one as
appropriate (separate or in another thread). That way the right people
will see it.

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 11 December 2014 at 09:16, tomas.kalas  wrote:
> Oh no, i want to answered to this topic, where you help me with the synonym
> filter:
>
> http://lucene.472066.n3.nabble.com/Alternative-searching-td4172339.html
>
> but i was opened this topic too and i checking my answer in google
> translator and copy it here.
>
> Now, i have a edit task, i do not have to search to specific time, but only
> in phrase, but with alternatives. Synonym filter is good idea, but if i have
> at specific word in more cases more altenatives, thats it the problem what i
> now dealing. I asked in this topic:
> http://lucene.472066.n3.nabble.com/Alternative-synonymum-td4173694.html
>
> Sorry for chaos.
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Design-optimal-Solr-Schema-tp4166632p4173748.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Is it possible in Solr to have document field value, based on context during query time, by request parameter ?

2014-12-11 Thread Alexandre Rafalovitch
So, what did not work for you with the External File Field approach?
What is the next gap you are trying to close?

You seem to be aware of the possible extension points for Solr, so you
are not looking for just a pointer to custom search components or
whatever.

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 11 December 2014 at 09:20, Nenko Ivanov  wrote:
> The Use Case:
>
> Very large and sharded index with articles with different categorization
> fields, pre populated with algorithmic estimated values (simple type, mostly
> Integer values). The index is accessed from multiple “clients” and each
> client can override article property based on his context, for example the
> sentiment score for specific article.
>
> If article X sentiment is overridden by Client A, its value persist in
> permanent storage for article X  with both its context value, alongside with
> the default algorithmic value.
>
> When Client A queries index, the document value of article X for sentiment
> has to match his overridden value in filter queries or in facet counts.
>
> Client B sees default estimated value for sentiment for article X.
>
> Currently the simplest solution is to duplicate content for each client, but
> that is not an option because of the index scale.
>
>
> Some background:
>
> The above effect was partly achieved few years ago for experimental purposes
> based on that tutorial -
> http://sujitpal.blogspot.com/2011/05/custom-sorting-in-solr-using-external.html
>
>
> --
> Nenko Ivanov


RE: WordBreakSolrSpellChecker Usage

2014-12-11 Thread Dyer, James
My first guess here, is seeing it works some of the time but not others, is 
that these values are too low:

5
5 

You know spellcheck.count is too low if the suggestion you want is not in the 
"suggestions" part of the response, but increasing it makes it get included.

You know that spellcheck.maxCollationTries is too low if it exists in 
"suggestions" but it is not getting suggested in the "collation" section.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Matt Mongeau [mailto:halogenandto...@gmail.com] 
Sent: Wednesday, December 10, 2014 12:43 PM
To: solr-user@lucene.apache.org
Subject: Fwd: WordBreakSolrSpellChecker Usage

If I have my search component setup like this
https://gist.github.com/halogenandtoast/cf9f296d01527080f18c and I have an
entry for “Rockpoint” shouldn’t “Rock point” generate suggestions?

This doesn't seem to be the case, but it works for "Blackstone" with "Black
stone". Any ideas on what I might be doing wrong?


Re: Details on why ConccurentUpdateSolrServer is reccommended for maximum index performance

2014-12-11 Thread Tom Burton-West
Thanks Eric,

That is helpful.  We already have a process that works similarly.  Each
thread/process that sends a document to Solr waits until it gets a response
in order to make sure that the document was indexed successfully (we log
errors and retry docs that don't get indexed successfully), however we run
20-100 of these processes,depending on  throughput (i.e. we send documents
to Solr for indexing as fast as we can until they start queuing up on the
Solr end.)

Is there a way to use CUSS with XML documents?

ie my second question:
> A related question, is how to use ConcurrentUpdateSolrServer with XML
> documents
>
> I have very large XML documents, and the examples I see all build
documents
> by adding fields in Java code.  Is there an example that actually reads
XML
> files from the file system?

Tom


Re: Details on why ConccurentUpdateSolrServer is reccommended for maximum index performance

2014-12-11 Thread Erick Erickson
I don't think so, it uses SolrInputDocuments and
lists thereof. So if you parse the xml and then
put things in SolrInputDocuments..

Or something like that.

Erick

On Thu, Dec 11, 2014 at 9:43 AM, Tom Burton-West  wrote:
> Thanks Eric,
>
> That is helpful.  We already have a process that works similarly.  Each
> thread/process that sends a document to Solr waits until it gets a response
> in order to make sure that the document was indexed successfully (we log
> errors and retry docs that don't get indexed successfully), however we run
> 20-100 of these processes,depending on  throughput (i.e. we send documents
> to Solr for indexing as fast as we can until they start queuing up on the
> Solr end.)
>
> Is there a way to use CUSS with XML documents?
>
> ie my second question:
>> A related question, is how to use ConcurrentUpdateSolrServer with XML
>> documents
>>
>> I have very large XML documents, and the examples I see all build
> documents
>> by adding fields in Java code.  Is there an example that actually reads
> XML
>> files from the file system?
>
> Tom


Re: Details on why ConccurentUpdateSolrServer is reccommended for maximum index performance

2014-12-11 Thread Michael Della Bitta

Tom:

ConcurrentUpdateSolrServer isn't magic or anything. You could pretty 
trivially write something that takes batches of your XML documents and 
combines them into a single document (multiple  tags in the  
section) and sends them up to Solr and achieve some of the same speed 
benefits.


If you use it, the JavaBin-based serialization in CUSS is lighter as a 
wire format, though: 
http://lucene.apache.org/solr/4_10_2/solr-solrj/org/apache/solr/client/solrj/impl/BinaryRequestWriter.html


Only thing you have to worry about (in both the CUSS and the home grown 
case) is a single bad document in a batch fails the whole batch. It's up 
to you to fall back to writing them individually so the rest of the 
batch makes it in.


Michael

On 12/11/14 11:04, Erick Erickson wrote:

I don't think so, it uses SolrInputDocuments and
lists thereof. So if you parse the xml and then
put things in SolrInputDocuments..

Or something like that.

Erick

On Thu, Dec 11, 2014 at 9:43 AM, Tom Burton-West  wrote:

Thanks Eric,

That is helpful.  We already have a process that works similarly.  Each
thread/process that sends a document to Solr waits until it gets a response
in order to make sure that the document was indexed successfully (we log
errors and retry docs that don't get indexed successfully), however we run
20-100 of these processes,depending on  throughput (i.e. we send documents
to Solr for indexing as fast as we can until they start queuing up on the
Solr end.)

Is there a way to use CUSS with XML documents?

ie my second question:

A related question, is how to use ConcurrentUpdateSolrServer with XML
documents

I have very large XML documents, and the examples I see all build

documents

by adding fields in Java code.  Is there an example that actually reads

XML

files from the file system?

Tom




Inconsistent doc value across two nodes - very simple test - what's the expected behavior?

2014-12-11 Thread Gili Nachum
I know Solr CAP properties are CP, but I don't see it happening over a very
basic test - doing something wrong?

With two Solr nodes, I index doc1 to both, stop node2, update doc1, stop
node1, start node2, start node1, and I get two different versions of the
doc depending on which replica I query.
I would expect node2 to update to itself.
Attaching Solr logs from both nodes.

*Config*
Solr 4.7.2 / Jetty.
SoldCloud on two nodes, and  3 ZK, all running in localhost.
single collection: single shard with two replicas.

*Reproducing:*
start node1 9.148.58.114:8983
start node2 9.148.58.114:8984
Cluster state: node1 leader. node2 active.

index value 'A' (id="change me").
query and expect 'A' -> success

Stop node2
Cluster state: node1 leader. node2 gone.
query and expect 'A' -> success

Update document value from 'A'->'B'
query and expect 'B' -> success

Stop node1
then
Start node2.
Cluster state: node1 gone. node2 down.

*104510 [coreZkRegister-1-thread-1] INFO
org.apache.solr.cloud.ShardLeaderElectionContext Waiting until we see more
replicas up for shard shard1: total=2 found=1 timeoutin=5.27665925E14ms*

wait 3m.

*184679 [coreZkRegister-1-thread-1] INFO
org.apache.solr.cloud.ShardLeaderElectionContext  I am the new leader:
http://9.148.58.114:8984/solr/quick-results-collection_shard1_replica2/

shard1*
Cluster state: node1 gone. node2 leader.

query and expect 'A' (old value) -> success

start node1
Cluster state: node1 actove. node2 leader.

*Inconsistency: *
*Querying node1 always returns 'B'. *
http://localhost:8983/solr/quick-results-collection_shard1_replica1/select?q=*%3A*&wt=json&indent=true
*Querying node1 always returns 'A'. *
http://localhost:8984/solr/quick-results-collection_shard1_replica2/select?q=*%3A*&wt=json&indent=true


Re: Details on why ConccurentUpdateSolrServer is reccommended for maximum index performance

2014-12-11 Thread Mikhail Khludnev
Agree with Erick.

However, I suppose you can try to provide your own RequestWriter, and let
it stream XML. btw, what's in them? How Solr handles them right now? Why
don't you want to start from the test?

On Thu, Dec 11, 2014 at 7:04 PM, Erick Erickson 
wrote:

> I don't think so, it uses SolrInputDocuments and
> lists thereof. So if you parse the xml and then
> put things in SolrInputDocuments..
>
> Or something like that.
>
> Erick
>
> On Thu, Dec 11, 2014 at 9:43 AM, Tom Burton-West 
> wrote:
> > Thanks Eric,
> >
> > That is helpful.  We already have a process that works similarly.  Each
> > thread/process that sends a document to Solr waits until it gets a
> response
> > in order to make sure that the document was indexed successfully (we log
> > errors and retry docs that don't get indexed successfully), however we
> run
> > 20-100 of these processes,depending on  throughput (i.e. we send
> documents
> > to Solr for indexing as fast as we can until they start queuing up on the
> > Solr end.)
> >
> > Is there a way to use CUSS with XML documents?
> >
> > ie my second question:
> >> A related question, is how to use ConcurrentUpdateSolrServer with XML
> >> documents
> >>
> >> I have very large XML documents, and the examples I see all build
> > documents
> >> by adding fields in Java code.  Is there an example that actually reads
> > XML
> >> files from the file system?
> >
> > Tom
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Re: Details on why ConccurentUpdateSolrServer is reccommended for maximum index performance

2014-12-11 Thread Yonik Seeley
On Wed, Dec 10, 2014 at 6:09 PM, Erick Erickson  wrote:
> So CUSS will do something like this:
> 1> assemble a packet for Solr
> 2> pass off the actual transmission
>  to Solr to a thread and immediately
>  go back to <1>.
>
> Basically, CUSS is doing async processing.

The more important part about what it's doing is the *streaming*.
CUSS is like batching documents without waiting for all of the
documents in the batch.
When you add a document, it immediately writes it to a stream where
solr can read it off and index it.  When you add a second document,
it's immediately written to the same stream (or at least one of the
open streams), as part of the same udpate request.  No separate HTTP
request, No separate update request.

The number of threads parameter for CUSS actually maps to the number
of open connections to Solr (and hence the number of concurrently
streaming update requests).

So to Solr (server side), it looks like a single update request
(assuming 1 thread) with a batch of multiple documents... but it was
never actually "batched" on the client side.

-Yonik


Help with a Join Query

2014-12-11 Thread Darin Amos
Hello,

I am trying to execute a join query that I am not 100% sure how to execute. 
Lets say I have a bunch of parent and child documents and every one of my child 
documents has a single value field “color”. 

If I want to search all parents that have a “red” child, tis is very easy:

{!join from=parent to=id}color:red

However, if I want to return only parents that have both a red AND a blue item 
it gets tricky. 

This query would return parents that have red OR blue
{!join from=parent to=id}color:red OR color:blue

And this query would return nothing since no child had both colors.
{!join from=parent to=id}color:red AND color:blue

Any suggestions? My thinking is I might require some kind of custom query.

Thanks!

Darin

Re: Details on why ConccurentUpdateSolrServer is reccommended for maximum index performance

2014-12-11 Thread Alexandre Rafalovitch
On 11 December 2014 at 11:40, Yonik Seeley  wrote:
> So to Solr (server side), it looks like a single update request
> (assuming 1 thread) with a batch of multiple documents... but it was
> never actually "batched" on the client side.

Does Solr also indexes them one-by-one as it parses them off the -
chunked -  stream? Or does it wait for the end of the "batch"?

Regards,
   Alex.

Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


Re: Details on why ConccurentUpdateSolrServer is reccommended for maximum index performance

2014-12-11 Thread Yonik Seeley
On Thu, Dec 11, 2014 at 11:52 AM, Alexandre Rafalovitch
 wrote:
> On 11 December 2014 at 11:40, Yonik Seeley  wrote:
>> So to Solr (server side), it looks like a single update request
>> (assuming 1 thread) with a batch of multiple documents... but it was
>> never actually "batched" on the client side.
>
> Does Solr also indexes them one-by-one as it parses them off the -
> chunked -  stream?

Yes, indexing is streaming (a document at a time is read off the
stream and then immediately indexed).

-Yonik


Re: WordBreakSolrSpellChecker Usage

2014-12-11 Thread Matt Mongeau
Is there a suggested value for this. I bumped them up to 20 and still
nothing has seemed to change.

On Thu, Dec 11, 2014 at 9:42 AM, Dyer, James 
wrote:

> My first guess here, is seeing it works some of the time but not others,
> is that these values are too low:
>
> 5
> 5
>
> You know spellcheck.count is too low if the suggestion you want is not in
> the "suggestions" part of the response, but increasing it makes it get
> included.
>
> You know that spellcheck.maxCollationTries is too low if it exists in
> "suggestions" but it is not getting suggested in the "collation" section.
>
> James Dyer
> Ingram Content Group
> (615) 213-4311
>
>
> -Original Message-
> From: Matt Mongeau [mailto:halogenandto...@gmail.com]
> Sent: Wednesday, December 10, 2014 12:43 PM
> To: solr-user@lucene.apache.org
> Subject: Fwd: WordBreakSolrSpellChecker Usage
>
> If I have my search component setup like this
> https://gist.github.com/halogenandtoast/cf9f296d01527080f18c and I have an
> entry for “Rockpoint” shouldn’t “Rock point” generate suggestions?
>
> This doesn't seem to be the case, but it works for "Blackstone" with "Black
> stone". Any ideas on what I might be doing wrong?
>


Re: Solr Error when making GeoPrefixTree polygon filter search

2014-12-11 Thread mathaix
Thank you. That was the issue. 
Is am running solr with Jetty. Is there are recommended way for including
those jars in the jetty configuration?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Error-when-making-GeoPrefixTree-polygon-filter-search-tp4173629p4173807.html
Sent from the Solr - User mailing list archive at Nabble.com.


Have anyone used Automatic Phrase Tokenization (AutoPhrasingTokenFilterFactory) ?

2014-12-11 Thread shamik
Hi, 

  I'm trying to use AutoPhrasingTokenFilterFactory which seems to be a 
great solution to our phrase query issues. But doesn't seem to work as 
mentioned in the blog : 

https://lucidworks.com/blog/automatic-phrase-tokenization-improving-lucene-search-precision-by-more-precise-linguistic-analysis/

The tokenizer is working as expected during query time, where it's 
preserving the phrases as a single token based on the text file. Here's my 
field definition : 


















On analyzing, I can see the phrase "seat cushions" (defined in 
autophrases.txt) is being indexed as "seat", "seat cushions" and "cushion". 

The problem is during the query time. As per the blog, the request handler 
needs to use a custom query parser to achieve the result. Here's my entry 
in solrconfig. 




velocity
browse
layout
Solritas

explicit
10
text
autophrasingParser




autophrases.txt


But if I query "seat cushions"  using this request handler, it's seemed to 
be treating the query as two separate terms and returning all results 
matching "seat" and "cushion". Not sure what I'm missing here. I'm using 
Solr 4.10. 

The other question I had is whether 
"com.lucidworks.analysis.AutoPhrasingQParserPlugin" supports the edismax 
features which is my default parser. 

I'll appreciate if anyone provide their feedback. 

-Thanks 
Shamik



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Have-anyone-used-Automatic-Phrase-Tokenization-AutoPhrasingTokenFilterFactory-tp4173808.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Help with a Join Query

2014-12-11 Thread Kydryavtsev Andrey
How about something like 

({!join from=parent to=id}color:red) AND ({!join from=parent to=id}color:blue) ?

11.12.2014, 19:48, "Darin Amos" :
> Hello,
>
> I am trying to execute a join query that I am not 100% sure how to execute. 
> Lets say I have a bunch of parent and child documents and every one of my 
> child documents has a single value field “color”.
>
> If I want to search all parents that have a “red” child, tis is very easy:
>
> {!join from=parent to=id}color:red
>
> However, if I want to return only parents that have both a red AND a blue 
> item it gets tricky.
>
> This query would return parents that have red OR blue
> {!join from=parent to=id}color:red OR color:blue
>
> And this query would return nothing since no child had both colors.
> {!join from=parent to=id}color:red AND color:blue
>
> Any suggestions? My thinking is I might require some kind of custom query.
>
> Thanks!
>
> Darin


RE: WordBreakSolrSpellChecker Usage

2014-12-11 Thread Dyer, James
Matt,

There is no exact number here, but I would think most people would want "count" 
to be maybe 10-20.  Increasing this incurs a very small performance penalty for 
each term it generates suggestions for, but you probably won't notice a 
difference.  For "maxCollationTries", 5 is a reasonable number but you might 
see improved collations if this is also perhaps 10.  With this one, you get a 
much larger performance penalty, but only when it need to try more combinations 
to return the "maxCollations".  In your case you have this at 5 also, right?  I 
would reduce this to the maximum number of re-written queries your application 
or users is actually going to use.  In a lot of cases, 1 is the right number 
here.  This would improve performance for you in some cases.

Possibly the reason “Rock point” > “Rockpoint” is failing is because you have 
"maxChanges" set to 10.  This tells it you are willing for it to break a word 
into 10 separate parts, or to combine up to 10 adjacent words into 1.  Having 
taken a quick glance at the code, I think what is happening is it is trying 
things like "r ock p oint" and "r o ck p o int", etc and never getting to your 
intended result.  In a typical scenario I would set "maxChanges" to 1-3, and 
often 1 is probably the most appropriate value here.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Matt Mongeau [mailto:halogenandto...@gmail.com] 
Sent: Thursday, December 11, 2014 11:34 AM
To: solr-user@lucene.apache.org
Subject: Re: WordBreakSolrSpellChecker Usage

Is there a suggested value for this. I bumped them up to 20 and still
nothing has seemed to change.

On Thu, Dec 11, 2014 at 9:42 AM, Dyer, James 
wrote:

> My first guess here, is seeing it works some of the time but not others,
> is that these values are too low:
>
> 5
> 5
>
> You know spellcheck.count is too low if the suggestion you want is not in
> the "suggestions" part of the response, but increasing it makes it get
> included.
>
> You know that spellcheck.maxCollationTries is too low if it exists in
> "suggestions" but it is not getting suggested in the "collation" section.
>
> James Dyer
> Ingram Content Group
> (615) 213-4311
>
>
> -Original Message-
> From: Matt Mongeau [mailto:halogenandto...@gmail.com]
> Sent: Wednesday, December 10, 2014 12:43 PM
> To: solr-user@lucene.apache.org
> Subject: Fwd: WordBreakSolrSpellChecker Usage
>
> If I have my search component setup like this
> https://gist.github.com/halogenandtoast/cf9f296d01527080f18c and I have an
> entry for “Rockpoint” shouldn’t “Rock point” generate suggestions?
>
> This doesn't seem to be the case, but it works for "Blackstone" with "Black
> stone". Any ideas on what I might be doing wrong?
>


Re: Help with a Join Query

2014-12-11 Thread Darin Amos
Thanks,

That looks like a viable option, I could do something like the following:

q={!join from=parent to=id}
&fq={!join from=parent to=id}color:red
&fq={!join from=parent to=id}color:blue

With all these joins happening like this, what kind of performance concern is 
this? I would guess this would start to cause a lot of work.

Thanks

Darin



> On Dec 11, 2014, at 1:04 PM, Kydryavtsev Andrey  wrote:
> 
> How about something like 
> 
> ({!join from=parent to=id}color:red) AND ({!join from=parent 
> to=id}color:blue) ?
> 
> 11.12.2014, 19:48, "Darin Amos" :
>> Hello,
>> 
>> I am trying to execute a join query that I am not 100% sure how to execute. 
>> Lets say I have a bunch of parent and child documents and every one of my 
>> child documents has a single value field “color”.
>> 
>> If I want to search all parents that have a “red” child, tis is very easy:
>> 
>> {!join from=parent to=id}color:red
>> 
>> However, if I want to return only parents that have both a red AND a blue 
>> item it gets tricky.
>> 
>> This query would return parents that have red OR blue
>> {!join from=parent to=id}color:red OR color:blue
>> 
>> And this query would return nothing since no child had both colors.
>> {!join from=parent to=id}color:red AND color:blue
>> 
>> Any suggestions? My thinking is I might require some kind of custom query.
>> 
>> Thanks!
>> 
>> Darin



Re: Help with a Join Query

2014-12-11 Thread Kydryavtsev Andrey


11.12.2014, 21:24, "Darin Amos" :
> Thanks,
>
> That looks like a viable option, I could do something like the following:
>
> q={!join from=parent to=id}
> &fq={!join from=parent to=id}color:red
> &fq={!join from=parent to=id}color:blue
>
> With all these joins happening like this, what kind of performance concern is 
> this? I would guess this would start to cause a lot of work.
>
> Thanks
>
> Darin
>>  On Dec 11, 2014, at 1:04 PM, Kydryavtsev Andrey  wrote:
>>
>>  How about something like
>>
>>  ({!join from=parent to=id}color:red) AND ({!join from=parent 
>> to=id}color:blue) ?
>>
>>  11.12.2014, 19:48, "Darin Amos" :
>>>  Hello,
>>>
>>>  I am trying to execute a join query that I am not 100% sure how to 
>>> execute. Lets say I have a bunch of parent and child documents and every 
>>> one of my child documents has a single value field “color”.
>>>
>>>  If I want to search all parents that have a “red” child, tis is very easy:
>>>
>>>  {!join from=parent to=id}color:red
>>>
>>>  However, if I want to return only parents that have both a red AND a blue 
>>> item it gets tricky.
>>>
>>>  This query would return parents that have red OR blue
>>>  {!join from=parent to=id}color:red OR color:blue
>>>
>>>  And this query would return nothing since no child had both colors.
>>>  {!join from=parent to=id}color:red AND color:blue
>>>
>>>  Any suggestions? My thinking is I might require some kind of custom query.
>>>
>>>  Thanks!
>>>
>>>  Darin


Re: Help with a Join Query

2014-12-11 Thread Kydryavtsev Andrey
According to may experience, "query time join" has relatively poor performance. 
If you can cache this joins effectively (not so many unique color values in 
requests, cache doesn't invalidate) - it's ok. If not, it may be interesting to 
try "block join" instead - 
http://blog.griddynamics.com/2013/09/solr-block-join-support.html

11.12.2014, 21:40, "Kydryavtsev Andrey" :
> 11.12.2014, 21:24, "Darin Amos" :
>>  Thanks,
>>
>>  That looks like a viable option, I could do something like the following:
>>
>>  q={!join from=parent to=id}
>>  &fq={!join from=parent to=id}color:red
>>  &fq={!join from=parent to=id}color:blue
>>
>>  With all these joins happening like this, what kind of performance concern 
>> is this? I would guess this would start to cause a lot of work.
>>
>>  Thanks
>>
>>  Darin
>>>   On Dec 11, 2014, at 1:04 PM, Kydryavtsev Andrey  
>>> wrote:
>>>
>>>   How about something like
>>>
>>>   ({!join from=parent to=id}color:red) AND ({!join from=parent 
>>> to=id}color:blue) ?
>>>
>>>   11.12.2014, 19:48, "Darin Amos" :
   Hello,

   I am trying to execute a join query that I am not 100% sure how to 
 execute. Lets say I have a bunch of parent and child documents and every 
 one of my child documents has a single value field “color”.

   If I want to search all parents that have a “red” child, tis is very 
 easy:

   {!join from=parent to=id}color:red

   However, if I want to return only parents that have both a red AND a 
 blue item it gets tricky.

   This query would return parents that have red OR blue
   {!join from=parent to=id}color:red OR color:blue

   And this query would return nothing since no child had both colors.
   {!join from=parent to=id}color:red AND color:blue

   Any suggestions? My thinking is I might require some kind of custom 
 query.

   Thanks!

   Darin


Re: Inconsistent doc value across two nodes - very simple test - what's the expected behavior?

2014-12-11 Thread Shalin Shekhar Mangar
Hi Gili,

Great question!

A write in Solr, by default, is only guaranteed to exist in 1 place i.e.
the leader and the safety valves that we have to preserve these writes are:

1. The leaderVoteWait time for which leader election is suspended until
enough live replicas are available
2. The two-way peer-sync between leader candidate and other replicas

The other safety valve is on the client side with the "min_rf" parameter
introduced by SOLR-5468 in Solr 4.9. If you set this param to 2 while
making the request then Solr will return the number of replicas to which it
could successfully send the update. Then depending on the response you can
make a decision to retry the update at a later time assuming it is
idempotent. This kinda puts the onus ensuring consistency on the client
side which is not ideal but better than nothing. See SOLR-5468 for more
discussion on this topic.

In your particular example, none of these safeties are invoked because you
start node2 while node1 was down and node2 goes ahead with leader election
after the wait period. Also since node1 was down during leader election,
peer sync doesn't happen and then node2 becomes the leader.

When node1 comes back online and joins as a replica, it recovers from the
leader using peer-sync (which returns the newest 100 updates) and finds
that there's nothing newer on the leader. However, there are no checks to
make sure that the replica doesn't have a newer update itself which is why
you end up with the inconsistent replica. If there were a lot of updates on
node2 (more than 100) while node1 was down, in which case peer-sync isn't
applicable, then it'd would have done a replication recovery and this
inconsistency would have been resolved.

So yeah we have a valid consistency bug such that we have inconsistent
replicas in a steady state. I wonder if the right way is to bump min_rf to
a higher value or peer-sync both ways during replica recovery. I'll need to
think more on this.


On Thu, Dec 11, 2014 at 4:21 PM, Gili Nachum  wrote:

> I know Solr CAP properties are CP, but I don't see it happening over a very
> basic test - doing something wrong?
>
> With two Solr nodes, I index doc1 to both, stop node2, update doc1, stop
> node1, start node2, start node1, and I get two different versions of the
> doc depending on which replica I query.
> I would expect node2 to update to itself.
> Attaching Solr logs from both nodes.
>
> *Config*
> Solr 4.7.2 / Jetty.
> SoldCloud on two nodes, and  3 ZK, all running in localhost.
> single collection: single shard with two replicas.
>
> *Reproducing:*
> start node1 9.148.58.114:8983
> start node2 9.148.58.114:8984
> Cluster state: node1 leader. node2 active.
>
> index value 'A' (id="change me").
> query and expect 'A' -> success
>
> Stop node2
> Cluster state: node1 leader. node2 gone.
> query and expect 'A' -> success
>
> Update document value from 'A'->'B'
> query and expect 'B' -> success
>
> Stop node1
> then
> Start node2.
> Cluster state: node1 gone. node2 down.
>
> *104510 [coreZkRegister-1-thread-1] INFO
> org.apache.solr.cloud.ShardLeaderElectionContext Waiting until we see more
> replicas up for shard shard1: total=2 found=1 timeoutin=5.27665925E14ms*
>
> wait 3m.
>
> *184679 [coreZkRegister-1-thread-1] INFO
> org.apache.solr.cloud.ShardLeaderElectionContext  I am the new leader:
> http://9.148.58.114:8984/solr/quick-results-collection_shard1_replica2/
> 
> shard1*
> Cluster state: node1 gone. node2 leader.
>
> query and expect 'A' (old value) -> success
>
> start node1
> Cluster state: node1 actove. node2 leader.
>
> *Inconsistency: *
> *Querying node1 always returns 'B'. *
>
> http://localhost:8983/solr/quick-results-collection_shard1_replica1/select?q=*%3A*&wt=json&indent=true
> *Querying node1 always returns 'A'. *
>
> http://localhost:8984/solr/quick-results-collection_shard1_replica2/select?q=*%3A*&wt=json&indent=true
>



-- 
Regards,
Shalin Shekhar Mangar.


Re: Solr 4.10.2 "Found core" but I get "No cores available" in dashboard page

2014-12-11 Thread solr-user
my apologies for the lack of clarity

our internal name for the project to upgrade solr from 4.0 to 4.10.2 is
"helios" and so we named our test folder "heliosearch".  I was not even
aware of the github project Heliosearch, and nothing we are doing is related
to it.

to simplify things for this post, we simplified things so that we have one
solr instance but two cores;  coreX contains the collection1 files/folders
as per the downloaded solr 4.10.2 package, while coreA uses the same
collection1 files/folders but with schema.xml and solrconfig.xml changes to
meet our needs

so file and foldername-wise, here is what we did:

1. C:\SOLR\solr-4.10.2.zip\solr-4.10.2\example renamed to
C:\SOLR\helios-4.10.2\Master
2. renamed example\solr\collection1 to example\solr\coreX; no files modified
here
3. copied example\solr\coreX to example\solr\coreA
4. modified the coreA schema to match our current production schema; ie our
field names, etc
5. modified the coreA solrconfig.xml to meet our needs (see below)

here are the solrconfig.xml changes we made to coreA

1. 
2. 4
3. false
4. false
5. commented out autoCommit section
6. commented out autoSoftCommit section
7. commented out the  section
8. 4
9. 
10.  contains geocluster
11. commented out these sections:
  
 
  
 
  
  
  
  
  

here are the schema.xml changes we made to our copy of the downloaded solr
4.10.2 package (aside from replacing the example fields provided in the
downloaded solr 4.10.2):

1. 
2. removed the example fields provided in the downloaded solr 4.10.2
3. delete various types we dont use in our current schemas
4. added fieldtypes that are in our current solr 4.0 instances
5. added various fieldtypes that are in our current solr 4.0 instances
6. readded the "text" field as apparently required:

also note that we are using java "1.7.0_67" and jetty-8.1.10.v20130312

all in all, I dont see anything that we have done that would keep the cores
from being discovered.

hope that helps.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-10-2-Found-core-but-I-get-No-cores-available-in-dashboard-page-tp4173602p4173831.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 4.10.2 "Found core" but I get "No cores available" in dashboard page

2014-12-11 Thread solr-user
small correction;  coreX (the one with the unmodified schema.xml and
solrconfig.xml) IS seen by solr and appears on the solr admin page, but
coreA (which has our modified schema and solrconfig) is found by solr but is
not shown in the solr admin page:

1494 [main] INFO  org.apache.solr.core.CoresLocator  û Looking for core
definitions underneath C:\SOLR\helios-4.10.2\Master\solr
1502 [main] INFO  org.apache.solr.core.CoresLocator  û Found core coreA in
C:\SOLR\helios-4.10.2\Master\solr\coreA\
1502 [main] INFO  org.apache.solr.core.CoresLocator  û Found core coreX in
C:\SOLR\helios-4.10.2\Master\solr\coreX\
1503 [main] INFO  org.apache.solr.core.CoresLocator  û Found 2 core
definitions





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-10-2-Found-core-but-I-get-No-cores-available-in-dashboard-page-tp4173602p4173832.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Help with a Join Query

2014-12-11 Thread EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)
May be you can try using AND condition in the single join something like 

q={!join from=parent to=id}(Id:xxx AND (Color:red OR Color:Blue)), I don't 
think this will give bigger performance issue.

Thanks

Ravi

-Original Message-
From: Darin Amos [mailto:dari...@gmail.com] 
Sent: Thursday, December 11, 2014 1:23 PM
To: solr-user@lucene.apache.org
Subject: Re: Help with a Join Query

Thanks,

That looks like a viable option, I could do something like the following:

q={!join from=parent to=id} &fq={!join from=parent 
to=id}color:red &fq={!join from=parent to=id}color:blue

With all these joins happening like this, what kind of performance concern is 
this? I would guess this would start to cause a lot of work.

Thanks

Darin



> On Dec 11, 2014, at 1:04 PM, Kydryavtsev Andrey  wrote:
> 
> How about something like
> 
> ({!join from=parent to=id}color:red) AND ({!join from=parent 
> to=id}color:blue) ?
> 
> 11.12.2014, 19:48, "Darin Amos" :
>> Hello,
>> 
>> I am trying to execute a join query that I am not 100% sure how to execute. 
>> Lets say I have a bunch of parent and child documents and every one of my 
>> child documents has a single value field “color”.
>> 
>> If I want to search all parents that have a “red” child, tis is very easy:
>> 
>> {!join from=parent to=id}color:red
>> 
>> However, if I want to return only parents that have both a red AND a blue 
>> item it gets tricky.
>> 
>> This query would return parents that have red OR blue {!join 
>> from=parent to=id}color:red OR color:blue
>> 
>> And this query would return nothing since no child had both colors.
>> {!join from=parent to=id}color:red AND color:blue
>> 
>> Any suggestions? My thinking is I might require some kind of custom query.
>> 
>> Thanks!
>> 
>> Darin



Re: Solr 4.10.2 "Found core" but I get "No cores available" in dashboard page

2014-12-11 Thread Alexandre Rafalovitch
And the XML is valid, lib references in solrconfig.xml point to the
right libraries (if any), you don't have duplicate definitions of
types, you don't have missing definitions of types? And you didn't
disable the admin handler?

And it's not just admin that's failing to find the core, right? If you
use command line to ping it for a basic search, do you get anything?

I am really grasping at the straws here. You seem to be very organized
with that and any errors (for the stuff I mentioned above) SHOULD be
clear and visible. I'd start bisecting the problem:
1) Admin and/or command-line problem
2) Is filesystem monitoring during the start (
http://technet.microsoft.com/en-us/sysinternals/bb896645 ) shows any
unexpected filesystem access
3) Can you cut your changes in half (e.g. not remove anything) and
still see the problem

Regards,
   Alex.
P.s. When you do figure it out, let us know. Just for the future
troubleshooting generations.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 11 December 2014 at 14:14, solr-user  wrote:
> small correction;  coreX (the one with the unmodified schema.xml and
> solrconfig.xml) IS seen by solr and appears on the solr admin page, but
> coreA (which has our modified schema and solrconfig) is found by solr but is
> not shown in the solr admin page:
>
> 1494 [main] INFO  org.apache.solr.core.CoresLocator  û Looking for core
> definitions underneath C:\SOLR\helios-4.10.2\Master\solr
> 1502 [main] INFO  org.apache.solr.core.CoresLocator  û Found core coreA in
> C:\SOLR\helios-4.10.2\Master\solr\coreA\
> 1502 [main] INFO  org.apache.solr.core.CoresLocator  û Found core coreX in
> C:\SOLR\helios-4.10.2\Master\solr\coreX\
> 1503 [main] INFO  org.apache.solr.core.CoresLocator  û Found 2 core
> definitions
>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-4-10-2-Found-core-but-I-get-No-cores-available-in-dashboard-page-tp4173602p4173832.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 4.10.2 "Found core" but I get "No cores available" in dashboard page

2014-12-11 Thread Chris Hostetter
: can you please include the *exact* solrconfig.xml & schema.xml you are 
: using for coreA ... you've given us an overview of what you changed, but 
: that's not enough for anyone to actally try and reproduce your problem.

if it helps (since hte list doesn't allow attachments) feel free to open a 
bug in jira, and attach a zip file of your entire solr home dir showing 
hte problem.


-Hoss
http://www.lucidworks.com/


Re: Solr 4.10.2 "Found core" but I get "No cores available" in dashboard page

2014-12-11 Thread Chris Hostetter

: coreA (which has our modified schema and solrconfig) is found by solr but is
: not shown in the solr admin page:

can you please include the *exact* solrconfig.xml & schema.xml you are 
using for coreA ... you've given us an overview of what you changed, but 
that's not enough for anyone to actally try and reproduce your problem.

if we can't reproduce it, it's impossible to diagnose it and offer 
suggestions/workarround/fix...

https://wiki.apache.org/solr/UsingMailingLists


-Hoss
http://www.lucidworks.com/


Re: Solr 4.10.2 "Found core" but I get "No cores available" in dashboard page

2014-12-11 Thread solr-user
yes, have triple checked the schema and solrconfig XML; various tools have
indicated the XML is valid

no missing types or dupes, and have not disabled the admin handler

as mentioned in my most recent response, I can see the coreX core (the
renamed and unmodified collection1 core from the downloaded package) and
query it with no issues, but coreA (whch has our specific schema and
solrconfig changes) is not showing in the admin interface and cannot be
queried (I get a 404)

both cores are located in the same solr folder.

appreciate the suggestions; looks like I will need to gradually move my
schema and core changes towards the collection1 content and see where things
start working; will take a while...sigh

will let you know what I find out.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-10-2-Found-core-but-I-get-No-cores-available-in-dashboard-page-tp4173602p4173839.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 4.10.2 "Found core" but I get "No cores available" in dashboard page

2014-12-11 Thread solr-user
Chris, will get the schema and solrconfig ready for uploading.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-10-2-Found-core-but-I-get-No-cores-available-in-dashboard-page-tp4173602p4173840.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Inconsistent doc value across two nodes - very simple test - what's the expected behavior?

2014-12-11 Thread Shalin Shekhar Mangar
I opened https://issues.apache.org/jira/browse/SOLR-6837

Probably best to have further conversations on the Jira issue.

On Thu, Dec 11, 2014 at 6:46 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> Hi Gili,
>
> Great question!
>
> A write in Solr, by default, is only guaranteed to exist in 1 place i.e.
> the leader and the safety valves that we have to preserve these writes are:
>
> 1. The leaderVoteWait time for which leader election is suspended until
> enough live replicas are available
> 2. The two-way peer-sync between leader candidate and other replicas
>
> The other safety valve is on the client side with the "min_rf" parameter
> introduced by SOLR-5468 in Solr 4.9. If you set this param to 2 while
> making the request then Solr will return the number of replicas to which it
> could successfully send the update. Then depending on the response you can
> make a decision to retry the update at a later time assuming it is
> idempotent. This kinda puts the onus ensuring consistency on the client
> side which is not ideal but better than nothing. See SOLR-5468 for more
> discussion on this topic.
>
> In your particular example, none of these safeties are invoked because you
> start node2 while node1 was down and node2 goes ahead with leader election
> after the wait period. Also since node1 was down during leader election,
> peer sync doesn't happen and then node2 becomes the leader.
>
> When node1 comes back online and joins as a replica, it recovers from the
> leader using peer-sync (which returns the newest 100 updates) and finds
> that there's nothing newer on the leader. However, there are no checks to
> make sure that the replica doesn't have a newer update itself which is why
> you end up with the inconsistent replica. If there were a lot of updates on
> node2 (more than 100) while node1 was down, in which case peer-sync isn't
> applicable, then it'd would have done a replication recovery and this
> inconsistency would have been resolved.
>
> So yeah we have a valid consistency bug such that we have inconsistent
> replicas in a steady state. I wonder if the right way is to bump min_rf to
> a higher value or peer-sync both ways during replica recovery. I'll need to
> think more on this.
>
>
> On Thu, Dec 11, 2014 at 4:21 PM, Gili Nachum  wrote:
>
>> I know Solr CAP properties are CP, but I don't see it happening over a
>> very
>> basic test - doing something wrong?
>>
>> With two Solr nodes, I index doc1 to both, stop node2, update doc1, stop
>> node1, start node2, start node1, and I get two different versions of the
>> doc depending on which replica I query.
>> I would expect node2 to update to itself.
>> Attaching Solr logs from both nodes.
>>
>> *Config*
>> Solr 4.7.2 / Jetty.
>> SoldCloud on two nodes, and  3 ZK, all running in localhost.
>> single collection: single shard with two replicas.
>>
>> *Reproducing:*
>> start node1 9.148.58.114:8983
>> start node2 9.148.58.114:8984
>> Cluster state: node1 leader. node2 active.
>>
>> index value 'A' (id="change me").
>> query and expect 'A' -> success
>>
>> Stop node2
>> Cluster state: node1 leader. node2 gone.
>> query and expect 'A' -> success
>>
>> Update document value from 'A'->'B'
>> query and expect 'B' -> success
>>
>> Stop node1
>> then
>> Start node2.
>> Cluster state: node1 gone. node2 down.
>>
>> *104510 [coreZkRegister-1-thread-1] INFO
>> org.apache.solr.cloud.ShardLeaderElectionContext Waiting until we see more
>> replicas up for shard shard1: total=2 found=1 timeoutin=5.27665925E14ms*
>>
>> wait 3m.
>>
>> *184679 [coreZkRegister-1-thread-1] INFO
>> org.apache.solr.cloud.ShardLeaderElectionContext  I am the new leader:
>> http://9.148.58.114:8984/solr/quick-results-collection_shard1_replica2/
>> 
>> shard1*
>> Cluster state: node1 gone. node2 leader.
>>
>> query and expect 'A' (old value) -> success
>>
>> start node1
>> Cluster state: node1 actove. node2 leader.
>>
>> *Inconsistency: *
>> *Querying node1 always returns 'B'. *
>>
>> http://localhost:8983/solr/quick-results-collection_shard1_replica1/select?q=*%3A*&wt=json&indent=true
>> *Querying node1 always returns 'A'. *
>>
>> http://localhost:8984/solr/quick-results-collection_shard1_replica2/select?q=*%3A*&wt=json&indent=true
>>
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>



-- 
Regards,
Shalin Shekhar Mangar.


Re: Solr Error when making GeoPrefixTree polygon filter search

2014-12-11 Thread david.w.smi...@gmail.com
As in the layout shipped with Solr?  Try putting the JTS ‘jar’ in lib/ext
and let us know if that worked.  I think it will but I forget.

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley

On Thu, Dec 11, 2014 at 12:40 PM, mathaix  wrote:
>
> Thank you. That was the issue.
> Is am running solr with Jetty. Is there are recommended way for including
> those jars in the jetty configuration?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-Error-when-making-GeoPrefixTree-polygon-filter-search-tp4173629p4173807.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


different fields for user-supplied phrases in edismax

2014-12-11 Thread Michael Sokolov
I'd like to supply a different set of fields for phrases than for bare 
terms.  Specifically, we'd like to treat phrases as more "exact" - 
probably turning off stemming and generally having a tighter analysis 
chain.  Note: this is *not* what's done by configuring "pf" which 
controls fields for the auto-generated phrases.  What we want to do is 
provide our users more precise control by explicit use of " "


Is there a way to do this by configuring edismax?  I don't think there 
is, and then if you agree, a followup question - if I want to extend the 
EDismax parser, does anybody have advice as to the best way in?  I'm 
looking at:


Query getFieldQuery(String field, String val, int slop)

and altering getAliasedQuery() to accept an aliases parameter, which 
would be a different set of aliases for phrases ...


does that make sense?

-Mike


Re: Solr 4.10.2 "Found core" but I get "No cores available" in dashboard page

2014-12-11 Thread Michael Sokolov
Have you rebooted the machine? (last refuge of the clueless, but often 
works) ...


On 12/11/14 2:50 PM, solr-user wrote:

yes, have triple checked the schema and solrconfig XML; various tools have
indicated the XML is valid

no missing types or dupes, and have not disabled the admin handler

as mentioned in my most recent response, I can see the coreX core (the
renamed and unmodified collection1 core from the downloaded package) and
query it with no issues, but coreA (whch has our specific schema and
solrconfig changes) is not showing in the admin interface and cannot be
queried (I get a 404)

both cores are located in the same solr folder.

appreciate the suggestions; looks like I will need to gradually move my
schema and core changes towards the collection1 content and see where things
start working; will take a while...sigh

will let you know what I find out.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-10-2-Found-core-but-I-get-No-cores-available-in-dashboard-page-tp4173602p4173839.html
Sent from the Solr - User mailing list archive at Nabble.com.




Mutli Lengual Suggester Solr 4.8

2014-12-11 Thread alaa.abuzaghleh
I am trying create suggester handler using solr 4.8, everything work fine but
when I try to get suggestion using different language Arabic, or Japanese
for example I got result in mixed language, but I am trying to search only
using Japanese, I got Arabic with that too. the following is my Schema.xml 















































id

























































































































































 






and this is my SolrConfig 






4.8



${solr.core0.data.dir:}






${solr.core0.data.dir:}






true





 
   explicit
   10
   id
   




explicit
edismax
10
full_name,job_tree, company, city, 
state, country,
first_name, last_name, id
full_name_suggest^60 
full_name_ngram^100.0 job_suggest^30
job_ngram^50.0 
full_name_edge^100.0 job_edge^50.0
true
full_name
   

f

Re: different fields for user-supplied phrases in edismax

2014-12-11 Thread Ahmet Arslan
Hi Mike,

If I am not wrong, you are trying to simulate google behaviour.
If you use quotes, google return exact matches. I think that makes perfectly 
sense and will be a valuable addition. I remember some folks asked/requested 
this behaviour in the list.

Ahmet



On Thursday, December 11, 2014 10:50 PM, Michael Sokolov 
 wrote:
I'd like to supply a different set of fields for phrases than for bare 
terms.  Specifically, we'd like to treat phrases as more "exact" - 
probably turning off stemming and generally having a tighter analysis 
chain.  Note: this is *not* what's done by configuring "pf" which 
controls fields for the auto-generated phrases.  What we want to do is 
provide our users more precise control by explicit use of " "

Is there a way to do this by configuring edismax?  I don't think there 
is, and then if you agree, a followup question - if I want to extend the 
EDismax parser, does anybody have advice as to the best way in?  I'm 
looking at:

Query getFieldQuery(String field, String val, int slop)

and altering getAliasedQuery() to accept an aliases parameter, which 
would be a different set of aliases for phrases ...

does that make sense?

-Mike


Re: different fields for user-supplied phrases in edismax

2014-12-11 Thread alaa.abuzaghleh


explicit
edismax
10
full_name,job_tree, company, city, 
state, country,
first_name, last_name, id
full_name_suggest^60 
full_name_ngram^100.0 job_suggest^30
job_ngram^50.0 
full_name_edge^100.0 job_edge^50.0
true
full_name
   

full_name asc
full_name asc



the configuration above let me search firstly by name, if there is no result
it will go and make the search by job hopefully this help you, I would like
to let you know that it does not work great if you have Japanese, Arabic, or
Chinese. 

this is result for query to search for user whose name is alaa or work as
developer
http://localhost:9090/solr/people/suggest?q=alaa%20developer&wt=json&indent=true

{
  "responseHeader":{
"status":0,
"QTime":27,
"params":{
  "indent":"true",
  "q":"alaa developer",
  "wt":"json"}},
  "grouped":{
"full_name":{
  "matches":4,
  "groups":[{
  "groupValue":"alaa",
  "doclist":{"numFound":1,"start":0,"docs":[
  {
"job_tree":"CTO(Chief Technology Officer) ",
"last_name":"Abuzaghleh",
"state":"California",
"country":"United States",
"city":"North Hollywood",
"id":"a2757538-9f16-42d8-907a-199c11787d09",
"company":"letspeer.com",
"full_name":"Alaa Abuzaghleh",
"first_name":"Alaa"}]
  }},
{
  "groupValue":"user1",
  "doclist":{"numFound":1,"start":0,"docs":[
  {
"job_tree":"Web Developer",
"last_name":"user1",
"state":"Amman",
"country":"Jordan",
"city":"Aljameaa",
"id":"78bd8079-666f-4e09-ab4f-aed796040c93",
"company":"BT-AT",
"full_name":"user1 user1",
"first_name":"user1"}]
  }},
{
  "groupValue":"user4",
  "doclist":{"numFound":1,"start":0,"docs":[
  {
"job_tree":"Mobile App Developer",
"last_name":"user4",
"state":"",
"country":"",
"city":"",
"id":"9e50c5b1-49cc-444a-a752-8b8ebe04b6f6",
"company":"Apple ",
"full_name":"user4 user4",
"first_name":"user4"}]
  }},
{
  "groupValue":"z3ra",
  "doclist":{"numFound":1,"start":0,"docs":[
  {
"job_tree":"",
"last_name":"z3ra",
"state":"",
"country":"",
"city":"",
"id":"2a82735d-cce0-400e-826b-b78f6bb56115",
"company":"",
"full_name":"usAlaa z3ra",
"first_name":"usAlaa"}]
  }}]}}}

you can go to the multi lengaul issue in same place where you put your issue
and look for schema configuration. 






--
View this message in context: 
http://lucene.472066.n3.nabble.com/different-fields-for-user-supplied-phrases-in-edismax-tp4173862p4173886.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Highlighting integer field

2014-12-11 Thread Pawel
Hi,
Thanks for response. It is quite important to me for example to highlight
multivalued field with many int or long tokens.

--
Paweł

On Thu, Dec 11, 2014 at 3:08 PM, Tomoko Uchida  wrote:
>
> Hi Pawel,
>
> Essentially, highlighting is a feature to show "fragments of documents"
> that matche user queries.
> With that, he/she can find occurrence of their query in long documents and
> can understand their results well.
>
> For tint or tlong fields (or other non-text field types), "fragments"
> usually have no meaning.
>
> So, excuse me, I cannot understand your intent.
> If you specify your need a little bit more, I or other fellows may be able
> to help you.
>
> Regards,
> Tomoko
>
> 2014-12-11 19:12 GMT+09:00 Pawel Rog :
>
> > Hi,
> > Is it possible to highlight int (TrieLongField) or long (TrieLongField)
> > field in Solr?
> >
> > --
> > Paweł
> >
>


Re: Highlighting integer field

2014-12-11 Thread Michael Sokolov
So the short answer to your original question is "no." Highlighting is 
designed to find matches *within* a tokenized (text) field only.  That 
is difficult because text gets processed and there are all sorts of 
complications, but for integers it should be pretty easy to match the 
values in the document and those in the query in the client, ie without 
help from Solr?


-Mike

On 12/11/14 6:19 PM, Pawel wrote:

Hi,
Thanks for response. It is quite important to me for example to highlight
multivalued field with many int or long tokens.

--
Paweł

On Thu, Dec 11, 2014 at 3:08 PM, Tomoko Uchida 
wrote:

Hi Pawel,

Essentially, highlighting is a feature to show "fragments of documents"
that matche user queries.
With that, he/she can find occurrence of their query in long documents and
can understand their results well.

For tint or tlong fields (or other non-text field types), "fragments"
usually have no meaning.

So, excuse me, I cannot understand your intent.
If you specify your need a little bit more, I or other fellows may be able
to help you.

Regards,
Tomoko

2014-12-11 19:12 GMT+09:00 Pawel Rog :


Hi,
Is it possible to highlight int (TrieLongField) or long (TrieLongField)
field in Solr?

--
Paweł





To understand SolrCloud configurations

2014-12-11 Thread E S J
Hello Team,

I would like to get clarified where to place schema.xml on SolrCloud set-up.

My Solr cloud set-up , 3 nodes, 3 shards and 3 replications, 3 ZooKeeper

What I have done is,
1. Taken a solr.war from solr default download (
solr-4.10.2/example/webapps/solr.war  -  4.10.2) and placed
 /webapps/ folder.

2. Taken Solr home from solr default download ( solr-4.10.2/example/solr/)
and placed on solr.home
(Copied Collection folder as well along with solr.xml)

3. Started 3 solr nodes and zookeepr instances ( after correct
configuration)

4. Register solr configurations of ZooKeeper using,
zkcli.sh -zkhost zoo1.internal:2183,zoo2.internal:2183,zoo3.internal:2183
-cmd upconfig -confdir /collection1/conf -confname default

5. Create 3 Shard's and 3 Replicas :
http://solr1.internal:7003/solr/admin/collections?action=CREATE&name=c-ins&replicationFactor=3&numShards=3&collection.configName=default&maxShardsPerNode=3&wt=json&indent=2


   After that I can see following folder structure  in Solr node1's
 directory ( Can see similar structure on my other 2 solr nodes)
-rw-r--r-- solr.xml drwxrwxr-x c-ins_shard1_replica1 drwxrwxr-x
c-ins_shard2_replica1 drwxrwxr-x c-ins_shard3_replica1 drwxr-xr-x
collection1


I've done some xml docuemnt indexing and it's working fine, Zoo-keepers are
also working fine, My Questions are,

1. Like to know what I have done is correct ?
2. Where to place the schema.xml's and other configurations. Because for
the moment it's are under collection1/conf folder and collection1 is not an
active collection for me. ( i'm using only c-ins core)


Appreciate your time on this.

Thanks - Elike


To understand SolrCloud configurations

2014-12-11 Thread E S J
Hello Team,

I would like to get clarified where to place schema.xml on SolrCloud set-up.

My Solr cloud set-up , 3 nodes, 3 shards and 3 replications, 3 ZooKeeper

What I have done is,
1. Taken a solr.war from solr default download (
solr-4.10.2/example/webapps/solr.war  -  4.10.2) and placed
 /webapps/ folder.

2. Taken Solr home from solr default download ( solr-4.10.2/example/solr/)
and placed on solr.home
(Copied Collection folder as well along with solr.xml)

3. Started 3 solr nodes and zookeepr instances ( after correct
configuration)

4. Register solr configurations of ZooKeeper using,
zkcli.sh -zkhost zoo1.internal:2183,zoo2.internal:2183,zoo3.internal:2183
-cmd upconfig -confdir /collection1/conf -confname default

5. Create 3 Shard's and 3 Replicas :
http://solr1.internal:7003/solr/admin/collections?action=CREATE&name=c-ins&replicationFactor=3&numShards=3&collection.configName=default&maxShardsPerNode=3&wt=json&indent=2


   After that I can see following folder structure  in Solr node1's
 directory ( Can see similar structure on my other 2 solr nodes)
-rw-r--r-- solr.xml drwxrwxr-x c-ins_shard1_replica1 drwxrwxr-x
c-ins_shard2_replica1 drwxrwxr-x c-ins_shard3_replica1 drwxr-xr-x
collection1


I've done some xml docuemnt indexing and it's working fine, Zoo-keepers are
also working fine, My Questions are,

1. Like to know what I have done is correct ?
2. Where to place the schema.xml's and other configurations. Because for
the moment it's are under collection1/conf folder and collection1 is not an
active collection for me. ( i'm using only c-ins core)


Appreciate your time on this.

Thanks - Elike


Browse interface

2014-12-11 Thread tharpa
Is it possible to boost a query using the browse interface?  How would one do
this?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Browse-interface-tp4173897.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: To understand SolrCloud configurations

2014-12-11 Thread Erick Erickson
bq: 1. Like to know what I have done is correct ?
Looks fine to me.

bq: 2. Where to place the schema.xml's and other configurations. Because for
the moment it's are under collection1/conf folder and collection1 is not an
active collection for me. ( i'm using only c-ins core)

I think you're a bit confused here. The configuration stuff is NOT
"under collection1/conf" as far as SolrCloud is concerned, it's in
Zookeeper in /configs/default, take a look at the admin>>cloud page,
click the /configs entry and I think you'll see a "defaults" node.

As far as SolrCloud is concerned, that's where your configs live. The
fact that they exist in /collection1/conf on your local
machine is totally irrelevant. Tomorrow, you could issue an upconfig
and use something like ".-confdir mytotallynewdirectory/conf
-confname default" and SolrCloud would happily overwrite your configs
in the Zookeeper "default" node with the new ones, _and_ distribute
them to all the Solr nodes when they were restarted.

So where your configs "should" live is in some kind of version control..

HTH,
Erick

On Thu, Dec 11, 2014 at 6:19 PM, E S J  wrote:
> Hello Team,
>
> I would like to get clarified where to place schema.xml on SolrCloud set-up.
>
> My Solr cloud set-up , 3 nodes, 3 shards and 3 replications, 3 ZooKeeper
>
> What I have done is,
> 1. Taken a solr.war from solr default download (
> solr-4.10.2/example/webapps/solr.war  -  4.10.2) and placed
>  /webapps/ folder.
>
> 2. Taken Solr home from solr default download ( solr-4.10.2/example/solr/)
> and placed on solr.home
> (Copied Collection folder as well along with solr.xml)
>
> 3. Started 3 solr nodes and zookeepr instances ( after correct
> configuration)
>
> 4. Register solr configurations of ZooKeeper using,
> zkcli.sh -zkhost zoo1.internal:2183,zoo2.internal:2183,zoo3.internal:2183
> -cmd upconfig -confdir /collection1/conf -confname default
>
> 5. Create 3 Shard's and 3 Replicas :
> http://solr1.internal:7003/solr/admin/collections?action=CREATE&name=c-ins&replicationFactor=3&numShards=3&collection.configName=default&maxShardsPerNode=3&wt=json&indent=2
>
>
>After that I can see following folder structure  in Solr node1's
>  directory ( Can see similar structure on my other 2 solr nodes)
> -rw-r--r-- solr.xml drwxrwxr-x c-ins_shard1_replica1 drwxrwxr-x
> c-ins_shard2_replica1 drwxrwxr-x c-ins_shard3_replica1 drwxr-xr-x
> collection1
>
>
> I've done some xml docuemnt indexing and it's working fine, Zoo-keepers are
> also working fine, My Questions are,
>
> 1. Like to know what I have done is correct ?
> 2. Where to place the schema.xml's and other configurations. Because for
> the moment it's are under collection1/conf folder and collection1 is not an
> active collection for me. ( i'm using only c-ins core)
>
>
> Appreciate your time on this.
>
> Thanks - Elike


Re: To understand SolrCloud configurations

2014-12-11 Thread E S J
Thanks Eric, I understand your explanation.
Quick question, Are configurations sits under /configs/defaults because
-configname specified as default when I execute the following command? Can
I specify -configname as /c-ins/

zkcli.sh -zkhost zoo1.internal:2183,zoo2.internal:2183,zoo3.internal:2183
-cmd upconfig -confdir /collection1/conf -confname default

Also I noticed that available options for -configname is default or
schemaless, that is why I specified as default.

Thanks,
Elike

On 12 December 2014 at 14:23, Erick Erickson 
wrote:
>
> bq: 1. Like to know what I have done is correct ?
> Looks fine to me.
>
> bq: 2. Where to place the schema.xml's and other configurations. Because
> for
> the moment it's are under collection1/conf folder and collection1 is not an
> active collection for me. ( i'm using only c-ins core)
>
> I think you're a bit confused here. The configuration stuff is NOT
> "under collection1/conf" as far as SolrCloud is concerned, it's in
> Zookeeper in /configs/default, take a look at the admin>>cloud page,
> click the /configs entry and I think you'll see a "defaults" node.
>
> As far as SolrCloud is concerned, that's where your configs live. The
> fact that they exist in /collection1/conf on your local
> machine is totally irrelevant. Tomorrow, you could issue an upconfig
> and use something like ".-confdir mytotallynewdirectory/conf
> -confname default" and SolrCloud would happily overwrite your configs
> in the Zookeeper "default" node with the new ones, _and_ distribute
> them to all the Solr nodes when they were restarted.
>
> So where your configs "should" live is in some kind of version
> control..
>
> HTH,
> Erick
>
> On Thu, Dec 11, 2014 at 6:19 PM, E S J  wrote:
> > Hello Team,
> >
> > I would like to get clarified where to place schema.xml on SolrCloud
> set-up.
> >
> > My Solr cloud set-up , 3 nodes, 3 shards and 3 replications, 3 ZooKeeper
> >
> > What I have done is,
> > 1. Taken a solr.war from solr default download (
> > solr-4.10.2/example/webapps/solr.war  -  4.10.2) and placed
> >  /webapps/ folder.
> >
> > 2. Taken Solr home from solr default download (
> solr-4.10.2/example/solr/)
> > and placed on solr.home
> > (Copied Collection folder as well along with solr.xml)
> >
> > 3. Started 3 solr nodes and zookeepr instances ( after correct
> > configuration)
> >
> > 4. Register solr configurations of ZooKeeper using,
> > zkcli.sh -zkhost zoo1.internal:2183,zoo2.internal:2183,zoo3.internal:2183
> > -cmd upconfig -confdir /collection1/conf -confname default
> >
> > 5. Create 3 Shard's and 3 Replicas :
> >
> http://solr1.internal:7003/solr/admin/collections?action=CREATE&name=c-ins&replicationFactor=3&numShards=3&collection.configName=default&maxShardsPerNode=3&wt=json&indent=2
> >
> >
> >After that I can see following folder structure  in Solr node1's
> >  directory ( Can see similar structure on my other 2 solr
> nodes)
> > -rw-r--r-- solr.xml drwxrwxr-x c-ins_shard1_replica1 drwxrwxr-x
> > c-ins_shard2_replica1 drwxrwxr-x c-ins_shard3_replica1 drwxr-xr-x
> > collection1
> >
> >
> > I've done some xml docuemnt indexing and it's working fine, Zoo-keepers
> are
> > also working fine, My Questions are,
> >
> > 1. Like to know what I have done is correct ?
> > 2. Where to place the schema.xml's and other configurations. Because for
> > the moment it's are under collection1/conf folder and collection1 is not
> an
> > active collection for me. ( i'm using only c-ins core)
> >
> >
> > Appreciate your time on this.
> >
> > Thanks - Elike
>


Re: Details on why ConccurentUpdateSolrServer is reccommended for maximum index performance

2014-12-11 Thread Shawn Heisey
On 12/11/2014 9:19 AM, Michael Della Bitta wrote:
> Only thing you have to worry about (in both the CUSS and the home grown
> case) is a single bad document in a batch fails the whole batch. It's up
> to you to fall back to writing them individually so the rest of the
> batch makes it in.

With CUSS, your program will never know that the batch failed, so your
code won't know that it must retry documents individually.  All requests
return with an apparent success even before the data is sent to Solr,
and there's no way for exceptions thrown during the background indexing
to be caught by user code.

If your program must know whether your updates were indexed successfully
by catching an exception when there's a problem, you'll need to write
your own multi-threaded indexing application using an instance of
HttpSolrServer.

I filed an issue on this, and built an imperfect patch.  The patch can
only tell you that there was a problem during indexing, it doesn't know
which document or even which batch had the problem.

https://issues.apache.org/jira/browse/SOLR-3284

Thanks,
Shawn



Re: To understand SolrCloud configurations

2014-12-11 Thread Shawn Heisey
On 12/11/2014 6:31 PM, E S J wrote:
> Thanks Eric, I understand your explanation.
> Quick question, Are configurations sits under /configs/defaults because
> -configname specified as default when I execute the following command? Can
> I specify -configname as /c-ins/
> 
> zkcli.sh -zkhost zoo1.internal:2183,zoo2.internal:2183,zoo3.internal:2183
> -cmd upconfig -confdir /collection1/conf -confname default
> 
> Also I noticed that available options for -configname is default or
> schemaless, that is why I specified as default.

The confname can be anything you want it to be.  You should not include
any slash characters in it, though ... make it c-ins, not /c-ins/.

Where do you see information telling you it can be default or
schemaless?  That sounds completely wrong to me, so I'd like to know
what needs to be fixed.

Here's part of what zkcli itself says if you run it with no options:

 -n,--confname  for upconfig, linkconfig: name of the config set

Thanks,
Shawn



Re: To understand SolrCloud configurations

2014-12-11 Thread E S J
Thanks, I thought only option is default or schemaless because , When we
run bin/solr -e cloud you will get prompt like ,

To begin, how many Solr nodes would you like to run in your local cluster?
(specify 1-4 nodes) [2] 3
Ok, let's start up 3 Solr nodes for your example SolrCloud cluster.

Please enter the port for node1 [8983]
8983
Please enter the port for node2 [7574]
7574
Please enter the port for node3 [8984]
8984
Cloning /home/j2ee/solr-4.10.2/example into /home/j2ee/solr-4.10.2/node1
Cloning /home/j2ee/solr-4.10.2/example into /home/j2ee/solr-4.10.2/node2
Cloning /home/j2ee/solr-4.10.2/example into /home/j2ee/solr-4.10.2/node3

Starting up SolrCloud node1 on port 8983 using command:

solr start -cloud -d node1 -p 8983


Waiting to see Solr listening on port 8983 [\]
Started Solr server on port 8983 (pid=29712). Happy searching!


Starting node2 on port 7574 using command:

solr start -cloud -d node2 -p 7574 -z localhost:9983


Waiting to see Solr listening on port 7574 [\]
Started Solr server on port 7574 (pid=29935). Happy searching!


Starting node3 on port 8984 using command:

solr start -cloud -d node3 -p 8984 -z localhost:9983


Waiting to see Solr listening on port 8984 [\]
Started Solr server on port 8984 (pid=30559). Happy searching!

Now let's create a new collection for indexing documents in your 3-node
cluster.

Please provide a name for your new collection: [gettingstarted]
gettingstarted
How many shards would you like to split gettingstarted into? [2] 3
3
How many replicas per shard would you like to create? [2] 3
3
*Please choose a configuration for the gettingstarted collection, available
options are: default or schemaless [default]*



On 12 December 2014 at 16:03, Shawn Heisey  wrote:
>
> On 12/11/2014 6:31 PM, E S J wrote:
> > Thanks Eric, I understand your explanation.
> > Quick question, Are configurations sits under /configs/defaults because
> > -configname specified as default when I execute the following command?
> Can
> > I specify -configname as /c-ins/
> >
> > zkcli.sh -zkhost zoo1.internal:2183,zoo2.internal:2183,zoo3.internal:2183
> > -cmd upconfig -confdir /collection1/conf -confname default
> >
> > Also I noticed that available options for -configname is default or
> > schemaless, that is why I specified as default.
>
> The confname can be anything you want it to be.  You should not include
> any slash characters in it, though ... make it c-ins, not /c-ins/.
>
> Where do you see information telling you it can be default or
> schemaless?  That sounds completely wrong to me, so I'd like to know
> what needs to be fixed.
>
> Here's part of what zkcli itself says if you run it with no options:
>
>  -n,--confname  for upconfig, linkconfig: name of the config set
>
> Thanks,
> Shawn
>
>


Re: To understand SolrCloud configurations

2014-12-11 Thread Shawn Heisey
On 12/11/2014 8:09 PM, E S J wrote:
> Thanks, I thought only option is default or schemaless because , When we
> run bin/solr -e cloud you will get prompt like ,



> *Please choose a configuration for the gettingstarted collection, available
> options are: default or schemaless [default]*

I have almost zero experience with the bin/solr script. It's very very
new and has undergone quite a lot of change in the upcoming 5.0 version
... so I'm waiting for the dust to settle before I try to understand it
and make suggestions about how to improve it.  It doesn't even exist in
the Solr versions that I use.

The "cloud" example for the bin/solr script puts everything on one
phsyical node.  When you do this for real, you are going to want each of
those nodes to be physically separate machines ... hopefully the
bin/solr script will be able to easily accommodate a production
SolrCloud installation.

Thanks,
Shawn



Documents with SOLR function "sort" are NOT sorted by score

2014-12-11 Thread eakarsu

I am having difficulty with my sort function. With the following sort,
documents are not sorted by score if you can see. Why sort function is not
able to sort it properly?
I appreciate your prompt answer


This is my sort function.

sort=map(and(termfreq(CustomersFavourite,852708),exists($exactqq)),1,1,1,0)
desc,map(and(termfreq(CustomersPurchased,852708),exists($exactqq)),1,1,1,0)
desc,map(exists($exactqq),1,1,NumberOfClicks,0)
desc,map(exists($exactqq),1,1,Amount,0)
desc,map(and(termfreq(InPromotion_925,true),exists($exactqq)),1,1,1,0)
desc,map(exists($exactqq),1,1,OrderCount,0) desc
exactqq={!edismax}(ProductModelNameExact:xyz OR ProductModelName_TR:xyz)

But as you can results are not sorted by score

"response": {
"numFound": 139,
"start": 0,
"maxScore": 0.6251737,
"docs": [
  {
"score": 0.28109676
  },
  {
"score": 0.25829598
  },
  {
"score": 0.36092186
  },
  {
"score": 0.6251737
  },
  {
"score": 0.1379621
  },
  {
"score": 0.14090014
  },
  {
"score": 0.1379621
  },
  {
"score": 0.14090014
  },
  {
"score": 0.50190175
  },
  {
"score": 0.1379621
  },
  {
"score": 0.12398934
  },
  {
"score": 0.1379621
  },
  {
"score": 0.12398934
  },
  {
"score": 0.4989637
  },
  {
"score": 0.12585841
  },
  {
"score": 0.12585841
  },



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Documents-with-SOLR-function-sort-are-NOT-sorted-by-score-tp4173928.html
Sent from the Solr - User mailing list archive at Nabble.com.


Join in SOLR

2014-12-11 Thread Rajesh
I'm using Solr 4.10. While importing through DIH, I've configured 3 separate
entities. I'm facing some problems for indexing and retrieval.

1) How can I give the unique key, as the 3 entities will have different
fields.
2) Is there a join query, from which I can join all the 3 tables.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Join-in-SOLR-tp4173930.html
Sent from the Solr - User mailing list archive at Nabble.com.


[Hep] tab delimited gz file indexing steps

2014-12-11 Thread Sithik
Team,
I have a compressed text file (gz) which holds tab delimited data. Is it
possible for me to index this file directly without doing any pre
processing of uncompressing the file on my own? if so, can you please tell
me the steps/config changes I am supposed to follow.

BTW, I am using solr-4.10.

Thanks in advance

-Sithik


Re: Join in SOLR

2014-12-11 Thread Tomoko Uchida
Hi,

I cannot guess what is 'entities' in your context, but do you want some
kind of join functionality like RDBs on Solr?
Basically, Solr is not "relational". So at first, you should consider
denormalize your RDB tables to one table/view (or issue SQL JOIN query in
DIH) to import data to Solr.

If you *really* need join like RDB on Solr, you could consult with Solr's
join feature. (That makes your system more complicated.)
https://wiki.apache.org/solr/Join

Regards,
Tomoko

2014-12-12 14:36 GMT+09:00 Rajesh :
>
> I'm using Solr 4.10. While importing through DIH, I've configured 3
> separate
> entities. I'm facing some problems for indexing and retrieval.
>
> 1) How can I give the unique key, as the 3 entities will have different
> fields.
> 2) Is there a join query, from which I can join all the 3 tables.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Join-in-SOLR-tp4173930.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>