date:20141211

Tomas,

You have a difficult use case. You seem to have a speech recognition
domain and you want to be able to search that transcribed text with
reference back to timing. It's an interesting problem, but not an easy
one. Certainly not something one can give you the answer all at once.

The issue here is representation of that text. You want it both
per-word (so you have timing) and as a flowing text (so you could find
it). And then, you also have problems how to express it from the PHP
client.

But here are things you need to think about:
1) Do you have groups in your word sequence. You say find "how are
you" but what about "there ah how" which would be still together in
the stream but is the end of one sentence and start of another. If you
do want to find any sequence of consequent words, you need to index
them together and you end up with one very long document. If not, you
need to decide how you are going to break your continuous text into
groups (based on SILENCE, timing, or something else)

2) Then you have the association of multi-word sequence to time. You
say "Good morning to you" is at 5.25, but that's not possible as each
word has it's own duration. Does it mean the word Good was 5.25? Can
they find "Morning to you" and will it still return 5.25? or 5.28?
This design decision will affect how you index it.

3) And what happens if the matched text happens twice like "Chao" -
hello and "Chao" - goodbye. If you want two separate documents
returned, this implies two documents in Solr. So, that goes hand in
hand with (1) above.

4) Then you have a whole highlighting issue, which I am not even going
to start on, except that the text being highlighted needs to be in one
field, so that has impact too.

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853

On 11 December 2014 at 03:33, tomas.kalas  wrote:
> Thanks for help, but how wrote Alex, I used synonm filter and it is what i
> want. When i wrote to synonym for example Hello, Hi. And sentence is Hello
> how are you and my query is Hi how are you, so that find it too.
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Design-optimal-Solr-Schema-tp4166632p4173690.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Highlighting integer field

2014-12-11 Thread Tomoko Uchida

Hi Pawel,

Essentially, highlighting is a feature to show "fragments of documents"
that matche user queries.
With that, he/she can find occurrence of their query in long documents and
can understand their results well.

For tint or tlong fields (or other non-text field types), "fragments"
usually have no meaning.

So, excuse me, I cannot understand your intent.
If you specify your need a little bit more, I or other fellows may be able
to help you.

Regards,
Tomoko

2014-12-11 19:12 GMT+09:00 Pawel Rog :

> Hi,
> Is it possible to highlight int (TrieLongField) or long (TrieLongField)
> field in Solr?
>
> --
> Paweł
>

Re: Design optimal Solr Schema

2014-12-11 Thread tomas.kalas

Oh no, i want to answered to this topic, where you help me with the synonym
filter:

http://lucene.472066.n3.nabble.com/Alternative-searching-td4172339.html

but i was opened this topic too and i checking my answer in google
translator and copy it here.

Now, i have a edit task, i do not have to search to specific time, but only
in phrase, but with alternatives. Synonym filter is good idea, but if i have
at specific word in more cases more altenatives, thats it the problem what i
now dealing. I asked in this topic:
http://lucene.472066.n3.nabble.com/Alternative-synonymum-td4173694.html

Sorry for chaos.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Design-optimal-Solr-Schema-tp4166632p4173748.html
Sent from the Solr - User mailing list archive at Nabble.com.

Is it possible in Solr to have document field value, based on context during query time, by request parameter ?

2014-12-11 Thread Nenko Ivanov


The Use Case:

Very large and sharded index with articles with different categorization 
fields, pre populated with algorithmic estimated values (simple type, 
mostly Integer values). The index is accessed from multiple “clients” 
and each client can override article property based on his context, for 
example the sentiment score for specific article.


If article X sentiment is overridden by Client A, its value persist in 
permanent storage for article X  with both its context value, alongside 
with the default algorithmic value.


When Client A queries index, the document value of article X for 
sentiment  has to match his overridden value in filter queries or in 
facet counts.


Client B sees default estimated value for sentiment for article X.

Currently the simplest solution is to duplicate content for each client, 
but that is not an option because of the index scale.



Some background:

The above effect was partly achieved few years ago for experimental 
purposes based on that tutorial - 
http://sujitpal.blogspot.com/2011/05/custom-sorting-in-solr-using-external.html



--
Nenko Ivanov

Re: Length norm not functioning in solr queries.

2014-12-11 Thread S.L

Yes, I understand that reindexing is neccesary , however for some reason I
was not able to invoke the js script from the updateprocessor, so I ended
up using Java only solution at index time.

Thanks.

On Thu, Dec 11, 2014 at 7:18 AM, Ahmet Arslan 
wrote:
>
> Hi,
>
> No special steps to be taken for cloud setup. Please note that for both
> solutions, re-index is mandatory.
>
> Ahmet
>
>
>
> On Thursday, December 11, 2014 12:15 PM, S.L 
> wrote:
> Ahmet,
>
> Thank you , as the configurations in SolrCloud are uploaded to zookeeper ,
> are there any special steps that need to be taken to make this work in
> SolrCloud ?
>
>
> On Wed, Dec 10, 2014 at 4:32 AM, Ahmet Arslan 
> wrote:
> >
> > Hi,
> >
> > Or even better, you can use your new field for tie break purposes. Where
> > scores are identical.
> > e.g. sort=score desc, wordCount asc
> >
> > Ahmet
> >
> >
> > On Wednesday, December 10, 2014 11:29 AM, Ahmet Arslan <
> iori...@yahoo.com>
> > wrote:
> > Hi,
> >
> > You mean update processor factory?
> >
> > Here is augmented (wordCount field added) version of your example :
> >
> > doc1:
> >
> > phoneName:"Details about  Apple iPhone 4s - 16GB - White (Verizon)
> > Smartphone Factory Unlocked"
> > wordCount: 11
> >
> > doc2:
> >
> > phoneName:"Apple iPhone 4S 16GB for Net10, No Contract, White"
> > wordCount: 9
> >
> >
> > First task is simply calculate wordCount values. You can do it in your
> > indexing code, or other places.
> > I quickly skimmed existing update processors but I couldn't find stock
> > implementation.
> > CountFieldValuesUpdateProcessorFactory fooled me, but it looks like it is
> > all about multivalued fields.
> >
> > I guess, A simple javascript that splits on whitespace and returns the
> > produced array size would do the trick :
> > StatelessScriptUpdateProcessorFactory
> >
> >
> >
> > At this point you have a int field named word count.
> > boost=div(1,wordCount) should work. Or you can came up with more
> > sophisticated math formula.
> >
> > Ahmet
> >
> >
> > On Wednesday, December 10, 2014 11:12 AM, S.L  >
> > wrote:
> > Hi Ahmet,
> >
> > Is there already an implementation of the suggested work around ? Thanks.
> >
> >
> > On Tue, Dec 9, 2014 at 6:41 AM, Ahmet Arslan 
> > wrote:
> >
> > > Hi,
> > >
> > > Default length norm is not best option for differentiating very short
> > > documents, like product names.
> > > Please see :
> > > http://find.searchhub.org/document/b3f776512ab640ec#b3f776512ab640ec
> > >
> > > I suggest you to create an additional integer field, that holds number
> of
> > > tokens. You can populate it via update processor. And then penalise
> > (using
> > > fuction queries) according to that field. This way you have more fine
> > > grained and flexible control over it.
> > >
> > > Ahmet
> > >
> > >
> > >
> > > On Tuesday, December 9, 2014 12:22 PM, S.L 
> > > wrote:
> > > Hi ,
> > >
> > > Mikhail Thanks , I looked at the explain and this is what I see for the
> > two
> > > different documents in questions, they have identical scores   even
> > though
> > > the document 2 has a shorter productName field, I do not see any
> > lenghtNorm
> > > related information in the explain.
> > >
> > > Also I am not exactly clear on what needs to be looked in the API ?
> > >
> > > *Search Query* : q=iphone+4s+16gb&qf= productName&mm=1&pf=
> > > productName&ps=1&pf2= productName&pf3=
> > > productName&stopwords=true&lowercaseOperators=true
> > >
> > > *productName Details about Apple iPhone 4s 16GB Smartphone AT&T Factory
> > > Unlocked *
> > >
> > >
> > >- *100%* 10.649221 sum of the following:
> > >   - *10.58%* 1.1270299 sum of the following:
> > >  - *2.1%* 0.22383358 productName:iphon
> > >  - *3.47%* 0.36922288 productName:"4 s"
> > >  - *5.01%* 0.53397346 productName:"16 gb"
> > >   - *30.81%* 3.2814684 productName:"iphon 4 s 16 gb"~1
> > >   - *27.79%* 2.959255 sum of the following:
> > >  - *10.97%* 1.1680154 productName:"iphon 4 s"~1
> > >  - *16.82%* 1.7912396 productName:"4 s 16 gb"~1
> > >   - *30.81%* 3.2814684 productName:"iphon 4 s 16 gb"~1
> > >
> > >
> > > *productName Apple iPhone 4S 16GB for Net10, No Contract, White*
> > >
> > >
> > >- *100%* 10.649221 sum of the following:
> > >   - *10.58%* 1.1270299 sum of the following:
> > >  - *2.1%* 0.22383358 productName:iphon
> > >  - *3.47%* 0.36922288 productName:"4 s"
> > >  - *5.01%* 0.53397346 productName:"16 gb"
> > >   - *30.81%* 3.2814684 productName:"iphon 4 s 16 gb"~1
> > >   - *27.79%* 2.959255 sum of the following:
> > >  - *10.97%* 1.1680154 productName:"iphon 4 s"~1
> > >  - *16.82%* 1.7912396 productName:"4 s 16 gb"~1
> > >   - *30.81%* 3.2814684 productName:"iphon 4 s 16 gb"~1
> > >
> > >
> > >
> > >
> > >
> > > On Mon, Dec 8, 2014 at 10:25 AM, Mikhail Khludnev <
> > > mkhlud...@griddynamics.com> wrote:
> > >
> > > > It's worth to look into  to check particular scoring values.
> > B

Re: Design optimal Solr Schema

Ok. Make sure to post in the right topics. People get super confused
when the conversation thread changes.

Maybe ignore this last couple of messages and post the new one as
appropriate (separate or in another thread). That way the right people
will see it.

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853

On 11 December 2014 at 09:16, tomas.kalas  wrote:
> Oh no, i want to answered to this topic, where you help me with the synonym
> filter:
>
> http://lucene.472066.n3.nabble.com/Alternative-searching-td4172339.html
>
> but i was opened this topic too and i checking my answer in google
> translator and copy it here.
>
> Now, i have a edit task, i do not have to search to specific time, but only
> in phrase, but with alternatives. Synonym filter is good idea, but if i have
> at specific word in more cases more altenatives, thats it the problem what i
> now dealing. I asked in this topic:
> http://lucene.472066.n3.nabble.com/Alternative-synonymum-td4173694.html
>
> Sorry for chaos.
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Design-optimal-Solr-Schema-tp4166632p4173748.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Is it possible in Solr to have document field value, based on context during query time, by request parameter ?

So, what did not work for you with the External File Field approach?
What is the next gap you are trying to close?

You seem to be aware of the possible extension points for Solr, so you
are not looking for just a pointer to custom search components or
whatever.

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 11 December 2014 at 09:20, Nenko Ivanov  wrote:
> The Use Case:
>
> Very large and sharded index with articles with different categorization
> fields, pre populated with algorithmic estimated values (simple type, mostly
> Integer values). The index is accessed from multiple “clients” and each
> client can override article property based on his context, for example the
> sentiment score for specific article.
>
> If article X sentiment is overridden by Client A, its value persist in
> permanent storage for article X  with both its context value, alongside with
> the default algorithmic value.
>
> When Client A queries index, the document value of article X for sentiment
> has to match his overridden value in filter queries or in facet counts.
>
> Client B sees default estimated value for sentiment for article X.
>
> Currently the simplest solution is to duplicate content for each client, but
> that is not an option because of the index scale.
>
>
> Some background:
>
> The above effect was partly achieved few years ago for experimental purposes
> based on that tutorial -
> http://sujitpal.blogspot.com/2011/05/custom-sorting-in-solr-using-external.html
>
>
> --
> Nenko Ivanov

RE: WordBreakSolrSpellChecker Usage

2014-12-11 Thread Dyer, James

My first guess here, is seeing it works some of the time but not others, is 
that these values are too low:

5
5 

You know spellcheck.count is too low if the suggestion you want is not in the 
"suggestions" part of the response, but increasing it makes it get included.

You know that spellcheck.maxCollationTries is too low if it exists in 
"suggestions" but it is not getting suggested in the "collation" section.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Matt Mongeau [mailto:halogenandto...@gmail.com] 
Sent: Wednesday, December 10, 2014 12:43 PM
To: solr-user@lucene.apache.org
Subject: Fwd: WordBreakSolrSpellChecker Usage

If I have my search component setup like this
https://gist.github.com/halogenandtoast/cf9f296d01527080f18c and I have an
entry for “Rockpoint” shouldn’t “Rock point” generate suggestions?

This doesn't seem to be the case, but it works for "Blackstone" with "Black
stone". Any ideas on what I might be doing wrong?

Re: Details on why ConccurentUpdateSolrServer is reccommended for maximum index performance

2014-12-11 Thread Tom Burton-West

Thanks Eric,

That is helpful.  We already have a process that works similarly.  Each
thread/process that sends a document to Solr waits until it gets a response
in order to make sure that the document was indexed successfully (we log
errors and retry docs that don't get indexed successfully), however we run
20-100 of these processes,depending on  throughput (i.e. we send documents
to Solr for indexing as fast as we can until they start queuing up on the
Solr end.)

Is there a way to use CUSS with XML documents?

ie my second question:
> A related question, is how to use ConcurrentUpdateSolrServer with XML
> documents
>
> I have very large XML documents, and the examples I see all build
documents
> by adding fields in Java code.  Is there an example that actually reads
XML
> files from the file system?

Tom

Re: Details on why ConccurentUpdateSolrServer is reccommended for maximum index performance

2014-12-11 Thread Erick Erickson

I don't think so, it uses SolrInputDocuments and
lists thereof. So if you parse the xml and then
put things in SolrInputDocuments..

Or something like that.

Erick

On Thu, Dec 11, 2014 at 9:43 AM, Tom Burton-West  wrote:
> Thanks Eric,
>
> That is helpful.  We already have a process that works similarly.  Each
> thread/process that sends a document to Solr waits until it gets a response
> in order to make sure that the document was indexed successfully (we log
> errors and retry docs that don't get indexed successfully), however we run
> 20-100 of these processes,depending on  throughput (i.e. we send documents
> to Solr for indexing as fast as we can until they start queuing up on the
> Solr end.)
>
> Is there a way to use CUSS with XML documents?
>
> ie my second question:
>> A related question, is how to use ConcurrentUpdateSolrServer with XML
>> documents
>>
>> I have very large XML documents, and the examples I see all build
> documents
>> by adding fields in Java code.  Is there an example that actually reads
> XML
>> files from the file system?
>
> Tom

Re: Details on why ConccurentUpdateSolrServer is reccommended for maximum index performance

2014-12-11 Thread Michael Della Bitta

Tom:

ConcurrentUpdateSolrServer isn't magic or anything. You could pretty
trivially write something that takes batches of your XML documents and
combines them into a single document (multiple tags in the
section) and sends them up to Solr and achieve some of the same speed
benefits.

If you use it, the JavaBin-based serialization in CUSS is lighter as a
wire format, though:
http://lucene.apache.org/solr/4_10_2/solr-solrj/org/apache/solr/client/solrj/impl/BinaryRequestWriter.html

Only thing you have to worry about (in both the CUSS and the home grown
case) is a single bad document in a batch fails the whole batch. It's up
to you to fall back to writing them individually so the rest of the
batch makes it in.

Michael

On 12/11/14 11:04, Erick Erickson wrote:

I don't think so, it uses SolrInputDocuments and
lists thereof. So if you parse the xml and then
put things in SolrInputDocuments..

Or something like that.

Erick

On Thu, Dec 11, 2014 at 9:43 AM, Tom Burton-West wrote:

Thanks Eric,

That is helpful. We already have a process that works similarly. Each
thread/process that sends a document to Solr waits until it gets a response
in order to make sure that the document was indexed successfully (we log
errors and retry docs that don't get indexed successfully), however we run
20-100 of these processes,depending on throughput (i.e. we send documents
to Solr for indexing as fast as we can until they start queuing up on the
Solr end.)

Is there a way to use CUSS with XML documents?

ie my second question:

A related question, is how to use ConcurrentUpdateSolrServer with XML
documents

I have very large XML documents, and the examples I see all build

documents

by adding fields in Java code. Is there an example that actually reads

XML

files from the file system?

Tom

Inconsistent doc value across two nodes - very simple test - what's the expected behavior?

2014-12-11 Thread Gili Nachum

I know Solr CAP properties are CP, but I don't see it happening over a very
basic test - doing something wrong?

With two Solr nodes, I index doc1 to both, stop node2, update doc1, stop
node1, start node2, start node1, and I get two different versions of the
doc depending on which replica I query.
I would expect node2 to update to itself.
Attaching Solr logs from both nodes.

*Config*
Solr 4.7.2 / Jetty.
SoldCloud on two nodes, and  3 ZK, all running in localhost.
single collection: single shard with two replicas.

*Reproducing:*
start node1 9.148.58.114:8983
start node2 9.148.58.114:8984
Cluster state: node1 leader. node2 active.

index value 'A' (id="change me").
query and expect 'A' -> success

Stop node2
Cluster state: node1 leader. node2 gone.
query and expect 'A' -> success

Update document value from 'A'->'B'
query and expect 'B' -> success

Stop node1
then
Start node2.
Cluster state: node1 gone. node2 down.

*104510 [coreZkRegister-1-thread-1] INFO
org.apache.solr.cloud.ShardLeaderElectionContext Waiting until we see more
replicas up for shard shard1: total=2 found=1 timeoutin=5.27665925E14ms*

wait 3m.

*184679 [coreZkRegister-1-thread-1] INFO
org.apache.solr.cloud.ShardLeaderElectionContext  I am the new leader:
http://9.148.58.114:8984/solr/quick-results-collection_shard1_replica2/

shard1*
Cluster state: node1 gone. node2 leader.

query and expect 'A' (old value) -> success

start node1
Cluster state: node1 actove. node2 leader.

*Inconsistency: *
*Querying node1 always returns 'B'. *
http://localhost:8983/solr/quick-results-collection_shard1_replica1/select?q=*%3A*&wt=json&indent=true
*Querying node1 always returns 'A'. *
http://localhost:8984/solr/quick-results-collection_shard1_replica2/select?q=*%3A*&wt=json&indent=true

Re: Details on why ConccurentUpdateSolrServer is reccommended for maximum index performance

2014-12-11 Thread Mikhail Khludnev

Agree with Erick.

However, I suppose you can try to provide your own RequestWriter, and let
it stream XML. btw, what's in them? How Solr handles them right now? Why
don't you want to start from the test?

On Thu, Dec 11, 2014 at 7:04 PM, Erick Erickson 
wrote:

> I don't think so, it uses SolrInputDocuments and
> lists thereof. So if you parse the xml and then
> put things in SolrInputDocuments..
>
> Or something like that.
>
> Erick
>
> On Thu, Dec 11, 2014 at 9:43 AM, Tom Burton-West 
> wrote:
> > Thanks Eric,
> >
> > That is helpful.  We already have a process that works similarly.  Each
> > thread/process that sends a document to Solr waits until it gets a
> response
> > in order to make sure that the document was indexed successfully (we log
> > errors and retry docs that don't get indexed successfully), however we
> run
> > 20-100 of these processes,depending on  throughput (i.e. we send
> documents
> > to Solr for indexing as fast as we can until they start queuing up on the
> > Solr end.)
> >
> > Is there a way to use CUSS with XML documents?
> >
> > ie my second question:
> >> A related question, is how to use ConcurrentUpdateSolrServer with XML
> >> documents
> >>
> >> I have very large XML documents, and the examples I see all build
> > documents
> >> by adding fields in Java code.  Is there an example that actually reads
> > XML
> >> files from the file system?
> >
> > Tom
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

Re: Details on why ConccurentUpdateSolrServer is reccommended for maximum index performance

2014-12-11 Thread Yonik Seeley

On Wed, Dec 10, 2014 at 6:09 PM, Erick Erickson  wrote:
> So CUSS will do something like this:
> 1> assemble a packet for Solr
> 2> pass off the actual transmission
>  to Solr to a thread and immediately
>  go back to <1>.
>
> Basically, CUSS is doing async processing.

The more important part about what it's doing is the *streaming*.
CUSS is like batching documents without waiting for all of the
documents in the batch.
When you add a document, it immediately writes it to a stream where
solr can read it off and index it.  When you add a second document,
it's immediately written to the same stream (or at least one of the
open streams), as part of the same udpate request.  No separate HTTP
request, No separate update request.

The number of threads parameter for CUSS actually maps to the number
of open connections to Solr (and hence the number of concurrently
streaming update requests).

So to Solr (server side), it looks like a single update request
(assuming 1 thread) with a batch of multiple documents... but it was
never actually "batched" on the client side.

-Yonik

Help with a Join Query

2014-12-11 Thread Darin Amos

Hello,

I am trying to execute a join query that I am not 100% sure how to execute. 
Lets say I have a bunch of parent and child documents and every one of my child 
documents has a single value field “color”. 

If I want to search all parents that have a “red” child, tis is very easy:

{!join from=parent to=id}color:red

However, if I want to return only parents that have both a red AND a blue item 
it gets tricky. 

This query would return parents that have red OR blue
{!join from=parent to=id}color:red OR color:blue

And this query would return nothing since no child had both colors.
{!join from=parent to=id}color:red AND color:blue

Any suggestions? My thinking is I might require some kind of custom query.

Thanks!

Darin

Re: Details on why ConccurentUpdateSolrServer is reccommended for maximum index performance

On 11 December 2014 at 11:40, Yonik Seeley  wrote:
> So to Solr (server side), it looks like a single update request
> (assuming 1 thread) with a batch of multiple documents... but it was
> never actually "batched" on the client side.

Does Solr also indexes them one-by-one as it parses them off the -
chunked -  stream? Or does it wait for the end of the "batch"?

Regards,
   Alex.

Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853

Re: Details on why ConccurentUpdateSolrServer is reccommended for maximum index performance

2014-12-11 Thread Yonik Seeley

On Thu, Dec 11, 2014 at 11:52 AM, Alexandre Rafalovitch
 wrote:
> On 11 December 2014 at 11:40, Yonik Seeley  wrote:
>> So to Solr (server side), it looks like a single update request
>> (assuming 1 thread) with a batch of multiple documents... but it was
>> never actually "batched" on the client side.
>
> Does Solr also indexes them one-by-one as it parses them off the -
> chunked -  stream?

Yes, indexing is streaming (a document at a time is read off the
stream and then immediately indexed).

-Yonik

Re: WordBreakSolrSpellChecker Usage

2014-12-11 Thread Matt Mongeau

Is there a suggested value for this. I bumped them up to 20 and still
nothing has seemed to change.

On Thu, Dec 11, 2014 at 9:42 AM, Dyer, James 
wrote:

> My first guess here, is seeing it works some of the time but not others,
> is that these values are too low:
>
> 5
> 5
>
> You know spellcheck.count is too low if the suggestion you want is not in
> the "suggestions" part of the response, but increasing it makes it get
> included.
>
> You know that spellcheck.maxCollationTries is too low if it exists in
> "suggestions" but it is not getting suggested in the "collation" section.
>
> James Dyer
> Ingram Content Group
> (615) 213-4311
>
>
> -Original Message-
> From: Matt Mongeau [mailto:halogenandto...@gmail.com]
> Sent: Wednesday, December 10, 2014 12:43 PM
> To: solr-user@lucene.apache.org
> Subject: Fwd: WordBreakSolrSpellChecker Usage
>
> If I have my search component setup like this
> https://gist.github.com/halogenandtoast/cf9f296d01527080f18c and I have an
> entry for “Rockpoint” shouldn’t “Rock point” generate suggestions?
>
> This doesn't seem to be the case, but it works for "Blackstone" with "Black
> stone". Any ideas on what I might be doing wrong?
>

Re: Solr Error when making GeoPrefixTree polygon filter search

2014-12-11 Thread mathaix

Thank you. That was the issue. 
Is am running solr with Jetty. Is there are recommended way for including
those jars in the jetty configuration?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Error-when-making-GeoPrefixTree-polygon-filter-search-tp4173629p4173807.html
Sent from the Solr - User mailing list archive at Nabble.com.

Have anyone used Automatic Phrase Tokenization (AutoPhrasingTokenFilterFactory) ?

2014-12-11 Thread shamik

Hi,

I'm trying to use AutoPhrasingTokenFilterFactory which seems to be a
great solution to our phrase query issues. But doesn't seem to work as
mentioned in the blog :

https://lucidworks.com/blog/automatic-phrase-tokenization-improving-lucene-search-precision-by-more-precise-linguistic-analysis/

The tokenizer is working as expected during query time, where it's
preserving the phrases as a single token based on the text file. Here's my
field definition :

On analyzing, I can see the phrase "seat cushions" (defined in
autophrases.txt) is being indexed as "seat", "seat cushions" and "cushion".

The problem is during the query time. As per the blog, the request handler
needs to use a custom query parser to achieve the result. Here's my entry
in solrconfig.

velocity
browse
layout
Solritas

explicit
10
text
autophrasingParser

autophrases.txt

But if I query "seat cushions" using this request handler, it's seemed to
be treating the query as two separate terms and returning all results
matching "seat" and "cushion". Not sure what I'm missing here. I'm using
Solr 4.10.

The other question I had is whether
"com.lucidworks.analysis.AutoPhrasingQParserPlugin" supports the edismax
features which is my default parser.

I'll appreciate if anyone provide their feedback.

-Thanks
Shamik

--
View this message in context:
http://lucene.472066.n3.nabble.com/Have-anyone-used-Automatic-Phrase-Tokenization-AutoPhrasingTokenFilterFactory-tp4173808.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Help with a Join Query

2014-12-11 Thread Kydryavtsev Andrey

How about something like 

({!join from=parent to=id}color:red) AND ({!join from=parent to=id}color:blue) ?

11.12.2014, 19:48, "Darin Amos" :
> Hello,
>
> I am trying to execute a join query that I am not 100% sure how to execute. 
> Lets say I have a bunch of parent and child documents and every one of my 
> child documents has a single value field “color”.
>
> If I want to search all parents that have a “red” child, tis is very easy:
>
> {!join from=parent to=id}color:red
>
> However, if I want to return only parents that have both a red AND a blue 
> item it gets tricky.
>
> This query would return parents that have red OR blue
> {!join from=parent to=id}color:red OR color:blue
>
> And this query would return nothing since no child had both colors.
> {!join from=parent to=id}color:red AND color:blue
>
> Any suggestions? My thinking is I might require some kind of custom query.
>
> Thanks!
>
> Darin

RE: WordBreakSolrSpellChecker Usage

2014-12-11 Thread Dyer, James

Matt,

There is no exact number here, but I would think most people would want "count" 
to be maybe 10-20.  Increasing this incurs a very small performance penalty for 
each term it generates suggestions for, but you probably won't notice a 
difference.  For "maxCollationTries", 5 is a reasonable number but you might 
see improved collations if this is also perhaps 10.  With this one, you get a 
much larger performance penalty, but only when it need to try more combinations 
to return the "maxCollations".  In your case you have this at 5 also, right?  I 
would reduce this to the maximum number of re-written queries your application 
or users is actually going to use.  In a lot of cases, 1 is the right number 
here.  This would improve performance for you in some cases.

Possibly the reason “Rock point” > “Rockpoint” is failing is because you have 
"maxChanges" set to 10.  This tells it you are willing for it to break a word 
into 10 separate parts, or to combine up to 10 adjacent words into 1.  Having 
taken a quick glance at the code, I think what is happening is it is trying 
things like "r ock p oint" and "r o ck p o int", etc and never getting to your 
intended result.  In a typical scenario I would set "maxChanges" to 1-3, and 
often 1 is probably the most appropriate value here.

James Dyer
Ingram Content Group
(615) 213-4311

-Original Message-
From: Matt Mongeau [mailto:halogenandto...@gmail.com] 
Sent: Thursday, December 11, 2014 11:34 AM
To: solr-user@lucene.apache.org
Subject: Re: WordBreakSolrSpellChecker Usage

Is there a suggested value for this. I bumped them up to 20 and still
nothing has seemed to change.

On Thu, Dec 11, 2014 at 9:42 AM, Dyer, James 
wrote:

> My first guess here, is seeing it works some of the time but not others,
> is that these values are too low:
>
> 5
> 5
>
> You know spellcheck.count is too low if the suggestion you want is not in
> the "suggestions" part of the response, but increasing it makes it get
> included.
>
> You know that spellcheck.maxCollationTries is too low if it exists in
> "suggestions" but it is not getting suggested in the "collation" section.
>
> James Dyer
> Ingram Content Group
> (615) 213-4311
>
>
> -Original Message-
> From: Matt Mongeau [mailto:halogenandto...@gmail.com]
> Sent: Wednesday, December 10, 2014 12:43 PM
> To: solr-user@lucene.apache.org
> Subject: Fwd: WordBreakSolrSpellChecker Usage
>
> If I have my search component setup like this
> https://gist.github.com/halogenandtoast/cf9f296d01527080f18c and I have an
> entry for “Rockpoint” shouldn’t “Rock point” generate suggestions?
>
> This doesn't seem to be the case, but it works for "Blackstone" with "Black
> stone". Any ideas on what I might be doing wrong?
>

Re: Help with a Join Query

2014-12-11 Thread Darin Amos

Thanks,

That looks like a viable option, I could do something like the following:

q={!join from=parent to=id}
&fq={!join from=parent to=id}color:red
&fq={!join from=parent to=id}color:blue

With all these joins happening like this, what kind of performance concern is 
this? I would guess this would start to cause a lot of work.

Thanks

Darin



> On Dec 11, 2014, at 1:04 PM, Kydryavtsev Andrey  wrote:
> 
> How about something like 
> 
> ({!join from=parent to=id}color:red) AND ({!join from=parent 
> to=id}color:blue) ?
> 
> 11.12.2014, 19:48, "Darin Amos" :
>> Hello,
>> 
>> I am trying to execute a join query that I am not 100% sure how to execute. 
>> Lets say I have a bunch of parent and child documents and every one of my 
>> child documents has a single value field “color”.
>> 
>> If I want to search all parents that have a “red” child, tis is very easy:
>> 
>> {!join from=parent to=id}color:red
>> 
>> However, if I want to return only parents that have both a red AND a blue 
>> item it gets tricky.
>> 
>> This query would return parents that have red OR blue
>> {!join from=parent to=id}color:red OR color:blue
>> 
>> And this query would return nothing since no child had both colors.
>> {!join from=parent to=id}color:red AND color:blue
>> 
>> Any suggestions? My thinking is I might require some kind of custom query.
>> 
>> Thanks!
>> 
>> Darin

Re: Help with a Join Query

2014-12-11 Thread Kydryavtsev Andrey



11.12.2014, 21:24, "Darin Amos" :
> Thanks,
>
> That looks like a viable option, I could do something like the following:
>
> q={!join from=parent to=id}
> &fq={!join from=parent to=id}color:red
> &fq={!join from=parent to=id}color:blue
>
> With all these joins happening like this, what kind of performance concern is 
> this? I would guess this would start to cause a lot of work.
>
> Thanks
>
> Darin
>>  On Dec 11, 2014, at 1:04 PM, Kydryavtsev Andrey  wrote:
>>
>>  How about something like
>>
>>  ({!join from=parent to=id}color:red) AND ({!join from=parent 
>> to=id}color:blue) ?
>>
>>  11.12.2014, 19:48, "Darin Amos" :
>>>  Hello,
>>>
>>>  I am trying to execute a join query that I am not 100% sure how to 
>>> execute. Lets say I have a bunch of parent and child documents and every 
>>> one of my child documents has a single value field “color”.
>>>
>>>  If I want to search all parents that have a “red” child, tis is very easy:
>>>
>>>  {!join from=parent to=id}color:red
>>>
>>>  However, if I want to return only parents that have both a red AND a blue 
>>> item it gets tricky.
>>>
>>>  This query would return parents that have red OR blue
>>>  {!join from=parent to=id}color:red OR color:blue
>>>
>>>  And this query would return nothing since no child had both colors.
>>>  {!join from=parent to=id}color:red AND color:blue
>>>
>>>  Any suggestions? My thinking is I might require some kind of custom query.
>>>
>>>  Thanks!
>>>
>>>  Darin

Re: Help with a Join Query

2014-12-11 Thread Kydryavtsev Andrey

According to may experience, "query time join" has relatively poor performance. 
If you can cache this joins effectively (not so many unique color values in 
requests, cache doesn't invalidate) - it's ok. If not, it may be interesting to 
try "block join" instead - 
http://blog.griddynamics.com/2013/09/solr-block-join-support.html

11.12.2014, 21:40, "Kydryavtsev Andrey" :
> 11.12.2014, 21:24, "Darin Amos" :
>>  Thanks,
>>
>>  That looks like a viable option, I could do something like the following:
>>
>>  q={!join from=parent to=id}
>>  &fq={!join from=parent to=id}color:red
>>  &fq={!join from=parent to=id}color:blue
>>
>>  With all these joins happening like this, what kind of performance concern 
>> is this? I would guess this would start to cause a lot of work.
>>
>>  Thanks
>>
>>  Darin
>>>   On Dec 11, 2014, at 1:04 PM, Kydryavtsev Andrey  
>>> wrote:
>>>
>>>   How about something like
>>>
>>>   ({!join from=parent to=id}color:red) AND ({!join from=parent 
>>> to=id}color:blue) ?
>>>
>>>   11.12.2014, 19:48, "Darin Amos" :
   Hello,

   I am trying to execute a join query that I am not 100% sure how to 
 execute. Lets say I have a bunch of parent and child documents and every 
 one of my child documents has a single value field “color”.

   If I want to search all parents that have a “red” child, tis is very 
 easy:

   {!join from=parent to=id}color:red

   However, if I want to return only parents that have both a red AND a 
 blue item it gets tricky.

   This query would return parents that have red OR blue
   {!join from=parent to=id}color:red OR color:blue

   And this query would return nothing since no child had both colors.
   {!join from=parent to=id}color:red AND color:blue

   Any suggestions? My thinking is I might require some kind of custom 
 query.

   Thanks!

   Darin

Re: Inconsistent doc value across two nodes - very simple test - what's the expected behavior?

2014-12-11 Thread Shalin Shekhar Mangar

Hi Gili,

Great question!

A write in Solr, by default, is only guaranteed to exist in 1 place i.e.
the leader and the safety valves that we have to preserve these writes are:

1. The leaderVoteWait time for which leader election is suspended until
enough live replicas are available
2. The two-way peer-sync between leader candidate and other replicas

The other safety valve is on the client side with the "min_rf" parameter
introduced by SOLR-5468 in Solr 4.9. If you set this param to 2 while
making the request then Solr will return the number of replicas to which it
could successfully send the update. Then depending on the response you can
make a decision to retry the update at a later time assuming it is
idempotent. This kinda puts the onus ensuring consistency on the client
side which is not ideal but better than nothing. See SOLR-5468 for more
discussion on this topic.

In your particular example, none of these safeties are invoked because you
start node2 while node1 was down and node2 goes ahead with leader election
after the wait period. Also since node1 was down during leader election,
peer sync doesn't happen and then node2 becomes the leader.

When node1 comes back online and joins as a replica, it recovers from the
leader using peer-sync (which returns the newest 100 updates) and finds
that there's nothing newer on the leader. However, there are no checks to
make sure that the replica doesn't have a newer update itself which is why
you end up with the inconsistent replica. If there were a lot of updates on
node2 (more than 100) while node1 was down, in which case peer-sync isn't
applicable, then it'd would have done a replication recovery and this
inconsistency would have been resolved.

So yeah we have a valid consistency bug such that we have inconsistent
replicas in a steady state. I wonder if the right way is to bump min_rf to
a higher value or peer-sync both ways during replica recovery. I'll need to
think more on this.

On Thu, Dec 11, 2014 at 4:21 PM, Gili Nachum  wrote:

> I know Solr CAP properties are CP, but I don't see it happening over a very
> basic test - doing something wrong?
>
> With two Solr nodes, I index doc1 to both, stop node2, update doc1, stop
> node1, start node2, start node1, and I get two different versions of the
> doc depending on which replica I query.
> I would expect node2 to update to itself.
> Attaching Solr logs from both nodes.
>
> *Config*
> Solr 4.7.2 / Jetty.
> SoldCloud on two nodes, and  3 ZK, all running in localhost.
> single collection: single shard with two replicas.
>
> *Reproducing:*
> start node1 9.148.58.114:8983
> start node2 9.148.58.114:8984
> Cluster state: node1 leader. node2 active.
>
> index value 'A' (id="change me").
> query and expect 'A' -> success
>
> Stop node2
> Cluster state: node1 leader. node2 gone.
> query and expect 'A' -> success
>
> Update document value from 'A'->'B'
> query and expect 'B' -> success
>
> Stop node1
> then
> Start node2.
> Cluster state: node1 gone. node2 down.
>
> *104510 [coreZkRegister-1-thread-1] INFO
> org.apache.solr.cloud.ShardLeaderElectionContext Waiting until we see more
> replicas up for shard shard1: total=2 found=1 timeoutin=5.27665925E14ms*
>
> wait 3m.
>
> *184679 [coreZkRegister-1-thread-1] INFO
> org.apache.solr.cloud.ShardLeaderElectionContext  I am the new leader:
> http://9.148.58.114:8984/solr/quick-results-collection_shard1_replica2/
> 
> shard1*
> Cluster state: node1 gone. node2 leader.
>
> query and expect 'A' (old value) -> success
>
> start node1
> Cluster state: node1 actove. node2 leader.
>
> *Inconsistency: *
> *Querying node1 always returns 'B'. *
>
> http://localhost:8983/solr/quick-results-collection_shard1_replica1/select?q=*%3A*&wt=json&indent=true
> *Querying node1 always returns 'A'. *
>
> http://localhost:8984/solr/quick-results-collection_shard1_replica2/select?q=*%3A*&wt=json&indent=true
>

-- 
Regards,
Shalin Shekhar Mangar.

Re: Solr 4.10.2 "Found core" but I get "No cores available" in dashboard page

my apologies for the lack of clarity

our internal name for the project to upgrade solr from 4.0 to 4.10.2 is
"helios" and so we named our test folder "heliosearch". I was not even
aware of the github project Heliosearch, and nothing we are doing is related
to it.

to simplify things for this post, we simplified things so that we have one
solr instance but two cores; coreX contains the collection1 files/folders
as per the downloaded solr 4.10.2 package, while coreA uses the same
collection1 files/folders but with schema.xml and solrconfig.xml changes to
meet our needs

so file and foldername-wise, here is what we did:

1. C:\SOLR\solr-4.10.2.zip\solr-4.10.2\example renamed to
C:\SOLR\helios-4.10.2\Master
2. renamed example\solr\collection1 to example\solr\coreX; no files modified
here
3. copied example\solr\coreX to example\solr\coreA
4. modified the coreA schema to match our current production schema; ie our
field names, etc
5. modified the coreA solrconfig.xml to meet our needs (see below)

here are the solrconfig.xml changes we made to coreA

1.
2. 4
3. false
4. false
5. commented out autoCommit section
6. commented out autoSoftCommit section
7. commented out the section
8. 4
9.
10. contains geocluster
11. commented out these sections:

here are the schema.xml changes we made to our copy of the downloaded solr
4.10.2 package (aside from replacing the example fields provided in the
downloaded solr 4.10.2):

1.
2. removed the example fields provided in the downloaded solr 4.10.2
3. delete various types we dont use in our current schemas
4. added fieldtypes that are in our current solr 4.0 instances
5. added various fieldtypes that are in our current solr 4.0 instances
6. readded the "text" field as apparently required:

also note that we are using java "1.7.0_67" and jetty-8.1.10.v20130312

all in all, I dont see anything that we have done that would keep the cores
from being discovered.

hope that helps.

--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-4-10-2-Found-core-but-I-get-No-cores-available-in-dashboard-page-tp4173602p4173831.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 4.10.2 "Found core" but I get "No cores available" in dashboard page

2014-12-11 Thread EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)

small correction;  coreX (the one with the unmodified schema.xml and
solrconfig.xml) IS seen by solr and appears on the solr admin page, but
coreA (which has our modified schema and solrconfig) is found by solr but is
not shown in the solr admin page:

1494 [main] INFO  org.apache.solr.core.CoresLocator  û Looking for core
definitions underneath C:\SOLR\helios-4.10.2\Master\solr
1502 [main] INFO  org.apache.solr.core.CoresLocator  û Found core coreA in
C:\SOLR\helios-4.10.2\Master\solr\coreA\
1502 [main] INFO  org.apache.solr.core.CoresLocator  û Found core coreX in
C:\SOLR\helios-4.10.2\Master\solr\coreX\
1503 [main] INFO  org.apache.solr.core.CoresLocator  û Found 2 core
definitions





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-10-2-Found-core-but-I-get-No-cores-available-in-dashboard-page-tp4173602p4173832.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Help with a Join Query

May be you can try using AND condition in the single join something like 

q={!join from=parent to=id}(Id:xxx AND (Color:red OR Color:Blue)), I don't 
think this will give bigger performance issue.

Thanks

Ravi

-Original Message-
From: Darin Amos [mailto:dari...@gmail.com] 
Sent: Thursday, December 11, 2014 1:23 PM
To: solr-user@lucene.apache.org
Subject: Re: Help with a Join Query

Thanks,

That looks like a viable option, I could do something like the following:

q={!join from=parent to=id} &fq={!join from=parent 
to=id}color:red &fq={!join from=parent to=id}color:blue

With all these joins happening like this, what kind of performance concern is 
this? I would guess this would start to cause a lot of work.

Thanks

Darin



> On Dec 11, 2014, at 1:04 PM, Kydryavtsev Andrey  wrote:
> 
> How about something like
> 
> ({!join from=parent to=id}color:red) AND ({!join from=parent 
> to=id}color:blue) ?
> 
> 11.12.2014, 19:48, "Darin Amos" :
>> Hello,
>> 
>> I am trying to execute a join query that I am not 100% sure how to execute. 
>> Lets say I have a bunch of parent and child documents and every one of my 
>> child documents has a single value field “color”.
>> 
>> If I want to search all parents that have a “red” child, tis is very easy:
>> 
>> {!join from=parent to=id}color:red
>> 
>> However, if I want to return only parents that have both a red AND a blue 
>> item it gets tricky.
>> 
>> This query would return parents that have red OR blue {!join 
>> from=parent to=id}color:red OR color:blue
>> 
>> And this query would return nothing since no child had both colors.
>> {!join from=parent to=id}color:red AND color:blue
>> 
>> Any suggestions? My thinking is I might require some kind of custom query.
>> 
>> Thanks!
>> 
>> Darin

Re: Solr 4.10.2 "Found core" but I get "No cores available" in dashboard page

And the XML is valid, lib references in solrconfig.xml point to the
right libraries (if any), you don't have duplicate definitions of
types, you don't have missing definitions of types? And you didn't
disable the admin handler?

And it's not just admin that's failing to find the core, right? If you
use command line to ping it for a basic search, do you get anything?

I am really grasping at the straws here. You seem to be very organized
with that and any errors (for the stuff I mentioned above) SHOULD be
clear and visible. I'd start bisecting the problem:
1) Admin and/or command-line problem
2) Is filesystem monitoring during the start (
http://technet.microsoft.com/en-us/sysinternals/bb896645 ) shows any
unexpected filesystem access
3) Can you cut your changes in half (e.g. not remove anything) and
still see the problem

Regards,
   Alex.
P.s. When you do figure it out, let us know. Just for the future
troubleshooting generations.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853

On 11 December 2014 at 14:14, solr-user  wrote:
> small correction;  coreX (the one with the unmodified schema.xml and
> solrconfig.xml) IS seen by solr and appears on the solr admin page, but
> coreA (which has our modified schema and solrconfig) is found by solr but is
> not shown in the solr admin page:
>
> 1494 [main] INFO  org.apache.solr.core.CoresLocator  û Looking for core
> definitions underneath C:\SOLR\helios-4.10.2\Master\solr
> 1502 [main] INFO  org.apache.solr.core.CoresLocator  û Found core coreA in
> C:\SOLR\helios-4.10.2\Master\solr\coreA\
> 1502 [main] INFO  org.apache.solr.core.CoresLocator  û Found core coreX in
> C:\SOLR\helios-4.10.2\Master\solr\coreX\
> 1503 [main] INFO  org.apache.solr.core.CoresLocator  û Found 2 core
> definitions
>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-4-10-2-Found-core-but-I-get-No-cores-available-in-dashboard-page-tp4173602p4173832.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 4.10.2 "Found core" but I get "No cores available" in dashboard page

2014-12-11 Thread Chris Hostetter

: can you please include the *exact* solrconfig.xml & schema.xml you are 
: using for coreA ... you've given us an overview of what you changed, but 
: that's not enough for anyone to actally try and reproduce your problem.

if it helps (since hte list doesn't allow attachments) feel free to open a 
bug in jira, and attach a zip file of your entire solr home dir showing 
hte problem.


-Hoss
http://www.lucidworks.com/

Re: Solr 4.10.2 "Found core" but I get "No cores available" in dashboard page

2014-12-11 Thread Chris Hostetter


: coreA (which has our modified schema and solrconfig) is found by solr but is
: not shown in the solr admin page:

can you please include the *exact* solrconfig.xml & schema.xml you are 
using for coreA ... you've given us an overview of what you changed, but 
that's not enough for anyone to actally try and reproduce your problem.

if we can't reproduce it, it's impossible to diagnose it and offer 
suggestions/workarround/fix...

https://wiki.apache.org/solr/UsingMailingLists


-Hoss
http://www.lucidworks.com/

Re: Solr 4.10.2 "Found core" but I get "No cores available" in dashboard page

yes, have triple checked the schema and solrconfig XML; various tools have
indicated the XML is valid

no missing types or dupes, and have not disabled the admin handler

as mentioned in my most recent response, I can see the coreX core (the
renamed and unmodified collection1 core from the downloaded package) and
query it with no issues, but coreA (whch has our specific schema and
solrconfig changes) is not showing in the admin interface and cannot be
queried (I get a 404)

both cores are located in the same solr folder.

appreciate the suggestions; looks like I will need to gradually move my
schema and core changes towards the collection1 content and see where things
start working; will take a while...sigh

will let you know what I find out.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-10-2-Found-core-but-I-get-No-cores-available-in-dashboard-page-tp4173602p4173839.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 4.10.2 "Found core" but I get "No cores available" in dashboard page

Chris, will get the schema and solrconfig ready for uploading.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-10-2-Found-core-but-I-get-No-cores-available-in-dashboard-page-tp4173602p4173840.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Inconsistent doc value across two nodes - very simple test - what's the expected behavior?

2014-12-11 Thread Shalin Shekhar Mangar

I opened https://issues.apache.org/jira/browse/SOLR-6837

Probably best to have further conversations on the Jira issue.

On Thu, Dec 11, 2014 at 6:46 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> Hi Gili,
>
> Great question!
>
> A write in Solr, by default, is only guaranteed to exist in 1 place i.e.
> the leader and the safety valves that we have to preserve these writes are:
>
> 1. The leaderVoteWait time for which leader election is suspended until
> enough live replicas are available
> 2. The two-way peer-sync between leader candidate and other replicas
>
> The other safety valve is on the client side with the "min_rf" parameter
> introduced by SOLR-5468 in Solr 4.9. If you set this param to 2 while
> making the request then Solr will return the number of replicas to which it
> could successfully send the update. Then depending on the response you can
> make a decision to retry the update at a later time assuming it is
> idempotent. This kinda puts the onus ensuring consistency on the client
> side which is not ideal but better than nothing. See SOLR-5468 for more
> discussion on this topic.
>
> In your particular example, none of these safeties are invoked because you
> start node2 while node1 was down and node2 goes ahead with leader election
> after the wait period. Also since node1 was down during leader election,
> peer sync doesn't happen and then node2 becomes the leader.
>
> When node1 comes back online and joins as a replica, it recovers from the
> leader using peer-sync (which returns the newest 100 updates) and finds
> that there's nothing newer on the leader. However, there are no checks to
> make sure that the replica doesn't have a newer update itself which is why
> you end up with the inconsistent replica. If there were a lot of updates on
> node2 (more than 100) while node1 was down, in which case peer-sync isn't
> applicable, then it'd would have done a replication recovery and this
> inconsistency would have been resolved.
>
> So yeah we have a valid consistency bug such that we have inconsistent
> replicas in a steady state. I wonder if the right way is to bump min_rf to
> a higher value or peer-sync both ways during replica recovery. I'll need to
> think more on this.
>
>
> On Thu, Dec 11, 2014 at 4:21 PM, Gili Nachum  wrote:
>
>> I know Solr CAP properties are CP, but I don't see it happening over a
>> very
>> basic test - doing something wrong?
>>
>> With two Solr nodes, I index doc1 to both, stop node2, update doc1, stop
>> node1, start node2, start node1, and I get two different versions of the
>> doc depending on which replica I query.
>> I would expect node2 to update to itself.
>> Attaching Solr logs from both nodes.
>>
>> *Config*
>> Solr 4.7.2 / Jetty.
>> SoldCloud on two nodes, and  3 ZK, all running in localhost.
>> single collection: single shard with two replicas.
>>
>> *Reproducing:*
>> start node1 9.148.58.114:8983
>> start node2 9.148.58.114:8984
>> Cluster state: node1 leader. node2 active.
>>
>> index value 'A' (id="change me").
>> query and expect 'A' -> success
>>
>> Stop node2
>> Cluster state: node1 leader. node2 gone.
>> query and expect 'A' -> success
>>
>> Update document value from 'A'->'B'
>> query and expect 'B' -> success
>>
>> Stop node1
>> then
>> Start node2.
>> Cluster state: node1 gone. node2 down.
>>
>> *104510 [coreZkRegister-1-thread-1] INFO
>> org.apache.solr.cloud.ShardLeaderElectionContext Waiting until we see more
>> replicas up for shard shard1: total=2 found=1 timeoutin=5.27665925E14ms*
>>
>> wait 3m.
>>
>> *184679 [coreZkRegister-1-thread-1] INFO
>> org.apache.solr.cloud.ShardLeaderElectionContext  I am the new leader:
>> http://9.148.58.114:8984/solr/quick-results-collection_shard1_replica2/
>> 
>> shard1*
>> Cluster state: node1 gone. node2 leader.
>>
>> query and expect 'A' (old value) -> success
>>
>> start node1
>> Cluster state: node1 actove. node2 leader.
>>
>> *Inconsistency: *
>> *Querying node1 always returns 'B'. *
>>
>> http://localhost:8983/solr/quick-results-collection_shard1_replica1/select?q=*%3A*&wt=json&indent=true
>> *Querying node1 always returns 'A'. *
>>
>> http://localhost:8984/solr/quick-results-collection_shard1_replica2/select?q=*%3A*&wt=json&indent=true
>>
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>



-- 
Regards,
Shalin Shekhar Mangar.

Re: Solr Error when making GeoPrefixTree polygon filter search

2014-12-11 Thread david.w.smi...@gmail.com

As in the layout shipped with Solr?  Try putting the JTS ‘jar’ in lib/ext
and let us know if that worked.  I think it will but I forget.

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley

On Thu, Dec 11, 2014 at 12:40 PM, mathaix  wrote:
>
> Thank you. That was the issue.
> Is am running solr with Jetty. Is there are recommended way for including
> those jars in the jetty configuration?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-Error-when-making-GeoPrefixTree-polygon-filter-search-tp4173629p4173807.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

different fields for user-supplied phrases in edismax

2014-12-11 Thread Michael Sokolov

I'd like to supply a different set of fields for phrases than for bare 
terms.  Specifically, we'd like to treat phrases as more "exact" - 
probably turning off stemming and generally having a tighter analysis 
chain.  Note: this is *not* what's done by configuring "pf" which 
controls fields for the auto-generated phrases.  What we want to do is 
provide our users more precise control by explicit use of " "


Is there a way to do this by configuring edismax?  I don't think there 
is, and then if you agree, a followup question - if I want to extend the 
EDismax parser, does anybody have advice as to the best way in?  I'm 
looking at:


Query getFieldQuery(String field, String val, int slop)

and altering getAliasedQuery() to accept an aliases parameter, which 
would be a different set of aliases for phrases ...


does that make sense?

-Mike

Re: Solr 4.10.2 "Found core" but I get "No cores available" in dashboard page

2014-12-11 Thread Michael Sokolov

Have you rebooted the machine? (last refuge of the clueless, but often 
works) ...


On 12/11/14 2:50 PM, solr-user wrote:

yes, have triple checked the schema and solrconfig XML; various tools have
indicated the XML is valid

no missing types or dupes, and have not disabled the admin handler

as mentioned in my most recent response, I can see the coreX core (the
renamed and unmodified collection1 core from the downloaded package) and
query it with no issues, but coreA (whch has our specific schema and
solrconfig changes) is not showing in the admin interface and cannot be
queried (I get a 404)

both cores are located in the same solr folder.

appreciate the suggestions; looks like I will need to gradually move my
schema and core changes towards the collection1 content and see where things
start working; will take a while...sigh

will let you know what I find out.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-10-2-Found-core-but-I-get-No-cores-available-in-dashboard-page-tp4173602p4173839.html
Sent from the Solr - User mailing list archive at Nabble.com.

Mutli Lengual Suggester Solr 4.8

2014-12-11 Thread alaa.abuzaghleh

I am trying create suggester handler using solr 4.8, everything work fine but
when I try to get suggestion using different language Arabic, or Japanese
for example I got result in mixed language, but I am trying to search only
using Japanese, I got Arabic with that too. the following is my Schema.xml 















































id

























































































































































 






and this is my SolrConfig 






4.8



${solr.core0.data.dir:}






${solr.core0.data.dir:}






true





 
   explicit
   10
   id
   




explicit
edismax
10
full_name,job_tree, company, city, 
state, country,
first_name, last_name, id
full_name_suggest^60 
full_name_ngram^100.0 job_suggest^30
job_ngram^50.0 
full_name_edge^100.0 job_edge^50.0
true
full_name
   

f

Re: different fields for user-supplied phrases in edismax

2014-12-11 Thread Ahmet Arslan

Hi Mike,

If I am not wrong, you are trying to simulate google behaviour.
If you use quotes, google return exact matches. I think that makes perfectly 
sense and will be a valuable addition. I remember some folks asked/requested 
this behaviour in the list.

Ahmet



On Thursday, December 11, 2014 10:50 PM, Michael Sokolov 
 wrote:
I'd like to supply a different set of fields for phrases than for bare 
terms.  Specifically, we'd like to treat phrases as more "exact" - 
probably turning off stemming and generally having a tighter analysis 
chain.  Note: this is *not* what's done by configuring "pf" which 
controls fields for the auto-generated phrases.  What we want to do is 
provide our users more precise control by explicit use of " "

Is there a way to do this by configuring edismax?  I don't think there 
is, and then if you agree, a followup question - if I want to extend the 
EDismax parser, does anybody have advice as to the best way in?  I'm 
looking at:

Query getFieldQuery(String field, String val, int slop)

and altering getAliasedQuery() to accept an aliases parameter, which 
would be a different set of aliases for phrases ...

does that make sense?

-Mike

Re: different fields for user-supplied phrases in edismax

2014-12-11 Thread alaa.abuzaghleh



explicit
edismax
10
full_name,job_tree, company, city, 
state, country,
first_name, last_name, id
full_name_suggest^60 
full_name_ngram^100.0 job_suggest^30
job_ngram^50.0 
full_name_edge^100.0 job_edge^50.0
true
full_name
   

full_name asc
full_name asc



the configuration above let me search firstly by name, if there is no result
it will go and make the search by job hopefully this help you, I would like
to let you know that it does not work great if you have Japanese, Arabic, or
Chinese. 

this is result for query to search for user whose name is alaa or work as
developer
http://localhost:9090/solr/people/suggest?q=alaa%20developer&wt=json&indent=true

{
  "responseHeader":{
"status":0,
"QTime":27,
"params":{
  "indent":"true",
  "q":"alaa developer",
  "wt":"json"}},
  "grouped":{
"full_name":{
  "matches":4,
  "groups":[{
  "groupValue":"alaa",
  "doclist":{"numFound":1,"start":0,"docs":[
  {
"job_tree":"CTO(Chief Technology Officer) ",
"last_name":"Abuzaghleh",
"state":"California",
"country":"United States",
"city":"North Hollywood",
"id":"a2757538-9f16-42d8-907a-199c11787d09",
"company":"letspeer.com",
"full_name":"Alaa Abuzaghleh",
"first_name":"Alaa"}]
  }},
{
  "groupValue":"user1",
  "doclist":{"numFound":1,"start":0,"docs":[
  {
"job_tree":"Web Developer",
"last_name":"user1",
"state":"Amman",
"country":"Jordan",
"city":"Aljameaa",
"id":"78bd8079-666f-4e09-ab4f-aed796040c93",
"company":"BT-AT",
"full_name":"user1 user1",
"first_name":"user1"}]
  }},
{
  "groupValue":"user4",
  "doclist":{"numFound":1,"start":0,"docs":[
  {
"job_tree":"Mobile App Developer",
"last_name":"user4",
"state":"",
"country":"",
"city":"",
"id":"9e50c5b1-49cc-444a-a752-8b8ebe04b6f6",
"company":"Apple ",
"full_name":"user4 user4",
"first_name":"user4"}]
  }},
{
  "groupValue":"z3ra",
  "doclist":{"numFound":1,"start":0,"docs":[
  {
"job_tree":"",
"last_name":"z3ra",
"state":"",
"country":"",
"city":"",
"id":"2a82735d-cce0-400e-826b-b78f6bb56115",
"company":"",
"full_name":"usAlaa z3ra",
"first_name":"usAlaa"}]
  }}]}}}

you can go to the multi lengaul issue in same place where you put your issue
and look for schema configuration. 






--
View this message in context: 
http://lucene.472066.n3.nabble.com/different-fields-for-user-supplied-phrases-in-edismax-tp4173862p4173886.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Highlighting integer field

2014-12-11 Thread Pawel

Hi,
Thanks for response. It is quite important to me for example to highlight
multivalued field with many int or long tokens.

--
Paweł

On Thu, Dec 11, 2014 at 3:08 PM, Tomoko Uchida  wrote:
>
> Hi Pawel,
>
> Essentially, highlighting is a feature to show "fragments of documents"
> that matche user queries.
> With that, he/she can find occurrence of their query in long documents and
> can understand their results well.
>
> For tint or tlong fields (or other non-text field types), "fragments"
> usually have no meaning.
>
> So, excuse me, I cannot understand your intent.
> If you specify your need a little bit more, I or other fellows may be able
> to help you.
>
> Regards,
> Tomoko
>
> 2014-12-11 19:12 GMT+09:00 Pawel Rog :
>
> > Hi,
> > Is it possible to highlight int (TrieLongField) or long (TrieLongField)
> > field in Solr?
> >
> > --
> > Paweł
> >
>

Re: Highlighting integer field

2014-12-11 Thread Michael Sokolov

So the short answer to your original question is "no." Highlighting is 
designed to find matches *within* a tokenized (text) field only.  That 
is difficult because text gets processed and there are all sorts of 
complications, but for integers it should be pretty easy to match the 
values in the document and those in the query in the client, ie without 
help from Solr?


-Mike

On 12/11/14 6:19 PM, Pawel wrote:

Hi,
Thanks for response. It is quite important to me for example to highlight
multivalued field with many int or long tokens.

--
Paweł

On Thu, Dec 11, 2014 at 3:08 PM, Tomoko Uchida 
wrote:

Hi Pawel,

Essentially, highlighting is a feature to show "fragments of documents"
that matche user queries.
With that, he/she can find occurrence of their query in long documents and
can understand their results well.

For tint or tlong fields (or other non-text field types), "fragments"
usually have no meaning.

So, excuse me, I cannot understand your intent.
If you specify your need a little bit more, I or other fellows may be able
to help you.

Regards,
Tomoko

2014-12-11 19:12 GMT+09:00 Pawel Rog :


Hi,
Is it possible to highlight int (TrieLongField) or long (TrieLongField)
field in Solr?

--
Paweł

To understand SolrCloud configurations

Hello Team,

I would like to get clarified where to place schema.xml on SolrCloud set-up.

My Solr cloud set-up , 3 nodes, 3 shards and 3 replications, 3 ZooKeeper

What I have done is,
1. Taken a solr.war from solr default download (
solr-4.10.2/example/webapps/solr.war  -  4.10.2) and placed
 /webapps/ folder.

2. Taken Solr home from solr default download ( solr-4.10.2/example/solr/)
and placed on solr.home
(Copied Collection folder as well along with solr.xml)

3. Started 3 solr nodes and zookeepr instances ( after correct
configuration)

4. Register solr configurations of ZooKeeper using,
zkcli.sh -zkhost zoo1.internal:2183,zoo2.internal:2183,zoo3.internal:2183
-cmd upconfig -confdir /collection1/conf -confname default

5. Create 3 Shard's and 3 Replicas :
http://solr1.internal:7003/solr/admin/collections?action=CREATE&name=c-ins&replicationFactor=3&numShards=3&collection.configName=default&maxShardsPerNode=3&wt=json&indent=2


   After that I can see following folder structure  in Solr node1's
 directory ( Can see similar structure on my other 2 solr nodes)
-rw-r--r-- solr.xml drwxrwxr-x c-ins_shard1_replica1 drwxrwxr-x
c-ins_shard2_replica1 drwxrwxr-x c-ins_shard3_replica1 drwxr-xr-x
collection1


I've done some xml docuemnt indexing and it's working fine, Zoo-keepers are
also working fine, My Questions are,

1. Like to know what I have done is correct ?
2. Where to place the schema.xml's and other configurations. Because for
the moment it's are under collection1/conf folder and collection1 is not an
active collection for me. ( i'm using only c-ins core)


Appreciate your time on this.

Thanks - Elike

To understand SolrCloud configurations

Hello Team,

I would like to get clarified where to place schema.xml on SolrCloud set-up.

My Solr cloud set-up , 3 nodes, 3 shards and 3 replications, 3 ZooKeeper

What I have done is,
1. Taken a solr.war from solr default download (
solr-4.10.2/example/webapps/solr.war  -  4.10.2) and placed
 /webapps/ folder.

2. Taken Solr home from solr default download ( solr-4.10.2/example/solr/)
and placed on solr.home
(Copied Collection folder as well along with solr.xml)

3. Started 3 solr nodes and zookeepr instances ( after correct
configuration)

4. Register solr configurations of ZooKeeper using,
zkcli.sh -zkhost zoo1.internal:2183,zoo2.internal:2183,zoo3.internal:2183
-cmd upconfig -confdir /collection1/conf -confname default

5. Create 3 Shard's and 3 Replicas :
http://solr1.internal:7003/solr/admin/collections?action=CREATE&name=c-ins&replicationFactor=3&numShards=3&collection.configName=default&maxShardsPerNode=3&wt=json&indent=2


   After that I can see following folder structure  in Solr node1's
 directory ( Can see similar structure on my other 2 solr nodes)
-rw-r--r-- solr.xml drwxrwxr-x c-ins_shard1_replica1 drwxrwxr-x
c-ins_shard2_replica1 drwxrwxr-x c-ins_shard3_replica1 drwxr-xr-x
collection1


I've done some xml docuemnt indexing and it's working fine, Zoo-keepers are
also working fine, My Questions are,

1. Like to know what I have done is correct ?
2. Where to place the schema.xml's and other configurations. Because for
the moment it's are under collection1/conf folder and collection1 is not an
active collection for me. ( i'm using only c-ins core)


Appreciate your time on this.

Thanks - Elike

Browse interface

2014-12-11 Thread tharpa

Is it possible to boost a query using the browse interface?  How would one do
this?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Browse-interface-tp4173897.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: To understand SolrCloud configurations

2014-12-11 Thread Erick Erickson

bq: 1. Like to know what I have done is correct ?
Looks fine to me.

bq: 2. Where to place the schema.xml's and other configurations. Because for
the moment it's are under collection1/conf folder and collection1 is not an
active collection for me. ( i'm using only c-ins core)

I think you're a bit confused here. The configuration stuff is NOT
"under collection1/conf" as far as SolrCloud is concerned, it's in
Zookeeper in /configs/default, take a look at the admin>>cloud page,
click the /configs entry and I think you'll see a "defaults" node.

As far as SolrCloud is concerned, that's where your configs live. The
fact that they exist in /collection1/conf on your local
machine is totally irrelevant. Tomorrow, you could issue an upconfig
and use something like ".-confdir mytotallynewdirectory/conf
-confname default" and SolrCloud would happily overwrite your configs
in the Zookeeper "default" node with the new ones, _and_ distribute
them to all the Solr nodes when they were restarted.

So where your configs "should" live is in some kind of version control..

HTH,
Erick

On Thu, Dec 11, 2014 at 6:19 PM, E S J  wrote:
> Hello Team,
>
> I would like to get clarified where to place schema.xml on SolrCloud set-up.
>
> My Solr cloud set-up , 3 nodes, 3 shards and 3 replications, 3 ZooKeeper
>
> What I have done is,
> 1. Taken a solr.war from solr default download (
> solr-4.10.2/example/webapps/solr.war  -  4.10.2) and placed
>  /webapps/ folder.
>
> 2. Taken Solr home from solr default download ( solr-4.10.2/example/solr/)
> and placed on solr.home
> (Copied Collection folder as well along with solr.xml)
>
> 3. Started 3 solr nodes and zookeepr instances ( after correct
> configuration)
>
> 4. Register solr configurations of ZooKeeper using,
> zkcli.sh -zkhost zoo1.internal:2183,zoo2.internal:2183,zoo3.internal:2183
> -cmd upconfig -confdir /collection1/conf -confname default
>
> 5. Create 3 Shard's and 3 Replicas :
> http://solr1.internal:7003/solr/admin/collections?action=CREATE&name=c-ins&replicationFactor=3&numShards=3&collection.configName=default&maxShardsPerNode=3&wt=json&indent=2
>
>
>After that I can see following folder structure  in Solr node1's
>  directory ( Can see similar structure on my other 2 solr nodes)
> -rw-r--r-- solr.xml drwxrwxr-x c-ins_shard1_replica1 drwxrwxr-x
> c-ins_shard2_replica1 drwxrwxr-x c-ins_shard3_replica1 drwxr-xr-x
> collection1
>
>
> I've done some xml docuemnt indexing and it's working fine, Zoo-keepers are
> also working fine, My Questions are,
>
> 1. Like to know what I have done is correct ?
> 2. Where to place the schema.xml's and other configurations. Because for
> the moment it's are under collection1/conf folder and collection1 is not an
> active collection for me. ( i'm using only c-ins core)
>
>
> Appreciate your time on this.
>
> Thanks - Elike

Re: To understand SolrCloud configurations

Thanks Eric, I understand your explanation.
Quick question, Are configurations sits under /configs/defaults because
-configname specified as default when I execute the following command? Can
I specify -configname as /c-ins/

zkcli.sh -zkhost zoo1.internal:2183,zoo2.internal:2183,zoo3.internal:2183
-cmd upconfig -confdir /collection1/conf -confname default

Also I noticed that available options for -configname is default or
schemaless, that is why I specified as default.

Thanks,
Elike

On 12 December 2014 at 14:23, Erick Erickson 
wrote:
>
> bq: 1. Like to know what I have done is correct ?
> Looks fine to me.
>
> bq: 2. Where to place the schema.xml's and other configurations. Because
> for
> the moment it's are under collection1/conf folder and collection1 is not an
> active collection for me. ( i'm using only c-ins core)
>
> I think you're a bit confused here. The configuration stuff is NOT
> "under collection1/conf" as far as SolrCloud is concerned, it's in
> Zookeeper in /configs/default, take a look at the admin>>cloud page,
> click the /configs entry and I think you'll see a "defaults" node.
>
> As far as SolrCloud is concerned, that's where your configs live. The
> fact that they exist in /collection1/conf on your local
> machine is totally irrelevant. Tomorrow, you could issue an upconfig
> and use something like ".-confdir mytotallynewdirectory/conf
> -confname default" and SolrCloud would happily overwrite your configs
> in the Zookeeper "default" node with the new ones, _and_ distribute
> them to all the Solr nodes when they were restarted.
>
> So where your configs "should" live is in some kind of version
> control..
>
> HTH,
> Erick
>
> On Thu, Dec 11, 2014 at 6:19 PM, E S J  wrote:
> > Hello Team,
> >
> > I would like to get clarified where to place schema.xml on SolrCloud
> set-up.
> >
> > My Solr cloud set-up , 3 nodes, 3 shards and 3 replications, 3 ZooKeeper
> >
> > What I have done is,
> > 1. Taken a solr.war from solr default download (
> > solr-4.10.2/example/webapps/solr.war  -  4.10.2) and placed
> >  /webapps/ folder.
> >
> > 2. Taken Solr home from solr default download (
> solr-4.10.2/example/solr/)
> > and placed on solr.home
> > (Copied Collection folder as well along with solr.xml)
> >
> > 3. Started 3 solr nodes and zookeepr instances ( after correct
> > configuration)
> >
> > 4. Register solr configurations of ZooKeeper using,
> > zkcli.sh -zkhost zoo1.internal:2183,zoo2.internal:2183,zoo3.internal:2183
> > -cmd upconfig -confdir /collection1/conf -confname default
> >
> > 5. Create 3 Shard's and 3 Replicas :
> >
> http://solr1.internal:7003/solr/admin/collections?action=CREATE&name=c-ins&replicationFactor=3&numShards=3&collection.configName=default&maxShardsPerNode=3&wt=json&indent=2
> >
> >
> >After that I can see following folder structure  in Solr node1's
> >  directory ( Can see similar structure on my other 2 solr
> nodes)
> > -rw-r--r-- solr.xml drwxrwxr-x c-ins_shard1_replica1 drwxrwxr-x
> > c-ins_shard2_replica1 drwxrwxr-x c-ins_shard3_replica1 drwxr-xr-x
> > collection1
> >
> >
> > I've done some xml docuemnt indexing and it's working fine, Zoo-keepers
> are
> > also working fine, My Questions are,
> >
> > 1. Like to know what I have done is correct ?
> > 2. Where to place the schema.xml's and other configurations. Because for
> > the moment it's are under collection1/conf folder and collection1 is not
> an
> > active collection for me. ( i'm using only c-ins core)
> >
> >
> > Appreciate your time on this.
> >
> > Thanks - Elike
>

Re: Details on why ConccurentUpdateSolrServer is reccommended for maximum index performance

2014-12-11 Thread Shawn Heisey

On 12/11/2014 9:19 AM, Michael Della Bitta wrote:
> Only thing you have to worry about (in both the CUSS and the home grown
> case) is a single bad document in a batch fails the whole batch. It's up
> to you to fall back to writing them individually so the rest of the
> batch makes it in.

With CUSS, your program will never know that the batch failed, so your
code won't know that it must retry documents individually.  All requests
return with an apparent success even before the data is sent to Solr,
and there's no way for exceptions thrown during the background indexing
to be caught by user code.

If your program must know whether your updates were indexed successfully
by catching an exception when there's a problem, you'll need to write
your own multi-threaded indexing application using an instance of
HttpSolrServer.

I filed an issue on this, and built an imperfect patch.  The patch can
only tell you that there was a problem during indexing, it doesn't know
which document or even which batch had the problem.

https://issues.apache.org/jira/browse/SOLR-3284

Thanks,
Shawn

Re: To understand SolrCloud configurations

2014-12-11 Thread Shawn Heisey

On 12/11/2014 6:31 PM, E S J wrote:
> Thanks Eric, I understand your explanation.
> Quick question, Are configurations sits under /configs/defaults because
> -configname specified as default when I execute the following command? Can
> I specify -configname as /c-ins/
> 
> zkcli.sh -zkhost zoo1.internal:2183,zoo2.internal:2183,zoo3.internal:2183
> -cmd upconfig -confdir /collection1/conf -confname default
> 
> Also I noticed that available options for -configname is default or
> schemaless, that is why I specified as default.

The confname can be anything you want it to be.  You should not include
any slash characters in it, though ... make it c-ins, not /c-ins/.

Where do you see information telling you it can be default or
schemaless?  That sounds completely wrong to me, so I'd like to know
what needs to be fixed.

Here's part of what zkcli itself says if you run it with no options:

 -n,--confname  for upconfig, linkconfig: name of the config set

Thanks,
Shawn

Re: To understand SolrCloud configurations