Spread SolrCloud across two locations

2017-05-23 Thread Jan Høydahl
Hi,

A customer has two locations (a few km apart) with super-fast networking 
in-between, so for day-to-day operation they view all VMs in both locations as 
a pool of servers. They typically spin up redundant servers for various 
services in each zone and if a zone should fail (they are a few km apart), the 
other will just continue working.

How can we best support such a setup with Cloud and Zookeeper? 
They do not need (or want) CDCR since latency and bandwidth is no problem, and 
CDCR is active-passive only so it anyway requires manual intervention to catch 
up if indexing is switched to the passive DC temporarily.
If it was not for ZK I would setup one Cloud cluster and make sure each shard 
was replicated cross zones and all would be fine.
But ZK really requires a third location in order to tolerate loss of an entire 
location/zone.
All solutions I can think of involves manual intervention, re-configuring of ZK 
followed by a restart of the surviving Solr nodes in order to point to the 
“new” ZK.

How have you guys solved such setups?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com



RE: Spread SolrCloud across two locations

2017-05-23 Thread Markus Jelsma
I would probably start by renting a VM at a third location to run Zookeeper.

Markus
 
-Original message-
> From:Jan Høydahl 
> Sent: Tuesday 23rd May 2017 11:09
> To: solr-user 
> Subject: Spread SolrCloud across two locations
> 
> Hi,
> 
> A customer has two locations (a few km apart) with super-fast networking 
> in-between, so for day-to-day operation they view all VMs in both locations 
> as a pool of servers. They typically spin up redundant servers for various 
> services in each zone and if a zone should fail (they are a few km apart), 
> the other will just continue working.
> 
> How can we best support such a setup with Cloud and Zookeeper? 
> They do not need (or want) CDCR since latency and bandwidth is no problem, 
> and CDCR is active-passive only so it anyway requires manual intervention to 
> catch up if indexing is switched to the passive DC temporarily.
> If it was not for ZK I would setup one Cloud cluster and make sure each shard 
> was replicated cross zones and all would be fine.
> But ZK really requires a third location in order to tolerate loss of an 
> entire location/zone.
> All solutions I can think of involves manual intervention, re-configuring of 
> ZK followed by a restart of the surviving Solr nodes in order to point to the 
> “new” ZK.
> 
> How have you guys solved such setups?
> 
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> 
> 


Re: Rule-based Replica Placement not working with Solr 6.5.1

2017-05-23 Thread Bernd Fehling
After some analysis it turns out that they compare apples with oranges :-(

Inside "tryAPermutationOfRules" the rule is called with rules.get() and
the next step is calling rule.compare(), but they don't compare the nodes
against the rule (or rules). They compare the nodes against each other.

E.g. server1:8983, server2:7574, server1:7574,...
What do you think will happen if comparing server1:8983 against server2:7574 
(and so on)???
It will _NEVER_ match!!!

Regards
Bernd


Am 23.05.2017 um 08:54 schrieb Bernd Fehling:
> No, that is way off, because:
> 1. you have no "tag" defined.
>shard and replica can be omitted and they will default to wildcard,
>but a "tag" must be defined.
> 2. replica must be an integer or a wildcard.
> 
> Regards
> Bernd
> 
> Am 23.05.2017 um 01:17 schrieb Damien Kamerman:
>> If you want all the replicas for shard1 on the same port then I think the
>> rule is: 'shard:shard1,replica:port:8983'
>>
>> On 22 May 2017 at 18:47, Bernd Fehling 
>> wrote:
>>
>>> I tried many settings with "Rule-based Replica Placement" on Solr 6.5.1
>>> and came to the conclusion that it is not working at all.
>>>
>>> My test setup is 6 nodes on 3 servers (port 8983 and 7574 on each server).
>>>
>>> The call to create a new collection is
>>> "http://localhost:8983/solr/admin/collections?action=CREATE&name=boss&;
>>> collection.configName=boss_configs&numShards=3&replicationFactor=2&
>>> maxShardsPerNode=1&rule=shard:shard1,replica:<2,port:8983"
>>>
>>> With "rule=shard:shard1,replica:<2,port:8983" I expect that shard1 has
>>> only nodes with port 8983 _OR_ it shoud fail due to "strict mode" because
>>> the fuzzy operator "~" it not set.
>>>
>>> The result of the call is:
>>> shard1 --> server2:7574 / server1:8983
>>> shard2 --> server1:7574 / server3:8983
>>> shard3 --> server2:8983 / server3:7574
>>>
>>> The expected result should be (at least!!!) shard1 --> server_x:8983 /
>>> server_y:8983
>>> where "_x" and "_y" can be anything between 1 and 3 but must be different.
>>>
>>> I think the problem is somewhere in "class ReplicaAssigner" with
>>> "tryAllPermutations"
>>> and "tryAPermutationOfRules".
>>>
>>> Regards
>>> Bernd
>>>
>>


Re: different length/size of unique 'id' field value in a collection.

2017-05-23 Thread Rick Leir
Derek,
If your algorithm is guaranteed to always provide unique id's then fine. I say 
incorrectly in that, after a few years in software development, I have seen 
bugs in the most careful code. A bug causing ID collisions could be hard to 
track down. Solr can generate unique ID's for you, and you can index your 
product ID's in normal fields, so that is my preference. Just a preference.
Cheers -- Rick

On May 22, 2017 10:07:36 PM EDT, Derek Poh  wrote:
>Hi Rick
>
>Myapologies I didnot make myself clearon the value of the fields. There
>
>are numbers.
>I used 'ts1', 'sup1' and 'pdt1' for simplicity and for ease of 
>understanding instead of the actual numbers.
>
>You mentioned this design has the potential for (in error cases) 
>concatenating id's incorrectly. Could explain more on this?
>
>On 5/22/2017 6:12 PM, Rick Leir wrote:
>> On 2017-05-22 02:25 AM, Derek Poh wrote:
>>> Hi
>>>
>>> Due to the source data structure, I need to concatenate the values
>of 
>>> 2 fields ('supplier_id' and 'product_id') to form the unique 'id' of
>
>>> each document.
>>> However there are cases where some documents only have 'supplier_id'
>
>>> field.
>>> This will result in some documents with a longer/larger 'id' field 
>>> (have both 'supplier_id' and 'product_id') and some with a 
>>> shorter/smaller 'id' field value (has only 'supplier_id').
>>>
>>> Please refer to simplified representation of the records below.
>>> 3rd record only has supplier id .
>>> ts1 sup1 pdt1
>>> ts1 sup1 pdt2
>>> ts1 sup2
>>> ts1 sup3 pdt3
>>> ts1 sup4 pdt5
>>> ts1 sup4 pdt6
>>>
>>> I understand the unique 'id' is use during indexing to check whether
>
>>> a document already exists. Create if it does not exists else update 
>>> if it exists.
>>>
>>> Are there any implications if the unique 'id' field value is of 
>>> different size/length among documents of a collection?
>> No
>>> Is it advisable to have such design?
>> Derek
>> You need unique ID's. This design has the potential for (in error 
>> cases) concatenating id's incorrectly. It might be better to have
>ID's 
>> which are just a number. That said, my current project has ID's which
>
>> are not just a number, YMMV.
>> cheers -- Rick
>>>
>>> Derek
>>
>>
>
>
>--
>CONFIDENTIALITY NOTICE 
>
>This e-mail (including any attachments) may contain confidential and/or
>privileged information. If you are not the intended recipient or have
>received this e-mail in error, please inform the sender immediately and
>delete this e-mail (including any attachments) from your computer, and
>you must not use, disclose to anyone else or copy this e-mail
>(including any attachments), whether in whole or in part. 
>
>This e-mail and any reply to it may be monitored for security, legal,
>regulatory compliance and/or other appropriate reasons.

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com 

Re: the problem on CDCR of solrCloud

2017-05-23 Thread alessandro.benedetti
What doesn't work ?
Can you specify exactly what is hapening ?
Can you add stacktraces as evidence to something bad happening ?
We can then try to help!



-
---
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
View this message in context: 
http://lucene.472066.n3.nabble.com/the-problem-on-CDCR-of-solrCloud-tp4336463p4336560.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: the problem on CDCR of solrCloud

2017-05-23 Thread Rick Leir
Hi 魏晓峰
What in particular is not working? Cheers  -- Rick

On May 22, 2017 11:00:24 PM EDT, "魏晓峰"  wrote:
>hello,my name is weixiaofeng. I'm from China, I'm a java
>developer.Recently we use the technology of solr  to complete search
>big data.we were in trouble in module CDCR(Cross Data Center
>Replication) of solrCloud。
>
>The goal of the project is to replicate data to multiple Data Centers
>to support Near Real Time Searching by immediately forwarding updates
>between nodes in the cluster on a per-shard basis.
> 
>but it doesn't work .I don't know how to solve this question.I'm very
>anxious about it.so can you help me? thank you!

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com 

Re: Rule-based Replica Placement not working with Solr 6.5.1

2017-05-23 Thread Noble Paul
did you try the rule
shard:shard1,port:8983

this ensures that all replicas of shard1 is allocated in the node w/ port 8983.

if it doesn't , it's a bug. Please open  aticket

On Tue, May 23, 2017 at 7:10 PM, Bernd Fehling
 wrote:
> After some analysis it turns out that they compare apples with oranges :-(
>
> Inside "tryAPermutationOfRules" the rule is called with rules.get() and
> the next step is calling rule.compare(), but they don't compare the nodes
> against the rule (or rules). They compare the nodes against each other.
>
> E.g. server1:8983, server2:7574, server1:7574,...
> What do you think will happen if comparing server1:8983 against server2:7574 
> (and so on)???
> It will _NEVER_ match!!!
>
> Regards
> Bernd
>
>
> Am 23.05.2017 um 08:54 schrieb Bernd Fehling:
>> No, that is way off, because:
>> 1. you have no "tag" defined.
>>shard and replica can be omitted and they will default to wildcard,
>>but a "tag" must be defined.
>> 2. replica must be an integer or a wildcard.
>>
>> Regards
>> Bernd
>>
>> Am 23.05.2017 um 01:17 schrieb Damien Kamerman:
>>> If you want all the replicas for shard1 on the same port then I think the
>>> rule is: 'shard:shard1,replica:port:8983'
>>>
>>> On 22 May 2017 at 18:47, Bernd Fehling 
>>> wrote:
>>>
 I tried many settings with "Rule-based Replica Placement" on Solr 6.5.1
 and came to the conclusion that it is not working at all.

 My test setup is 6 nodes on 3 servers (port 8983 and 7574 on each server).

 The call to create a new collection is
 "http://localhost:8983/solr/admin/collections?action=CREATE&name=boss&;
 collection.configName=boss_configs&numShards=3&replicationFactor=2&
 maxShardsPerNode=1&rule=shard:shard1,replica:<2,port:8983"

 With "rule=shard:shard1,replica:<2,port:8983" I expect that shard1 has
 only nodes with port 8983 _OR_ it shoud fail due to "strict mode" because
 the fuzzy operator "~" it not set.

 The result of the call is:
 shard1 --> server2:7574 / server1:8983
 shard2 --> server1:7574 / server3:8983
 shard3 --> server2:8983 / server3:7574

 The expected result should be (at least!!!) shard1 --> server_x:8983 /
 server_y:8983
 where "_x" and "_y" can be anything between 1 and 3 but must be different.

 I think the problem is somewhere in "class ReplicaAssigner" with
 "tryAllPermutations"
 and "tryAPermutationOfRules".

 Regards
 Bernd

>>>



-- 
-
Noble Paul


SOLR Index and Schema.xml file corruption

2017-05-23 Thread LAD, SAGAR
Hi SOLR team,
We are using SOLR 4.6.0 with sitecore CMS 7.2 .

It is observed that search indexes and some time schema.xml file get corrupted. 
Schema.xml field tag got extra forward slash and it result into stopping of 
SOLR.
We have "  " therefore only 
manual update is allowed.

Please guide us in this issue and let us know any detail required.


Regards
_
[Email_CBE.gif]Sagar Lad | Capgemini Technology Services India Limited | Pune
Schlumberger, Sogeti India
Main: +91 20 2760 1000 extn 2012331 | Cell: +91 8983724089 +91 98 199 64738
www.capgemini.com
People matter, results count.
___
[50years]



Connect with Capgemini:
[Picto_Blog][Picto_Twitter][Picto_Facebook][Picto_LinkedIn][Picto_Slideshare][Picto_YouTube]















This message contains information that may be privileged or confidential and is 
the property of the Capgemini Group. It is intended only for the person to whom 
it is addressed. If you are not the intended recipient, you are not authorized 
to read, print, retain, copy, disseminate, distribute, or use this message or 
any part thereof. If you receive this message in error, please notify the 
sender immediately and delete all copies of this message.


Re: Rule-based Replica Placement not working with Solr 6.5.1

2017-05-23 Thread Bernd Fehling
Yes, I tried that already.
Sure, it assigns 2 nodes with port 8983 to shard1 (e.g. 
server1:8983,server2:8983).
But due to no replica rule (which defaults to wildcard) I also get
shard3 --> server2:8983,server2:7574
shard2 --> server1:7574,server3:8983

The result is 3 replicas on server2 and also 2 replicas on one node of server2
but _no_ replica on node server3:7574.

I also tried to really nail it down with the rule:
rule=shard:shard1,replica:<2,sysprop.rack:1&
rule=shard:shard2,replica:<2,sysprop.rack:2&
rule=shard:shard3,replica:<2,sysprop.rack:3

The nodes were started with the correct -Drack=x property, but no luck.

>From debugging I can see that the code is "over complicated" written.
Probably to catch all possibilities (core, node, port, ip_x,...) but with the 
lack
not really trying all permutations and obeying the rules.

I will open a ticket for this.

Regards
Bernd

Am 23.05.2017 um 14:09 schrieb Noble Paul:
> did you try the rule
> shard:shard1,port:8983
> 
> this ensures that all replicas of shard1 is allocated in the node w/ port 
> 8983.
> 
> if it doesn't , it's a bug. Please open  aticket
> 
> On Tue, May 23, 2017 at 7:10 PM, Bernd Fehling
>  wrote:
>> After some analysis it turns out that they compare apples with oranges :-(
>>
>> Inside "tryAPermutationOfRules" the rule is called with rules.get() and
>> the next step is calling rule.compare(), but they don't compare the nodes
>> against the rule (or rules). They compare the nodes against each other.
>>
>> E.g. server1:8983, server2:7574, server1:7574,...
>> What do you think will happen if comparing server1:8983 against server2:7574 
>> (and so on)???
>> It will _NEVER_ match!!!
>>
>> Regards
>> Bernd
>>
>>
>> Am 23.05.2017 um 08:54 schrieb Bernd Fehling:
>>> No, that is way off, because:
>>> 1. you have no "tag" defined.
>>>shard and replica can be omitted and they will default to wildcard,
>>>but a "tag" must be defined.
>>> 2. replica must be an integer or a wildcard.
>>>
>>> Regards
>>> Bernd
>>>
>>> Am 23.05.2017 um 01:17 schrieb Damien Kamerman:
 If you want all the replicas for shard1 on the same port then I think the
 rule is: 'shard:shard1,replica:port:8983'

 On 22 May 2017 at 18:47, Bernd Fehling 
 wrote:

> I tried many settings with "Rule-based Replica Placement" on Solr 6.5.1
> and came to the conclusion that it is not working at all.
>
> My test setup is 6 nodes on 3 servers (port 8983 and 7574 on each server).
>
> The call to create a new collection is
> "http://localhost:8983/solr/admin/collections?action=CREATE&name=boss&;
> collection.configName=boss_configs&numShards=3&replicationFactor=2&
> maxShardsPerNode=1&rule=shard:shard1,replica:<2,port:8983"
>
> With "rule=shard:shard1,replica:<2,port:8983" I expect that shard1 has
> only nodes with port 8983 _OR_ it shoud fail due to "strict mode" because
> the fuzzy operator "~" it not set.
>
> The result of the call is:
> shard1 --> server2:7574 / server1:8983
> shard2 --> server1:7574 / server3:8983
> shard3 --> server2:8983 / server3:7574
>
> The expected result should be (at least!!!) shard1 --> server_x:8983 /
> server_y:8983
> where "_x" and "_y" can be anything between 1 and 3 but must be different.
>
> I think the problem is somewhere in "class ReplicaAssigner" with
> "tryAllPermutations"
> and "tryAPermutationOfRules".
>
> Regards
> Bernd
>



Re: Spread SolrCloud across two locations

2017-05-23 Thread Jan Høydahl
I.e. tell the customer that in order to have automatic failover and recovery in 
a 2-location setup we require at least one ZK instance in a separate third 
location. Kind of a tough requirement but necessary to safe-guard against split 
brain during network partition. 

If a third location is not an option, how would you setup ZK for manual 
reconfiguration? 
Two ZK in DC1 and one in DC2 would give you automatic recovery in case DC2 
falls out, but if DC1 falls out, WRITE would be disabled and to resume write in 
DC2 only, one would need to stop Solr + ZK, reconfigure ZK in DC2 as standalone 
(or setup two more) and then start Solr again with only one ZK.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 23. mai 2017 kl. 11.14 skrev Markus Jelsma :
> 
> I would probably start by renting a VM at a third location to run Zookeeper.
> 
> Markus
> 
> -Original message-
>> From:Jan Høydahl 
>> Sent: Tuesday 23rd May 2017 11:09
>> To: solr-user 
>> Subject: Spread SolrCloud across two locations
>> 
>> Hi,
>> 
>> A customer has two locations (a few km apart) with super-fast networking 
>> in-between, so for day-to-day operation they view all VMs in both locations 
>> as a pool of servers. They typically spin up redundant servers for various 
>> services in each zone and if a zone should fail (they are a few km apart), 
>> the other will just continue working.
>> 
>> How can we best support such a setup with Cloud and Zookeeper? 
>> They do not need (or want) CDCR since latency and bandwidth is no problem, 
>> and CDCR is active-passive only so it anyway requires manual intervention to 
>> catch up if indexing is switched to the passive DC temporarily.
>> If it was not for ZK I would setup one Cloud cluster and make sure each 
>> shard was replicated cross zones and all would be fine.
>> But ZK really requires a third location in order to tolerate loss of an 
>> entire location/zone.
>> All solutions I can think of involves manual intervention, re-configuring of 
>> ZK followed by a restart of the surviving Solr nodes in order to point to 
>> the “new” ZK.
>> 
>> How have you guys solved such setups?
>> 
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>> 
>> 



High CPU when use grouping group.ngroups=true

2017-05-23 Thread Nguyen Manh Tien
Hi All,

I recently switch from solr field collapse/expand to grouping for collapse
search result
All seem good but CPU is always high (80-100%) when i set param
group.ngroups=true.

We set ngroups=true to get number of groups so that we can paginate search
result correctly.
Due to CPU issue we need to turn it off.

Is ngroups=true is expensive feature? Is there any way to prevent CPU issue
and still have correct pagination.

Thanks,
Tien


Re: High CPU when use grouping group.ngroups=true

2017-05-23 Thread Erick Erickson
How many unique values in your group field? For high-cardinality
fields there's quite a bit of bookkeeping that needs to be done.

Have you tried profiling to see where the CPU time is being spent?

Best,
Erick

On Tue, May 23, 2017 at 7:46 AM, Nguyen Manh Tien
 wrote:
> Hi All,
>
> I recently switch from solr field collapse/expand to grouping for collapse
> search result
> All seem good but CPU is always high (80-100%) when i set param
> group.ngroups=true.
>
> We set ngroups=true to get number of groups so that we can paginate search
> result correctly.
> Due to CPU issue we need to turn it off.
>
> Is ngroups=true is expensive feature? Is there any way to prevent CPU issue
> and still have correct pagination.
>
> Thanks,
> Tien


Re: High CPU when use grouping group.ngroups=true

2017-05-23 Thread Nguyen Manh Tien
The collapse field is high-cardinality field. I haven't profiling yet but
will do it.

Thanks,
Tien

On Tue, May 23, 2017 at 9:48 PM, Erick Erickson 
wrote:

> How many unique values in your group field? For high-cardinality
> fields there's quite a bit of bookkeeping that needs to be done.
>
> Have you tried profiling to see where the CPU time is being spent?
>
> Best,
> Erick
>
> On Tue, May 23, 2017 at 7:46 AM, Nguyen Manh Tien
>  wrote:
> > Hi All,
> >
> > I recently switch from solr field collapse/expand to grouping for
> collapse
> > search result
> > All seem good but CPU is always high (80-100%) when i set param
> > group.ngroups=true.
> >
> > We set ngroups=true to get number of groups so that we can paginate
> search
> > result correctly.
> > Due to CPU issue we need to turn it off.
> >
> > Is ngroups=true is expensive feature? Is there any way to prevent CPU
> issue
> > and still have correct pagination.
> >
> > Thanks,
> > Tien
>


Re: SOLR Index and Schema.xml file corruption

2017-05-23 Thread Erick Erickson
If you have classic schema factory configured, then Solr will not write the
schema.xml file out. So either something's strange with SiteCore or someone
inadvertently hand-edited the schema. I suggest contacting the SiteCore
people to see how it would get that way.

You should be able to shut Solr/SiteCore down, hand-edit all your
schema.xml files to fix the problem and start things back up though.

Best,
Erick

On Tue, May 23, 2017 at 5:08 AM, LAD, SAGAR  wrote:

> Hi SOLR team,
>
> We are using SOLR 4.6.0 with sitecore CMS 7.2 .
>
>
>
> It is observed that search indexes and some time schema.xml file get
> corrupted. Schema.xml field tag got extra forward slash and it result into
> stopping of SOLR.
>
> We have “  ” therefore
> only manual update is allowed.
>
>
>
> Please guide us in this issue and let us know any detail required.
>
>
>
>
>
> Regards
>
> *_*
>
> [image: Email_CBE.gif]Sagar Lad | Capgemini Technology Services India
> Limited | Pune
>
> Schlumberger, Sogeti India
>
> Main: +91 20 2760 1000 extn 2012331 <+91%2020%202760%201000> | Cell: +91
> 8983724089 <+91%2089837%2024089> +91 98 199 64738
>
> www.capgemini.com
>
> *People matter, results count.*
>
> *___*
>
> [image: 50years]
>
>
>
> *Connect with Capgemini:*
> [image: Picto_Blog]
> [image:
> Picto_Twitter] [image: Picto_Facebook]
> [image: Picto_LinkedIn]
> [image: Picto_Slideshare]
> [image: Picto_YouTube]
> 
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> This message contains information that may be privileged or confidential
> and is the property of the Capgemini Group. It is intended only for the
> person to whom it is addressed. If you are not the intended recipient, you
> are not authorized to read, print, retain, copy, disseminate, distribute,
> or use this message or any part thereof. If you receive this message in
> error, please notify the sender immediately and delete all copies of this
> message.
>


Re: Spread SolrCloud across two locations

2017-05-23 Thread Susheel Kumar
Hi Jan, FYI - Since last year, I have been running a Solr 6.0 cluster in
one of lower env with 6 shards/replica in dc1 & 6 shard/replica in dc2 (each
shard replicated cross data center) with 3 ZK in dc1 and 2 ZK in dc2.  (I
didn't have the availability of 3rd data center for ZK so went with only 2
data center with above configuration)  and so far no issues.  Its been
running fine, indexing, replicating data, serving queries etc. So in my
test, setting up single cluster across two zones/data center works without
any issue when there is no or very minimal latency (in my case around 30ms
one way)

Thanks,
Susheel

On Tue, May 23, 2017 at 9:20 AM, Jan Høydahl  wrote:

> I.e. tell the customer that in order to have automatic failover and
> recovery in a 2-location setup we require at least one ZK instance in a
> separate third location. Kind of a tough requirement but necessary to
> safe-guard against split brain during network partition.
>
> If a third location is not an option, how would you setup ZK for manual
> reconfiguration?
> Two ZK in DC1 and one in DC2 would give you automatic recovery in case DC2
> falls out, but if DC1 falls out, WRITE would be disabled and to resume
> write in DC2 only, one would need to stop Solr + ZK, reconfigure ZK in DC2
> as standalone (or setup two more) and then start Solr again with only one
> ZK.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> > 23. mai 2017 kl. 11.14 skrev Markus Jelsma :
> >
> > I would probably start by renting a VM at a third location to run
> Zookeeper.
> >
> > Markus
> >
> > -Original message-
> >> From:Jan Høydahl 
> >> Sent: Tuesday 23rd May 2017 11:09
> >> To: solr-user 
> >> Subject: Spread SolrCloud across two locations
> >>
> >> Hi,
> >>
> >> A customer has two locations (a few km apart) with super-fast
> networking in-between, so for day-to-day operation they view all VMs in
> both locations as a pool of servers. They typically spin up redundant
> servers for various services in each zone and if a zone should fail (they
> are a few km apart), the other will just continue working.
> >>
> >> How can we best support such a setup with Cloud and Zookeeper?
> >> They do not need (or want) CDCR since latency and bandwidth is no
> problem, and CDCR is active-passive only so it anyway requires manual
> intervention to catch up if indexing is switched to the passive DC
> temporarily.
> >> If it was not for ZK I would setup one Cloud cluster and make sure each
> shard was replicated cross zones and all would be fine.
> >> But ZK really requires a third location in order to tolerate loss of an
> entire location/zone.
> >> All solutions I can think of involves manual intervention,
> re-configuring of ZK followed by a restart of the surviving Solr nodes in
> order to point to the “new” ZK.
> >>
> >> How have you guys solved such setups?
> >>
> >> --
> >> Jan Høydahl, search solution architect
> >> Cominvent AS - www.cominvent.com
> >>
> >>
>
>


Re: Spread SolrCloud across two locations

2017-05-23 Thread Erick Erickson
Susheel:

The issue is that if, for any reason at all, the connection between
dc1 and dc2 is broken, there will be no indexing on dc2 since the Solr
servers there will not sense ZK quorum. You'll have to do something
manual to reconfigure.

That's not a flaw in your setup, just the way things work ;).

Putting one of the ZKs on a third DC would change that...

Best,
Erick

On Tue, May 23, 2017 at 9:12 AM, Susheel Kumar  wrote:
> Hi Jan, FYI - Since last year, I have been running a Solr 6.0 cluster in
> one of lower env with 6 shards/replica in dc1 & 6 shard/replica in dc2 (each
> shard replicated cross data center) with 3 ZK in dc1 and 2 ZK in dc2.  (I
> didn't have the availability of 3rd data center for ZK so went with only 2
> data center with above configuration)  and so far no issues.  Its been
> running fine, indexing, replicating data, serving queries etc. So in my
> test, setting up single cluster across two zones/data center works without
> any issue when there is no or very minimal latency (in my case around 30ms
> one way)
>
> Thanks,
> Susheel
>
> On Tue, May 23, 2017 at 9:20 AM, Jan Høydahl  wrote:
>
>> I.e. tell the customer that in order to have automatic failover and
>> recovery in a 2-location setup we require at least one ZK instance in a
>> separate third location. Kind of a tough requirement but necessary to
>> safe-guard against split brain during network partition.
>>
>> If a third location is not an option, how would you setup ZK for manual
>> reconfiguration?
>> Two ZK in DC1 and one in DC2 would give you automatic recovery in case DC2
>> falls out, but if DC1 falls out, WRITE would be disabled and to resume
>> write in DC2 only, one would need to stop Solr + ZK, reconfigure ZK in DC2
>> as standalone (or setup two more) and then start Solr again with only one
>> ZK.
>>
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>>
>> > 23. mai 2017 kl. 11.14 skrev Markus Jelsma :
>> >
>> > I would probably start by renting a VM at a third location to run
>> Zookeeper.
>> >
>> > Markus
>> >
>> > -Original message-
>> >> From:Jan Høydahl 
>> >> Sent: Tuesday 23rd May 2017 11:09
>> >> To: solr-user 
>> >> Subject: Spread SolrCloud across two locations
>> >>
>> >> Hi,
>> >>
>> >> A customer has two locations (a few km apart) with super-fast
>> networking in-between, so for day-to-day operation they view all VMs in
>> both locations as a pool of servers. They typically spin up redundant
>> servers for various services in each zone and if a zone should fail (they
>> are a few km apart), the other will just continue working.
>> >>
>> >> How can we best support such a setup with Cloud and Zookeeper?
>> >> They do not need (or want) CDCR since latency and bandwidth is no
>> problem, and CDCR is active-passive only so it anyway requires manual
>> intervention to catch up if indexing is switched to the passive DC
>> temporarily.
>> >> If it was not for ZK I would setup one Cloud cluster and make sure each
>> shard was replicated cross zones and all would be fine.
>> >> But ZK really requires a third location in order to tolerate loss of an
>> entire location/zone.
>> >> All solutions I can think of involves manual intervention,
>> re-configuring of ZK followed by a restart of the surviving Solr nodes in
>> order to point to the “new” ZK.
>> >>
>> >> How have you guys solved such setups?
>> >>
>> >> --
>> >> Jan Høydahl, search solution architect
>> >> Cominvent AS - www.cominvent.com
>> >>
>> >>
>>
>>


Re: Spread SolrCloud across two locations

2017-05-23 Thread Susheel Kumar
Agree, Erick.  Since this setup is in our test env, haven't really invested
to add another DC but for Prod sure, will go by DC3 if we do go with this
setup.

On Tue, May 23, 2017 at 12:38 PM, Erick Erickson 
wrote:

> Susheel:
>
> The issue is that if, for any reason at all, the connection between
> dc1 and dc2 is broken, there will be no indexing on dc2 since the Solr
> servers there will not sense ZK quorum. You'll have to do something
> manual to reconfigure.
>
> That's not a flaw in your setup, just the way things work ;).
>
> Putting one of the ZKs on a third DC would change that...
>
> Best,
> Erick
>
> On Tue, May 23, 2017 at 9:12 AM, Susheel Kumar 
> wrote:
> > Hi Jan, FYI - Since last year, I have been running a Solr 6.0 cluster in
> > one of lower env with 6 shards/replica in dc1 & 6 shard/replica in dc2
> (each
> > shard replicated cross data center) with 3 ZK in dc1 and 2 ZK in dc2.  (I
> > didn't have the availability of 3rd data center for ZK so went with only
> 2
> > data center with above configuration)  and so far no issues.  Its been
> > running fine, indexing, replicating data, serving queries etc. So in my
> > test, setting up single cluster across two zones/data center works
> without
> > any issue when there is no or very minimal latency (in my case around
> 30ms
> > one way)
> >
> > Thanks,
> > Susheel
> >
> > On Tue, May 23, 2017 at 9:20 AM, Jan Høydahl 
> wrote:
> >
> >> I.e. tell the customer that in order to have automatic failover and
> >> recovery in a 2-location setup we require at least one ZK instance in a
> >> separate third location. Kind of a tough requirement but necessary to
> >> safe-guard against split brain during network partition.
> >>
> >> If a third location is not an option, how would you setup ZK for manual
> >> reconfiguration?
> >> Two ZK in DC1 and one in DC2 would give you automatic recovery in case
> DC2
> >> falls out, but if DC1 falls out, WRITE would be disabled and to resume
> >> write in DC2 only, one would need to stop Solr + ZK, reconfigure ZK in
> DC2
> >> as standalone (or setup two more) and then start Solr again with only
> one
> >> ZK.
> >>
> >> --
> >> Jan Høydahl, search solution architect
> >> Cominvent AS - www.cominvent.com
> >>
> >> > 23. mai 2017 kl. 11.14 skrev Markus Jelsma <
> markus.jel...@openindex.io>:
> >> >
> >> > I would probably start by renting a VM at a third location to run
> >> Zookeeper.
> >> >
> >> > Markus
> >> >
> >> > -Original message-
> >> >> From:Jan Høydahl 
> >> >> Sent: Tuesday 23rd May 2017 11:09
> >> >> To: solr-user 
> >> >> Subject: Spread SolrCloud across two locations
> >> >>
> >> >> Hi,
> >> >>
> >> >> A customer has two locations (a few km apart) with super-fast
> >> networking in-between, so for day-to-day operation they view all VMs in
> >> both locations as a pool of servers. They typically spin up redundant
> >> servers for various services in each zone and if a zone should fail
> (they
> >> are a few km apart), the other will just continue working.
> >> >>
> >> >> How can we best support such a setup with Cloud and Zookeeper?
> >> >> They do not need (or want) CDCR since latency and bandwidth is no
> >> problem, and CDCR is active-passive only so it anyway requires manual
> >> intervention to catch up if indexing is switched to the passive DC
> >> temporarily.
> >> >> If it was not for ZK I would setup one Cloud cluster and make sure
> each
> >> shard was replicated cross zones and all would be fine.
> >> >> But ZK really requires a third location in order to tolerate loss of
> an
> >> entire location/zone.
> >> >> All solutions I can think of involves manual intervention,
> >> re-configuring of ZK followed by a restart of the surviving Solr nodes
> in
> >> order to point to the “new” ZK.
> >> >>
> >> >> How have you guys solved such setups?
> >> >>
> >> >> --
> >> >> Jan Høydahl, search solution architect
> >> >> Cominvent AS - www.cominvent.com
> >> >>
> >> >>
> >>
> >>
>


Re: Spread SolrCloud across two locations

2017-05-23 Thread Shawn Heisey

On 5/23/2017 10:12 AM, Susheel Kumar wrote:
Hi Jan, FYI - Since last year, I have been running a Solr 6.0 cluster 
in one of lower env with 6 shards/replica in dc1 & 6 shard/replica in 
dc2 (each shard replicated cross data center) with 3 ZK in dc1 and 2 
ZK in dc2. (I didn't have the availability of 3rd data center for ZK 
so went with only 2 data center with above configuration) and so far 
no issues. Its been running fine, indexing, replicating data, serving 
queries etc. So in my test, setting up single cluster across two 
zones/data center works without any issue when there is no or very 
minimal latency (in my case around 30ms one way


With that setup, if dc2 goes down, you're all good, but if dc1 goes 
down, you're not.


There aren't enough ZK servers in dc2 to maintain quorum when dc1 is 
unreachable, and SolrCloud is going to go read-only.  Queries would most 
likely work, but you would not be able to change the indexes at all.


ZooKeeper with N total servers requires int((N/2)+1) servers to be 
operational to maintain quorum.  This means that with five total 
servers, three must be operational and able to talk to each other, or ZK 
cannot guarantee that there is no split-brain, so quorum is lost.


ZK in two data centers will never be fully fault-tolerant. There is no 
combination of servers that will work properly.  You must have three 
data centers for a geographically fault-tolerant cluster.  Solr would be 
optional in the third data center.  ZK must be installed in all three.


Thanks,
Shawn



Re: solrcloud replicas not in sync

2017-05-23 Thread Webster Homer
We see a pretty consistent issue where the replicas show in the admin
console as not current, indicating that our auto commit isn't commiting. In
one case we loaded the data to the source, cdcr replicated it to the
targets and we see the source and the target as having current = false. It
is searchable so the soft commits are happening. We turned off data loading
to investigate this issue, and the replicas are still not current after 3
days. So there should have been ample time to catch up.
This is our autoCommit
 
   25000
   ${solr.autoCommit.maxTime:30}
   false
 

This is our autoSoftCommit
 
   ${solr.autoSoftCommit.maxTime:15000}
 
neither property, solr.autoCommit.maxTime or solr.autoSoftCommit.maxTime
are set.

We also have an updateChain that calls the
solr.IgnoreCommitOptimizeUpdateProcessorFactory to ignore client commits.
Could that be the cause of our
  
 
   
 200
   

   
   

   
   
 

We did create a date field to all our collections that defaults to NOW so I
can see that no new data was added, but the replicas don't seem to get the
commit. I assume this is something in our configuration (see above).

Is there a way to determine when the last commit occurred?

I believe that the one replica got out of sync due to an admin running an
optimize while cdcr was still running.
That was one collection, but it looks like we are missing commits on most
of our collections.

Any help would be greatly appreciated!

Thanks,
Webster Homer

On Mon, May 22, 2017 at 4:12 PM, Erick Erickson 
wrote:

> You can ping individual replicas by addressing to a specific replica
> and setting distrib=false, something like
>
>  http://SOLR_NODE:port/solr/collection1_shard1_replica1/
> query?distrib=false&q=..
>
> But one thing to check first is that you've committed. I'd:
> 1> turn off indexing on the source cluster.
> 2> wait until the CDCR had caught up (if necessary).
> 3> issue a hard commit on the target
> 4> _then_ see if the counts were what is expected.
>
> Due to the fact that autocommit settings can fire at different clock
> times even for replicas on the same shard, it's easier to track
> whether it's a transient issue. The other thing I've seen people do is
> have a timestamp on the docs set to NOW (there's an update processor
> that can do this). Then when you check for consistency you can use
> fq=timestamp:[* TO NOW - (some interval significantly longer than your
> autocommit interval)].
>
> bq: Is there a way to recover when a shard has inconsistent replicas.
> If I use the delete replica API call to delete one of them and then use add
> replica to create it from scratch will it auto-populate from the other
> replica in the shard?
>
> Yes. Whenever you ADDREPLICA it'll catch itself up from the leader
> before becoming active. It'll have to copy the _entire_ index from the
> leader, so you'll see network traffic spike.
>
> Best,
> Erick
>
> On Mon, May 22, 2017 at 1:41 PM, Webster Homer 
> wrote:
> > I have a solrcloud collection with 2 shards and 4 replicas. The replicas
> > for shard 1 have different numbers of records, so different queries will
> > return different numbers of records.
> >
> > I am not certain how this occurred, it happened in a collection that was
> a
> > cdcr target.
> >
> > Is there a way to limit a search to a specific replica of a shard? We
> want
> > to understand the differences
> >
> > Is there a way to recover when a shard has inconsistent replicas.
> > If I use the delete replica API call to delete one of them and then use
> add
> > replica to create it from scratch will it auto-populate from the other
> > replica in the shard?
> >
> > Thanks,
> > Webster
> >
> > --
> >
> >
> > This message and any attachment are confidential and may be privileged or
> > otherwise protected from disclosure. If you are not the intended
> recipient,
> > you must not copy this message or attachment or disclose the contents to
> > any other person. If you have received this transmission in error, please
> > notify the sender immediately and delete the message and any attachment
> > from your system. Merck KGaA, Darmstadt, Germany and any of its
> > subsidiaries do not accept liability for any omissions or errors in this
> > message which may arise as a result of E-Mail-transmission or for damages
> > resulting from any unauthorized changes of the content of this message
> and
> > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> > subsidiaries do not guarantee that this message is free of viruses and
> does
> > not accept liability for any damages caused by any virus transmitted
> > therewith.
> >
> > Click http://www.emdgroup.com/disclaimer to access the German, French,
> > Spanish and Portuguese versions of this disclaimer.
>

-- 


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
yo

Re: Indexing word with plus sign

2017-05-23 Thread Fundera Developer
I have also tried this option, by using a PatternReplaceFilterFactory, like 
this:



but it gets processed AFTER the Tokenizer, so when it executes there is no 
longer an "i+d" token, but two "i" and "d" independent tokens.

Is there a way I could make the filter execute before the Tokenizer? I have 
tried to place it first in the Analyzer definition like this:

 
   
   
   
   
   
 

But I had no luck.

Are there any other approaches I could be missing?

Thanks!


El 22/05/17 a las 20:50, Rick Leir escribió:

Fundera,
You need a regex which matches a '+' with non-blank chars before and after. It 
should not replace a  '+' preceded by white space, that is important in Solr. 
This is not a perfect solution, but might improve matters for you.
Cheers -- Rick

On May 22, 2017 1:58:21 PM EDT, Fundera Developer 
 wrote:


Thank you Zahid and Erik,

I was going to try the CharFilter suggestion, but then I doubted. I see
the indexing process, and how the appearance of 'i+d' would be handled,
but, what happens at query time? If I use the same filter, I could
remove '+' chars that are added by the user to identify compulsory
tokens in the search results, couldn't I?  However, if i do not use the
CharFilter I would not be able to match the 'i+d' search tokens...

Thanks all!



El 22/05/17 a las 16:39, Erick Erickson escribió:

You can also use any of the other tokenizers. WhitespaceTokenizer for
instance. There are a couple that use regular expressions. Etc. See:
https://cwiki.apache.org/confluence/display/solr/Tokenizers

Each one has it's considerations. WhitespaceTokenizer won't, for
instance, separate out punctuation so you might then have to use a
filter to remove those. Regex's can be tricky to get right ;). Etc

Best,
Erick

On Mon, May 22, 2017 at 5:26 AM, Muhammad Zahid Iqbal

wrote:


Hi,


Before applying tokenizer, you can replace your special symbols with
some
phrase to preserve it and after tokenized you can replace it back.

For example:



Thanks,
Zahid iqbal

On Mon, May 22, 2017 at 12:57 AM, Fundera Developer <
funderadevelo...@outlook.com>
wrote:



Hi all,

I am a bit stuck at a problem that I feel must be easy to solve. In
Spanish it is usual to find the term 'i+d'. We are working with Solr
5.5,
and StandardTokenizer splits 'i' and 'd' and sometimes, as we have in
the
index documents both in Spanish and Catalan, and in Catalan it is
frequent
to find 'i' as a word, when a user searches for 'i+d' it gets Catalan
documents as results.

I have tried to use the SynonymFilter, with something like:

i+d => investigacionYdesarrollo

But it does not seem to change anything.

Is there a way I could set an exception to the Tokenizer so that it
does
not split this word?

Thanks in advance!







RE: Using the Data Import Handler with SQLite

2017-05-23 Thread Dheeraj Kumar Karnati
Hi Zac,
  I think you have added entity closing tag 2 times. that might be
causing an issue. It been a long time . not sure whether you are still
working on it or not. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-the-Data-Import-Handler-with-SQLite-tp2765655p4336690.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Indexing word with plus sign

2017-05-23 Thread Erick Erickson
You need to distinguish between

PatternReplaceCharFilterFactory

and

PatternReplaceFilterFactory

The first one is applied to the entire input _before_ tokenization.
The second is applied _after_ tokenization to individual tokens, by
that time it's too late.

It's an easy thing to miss.

And at query time you'll have to be careful to keep the + sign from
being interpreted as an operator.
Best,
Erick

On Tue, May 23, 2017 at 10:12 AM, Fundera Developer
 wrote:
> I have also tried this option, by using a PatternReplaceFilterFactory, like 
> this:
>
>  replacement="investigación y desarrollo"/>
>
> but it gets processed AFTER the Tokenizer, so when it executes there is no 
> longer an "i+d" token, but two "i" and "d" independent tokens.
>
> Is there a way I could make the filter execute before the Tokenizer? I have 
> tried to place it first in the Analyzer definition like this:
>
>  
> mapping="mapping-FoldToASCII.txt"/>
> replacement="investigación y desarrollo"/>
>
>
> words="stopwords.txt" />
>  
>
> But I had no luck.
>
> Are there any other approaches I could be missing?
>
> Thanks!
>
>
> El 22/05/17 a las 20:50, Rick Leir escribió:
>
> Fundera,
> You need a regex which matches a '+' with non-blank chars before and after. 
> It should not replace a  '+' preceded by white space, that is important in 
> Solr. This is not a perfect solution, but might improve matters for you.
> Cheers -- Rick
>
> On May 22, 2017 1:58:21 PM EDT, Fundera Developer 
>  wrote:
>
>
> Thank you Zahid and Erik,
>
> I was going to try the CharFilter suggestion, but then I doubted. I see
> the indexing process, and how the appearance of 'i+d' would be handled,
> but, what happens at query time? If I use the same filter, I could
> remove '+' chars that are added by the user to identify compulsory
> tokens in the search results, couldn't I?  However, if i do not use the
> CharFilter I would not be able to match the 'i+d' search tokens...
>
> Thanks all!
>
>
>
> El 22/05/17 a las 16:39, Erick Erickson escribió:
>
> You can also use any of the other tokenizers. WhitespaceTokenizer for
> instance. There are a couple that use regular expressions. Etc. See:
> https://cwiki.apache.org/confluence/display/solr/Tokenizers
>
> Each one has it's considerations. WhitespaceTokenizer won't, for
> instance, separate out punctuation so you might then have to use a
> filter to remove those. Regex's can be tricky to get right ;). Etc
>
> Best,
> Erick
>
> On Mon, May 22, 2017 at 5:26 AM, Muhammad Zahid Iqbal
> 
> wrote:
>
>
> Hi,
>
>
> Before applying tokenizer, you can replace your special symbols with
> some
> phrase to preserve it and after tokenized you can replace it back.
>
> For example:
>  replacement="xxx" />
>
>
> Thanks,
> Zahid iqbal
>
> On Mon, May 22, 2017 at 12:57 AM, Fundera Developer <
> funderadevelo...@outlook.com>
> wrote:
>
>
>
> Hi all,
>
> I am a bit stuck at a problem that I feel must be easy to solve. In
> Spanish it is usual to find the term 'i+d'. We are working with Solr
> 5.5,
> and StandardTokenizer splits 'i' and 'd' and sometimes, as we have in
> the
> index documents both in Spanish and Catalan, and in Catalan it is
> frequent
> to find 'i' as a word, when a user searches for 'i+d' it gets Catalan
> documents as results.
>
> I have tried to use the SynonymFilter, with something like:
>
> i+d => investigacionYdesarrollo
>
> But it does not seem to change anything.
>
> Is there a way I could set an exception to the Tokenizer so that it
> does
> not split this word?
>
> Thanks in advance!
>
>
>
>
>


Re: solrcloud replicas not in sync

2017-05-23 Thread Erick Erickson
This is all quite strange. Optimize (BTW, it's rarely
necessary/desirable on an index that changes, despite its name)
shouldn't matter here. CDCR forwards the raw documents to the target
cluster.

Ample time indeed. With a soft commit of 15 seconds, that's your
window (with some slop for how long CDCR takes).

If you do a search and sort by your timestamp descending, what do you
see on the target cluster? And when you are indexing and CDCR is
running, your target cluster solr logs should show updates coming in.
Mostly checking if the data is even getting to the target cluster
here.

Also check the tlogs on the source cluster. By "check" here I just
mean "are they reasonable size", and "reasonable" should be very
small. The tlogs are the "queue" that CDCR uses to store docs before
forwarding to the target cluster, so this is just a sanity check. If
they're huge, then CDCR is not forwarding anything to the target
cluster.

It's also vaguely possible that
IgnoreCommitOptimizeUpdateProcessorFactory is interfering, if so it's
a bug and should be reported as a JIRA. If you remove that on the
target cluster, does the behavior change?

I'm mystified here as you can tell.

Best,
Erick

On Tue, May 23, 2017 at 10:12 AM, Webster Homer  wrote:
> We see a pretty consistent issue where the replicas show in the admin
> console as not current, indicating that our auto commit isn't commiting. In
> one case we loaded the data to the source, cdcr replicated it to the
> targets and we see the source and the target as having current = false. It
> is searchable so the soft commits are happening. We turned off data loading
> to investigate this issue, and the replicas are still not current after 3
> days. So there should have been ample time to catch up.
> This is our autoCommit
>  
>25000
>${solr.autoCommit.maxTime:30}
>false
>  
>
> This is our autoSoftCommit
>  
>${solr.autoSoftCommit.maxTime:15000}
>  
> neither property, solr.autoCommit.maxTime or solr.autoSoftCommit.maxTime
> are set.
>
> We also have an updateChain that calls the
> solr.IgnoreCommitOptimizeUpdateProcessorFactory to ignore client commits.
> Could that be the cause of our
>   
>  
>
>  200
>
>
>
>
>
>
>
>  
>
> We did create a date field to all our collections that defaults to NOW so I
> can see that no new data was added, but the replicas don't seem to get the
> commit. I assume this is something in our configuration (see above).
>
> Is there a way to determine when the last commit occurred?
>
> I believe that the one replica got out of sync due to an admin running an
> optimize while cdcr was still running.
> That was one collection, but it looks like we are missing commits on most
> of our collections.
>
> Any help would be greatly appreciated!
>
> Thanks,
> Webster Homer
>
> On Mon, May 22, 2017 at 4:12 PM, Erick Erickson 
> wrote:
>
>> You can ping individual replicas by addressing to a specific replica
>> and setting distrib=false, something like
>>
>>  http://SOLR_NODE:port/solr/collection1_shard1_replica1/
>> query?distrib=false&q=..
>>
>> But one thing to check first is that you've committed. I'd:
>> 1> turn off indexing on the source cluster.
>> 2> wait until the CDCR had caught up (if necessary).
>> 3> issue a hard commit on the target
>> 4> _then_ see if the counts were what is expected.
>>
>> Due to the fact that autocommit settings can fire at different clock
>> times even for replicas on the same shard, it's easier to track
>> whether it's a transient issue. The other thing I've seen people do is
>> have a timestamp on the docs set to NOW (there's an update processor
>> that can do this). Then when you check for consistency you can use
>> fq=timestamp:[* TO NOW - (some interval significantly longer than your
>> autocommit interval)].
>>
>> bq: Is there a way to recover when a shard has inconsistent replicas.
>> If I use the delete replica API call to delete one of them and then use add
>> replica to create it from scratch will it auto-populate from the other
>> replica in the shard?
>>
>> Yes. Whenever you ADDREPLICA it'll catch itself up from the leader
>> before becoming active. It'll have to copy the _entire_ index from the
>> leader, so you'll see network traffic spike.
>>
>> Best,
>> Erick
>>
>> On Mon, May 22, 2017 at 1:41 PM, Webster Homer 
>> wrote:
>> > I have a solrcloud collection with 2 shards and 4 replicas. The replicas
>> > for shard 1 have different numbers of records, so different queries will
>> > return different numbers of records.
>> >
>> > I am not certain how this occurred, it happened in a collection that was
>> a
>> > cdcr target.
>> >
>> > Is there a way to limit a search to a specific replica of a shard? We
>> want
>> > to understand the differences
>> >
>> > Is there a way to recover when a shard has inconsistent replicas.
>> > If I use the delete replica API call to delete one of them and

Re: Indexing word with plus sign

2017-05-23 Thread Walter Underwood
Years ago at Netflix, I had to deal with a DVD from a band named “+/-“. I gave 
up and translated that to “plusminus” at index and query time.

http://plusmin.us/ 

Luckily, “.hack//Sign” and other related dot-hack anime matched if I just 
deleted all the punctuation. And everyone searched for "[•REC]²” as “rec2”. The 
middot is supposed to be red. Movie studios are clueless about searchable 
strings.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On May 23, 2017, at 10:41 AM, Erick Erickson  wrote:
> 
> You need to distinguish between
> 
> PatternReplaceCharFilterFactory
> 
> and
> 
> PatternReplaceFilterFactory
> 
> The first one is applied to the entire input _before_ tokenization.
> The second is applied _after_ tokenization to individual tokens, by
> that time it's too late.
> 
> It's an easy thing to miss.
> 
> And at query time you'll have to be careful to keep the + sign from
> being interpreted as an operator.
> Best,
> Erick
> 
> On Tue, May 23, 2017 at 10:12 AM, Fundera Developer
>  wrote:
>> I have also tried this option, by using a PatternReplaceFilterFactory, like 
>> this:
>> 
>> > replacement="investigación y desarrollo"/>
>> 
>> but it gets processed AFTER the Tokenizer, so when it executes there is no 
>> longer an "i+d" token, but two "i" and "d" independent tokens.
>> 
>> Is there a way I could make the filter execute before the Tokenizer? I have 
>> tried to place it first in the Analyzer definition like this:
>> 
>> 
>>   > mapping="mapping-FoldToASCII.txt"/>
>>   > replacement="investigación y desarrollo"/>
>>   
>>   
>>   > words="stopwords.txt" />
>> 
>> 
>> But I had no luck.
>> 
>> Are there any other approaches I could be missing?
>> 
>> Thanks!
>> 
>> 
>> El 22/05/17 a las 20:50, Rick Leir escribió:
>> 
>> Fundera,
>> You need a regex which matches a '+' with non-blank chars before and after. 
>> It should not replace a  '+' preceded by white space, that is important in 
>> Solr. This is not a perfect solution, but might improve matters for you.
>> Cheers -- Rick
>> 
>> On May 22, 2017 1:58:21 PM EDT, Fundera Developer 
>>  wrote:
>> 
>> 
>> Thank you Zahid and Erik,
>> 
>> I was going to try the CharFilter suggestion, but then I doubted. I see
>> the indexing process, and how the appearance of 'i+d' would be handled,
>> but, what happens at query time? If I use the same filter, I could
>> remove '+' chars that are added by the user to identify compulsory
>> tokens in the search results, couldn't I?  However, if i do not use the
>> CharFilter I would not be able to match the 'i+d' search tokens...
>> 
>> Thanks all!
>> 
>> 
>> 
>> El 22/05/17 a las 16:39, Erick Erickson escribió:
>> 
>> You can also use any of the other tokenizers. WhitespaceTokenizer for
>> instance. There are a couple that use regular expressions. Etc. See:
>> https://cwiki.apache.org/confluence/display/solr/Tokenizers
>> 
>> Each one has it's considerations. WhitespaceTokenizer won't, for
>> instance, separate out punctuation so you might then have to use a
>> filter to remove those. Regex's can be tricky to get right ;). Etc
>> 
>> Best,
>> Erick
>> 
>> On Mon, May 22, 2017 at 5:26 AM, Muhammad Zahid Iqbal
>> 
>> wrote:
>> 
>> 
>> Hi,
>> 
>> 
>> Before applying tokenizer, you can replace your special symbols with
>> some
>> phrase to preserve it and after tokenized you can replace it back.
>> 
>> For example:
>> > replacement="xxx" />
>> 
>> 
>> Thanks,
>> Zahid iqbal
>> 
>> On Mon, May 22, 2017 at 12:57 AM, Fundera Developer <
>> funderadevelo...@outlook.com>
>> wrote:
>> 
>> 
>> 
>> Hi all,
>> 
>> I am a bit stuck at a problem that I feel must be easy to solve. In
>> Spanish it is usual to find the term 'i+d'. We are working with Solr
>> 5.5,
>> and StandardTokenizer splits 'i' and 'd' and sometimes, as we have in
>> the
>> index documents both in Spanish and Catalan, and in Catalan it is
>> frequent
>> to find 'i' as a word, when a user searches for 'i+d' it gets Catalan
>> documents as results.
>> 
>> I have tried to use the SynonymFilter, with something like:
>> 
>> i+d => investigacionYdesarrollo
>> 
>> But it does not seem to change anything.
>> 
>> Is there a way I could set an exception to the Tokenizer so that it
>> does
>> not split this word?
>> 
>> Thanks in advance!
>> 
>> 
>> 
>> 
>> 



Re: Indexing word with plus sign

2017-05-23 Thread Fundera Developer
Thanks Walter!!

For the sake of curiosity, do you remember which Tokenizer were you using in 
that case?

Thanks!


El 23/05/17 a las 20:02, Walter Underwood escribió:

Years ago at Netflix, I had to deal with a DVD from a band named “+/-“. I gave 
up and translated that to “plusminus” at index and query time.

http://plusmin.us/ 

Luckily, “.hack//Sign” and other related dot-hack anime matched if I just 
deleted all the punctuation. And everyone searched for "[•REC]²” as “rec2”. The 
middot is supposed to be red. Movie studios are clueless about searchable 
strings.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)




On May 23, 2017, at 10:41 AM, Erick Erickson 
 wrote:

You need to distinguish between

PatternReplaceCharFilterFactory

and

PatternReplaceFilterFactory

The first one is applied to the entire input _before_ tokenization.
The second is applied _after_ tokenization to individual tokens, by
that time it's too late.

It's an easy thing to miss.

And at query time you'll have to be careful to keep the + sign from
being interpreted as an operator.
Best,
Erick

On Tue, May 23, 2017 at 10:12 AM, Fundera Developer
 wrote:


I have also tried this option, by using a PatternReplaceFilterFactory, like 
this:



but it gets processed AFTER the Tokenizer, so when it executes there is no 
longer an "i+d" token, but two "i" and "d" independent tokens.

Is there a way I could make the filter execute before the Tokenizer? I have 
tried to place it first in the Analyzer definition like this:


  
  
  
  
  


But I had no luck.

Are there any other approaches I could be missing?

Thanks!


El 22/05/17 a las 20:50, Rick Leir escribió:

Fundera,
You need a regex which matches a '+' with non-blank chars before and after. It 
should not replace a  '+' preceded by white space, that is important in Solr. 
This is not a perfect solution, but might improve matters for you.
Cheers -- Rick

On May 22, 2017 1:58:21 PM EDT, Fundera Developer 

 wrote:


Thank you Zahid and Erik,

I was going to try the CharFilter suggestion, but then I doubted. I see
the indexing process, and how the appearance of 'i+d' would be handled,
but, what happens at query time? If I use the same filter, I could
remove '+' chars that are added by the user to identify compulsory
tokens in the search results, couldn't I?  However, if i do not use the
CharFilter I would not be able to match the 'i+d' search tokens...

Thanks all!



El 22/05/17 a las 16:39, Erick Erickson escribió:

You can also use any of the other tokenizers. WhitespaceTokenizer for
instance. There are a couple that use regular expressions. Etc. See:
https://cwiki.apache.org/confluence/display/solr/Tokenizers

Each one has it's considerations. WhitespaceTokenizer won't, for
instance, separate out punctuation so you might then have to use a
filter to remove those. Regex's can be tricky to get right ;). Etc

Best,
Erick

On Mon, May 22, 2017 at 5:26 AM, Muhammad Zahid Iqbal

wrote:


Hi,


Before applying tokenizer, you can replace your special symbols with
some
phrase to preserve it and after tokenized you can replace it back.

For example:



Thanks,
Zahid iqbal

On Mon, May 22, 2017 at 12:57 AM, Fundera Developer <
funderadevelo...@outlook.com>
wrote:



Hi all,

I am a bit stuck at a problem that I feel must be easy to solve. In
Spanish it is usual to find the term 'i+d'. We are working with Solr
5.5,
and StandardTokenizer splits 'i' and 'd' and sometimes, as we have in
the
index documents both in Spanish and Catalan, and in Catalan it is
frequent
to find 'i' as a word, when a user searches for 'i+d' it gets Catalan
documents as results.

I have tried to use the SynonymFilter, with something like:

i+d => investigacionYdesarrollo

But it does not seem to change anything.

Is there a way I could set an exception to the Tokenizer so that it
does
not split this word?

Thanks in advance!













Re: Indexing word with plus sign

2017-05-23 Thread Walter Underwood
That was on Solr 1.3, so I’m pretty sure it was the whitespace tokenizer.

The synonym substitution for “+/-" was done in client code and indexing code, 
outside of Solr. We also sanitized queries to remove all query syntax 
characters. 

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On May 23, 2017, at 11:21 AM, Fundera Developer 
>  wrote:
> 
> Thanks Walter!!
> 
> For the sake of curiosity, do you remember which Tokenizer were you using in 
> that case?
> 
> Thanks!
> 
> 
> El 23/05/17 a las 20:02, Walter Underwood escribió:
> 
> Years ago at Netflix, I had to deal with a DVD from a band named “+/-“. I 
> gave up and translated that to “plusminus” at index and query time.
> 
> http://plusmin.us/ 
> 
> Luckily, “.hack//Sign” and other related dot-hack anime matched if I just 
> deleted all the punctuation. And everyone searched for "[•REC]²” as “rec2”. 
> The middot is supposed to be red. Movie studios are clueless about searchable 
> strings.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
> 
> 
> 
> On May 23, 2017, at 10:41 AM, Erick Erickson 
>  wrote:
> 
> You need to distinguish between
> 
> PatternReplaceCharFilterFactory
> 
> and
> 
> PatternReplaceFilterFactory
> 
> The first one is applied to the entire input _before_ tokenization.
> The second is applied _after_ tokenization to individual tokens, by
> that time it's too late.
> 
> It's an easy thing to miss.
> 
> And at query time you'll have to be careful to keep the + sign from
> being interpreted as an operator.
> Best,
> Erick
> 
> On Tue, May 23, 2017 at 10:12 AM, Fundera Developer
>  wrote:
> 
> 
> I have also tried this option, by using a PatternReplaceFilterFactory, like 
> this:
> 
>  replacement="investigación y desarrollo"/>
> 
> but it gets processed AFTER the Tokenizer, so when it executes there is no 
> longer an "i+d" token, but two "i" and "d" independent tokens.
> 
> Is there a way I could make the filter execute before the Tokenizer? I have 
> tried to place it first in the Analyzer definition like this:
> 
>
>   mapping="mapping-FoldToASCII.txt"/>
>   replacement="investigación y desarrollo"/>
>  
>  
>   words="stopwords.txt" />
>
> 
> But I had no luck.
> 
> Are there any other approaches I could be missing?
> 
> Thanks!
> 
> 
> El 22/05/17 a las 20:50, Rick Leir escribió:
> 
> Fundera,
> You need a regex which matches a '+' with non-blank chars before and after. 
> It should not replace a  '+' preceded by white space, that is important in 
> Solr. This is not a perfect solution, but might improve matters for you.
> Cheers -- Rick
> 
> On May 22, 2017 1:58:21 PM EDT, Fundera Developer 
> 
>  wrote:
> 
> 
> Thank you Zahid and Erik,
> 
> I was going to try the CharFilter suggestion, but then I doubted. I see
> the indexing process, and how the appearance of 'i+d' would be handled,
> but, what happens at query time? If I use the same filter, I could
> remove '+' chars that are added by the user to identify compulsory
> tokens in the search results, couldn't I?  However, if i do not use the
> CharFilter I would not be able to match the 'i+d' search tokens...
> 
> Thanks all!
> 
> 
> 
> El 22/05/17 a las 16:39, Erick Erickson escribió:
> 
> You can also use any of the other tokenizers. WhitespaceTokenizer for
> instance. There are a couple that use regular expressions. Etc. See:
> https://cwiki.apache.org/confluence/display/solr/Tokenizers
> 
> Each one has it's considerations. WhitespaceTokenizer won't, for
> instance, separate out punctuation so you might then have to use a
> filter to remove those. Regex's can be tricky to get right ;). Etc
> 
> Best,
> Erick
> 
> On Mon, May 22, 2017 at 5:26 AM, Muhammad Zahid Iqbal
> 
> wrote:
> 
> 
> Hi,
> 
> 
> Before applying tokenizer, you can replace your special symbols with
> some
> phrase to preserve it and after tokenized you can replace it back.
> 
> For example:
>  replacement="xxx" />
> 
> 
> Thanks,
> Zahid iqbal
> 
> On Mon, May 22, 2017 at 12:57 AM, Fundera Developer <
> funderadevelo...@outlook.com>
> wrote:
> 
> 
> 
> Hi all,
> 
> I am a bit stuck at a 

JSON Facet API : numBuckets and count clarification

2017-05-23 Thread Varun Thacker
Here is my current understanding of how these counts work

numBuckets : Is supposed to tell the user how many unique buckets were seen
in the facet calculation. Given that we currently don't do refinements this
number can be equal or less than the actual number of unique buckets

count : The total number of values which were taken into consideration for
the facet calculation.
For single valued fields this would be equal to the number matching
documents

And these would behave the same way for the sub-facets as well.


Re: Disable All kind of caching in Solr/Lucene

2017-05-23 Thread Pushkar Raste
What version are you on. There was a bug where if you use cache size 0, it
would still create a cache with size 2 (or may be just 1). It was fixed
under https://issues.apache.org/jira/browse/SOLR-9886?filter=-2



On Apr 3, 2017 9:26 AM, "Nilesh Kamani"  wrote:

> @Yonik even though the code change is in SolrIndexer class, it has nothing
> do with index itself.
> After fetching docIds, I am filtering them on one more criteria. (Very
> weird code it is).
>
> I tried q={!cache=false}, but not working. Subsequent search is done under
> 2 milliseconds.
>
> Does anybdody have more insight  on this ?
>
> On Fri, Mar 31, 2017 at 2:17 PM, Yonik Seeley  wrote:
>
> > On Fri, Mar 31, 2017 at 1:53 PM, Nilesh Kamani 
> > wrote:
> > > @Alexandre - Could you please point me to reference doc to remove
> default
> > > cache settings ?
> > >
> > > @Yonik - The code change is in Solr Indexer to sort the results.
> >
> > OK, so to test indexing performance, there are no caches to worry
> > about (as long as you have autowarmCount=0 on all caches, as is the
> > case with the Solr example configs).
> >
> > To test sorted query performance (I assume you're sorting the index to
> > accelerate certain sorted queries), if you can't make the queries
> > unique, then add
> > {!cache=false} to the query
> > example: q={!cache=false}*:*
> > You could also add a random term on a non-existent field to change the
> > query and prevent unwanted caching...
> > example: q=*:* does_not_exist_s:149475394
> >
> > -Yonik
> >
>


Re: Rule-based Replica Placement not working with Solr 6.5.1

2017-05-23 Thread Damien Kamerman
I'm not sure I fully understand what you're trying to do but this is what I
do to ensure replicas are not on the same rack:

rule=shard:*,replica:<2,sysprop.rack:*

On 23 May 2017 at 22:37, Bernd Fehling 
wrote:

> Yes, I tried that already.
> Sure, it assigns 2 nodes with port 8983 to shard1 (e.g.
> server1:8983,server2:8983).
> But due to no replica rule (which defaults to wildcard) I also get
> shard3 --> server2:8983,server2:7574
> shard2 --> server1:7574,server3:8983
>
> The result is 3 replicas on server2 and also 2 replicas on one node of
> server2
> but _no_ replica on node server3:7574.
>
> I also tried to really nail it down with the rule:
> rule=shard:shard1,replica:<2,sysprop.rack:1&
> rule=shard:shard2,replica:<2,sysprop.rack:2&
> rule=shard:shard3,replica:<2,sysprop.rack:3
>
> The nodes were started with the correct -Drack=x property, but no luck.
>
> From debugging I can see that the code is "over complicated" written.
> Probably to catch all possibilities (core, node, port, ip_x,...) but with
> the lack
> not really trying all permutations and obeying the rules.
>
> I will open a ticket for this.
>
> Regards
> Bernd
>
> Am 23.05.2017 um 14:09 schrieb Noble Paul:
> > did you try the rule
> > shard:shard1,port:8983
> >
> > this ensures that all replicas of shard1 is allocated in the node w/
> port 8983.
> >
> > if it doesn't , it's a bug. Please open  aticket
> >
> > On Tue, May 23, 2017 at 7:10 PM, Bernd Fehling
> >  wrote:
> >> After some analysis it turns out that they compare apples with oranges
> :-(
> >>
> >> Inside "tryAPermutationOfRules" the rule is called with rules.get() and
> >> the next step is calling rule.compare(), but they don't compare the
> nodes
> >> against the rule (or rules). They compare the nodes against each other.
> >>
> >> E.g. server1:8983, server2:7574, server1:7574,...
> >> What do you think will happen if comparing server1:8983 against
> server2:7574 (and so on)???
> >> It will _NEVER_ match!!!
> >>
> >> Regards
> >> Bernd
> >>
> >>
> >> Am 23.05.2017 um 08:54 schrieb Bernd Fehling:
> >>> No, that is way off, because:
> >>> 1. you have no "tag" defined.
> >>>shard and replica can be omitted and they will default to wildcard,
> >>>but a "tag" must be defined.
> >>> 2. replica must be an integer or a wildcard.
> >>>
> >>> Regards
> >>> Bernd
> >>>
> >>> Am 23.05.2017 um 01:17 schrieb Damien Kamerman:
>  If you want all the replicas for shard1 on the same port then I think
> the
>  rule is: 'shard:shard1,replica:port:8983'
> 
>  On 22 May 2017 at 18:47, Bernd Fehling  de>
>  wrote:
> 
> > I tried many settings with "Rule-based Replica Placement" on Solr
> 6.5.1
> > and came to the conclusion that it is not working at all.
> >
> > My test setup is 6 nodes on 3 servers (port 8983 and 7574 on each
> server).
> >
> > The call to create a new collection is
> > "http://localhost:8983/solr/admin/collections?action=
> CREATE&name=boss&
> > collection.configName=boss_configs&numShards=3&replicationFactor=2&
> > maxShardsPerNode=1&rule=shard:shard1,replica:<2,port:8983"
> >
> > With "rule=shard:shard1,replica:<2,port:8983" I expect that shard1
> has
> > only nodes with port 8983 _OR_ it shoud fail due to "strict mode"
> because
> > the fuzzy operator "~" it not set.
> >
> > The result of the call is:
> > shard1 --> server2:7574 / server1:8983
> > shard2 --> server1:7574 / server3:8983
> > shard3 --> server2:8983 / server3:7574
> >
> > The expected result should be (at least!!!) shard1 --> server_x:8983
> /
> > server_y:8983
> > where "_x" and "_y" can be anything between 1 and 3 but must be
> different.
> >
> > I think the problem is somewhere in "class ReplicaAssigner" with
> > "tryAllPermutations"
> > and "tryAPermutationOfRules".
> >
> > Regards
> > Bernd
> >
> 
>


solr 6 at scale

2017-05-23 Thread Nawab Zada Asad Iqbal
Hi all,

I am planning to upgrade my solr.4.x installation to a recent stable
version. Should I get the latest 6.5.1 bits or will a little older release
be better in terms of stability?
I am curious if there is way to see solr.6.x adoption in large companies. I
have talked to few people and they are also stuck at older major versions.

Anyone using solr.6.x for multi-terabytes index size: how did you decide
which version to upgrade to?


Regards
Nawab


Re: solr 6 at scale

2017-05-23 Thread Walter Underwood
We are running 6.5.1 in a 16 node cluster, four shards and four replicas. It is 
performing brilliantly.

Our index is 18 million documents, but we have very heavy queries. Students are 
searching for homework help, so they paste in the entire problem. We truncate 
queries at 40 terms to limit the load, but we have a LOT of long queries. Our 
average query time is nicely under 500 milliseconds.

I strongly recommend that you benchmark your data with your prod queries. 
JMeter can replay access logs.

Versions after 6.4.0 and before 6.5.1 have performance issues because of the 
metrics reporting. Use 6.5.1.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On May 23, 2017, at 5:27 PM, Nawab Zada Asad Iqbal  wrote:
> 
> Hi all,
> 
> I am planning to upgrade my solr.4.x installation to a recent stable
> version. Should I get the latest 6.5.1 bits or will a little older release
> be better in terms of stability?
> I am curious if there is way to see solr.6.x adoption in large companies. I
> have talked to few people and they are also stuck at older major versions.
> 
> Anyone using solr.6.x for multi-terabytes index size: how did you decide
> which version to upgrade to?
> 
> 
> Regards
> Nawab



Re: Disable All kind of caching in Solr/Lucene

2017-05-23 Thread Nilesh Kamani
Thanks Pushkar. I will upgrade to latest solar version and check if it is
working now.


On Tue, May 23, 2017 at 7:13 PM Pushkar Raste 
wrote:

> What version are you on. There was a bug where if you use cache size 0, it
> would still create a cache with size 2 (or may be just 1). It was fixed
> under https://issues.apache.org/jira/browse/SOLR-9886?filter=-2
>
>
>
> On Apr 3, 2017 9:26 AM, "Nilesh Kamani"  wrote:
>
> > @Yonik even though the code change is in SolrIndexer class, it has
> nothing
> > do with index itself.
> > After fetching docIds, I am filtering them on one more criteria. (Very
> > weird code it is).
> >
> > I tried q={!cache=false}, but not working. Subsequent search is done
> under
> > 2 milliseconds.
> >
> > Does anybdody have more insight  on this ?
> >
> > On Fri, Mar 31, 2017 at 2:17 PM, Yonik Seeley  wrote:
> >
> > > On Fri, Mar 31, 2017 at 1:53 PM, Nilesh Kamani <
> nilesh.kam...@gmail.com>
> > > wrote:
> > > > @Alexandre - Could you please point me to reference doc to remove
> > default
> > > > cache settings ?
> > > >
> > > > @Yonik - The code change is in Solr Indexer to sort the results.
> > >
> > > OK, so to test indexing performance, there are no caches to worry
> > > about (as long as you have autowarmCount=0 on all caches, as is the
> > > case with the Solr example configs).
> > >
> > > To test sorted query performance (I assume you're sorting the index to
> > > accelerate certain sorted queries), if you can't make the queries
> > > unique, then add
> > > {!cache=false} to the query
> > > example: q={!cache=false}*:*
> > > You could also add a random term on a non-existent field to change the
> > > query and prevent unwanted caching...
> > > example: q=*:* does_not_exist_s:149475394
> > >
> > > -Yonik
> > >
> >
>


Re: Solr in NAS or Network Shared Drive

2017-05-23 Thread Shawn Heisey
On 5/19/2017 8:33 AM, Ravi Kumar Taminidi wrote:
> Hello,  Scenario: Currently we have 2 Solr Servers running in 2 different 
> servers (linux), Is there any way can we make the Core to be located in NAS 
> or Network shared Drive so both the solrs using the same Index.
>
> Let me know if any performance issues, our size of Index is appx 1GB.

I think it's a very bad idea to try to share indexes between multiple
Solr instances.  You can override the locking and get it to work, and
you may be able to find advice on the Internet about how to do it.  I
can tell you that it's outside the design intent for both Lucene and
Solr.  Lucene works aggressively to *prevent* multiple processes from
sharing an index.

In general, network storage is not a good idea for Solr.  There's added
latency for accessing any data, and frequently the filesystem won't
support the kind of locking that Lucene wants to use, but the biggest
potential problem is disk caching.  Solr/Lucene is absolutely reliant on
disk caching in the SOlr server's local memory for good performance.  If
the network filesystem cannot be cached by the client that has mounted
the storage, which I believe is the case for most network filesystem
types, then you're reliant on disk caching in the network server(s). 
For VERY large indexes, which is really the only viable use case I can
imagine for network storage, it is highly unlikely that the network
server(s) will have enough memory to effectively cache the data.

Solr has explicit support for HDFS storage, but as I understand it, HDFS
includes the ability for a client to allocate memory that gets used
exclusively for caching on the client side, which allows HDFS to
function like a local filesystem in ways that I don't think NFS can. 
Getting back to my advice about not sharing indexes -- even with
SolrCloud on HDFS, multiple replicas generally do NOT share an index.

A 1GB index is very small, so there's no good reason I can think of to
involve network storage.  I would strongly recommend local storage, and
you should abandon any attempt to share the same index data between more
than one Solr instance.

Thanks,
Shawn



Re: solr 6 at scale

2017-05-23 Thread Erick Erickson
I'll quibble a little with Walter and say that 6.4.2 fixes the perf
problem in 6.4.0 and 6.4.1. Which doesn't change his recommendation at
all, I'd go with 6.5.1.

Best,
Erick

On Tue, May 23, 2017 at 5:49 PM, Walter Underwood  wrote:
> We are running 6.5.1 in a 16 node cluster, four shards and four replicas. It 
> is performing brilliantly.
>
> Our index is 18 million documents, but we have very heavy queries. Students 
> are searching for homework help, so they paste in the entire problem. We 
> truncate queries at 40 terms to limit the load, but we have a LOT of long 
> queries. Our average query time is nicely under 500 milliseconds.
>
> I strongly recommend that you benchmark your data with your prod queries. 
> JMeter can replay access logs.
>
> Versions after 6.4.0 and before 6.5.1 have performance issues because of the 
> metrics reporting. Use 6.5.1.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
>> On May 23, 2017, at 5:27 PM, Nawab Zada Asad Iqbal  wrote:
>>
>> Hi all,
>>
>> I am planning to upgrade my solr.4.x installation to a recent stable
>> version. Should I get the latest 6.5.1 bits or will a little older release
>> be better in terms of stability?
>> I am curious if there is way to see solr.6.x adoption in large companies. I
>> have talked to few people and they are also stuck at older major versions.
>>
>> Anyone using solr.6.x for multi-terabytes index size: how did you decide
>> which version to upgrade to?
>>
>>
>> Regards
>> Nawab
>


How to handle nested documents in solr (SolrJ)

2017-05-23 Thread prasad chowdary
Dear All,

I have a requirement that I need to index the documents in solr using Java
code.

Each document contains a sub documents like below ( Its just for
underastanding my question).


student id : 123
student name : john
marks : 
   maths: 90
   English :95

student id : 124
student name : rack
marks : 
   maths: 80
   English :96

etc...

So, as shown above each document contains one child document i.e marks.

Actaully I don't need any joins or anything.My requirement is :

if I query "English:95" ,it should return the complete document ,i.e child
along with parent like below

student id : 123
student name : john
marks : 
   maths: 90
   English :95

and also if I query "student id : 123" , it should return the whole document
same as above.

Currently I am able to get the child along with parent for child match by
using extendedResults option .

But not able to get the child for parent match.

Please help me out in this regard.Thanks in advance.








--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-handle-nested-documents-in-solr-SolrJ-tp4336861.html
Sent from the Solr - User mailing list archive at Nabble.com.