Re: SOLR as nosql database store

2017-05-10 Thread Bharath Kumar
Thanks Shawn and Rick for your suggestions. We will surely look at these
options.

On Tue, May 9, 2017 at 4:39 AM, Shawn Heisey  wrote:

> On 5/9/2017 12:58 AM, Bharath Kumar wrote:
> > Thanks Hrishikesh and Dave. We use SOLR cloud with 2 extra replicas,
> will that not serve as backup when something goes wrong? Also we use latest
> solr 6 and from the documentation of solr, the indexing performance has
> been good. The reason is that we are using MySQL as the primary data store
> and the performance might not be optimal if we write data at a very rapid
> rate. Already we index almost half the fields that are in MySQL in solr.
>
> A replica is protection against data loss in the event of hardware
> failure, but there are classes of problems that it cannot protect against.
>
> Although Solr (Lucene) does try *really* hard to never lose data that it
> hasn't been asked to delete, it is not designed to be a database.  It's
> a search engine.  Solr doesn't offer the same kinds of guarantees about
> the data it contains that software like MySQL does.
>
> I personally don't recommend trying to use Solr as a primary data store,
> but if that's what you really want to do, then I would suggest that you
> have two complete Solr installs, with multiple replicas on both.  One of
> them will be used for searching and have a configuration you're already
> familiar with, the other will be purely for data storage -- only certain
> fields like the uniqueKey will be indexed, but every other field will be
> stored only.
>
> Running with two separate Solr installs will allow you to optimize one
> for searching and the other for data storage.  The searching install
> will be able to rebuild itself from the data storage install when that
> is required.  If better performance is needed for the rebuild, you have
> the option of writing a multi-threaded or multi-process program that
> reads from one and writes to the other.
>
> Thanks,
> Shawn
>
>


-- 
Thanks & Regards,
Bharath MV Kumar

"Life is short, enjoy every moment of it"


Re: SOLR as nosql database store

2017-05-10 Thread Mike Drob
> The searching install will be able to rebuild itself from the data
storage install when that
is required.

Is this a use case for CDCR?

Mike

On Tue, May 9, 2017 at 6:39 AM, Shawn Heisey  wrote:

> On 5/9/2017 12:58 AM, Bharath Kumar wrote:
> > Thanks Hrishikesh and Dave. We use SOLR cloud with 2 extra replicas,
> will that not serve as backup when something goes wrong? Also we use latest
> solr 6 and from the documentation of solr, the indexing performance has
> been good. The reason is that we are using MySQL as the primary data store
> and the performance might not be optimal if we write data at a very rapid
> rate. Already we index almost half the fields that are in MySQL in solr.
>
> A replica is protection against data loss in the event of hardware
> failure, but there are classes of problems that it cannot protect against.
>
> Although Solr (Lucene) does try *really* hard to never lose data that it
> hasn't been asked to delete, it is not designed to be a database.  It's
> a search engine.  Solr doesn't offer the same kinds of guarantees about
> the data it contains that software like MySQL does.
>
> I personally don't recommend trying to use Solr as a primary data store,
> but if that's what you really want to do, then I would suggest that you
> have two complete Solr installs, with multiple replicas on both.  One of
> them will be used for searching and have a configuration you're already
> familiar with, the other will be purely for data storage -- only certain
> fields like the uniqueKey will be indexed, but every other field will be
> stored only.
>
> Running with two separate Solr installs will allow you to optimize one
> for searching and the other for data storage.  The searching install
> will be able to rebuild itself from the data storage install when that
> is required.  If better performance is needed for the rebuild, you have
> the option of writing a multi-threaded or multi-process program that
> reads from one and writes to the other.
>
> Thanks,
> Shawn
>
>


Re: distribution of leader and replica in SolrCloud

2017-05-10 Thread Rick Leir

Bernd,

Yes, cloud, ahhh. As you say, the world changed.  Do you have any hint 
from the cloud provider as to which physical machine your virtual server 
is on? If so, you can hopefully distribute your replicas across physical 
machines. This is not just for reliability: in a sharded system, each 
query will cause activity in several virtual servers and you would 
prefer that they are on separate physical machines, not competing for 
resources. Maybe, for Solr, you should choose a provider which can lease 
you the whole physical machine. You would prefer a 256G machine over 
several shards on 64G virtual machines.


And many cloud providers assume that servers are mostly idle, so they 
cram too many server containers into a machine. Then, very occasionally, 
you get OOM even though you did not exceed your advertised RAM. This is 
a topic for some other forum, where should I look?


With AWS you can choose to locate your virtual machine in US-west-Oregon 
or US-east-i-forget or a few other locations, but that is a very coarse 
division. Can you choose physical machine?


With Google, it might be dynamic?
cheers -- Rick


On 2017-05-09 03:44 AM, Bernd Fehling wrote:

I would name your solution more a work around as any similar solution of this 
kind.
The issue SOLR-6027 is now 3 years open and the world has changed.
Instead of racks full of blades where you had many dedicated bare metal servers
you have now huge machines with 256GB RAM and many CPUs. Virtualization has 
taken place.
To get under these conditions some independance from the physical hardware you 
have
to spread the shards across several physical machines with virtual servers.
>From my point of view it is a good solution to have 5 virtual 64GB servers
on 5 different huge physical machines and start 2 instances on each virtual 
server.
If I would split up each 64GB virtual server into two 32GB virtual server there 
would
be no gain. We don't have 10 huge machines (no security win) and we have to 
admin
and control 10 virtual servers instead of 5 (plus zookeeper servers).

It is state of the art that you don't have to care about the servers within
the cloud. This is the main sense of a cloud.
The leader should always be aware who are the members of his cloud, how to reach
them (IP address) and how are the users of the cloud (collections) distributed
across the cloud.

It would be great if a solution of issue SOLR-6027 would lead to some kind of
"automatic mode" for server distribution, without any special configuring.

Regards,
Bernd


Am 08.05.2017 um 17:47 schrieb Erick Erickson:

Also, you can specify custom placement rules, see:
https://cwiki.apache.org/confluence/display/solr/Rule-based+Replica+Placement

But Shawn's statement is the nub of what you're seeing, by default
multiple JVMs on the same physical machine are considered separate
Solr instances.

Also note that if you want to, you can specify a nodeSet when you
create the nodes, and in particular the special value EMPTY. That'll
create a collection with no replicas and you can ADDREPLICA to
precisely place each one if you require that level of control.

Best,
Erick

On Mon, May 8, 2017 at 7:44 AM, Shawn Heisey  wrote:

On 5/8/2017 5:38 AM, Bernd Fehling wrote:

boss -- shard1 - server2:7574
| |-- server2:8983 (leader)

The reason that this happened is because you've got two nodes running on
every server.  From SolrCloud's perspective, there are ten distinct
nodes, not five.

SolrCloud doesn't notice the fact that different nodes are running on
the same server(s).  If your reaction to hearing this is that it
*should* notice, you're probably right, but in a typical use case, each
server should only be running one Solr instance, so this would never happen.

There is only one instance where I can think of where I would recommend
running multiple instances per server, and that is when the required
heap size for a single instance would be VERY large.  Running two
instances with smaller heaps can yield better performance.

See this issue:

https://issues.apache.org/jira/browse/SOLR-6027

Thanks,
Shawn





Re: Solr Query Limits

2017-05-10 Thread Alexandre Rafalovitch
How many values are you trying to pass in? And in which format? And
what issues are you facing? There are too many variables here to give
a generic advice.

Regards,
   Alex.

http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 10 May 2017 at 02:33, Adnan Shaikh  wrote:
> Hello,
>
> Thanks Alexandre for the update.
>
> Please help me to understand the other part of the query as well , if there
> is any limit to how many values we can pass for a key.
>
> Thanks,
> Mohammad Adnan Shaikh
>
> On May 9, 2017, at 8:05 PM, Alexandre Rafalovitch 
> wrote:
>
> I am not aware of any limits in Solr itself. However, if you are using
> a GET request to do the query, you may be running into browser
> limitations regarding URL length.
>
> It may be useful to know that Solr can accept the query parameters in
> the POST body as well.
>
> Regards,
>   Alex.
> 
> http://www.solr-start.com/ - Resources for Solr users, new and experienced
>
>
> On 9 May 2017 at 10:19, Adnan Shaikh  wrote:
>
> Hello Team,
>
> Have a query  pertaining to how many values are we able to pass in a Solr
> query.
>
> Can we please find out if:
>
> 1. There is a limit to the number of characters that we can pass in a
> Solr query field?
> 2. Is there a limit to how many values we can pass for the one key?
>
> Thanks,
> Mohammad Adnan Shaikh


Need help in understanding solr clustering component

2017-05-10 Thread yauza
 I was looking(in process of making my own) into solr's default clustering
component for carrot2. In the clustering component class there are 2 methods
where the clustering algorithms are called:

in the overridden process method
SolrDocumentList solrDocList = SolrPluginUtils.docListToSolrDocumentList(
results.docList, rb.req.getSearcher(),
engine.getFieldsToLoad(rb.req),docIds);
Object clusters = engine.cluster(rb.getQuery(), solrDocList, docIds,
rb.req);
rb.rsp.add("clusters", clusters);

And once again in the finishStage method

Map docIds = null;
Object clusters = engine.cluster(rb.getQuery(), solrDocList, docIds,
rb.req);
rb.rsp.add("clusters", clusters);

Now my question is the process method works not on the complete result query
but on the shards and finish stage once when all the results have been
aggregated, then why does we call the clustering algorithms twice and adding
it to the resulted cluster?  Am I missing something?  
Wont it create too many labels if in the worst case none of the cluster
labels match?


P.S Please correct me if I am wrong.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Need-help-in-understanding-solr-clustering-component-tp4334400.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: distribution of leader and replica in SolrCloud

2017-05-10 Thread Bernd Fehling
Hi Rick,

yes I have distributed 5 virtual server accross 5 physical machines.
So each virtual server is on a separate physical machine.

Splitting each virtual server (64GB RAM) into two (32GB RAM), which then
will be 10 virtual server accross 5 physical machines, is no option
because there is no gain against hardware failure of a physical machine.

So I rather go with two Solr instances per 64GB virtual server as first try.

Currently I'm still trying to solve the Rule-based Replica Placement.
There seams to be no way to report if a node is a "leader" or has the 
"role="leader".

Do you know how to create a rule like:
--> "do not create the replica on the same host where his leader exists"

Regards,
Bernd


Am 10.05.2017 um 10:54 schrieb Rick Leir:
> Bernd,
> 
> Yes, cloud, ahhh. As you say, the world changed.  Do you have any hint from 
> the cloud provider as to which physical machine your virtual server
> is on? If so, you can hopefully distribute your replicas across physical 
> machines. This is not just for reliability: in a sharded system, each
> query will cause activity in several virtual servers and you would prefer 
> that they are on separate physical machines, not competing for
> resources. Maybe, for Solr, you should choose a provider which can lease you 
> the whole physical machine. You would prefer a 256G machine over
> several shards on 64G virtual machines.
> 
> And many cloud providers assume that servers are mostly idle, so they cram 
> too many server containers into a machine. Then, very occasionally,
> you get OOM even though you did not exceed your advertised RAM. This is a 
> topic for some other forum, where should I look?
> 
> With AWS you can choose to locate your virtual machine in US-west-Oregon or 
> US-east-i-forget or a few other locations, but that is a very coarse
> division. Can you choose physical machine?
> 
> With Google, it might be dynamic?
> cheers -- Rick
> 
> 
> On 2017-05-09 03:44 AM, Bernd Fehling wrote:
>> I would name your solution more a work around as any similar solution of 
>> this kind.
>> The issue SOLR-6027 is now 3 years open and the world has changed.
>> Instead of racks full of blades where you had many dedicated bare metal 
>> servers
>> you have now huge machines with 256GB RAM and many CPUs. Virtualization has 
>> taken place.
>> To get under these conditions some independance from the physical hardware 
>> you have
>> to spread the shards across several physical machines with virtual servers.
>> >From my point of view it is a good solution to have 5 virtual 64GB servers
>> on 5 different huge physical machines and start 2 instances on each virtual 
>> server.
>> If I would split up each 64GB virtual server into two 32GB virtual server 
>> there would
>> be no gain. We don't have 10 huge machines (no security win) and we have to 
>> admin
>> and control 10 virtual servers instead of 5 (plus zookeeper servers).
>>
>> It is state of the art that you don't have to care about the servers within
>> the cloud. This is the main sense of a cloud.
>> The leader should always be aware who are the members of his cloud, how to 
>> reach
>> them (IP address) and how are the users of the cloud (collections) 
>> distributed
>> across the cloud.
>>
>> It would be great if a solution of issue SOLR-6027 would lead to some kind of
>> "automatic mode" for server distribution, without any special configuring.
>>
>> Regards,
>> Bernd
>>
>>
>> Am 08.05.2017 um 17:47 schrieb Erick Erickson:
>>> Also, you can specify custom placement rules, see:
>>> https://cwiki.apache.org/confluence/display/solr/Rule-based+Replica+Placement
>>>
>>> But Shawn's statement is the nub of what you're seeing, by default
>>> multiple JVMs on the same physical machine are considered separate
>>> Solr instances.
>>>
>>> Also note that if you want to, you can specify a nodeSet when you
>>> create the nodes, and in particular the special value EMPTY. That'll
>>> create a collection with no replicas and you can ADDREPLICA to
>>> precisely place each one if you require that level of control.
>>>
>>> Best,
>>> Erick
>>>
>>> On Mon, May 8, 2017 at 7:44 AM, Shawn Heisey  wrote:
 On 5/8/2017 5:38 AM, Bernd Fehling wrote:
> boss -- shard1 - server2:7574
> | |-- server2:8983 (leader)
 The reason that this happened is because you've got two nodes running on
 every server.  From SolrCloud's perspective, there are ten distinct
 nodes, not five.

 SolrCloud doesn't notice the fact that different nodes are running on
 the same server(s).  If your reaction to hearing this is that it
 *should* notice, you're probably right, but in a typical use case, each
 server should only be running one Solr instance, so this would never 
 happen.

 There is only one instance where I can think of where I would recommend
 running multiple instances per server, and that is when the required
 heap size for a single instance would be V

RE: 6.5.1. cloud went partially down

2017-05-10 Thread Markus Jelsma
I am not this is directly related but we also sometimes see clients losing 
connections on 6.5.1, this with the problem described below are unique to 
6.5.1, i have not seen this many issues with cloud in a short time for a very 
long time. 

2017-05-09 21:30:36.661 ERROR (Document compiler) [c:logs s:shard1 r:core_node1 
x:logs_shard1_replica1] o.a.s.c.s.i.CloudSolrClient Request to collection 
search failed due to (0) java.lang.IllegalStateException: Connection pool shut 
down, retry? 0

Clients appear unable to recover from this problem. The cloud the clients are 
connecting to is up and doing fine.

Any ideas?

Thanks,
Markus

 
 
-Original message-
> From:Markus Jelsma 
> Sent: Monday 8th May 2017 11:35
> To: solr-user 
> Subject: 6.5.1. cloud went partially down
> 
> Hi,
> 
> Multiple 6.5.1. clouds / collections went down this weekend around the same 
> time, they share the same ZK quorum. The nodes stayed up but did not rejoin 
> the cluster (find or connect to ZK)
> 
> This is what the log told us:
> 
> 2017-05-06 18:58:34.893 WARN  
> (zkCallback-5-thread-9-processing-n:idx6.example.org:8983_solr) [   ] 
> o.a.s.c.c.ConnectionManager Watcher 
> org.apache.solr.common.cloud.ConnectionManager@4f97bdad name: ZooKe
> eperConnection 
> Watcher:89.188.14.10:2181,89.188.14.11:2181,89.188.14.12:2181/solr_collection_search
>  got event WatchedEvent state:Disconnected type:None path:null path: null 
> type: None
> 2017-05-06 18:58:34.893 WARN  
> (zkCallback-5-thread-9-processing-n:idx6.example.org:8983_solr) [   ] 
> o.a.s.c.c.ConnectionManager zkClient has disconnected
> 2017-05-06 18:58:35.001 WARN  
> (zkCallback-9-thread-5-processing-n:idx6.example.org:8983_solr 
> x:search_shard2_replica3 s:shard2 c:search r:core_node6-EventThread) 
> [c:search s:shard2 r:core_node6 x:search_shard2_replica3] 
> o.a.s.c.c.ConnectionManager Watcher 
> org.apache.solr.common.cloud.ConnectionManager@c226cc name: 
> ZooKeeperConnection 
> Watcher:89.188.14.10:2181,89.188.14.11:2181,89.188.14.12:2181/solr_collection_search
>  got event WatchedEvent state:Disconnected type:None path:null path: null 
> type: None
> 2017-05-06 18:58:35.010 WARN  
> (zkCallback-9-thread-5-processing-n:idx6.example.org:8983_solr 
> x:search_shard2_replica3 s:shard2 c:search r:core_node6-EventThread) 
> [c:search s:shard2 r:core_node6 x:search_shard2_replica3] 
> o.a.s.c.c.ConnectionManager zkClient has disconnected
> 2017-05-06 18:58:45.360 WARN  
> (zkCallback-5-thread-8-processing-n:idx6.example.org:8983_solr) [   ] 
> o.a.s.c.c.ConnectionManager Watcher 
> org.apache.solr.common.cloud.ConnectionManager@4f97bdad name: 
> ZooKeeperConnection 
> Watcher:89.188.14.10:2181,89.188.14.11:2181,89.188.14.12:2181/solr_collection_search
>  got event WatchedEvent state:Expired type:None path:null path: null type: 
> None
> 2017-05-06 18:58:45.360 WARN  
> (zkCallback-5-thread-8-processing-n:idx6.example.org:8983_solr) [   ] 
> o.a.s.c.c.ConnectionManager Our previous ZooKeeper session was expired. 
> Attempting to reconnect to recover relationship with ZooKeeper...
> 2017-05-06 18:58:45.380 WARN  
> (OverseerStateUpdate-97740792370385619-idx6.example.org:8983_solr-n_000558)
>  [   ] o.a.s.c.Overseer Solr cannot talk to ZK, exiting Overseer main queue 
> loop
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode 
> = Session expired for /overseer/queue
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1472)
> at 
> org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:339)
> at 
> org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:336)
> at 
> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60)
> at 
> org.apache.solr.common.cloud.SolrZkClient.getChildren(SolrZkClient.java:336)
> at 
> org.apache.solr.cloud.DistributedQueue.fetchZkChildren(DistributedQueue.java:308)
> at 
> org.apache.solr.cloud.DistributedQueue.firstChild(DistributedQueue.java:285)
> at 
> org.apache.solr.cloud.DistributedQueue.firstElement(DistributedQueue.java:393)
> at 
> org.apache.solr.cloud.DistributedQueue.peek(DistributedQueue.java:159)
> at 
> org.apache.solr.cloud.DistributedQueue.peek(DistributedQueue.java:137)
> at 
> org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:180)
> at java.lang.Thread.run(Thread.java:745)
> 2017-05-06 18:58:45.381 WARN  
> (zkCallback-5-thread-8-processing-n:idx6.example.org:8983_solr) [   ] 
> o.a.s.c.c.DefaultConnectionStrategy Connection expired - starting a new one...
> 2017-05-06 18:58:45.382 ERROR (OverseerExitThread) [   ] o.a.s.c.Overseer 
> could not read the data
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode 
> = 

Re: distribution of leader and replica in SolrCloud

2017-05-10 Thread Rick Leir
Myself, I am still in the old camp. For critical machines, I want to know that 
it is my machine, with my disks, and what software is installed exactly. But 
maybe the cloud provider's fast network is more important? Cheers--Rick

On May 10, 2017 6:13:27 AM EDT, Bernd Fehling  
wrote:
>Hi Rick,
>
>yes I have distributed 5 virtual server accross 5 physical machines.
>So each virtual server is on a separate physical machine.
>
>Splitting each virtual server (64GB RAM) into two (32GB RAM), which
>then
>will be 10 virtual server accross 5 physical machines, is no option
>because there is no gain against hardware failure of a physical
>machine.
>
>So I rather go with two Solr instances per 64GB virtual server as first
>try.
>
>Currently I'm still trying to solve the Rule-based Replica Placement.
>There seams to be no way to report if a node is a "leader" or has the
>"role="leader".
>
>Do you know how to create a rule like:
>--> "do not create the replica on the same host where his leader
>exists"
>
>Regards,
>Bernd
>
>
>Am 10.05.2017 um 10:54 schrieb Rick Leir:
>> Bernd,
>> 
>> Yes, cloud, ahhh. As you say, the world changed.  Do you have any
>hint from the cloud provider as to which physical machine your virtual
>server
>> is on? If so, you can hopefully distribute your replicas across
>physical machines. This is not just for reliability: in a sharded
>system, each
>> query will cause activity in several virtual servers and you would
>prefer that they are on separate physical machines, not competing for
>> resources. Maybe, for Solr, you should choose a provider which can
>lease you the whole physical machine. You would prefer a 256G machine
>over
>> several shards on 64G virtual machines.
>> 
>> And many cloud providers assume that servers are mostly idle, so they
>cram too many server containers into a machine. Then, very
>occasionally,
>> you get OOM even though you did not exceed your advertised RAM. This
>is a topic for some other forum, where should I look?
>> 
>> With AWS you can choose to locate your virtual machine in
>US-west-Oregon or US-east-i-forget or a few other locations, but that
>is a very coarse
>> division. Can you choose physical machine?
>> 
>> With Google, it might be dynamic?
>> cheers -- Rick
>> 
>> 
>> On 2017-05-09 03:44 AM, Bernd Fehling wrote:
>>> I would name your solution more a work around as any similar
>solution of this kind.
>>> The issue SOLR-6027 is now 3 years open and the world has changed.
>>> Instead of racks full of blades where you had many dedicated bare
>metal servers
>>> you have now huge machines with 256GB RAM and many CPUs.
>Virtualization has taken place.
>>> To get under these conditions some independance from the physical
>hardware you have
>>> to spread the shards across several physical machines with virtual
>servers.
>>> >From my point of view it is a good solution to have 5 virtual 64GB
>servers
>>> on 5 different huge physical machines and start 2 instances on each
>virtual server.
>>> If I would split up each 64GB virtual server into two 32GB virtual
>server there would
>>> be no gain. We don't have 10 huge machines (no security win) and we
>have to admin
>>> and control 10 virtual servers instead of 5 (plus zookeeper
>servers).
>>>
>>> It is state of the art that you don't have to care about the servers
>within
>>> the cloud. This is the main sense of a cloud.
>>> The leader should always be aware who are the members of his cloud,
>how to reach
>>> them (IP address) and how are the users of the cloud (collections)
>distributed
>>> across the cloud.
>>>
>>> It would be great if a solution of issue SOLR-6027 would lead to
>some kind of
>>> "automatic mode" for server distribution, without any special
>configuring.
>>>
>>> Regards,
>>> Bernd
>>>
>>>
>>> Am 08.05.2017 um 17:47 schrieb Erick Erickson:
 Also, you can specify custom placement rules, see:

>https://cwiki.apache.org/confluence/display/solr/Rule-based+Replica+Placement

 But Shawn's statement is the nub of what you're seeing, by default
 multiple JVMs on the same physical machine are considered separate
 Solr instances.

 Also note that if you want to, you can specify a nodeSet when you
 create the nodes, and in particular the special value EMPTY.
>That'll
 create a collection with no replicas and you can ADDREPLICA to
 precisely place each one if you require that level of control.

 Best,
 Erick

 On Mon, May 8, 2017 at 7:44 AM, Shawn Heisey 
>wrote:
> On 5/8/2017 5:38 AM, Bernd Fehling wrote:
>> boss -- shard1 - server2:7574
>> | |-- server2:8983 (leader)
> The reason that this happened is because you've got two nodes
>running on
> every server.  From SolrCloud's perspective, there are ten
>distinct
> nodes, not five.
>
> SolrCloud doesn't notice the fact that different nodes are running
>on
> the same server(s).  If your reaction to hearing this is that it
> *sho

Re: SOLR as nosql database store

2017-05-10 Thread Shawn Heisey
On 5/10/2017 2:15 AM, Mike Drob wrote:
>> The searching install will be able to rebuild itself from the data
> storage install when that
> is required.
>
> Is this a use case for CDCR?

Does CDCR require an identical schema between locations?  If not, then I
think CDCR can keep a searching install up to date by copying
transaction logs, but I don't think it would be able to do the initial
population.

I'm pretty sure that index creation would have to be done from scratch
by indexing.  The source could be the storage install, but you'd have to
use DIH or a custom program to take care of it.

Thanks,
Shawn



Do developers and power users support stackoverflow.com solr tag?

2017-05-10 Thread Karl-Philipp Richter
Hi,
Do developers and power users (which are famous on the mailing list(s))
support the `solr` tag on stackoverflow.com? There's no definite answer,
I know, but someone might do an educated guess.

-Kalle



signature.asc
Description: OpenPGP digital signature


Re: Solr Query Limits

2017-05-10 Thread Shawn Heisey
On 5/10/2017 12:33 AM, Adnan Shaikh wrote:
> Thanks Alexandre for the update.
>
> Please help me to understand the other part of the query as well , if there 
> is any limit to how many values we can pass for a key.

The limit is not the number of values, but the size of the request in bytes.

A typical GET request line in HTTP looks like this:

GET /foo/bar?param1=foo¶m2=bar HTTP/1.1

The size of this request is typically limited by webservers to 8192
bytes.  The Jetty that powers Solr has this as the default limit.  This
limit can be increased, but you probably don't want to go beyond about 32K.

Increasing the HTTP header size limit isn't the way to get REALLY large
requests through.  For that, you want a POST request, where the
parameters will be in the request body instead of on the request line
itself.

The default limit in Solr for a POST body is 2 megabytes -- 2097152
bytes.  This can be increased with configuration in solrconfig.xml.

Thanks,
Shawn



Re: Do developers and power users support stackoverflow.com solr tag?

2017-05-10 Thread Shawn Heisey
On 5/10/2017 6:31 AM, Karl-Philipp Richter wrote:
> Do developers and power users (which are famous on the mailing
> list(s)) support the `solr` tag on stackoverflow.com? There's no
> definite answer, I know, but someone might do an educated guess. 

I don't seek out questions on SO, but if one happens to come my way that
I can answer, there's a good chance I will post.  Most of the time I see
SO posts via some other medium, though -- like this list or the #solr
IRC channel.  That kind of exposure makes it a little bit less likely
that I will respond on SO.

Thanks,
Shawn



How to Speed Up Solr ResposeWriter

2017-05-10 Thread Prashobh Chandran
Hi,

   Currently we are using solr 5.3.1 engine, Im getting json format results
from engine. But it's taking time to getting results, So i need to speed up
solr response writer. Is there anyway?

Please reply asap...



Regards,
Prasobh


Solrcloud collection restore puts 2 replicas on the same node

2017-05-10 Thread Webster Homer
I am running Solr 6.2 on a 4 node cluster

Each collection has 2 shards and a replication factor of 2

Normally when I create a collection I see a replica on each node, which is
what I would expect.

However when I restore a backup to a new collection I see that one node has
two replicas on it. They are  from different shards, but one of my nodes
doesn't get a replica. Is there a way to force the restore to use different
nodes if it can?

I saw a similar issue on a two node test cloud where the restore created
the two replicas on the same node.

-- 


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.


Re: SOLR as nosql database store

2017-05-10 Thread Walter Underwood
CDCR doesn’t rebuild it so much as copy it.

To change the schema, you’ll need to reindex.

I’ve worked on two NoSQL databases (Objectivity and MarkLogic) and I’ve worked 
on Solr. They are utterly different designs, intended to do different things.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On May 10, 2017, at 5:24 AM, Shawn Heisey  wrote:
> 
> On 5/10/2017 2:15 AM, Mike Drob wrote:
>>> The searching install will be able to rebuild itself from the data
>> storage install when that
>> is required.
>> 
>> Is this a use case for CDCR?
> 
> Does CDCR require an identical schema between locations?  If not, then I
> think CDCR can keep a searching install up to date by copying
> transaction logs, but I don't think it would be able to do the initial
> population.
> 
> I'm pretty sure that index creation would have to be done from scratch
> by indexing.  The source could be the storage install, but you'd have to
> use DIH or a custom program to take care of it.
> 
> Thanks,
> Shawn
> 



Re: Solrcloud collection restore puts 2 replicas on the same node

2017-05-10 Thread Erick Erickson
Possibly https://issues.apache.org/jira/browse/SOLR-9527?

On Wed, May 10, 2017 at 7:34 AM, Webster Homer  wrote:
> I am running Solr 6.2 on a 4 node cluster
>
> Each collection has 2 shards and a replication factor of 2
>
> Normally when I create a collection I see a replica on each node, which is
> what I would expect.
>
> However when I restore a backup to a new collection I see that one node has
> two replicas on it. They are  from different shards, but one of my nodes
> doesn't get a replica. Is there a way to force the restore to use different
> nodes if it can?
>
> I saw a similar issue on a two node test cloud where the restore created
> the two replicas on the same node.
>
> --
>
>
> This message and any attachment are confidential and may be privileged or
> otherwise protected from disclosure. If you are not the intended recipient,
> you must not copy this message or attachment or disclose the contents to
> any other person. If you have received this transmission in error, please
> notify the sender immediately and delete the message and any attachment
> from your system. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not accept liability for any omissions or errors in this
> message which may arise as a result of E-Mail-transmission or for damages
> resulting from any unauthorized changes of the content of this message and
> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not guarantee that this message is free of viruses and does
> not accept liability for any damages caused by any virus transmitted
> therewith.
>
> Click http://www.emdgroup.com/disclaimer to access the German, French,
> Spanish and Portuguese versions of this disclaimer.


Re: How to Speed Up Solr ResposeWriter

2017-05-10 Thread Erick Erickson
You need to describe your problem more fully. The response writer is
rarely a bottleneck, so I'm guessing there are things you aren't
telling us. Are you returning thousands of rows? Are the documents
huge? Details matter.

Best,
Erick

On Wed, May 10, 2017 at 5:34 AM, Prashobh Chandran  wrote:
> Hi,
>
>Currently we are using solr 5.3.1 engine, Im getting json format results
> from engine. But it's taking time to getting results, So i need to speed up
> solr response writer. Is there anyway?
>
> Please reply asap...
>
>
>
> Regards,
> Prasobh


Re: Do developers and power users support stackoverflow.com solr tag?

2017-05-10 Thread Erick Erickson
Personally I have all I can do to keep up with this list and the dev
list and, you know, do my day job ;)

I've seen quite a few references to SO for Solr questions, and the
times I've perused them the answers I've been impressed. Just don't
have time.

On Wed, May 10, 2017 at 6:01 AM, Shawn Heisey  wrote:
> On 5/10/2017 6:31 AM, Karl-Philipp Richter wrote:
>> Do developers and power users (which are famous on the mailing
>> list(s)) support the `solr` tag on stackoverflow.com? There's no
>> definite answer, I know, but someone might do an educated guess.
>
> I don't seek out questions on SO, but if one happens to come my way that
> I can answer, there's a good chance I will post.  Most of the time I see
> SO posts via some other medium, though -- like this list or the #solr
> IRC channel.  That kind of exposure makes it a little bit less likely
> that I will respond on SO.
>
> Thanks,
> Shawn
>


Re: Search substring in field

2017-05-10 Thread Emir Arnautovic

Hi,

Solr works on top of data structure called inverted index 
. You can misuse it and do 
not invert your documents and use regex or wildcards to find matches, 
but that is not the way to use it - it'll be significantly slower.


Solr does support subset of regex and syntax for that is field:/regex/

Solr also supports wildcards: * and ?

In any case you have to be aware that it matches tokens and you have to 
setup your analysis properly to make it work (at least need to lowercase 
if want to make it case insensitive).



On 09.05.2017 19:15, jnobre wrote:

Hello,

Thanks for your response.

I realize the concept, but I do not know which one to use in my case. Not
exactly the difference between the analyzes.

1- At this moment I search for
"source": * "hello word" * or url =
http://:8983/solr/AWP10/select?Indent=on&q=source:*%22hello%20world%22*&wt=json
If you index source as string (single token) you can search with 
wildcards, but you have to escape spaces - source: *hello\ word*

or can use regex - source:/.*hello word.*/
If you index it as text, it will be tokenized and it will have tokens 
"hello" and "word" and then you can use phrase query - source: "hello 
word" - this is recommended way.


For example, one line of the answer:
"source":
["http://www.gravatar.com/avatar/ad516503a11cd5ca435acc9bb6523536?s=32";]

The expression does not appear and even then the line is returned.
you can use debugQuery=true to see how query is parsed - the one you 
sent uses match all on default field.


2 - My idea was to identify a url in the middle of a string with regex, for
example, as it does in Java:
Eur-lex.europa.eu eur-lex.europa.eu eur-lex.europa.eu Eur-lex.europa.eu
eur-lex.europa.eu
I do not know what the syntax is for entering regex in the search.
The proper way is to use analysis to split url into tokens and then to 
search for exact match. Analysis could include:

1. changing / with space
2. white space tokenizer
3. removing 'www.'
4. ignoring http
...


3- I can use the multiplication function, but not the search syntax to
evaluate its return.
Again, if you always query product of the same fields, you might want to 
create field containing that value (e.g. field prod) and then use range 
query - prod:[10 TO 20]


If you have two numeric fields (e.g. a and b) you can filter out doc 
using frange in filter query:

  fg={!frange l=10 u=20}product(a, b)
if you need to return that value you need to add it to fl:
  fl=*,prod:product(a,b)
this will return all stored fields and product as 'prod'.

HTH,
Emir







--
View this message in context: 
http://lucene.472066.n3.nabble.com/Search-substring-in-field-tp4333553p4334316.html
Sent from the Solr - User mailing list archive at Nabble.com.


--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



Re: Do developers and power users support stackoverflow.com solr tag?

2017-05-10 Thread Walter Underwood
I just checked, and it has been 3.5 years since I’ve answered anything about 
solr on Stack Overflow.

It’s been 30 minutes since I answered something here.

I have contributed some answers in the amateur radio group. Stack Overflow has 
a bad
tendency to get stuck on the earliest “might be right” answer, even if it is 
wrong. Very
frustrating. This happens a lot with questions about antennas.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On May 10, 2017, at 7:47 AM, Erick Erickson  wrote:
> 
> Personally I have all I can do to keep up with this list and the dev
> list and, you know, do my day job ;)
> 
> I've seen quite a few references to SO for Solr questions, and the
> times I've perused them the answers I've been impressed. Just don't
> have time.
> 
> On Wed, May 10, 2017 at 6:01 AM, Shawn Heisey  wrote:
>> On 5/10/2017 6:31 AM, Karl-Philipp Richter wrote:
>>> Do developers and power users (which are famous on the mailing
>>> list(s)) support the `solr` tag on stackoverflow.com? There's no
>>> definite answer, I know, but someone might do an educated guess.
>> 
>> I don't seek out questions on SO, but if one happens to come my way that
>> I can answer, there's a good chance I will post.  Most of the time I see
>> SO posts via some other medium, though -- like this list or the #solr
>> IRC channel.  That kind of exposure makes it a little bit less likely
>> that I will respond on SO.
>> 
>> Thanks,
>> Shawn
>> 



Re: Solrcloud collection restore puts 2 replicas on the same node

2017-05-10 Thread Webster Homer
Yes that looks like the issue I'm seeing. When will 6.6 be released?

On Wed, May 10, 2017 at 9:42 AM, Erick Erickson 
wrote:

> Possibly https://issues.apache.org/jira/browse/SOLR-9527?
>
> On Wed, May 10, 2017 at 7:34 AM, Webster Homer 
> wrote:
> > I am running Solr 6.2 on a 4 node cluster
> >
> > Each collection has 2 shards and a replication factor of 2
> >
> > Normally when I create a collection I see a replica on each node, which
> is
> > what I would expect.
> >
> > However when I restore a backup to a new collection I see that one node
> has
> > two replicas on it. They are  from different shards, but one of my nodes
> > doesn't get a replica. Is there a way to force the restore to use
> different
> > nodes if it can?
> >
> > I saw a similar issue on a two node test cloud where the restore created
> > the two replicas on the same node.
> >
> > --
> >
> >
> > This message and any attachment are confidential and may be privileged or
> > otherwise protected from disclosure. If you are not the intended
> recipient,
> > you must not copy this message or attachment or disclose the contents to
> > any other person. If you have received this transmission in error, please
> > notify the sender immediately and delete the message and any attachment
> > from your system. Merck KGaA, Darmstadt, Germany and any of its
> > subsidiaries do not accept liability for any omissions or errors in this
> > message which may arise as a result of E-Mail-transmission or for damages
> > resulting from any unauthorized changes of the content of this message
> and
> > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> > subsidiaries do not guarantee that this message is free of viruses and
> does
> > not accept liability for any damages caused by any virus transmitted
> > therewith.
> >
> > Click http://www.emdgroup.com/disclaimer to access the German, French,
> > Spanish and Portuguese versions of this disclaimer.
>

-- 


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.


Re: distribution of leader and replica in SolrCloud

2017-05-10 Thread Erick Erickson
Bernd:

Short form: Worrying about which node is the leader is wasting your
time. Details below:

Why do you care what nodes the leaders are on? There has to be some
concern you have about co-locating the leaders on the same node or you
wouldn't be spending the time on it. Please articulate that concern.
This really sounds like an XY problem, you're asking about X (how to
insure leader location) when you're concerned about Y. What's the Y?

Worrying about placing leaders is really a waste of time at the scale
you're talking. Which replica is the leader changes anyway when nodes
go up and down or depending on the order you start your Solr
instances. There's no way to specify it with rule based node placement
during collection creation because it's not worth the effort.

You may be misunderstanding what the leader does. It's biggest
responsibility is that it forwards the raw documents to the other
replicas when indexing. The reason updates are sent to the leader is
to insure ordering (i.e. if the same doc is sent to two nodes in the
cloud for indexing by two different clients at the same time, one of
them has to "win" deterministically). But other than this trivial
check (and forwarding the docs of course, but that's almost entirely
I/O) the leader is just like any other replica in the shard.

The leader does _not_ index the document and the forward the results
to the followers, all replicas on a shard index each document
independently.

When querying, the leader has no additional duties at all, it's just
another replica serving queries.

So all the added duties of a leader amount to is forwarding the raw
docs to the followers, collecting the responses and returning the
status of the indexing request to the caller. During querying, the
leader may or may not participate.

I doubt you'll even be able to measure the increased load on a node
even if all the leaders are located on it. As far as cluster
robustness is concerned, again where the leaders are placed is
irrelevant. The only time I've seen any problems with all the leaders
being on a single physical node, there were several hundred shards.

Best,
Erick

On Wed, May 10, 2017 at 5:15 AM, Rick Leir  wrote:
> Myself, I am still in the old camp. For critical machines, I want to know 
> that it is my machine, with my disks, and what software is installed exactly. 
> But maybe the cloud provider's fast network is more important? Cheers--Rick
>
> On May 10, 2017 6:13:27 AM EDT, Bernd Fehling 
>  wrote:
>>Hi Rick,
>>
>>yes I have distributed 5 virtual server accross 5 physical machines.
>>So each virtual server is on a separate physical machine.
>>
>>Splitting each virtual server (64GB RAM) into two (32GB RAM), which
>>then
>>will be 10 virtual server accross 5 physical machines, is no option
>>because there is no gain against hardware failure of a physical
>>machine.
>>
>>So I rather go with two Solr instances per 64GB virtual server as first
>>try.
>>
>>Currently I'm still trying to solve the Rule-based Replica Placement.
>>There seams to be no way to report if a node is a "leader" or has the
>>"role="leader".
>>
>>Do you know how to create a rule like:
>>--> "do not create the replica on the same host where his leader
>>exists"
>>
>>Regards,
>>Bernd
>>
>>
>>Am 10.05.2017 um 10:54 schrieb Rick Leir:
>>> Bernd,
>>>
>>> Yes, cloud, ahhh. As you say, the world changed.  Do you have any
>>hint from the cloud provider as to which physical machine your virtual
>>server
>>> is on? If so, you can hopefully distribute your replicas across
>>physical machines. This is not just for reliability: in a sharded
>>system, each
>>> query will cause activity in several virtual servers and you would
>>prefer that they are on separate physical machines, not competing for
>>> resources. Maybe, for Solr, you should choose a provider which can
>>lease you the whole physical machine. You would prefer a 256G machine
>>over
>>> several shards on 64G virtual machines.
>>>
>>> And many cloud providers assume that servers are mostly idle, so they
>>cram too many server containers into a machine. Then, very
>>occasionally,
>>> you get OOM even though you did not exceed your advertised RAM. This
>>is a topic for some other forum, where should I look?
>>>
>>> With AWS you can choose to locate your virtual machine in
>>US-west-Oregon or US-east-i-forget or a few other locations, but that
>>is a very coarse
>>> division. Can you choose physical machine?
>>>
>>> With Google, it might be dynamic?
>>> cheers -- Rick
>>>
>>>
>>> On 2017-05-09 03:44 AM, Bernd Fehling wrote:
 I would name your solution more a work around as any similar
>>solution of this kind.
 The issue SOLR-6027 is now 3 years open and the world has changed.
 Instead of racks full of blades where you had many dedicated bare
>>metal servers
 you have now huge machines with 256GB RAM and many CPUs.
>>Virtualization has taken place.
 To get under these conditions some independance from the physical
>>hardware you 

Re: Solrcloud collection restore puts 2 replicas on the same node

2017-05-10 Thread Erick Erickson
bq;  When will 6.6 be released

Real Soon Now. The release process has started, the first RC will
probably be cut sometime next week. After that, the process will take
3-4 days. Any issues found will reset that "3-4 days" as another RC is
spun.

Best,
Erick

On Wed, May 10, 2017 at 8:04 AM, Webster Homer  wrote:
> Yes that looks like the issue I'm seeing. When will 6.6 be released?
>
> On Wed, May 10, 2017 at 9:42 AM, Erick Erickson 
> wrote:
>
>> Possibly https://issues.apache.org/jira/browse/SOLR-9527?
>>
>> On Wed, May 10, 2017 at 7:34 AM, Webster Homer 
>> wrote:
>> > I am running Solr 6.2 on a 4 node cluster
>> >
>> > Each collection has 2 shards and a replication factor of 2
>> >
>> > Normally when I create a collection I see a replica on each node, which
>> is
>> > what I would expect.
>> >
>> > However when I restore a backup to a new collection I see that one node
>> has
>> > two replicas on it. They are  from different shards, but one of my nodes
>> > doesn't get a replica. Is there a way to force the restore to use
>> different
>> > nodes if it can?
>> >
>> > I saw a similar issue on a two node test cloud where the restore created
>> > the two replicas on the same node.
>> >
>> > --
>> >
>> >
>> > This message and any attachment are confidential and may be privileged or
>> > otherwise protected from disclosure. If you are not the intended
>> recipient,
>> > you must not copy this message or attachment or disclose the contents to
>> > any other person. If you have received this transmission in error, please
>> > notify the sender immediately and delete the message and any attachment
>> > from your system. Merck KGaA, Darmstadt, Germany and any of its
>> > subsidiaries do not accept liability for any omissions or errors in this
>> > message which may arise as a result of E-Mail-transmission or for damages
>> > resulting from any unauthorized changes of the content of this message
>> and
>> > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
>> > subsidiaries do not guarantee that this message is free of viruses and
>> does
>> > not accept liability for any damages caused by any virus transmitted
>> > therewith.
>> >
>> > Click http://www.emdgroup.com/disclaimer to access the German, French,
>> > Spanish and Portuguese versions of this disclaimer.
>>
>
> --
>
>
> This message and any attachment are confidential and may be privileged or
> otherwise protected from disclosure. If you are not the intended recipient,
> you must not copy this message or attachment or disclose the contents to
> any other person. If you have received this transmission in error, please
> notify the sender immediately and delete the message and any attachment
> from your system. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not accept liability for any omissions or errors in this
> message which may arise as a result of E-Mail-transmission or for damages
> resulting from any unauthorized changes of the content of this message and
> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not guarantee that this message is free of viruses and does
> not accept liability for any damages caused by any virus transmitted
> therewith.
>
> Click http://www.emdgroup.com/disclaimer to access the German, French,
> Spanish and Portuguese versions of this disclaimer.


Replicating the master node for a fail over scenario

2017-05-10 Thread Dominik Niziński
Hello,

we're successfully using solr in our application for a few months now.
Recently we've got asked if there is a possibility of having multiple
master nodes (in case of some disaster happening in one of server
locations).
Basically what we want to do is having a few master nodes running at the
same time which then could automatically pick up the work if one is not
responding. For now we have decided to go with a complete master node
replication (basically on machine level) running the whole time and ready
to be plugged in to the live system via DNS configuration if something
wrong happens to the "real" master.

Latest stackoverflow post about that is nearly 6 years old (
http://stackoverflow.com/questions/6362484/apache-solr-failover-support-in-master-slave-setup)
so I was wondering if something has changed since then or are there any new
solutions to the issue.

Kind Regards,
Dominik


Re: Replicating the master node for a fail over scenario

2017-05-10 Thread Erick Erickson
This is really what SolrCloud was built for, particularly CDCR (Cross
Data Center Replication) for remote DCs.

For the master/slave situation there's nothing automatic, it's a
roll-your-own type thing. People have done things like:

1> any replica can be "promoted" to master with configuration changes.
So in the disaster case have a mechanism whereby you can re-index from
"some time ago". Say your poll interval is X and at time T your master
dies. Promote one of your slaves to master (simple config changes) and
re-index anything that's changed since, say, T-(X + some margin just
to be sure). Say the poll interval is 1 hour. If I can re-index from 2
hours before the master went south I have all my data. True you will
be serving stale data for "a while", but this is sometime acceptable

2> Have your client index to two nodes. The trick is that the "backup"
isn't doing anything interesting, i.e. no slave is polling it. If the
master fails, reconfigure the slaves to point to the machine that's
live.

3> Just consider the two data centers to be completely independent as
far as Solr is concerned and replicate your system-of-record to the
second DC. Each DC indexes and (perhaps) serves searches
independently.

But really, SolrCloud is built for HA/DR (admittedly with some added
complexity). If the simple approaches I've outlined don't work and
HA/DR is that important, you might want to consider it.

Best,
Erick

On Wed, May 10, 2017 at 8:26 AM, Dominik Niziński  wrote:
> Hello,
>
> we're successfully using solr in our application for a few months now.
> Recently we've got asked if there is a possibility of having multiple
> master nodes (in case of some disaster happening in one of server
> locations).
> Basically what we want to do is having a few master nodes running at the
> same time which then could automatically pick up the work if one is not
> responding. For now we have decided to go with a complete master node
> replication (basically on machine level) running the whole time and ready
> to be plugged in to the live system via DNS configuration if something
> wrong happens to the "real" master.
>
> Latest stackoverflow post about that is nearly 6 years old (
> http://stackoverflow.com/questions/6362484/apache-solr-failover-support-in-master-slave-setup)
> so I was wondering if something has changed since then or are there any new
> solutions to the issue.
>
> Kind Regards,
> Dominik


Re: Do developers and power users support stackoverflow.com solr tag?

2017-05-10 Thread Karl-Philipp Richter
Hi,

Am 10.05.2017 um 17:03 schrieb Walter Underwood:
> I have contributed some answers in the amateur radio group. Stack Overflow 
> has a bad
> tendency to get stuck on the earliest “might be right” answer, even if it is 
> wrong. Very
> frustrating. This happens a lot with questions about antennas.
I can imagine this, but it's hard to follow without an example - not
necessary because there's no need to discuss the ups and downs of
stackexchange (!= stackoverflow) sites. If a Q&A frustates you - and
you're active on mailing lists - then you must have stubled over a lot
of pretty rare issues ;)

-Kalle



signature.asc
Description: OpenPGP digital signature


Re: Do developers and power users support stackoverflow.com solr tag?

2017-05-10 Thread Karl-Philipp Richter
Hi,

Am 10.05.2017 um 15:01 schrieb Shawn Heisey:
> I don't seek out questions on SO, but if one happens to come my way that
> I can answer, there's a good chance I will post.  Most of the time I see
> SO posts via some other medium, though -- like this list or the #solr
> IRC channel.  That kind of exposure makes it a little bit less likely
> that I will respond on SO.
Would you (all) consider it useful to cross-post a SO question on this
list? And if yes to leave it at the URL in order to profit from the
editing features of a Q&A (which is the reason they exist in the first
place and - more controversially - deprecate mailing lists to a large
extent - regardless of the still large popularity based on habits)?

-Kalle



signature.asc
Description: OpenPGP digital signature


Re: Do developers and power users support stackoverflow.com solr tag?

2017-05-10 Thread Alexandre Rafalovitch
I think I am the only person answering both SO Solr tag and Mailing
List questions comparatively frequently. Less now than before, but I
still track SO it by subscribing to the solr tag newsletter. There are
some other strong users answering SO tag, but I don't think they are
on the Mailing List. Or they are dark here.

As an unrelated but fun fact, the ATOM DIH example coming up in Solr
6.6 (instead of broken RSS example) populates the collection from that
same SO Solr tag. So, we may get more attention then.

Regards,
   Alex.

http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 10 May 2017 at 08:31, Karl-Philipp Richter  wrote:
> Hi,
> Do developers and power users (which are famous on the mailing list(s))
> support the `solr` tag on stackoverflow.com? There's no definite answer,
> I know, but someone might do an educated guess.
>
> -Kalle
>


Solr 6.5.0 sql select * issue because of invalid "score" field?

2017-05-10 Thread kringe

Hello,


Running solr 6.5.0


I have a collection called TestIndex with a schema that has a few fields that 
are all, among other things, set up as docValues fields so I can perform 
unlimited sql queries against the collection.


After ingesting 10 documents I try to use the Solr admin UI to perform an sql 
query for selecting all the data and have it streamed back. 


The following is the error I get from Solr when trying to execute my query:



{
"result-set":{
"docs":[{
"EXCEPTION":"Failed to execute sqlQuery 'select * from TestIndex' against JDBC 
connection 'jdbc:calcitesolr:'.\nError while executing SQL \"select * from 
TestIndex\": java.io.IOException: score is not a valid field for unlimited 
queries.",
"EOF":true,
"RESPONSE_TIME":329}]}}



The thing is that my collection does not have a "score" field in it. From 
googling around it would seem I get score field by default from solr, but my 
reading would suggest that I should only get this if I requested it in my query 
(which I did not do).


Is this a bug, or is it expected that I should be setting up some configuration 
someplace to specify that I do not want score to be returned in my sql query? 
If this is not a bug how am I supposed to do a very simple "select * from ..." 
sql query if solr is going to automatically include a score field that is not 
valid for unlimited queries?


Thanks


Re: Do developers and power users support stackoverflow.com solr tag?

2017-05-10 Thread Alexandre Rafalovitch
On 10 May 2017 at 11:50, Karl-Philipp Richter  wrote:
> Would you (all) consider it useful to cross-post a SO question on this
> list?

I think most of the times, it would be more efficient to post
Solr-only questions on this mailing list in the first place. But
people who find SO somehow do not find the User Group list. I try to
drive them to the list when appropriate.

Importantly, SO also has a lot of Solr client questions that this list
does not specialize in (e.g. ruby/python-framework level question).
They probably will not get much answers here, but they also don't have
their own dedicated community either. So, SO is where they go.

I guess I could say that I answer SO questions because they are asked
there (and I focus the newbies) not because I think SO has caught on
as a good Solr forum.

Regards,
   Alex.



http://www.solr-start.com/ - Resources for Solr users, new and experienced


Re: Do developers and power users support stackoverflow.com solr tag?

2017-05-10 Thread Walter Underwood
Sure, here is an example. The accepted answer doesn’t really answer the 
question. Mine finally got an equal number of votes, but is not accepted. 
Essentially, this is voting on physics, which is not a good way to find 
engineering solutions.

https://ham.stackexchange.com/questions/337/why-do-concurrent-fm-signals-not-mix-together/427
 


Also, the back and forth to clarify the question is harder to do at Stack 
Overflow.

Finally, I only visit forum sites when I absolutely have to. I've had all the 
discussions coming to one place since the early 1980’s, with Usenet. Visiting 
one site per topic is a crazy waste of time.

I maintained the internal forums and Notesfiles software at HP for about ten 
years (before the WWW), so I’m pretty aware of discussions that don’t work 
right.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On May 10, 2017, at 8:49 AM, Karl-Philipp Richter  wrote:
> 
> Hi,
> 
> Am 10.05.2017 um 17:03 schrieb Walter Underwood:
>> I have contributed some answers in the amateur radio group. Stack Overflow 
>> has a bad
>> tendency to get stuck on the earliest “might be right” answer, even if it is 
>> wrong. Very
>> frustrating. This happens a lot with questions about antennas.
> I can imagine this, but it's hard to follow without an example - not
> necessary because there's no need to discuss the ups and downs of
> stackexchange (!= stackoverflow) sites. If a Q&A frustates you - and
> you're active on mailing lists - then you must have stubled over a lot
> of pretty rare issues ;)
> 
> -Kalle
> 



Create core with bin/solr where BasicAuth is setup

2017-05-10 Thread bay chae
Hi,

I have basic auth implemented in solr and can create a core with 'curl —user…’ 
and through the web interface with username and password entered.

I can create a core:

bin/solr create -c bore 

with this in solr.in.sh:

SOLR_AUTH_TYPE="basic"
SOLR_AUTHENTICATION_OPTS="-Dbasicauth=solr:SolrRocks”

But say I don’t want to store a plaintext password in solr.in.sh and would 
rather create a core on command with the following:

bin/solr create -c bore -Dbasicauth=solr:SolrRocks

Then I find i get the following error:

ERROR: Unrecognized or misplaced argument: -Dbasicauth=solr:SolrRocks!

I have tried other placements without success.

Could anyone help with this off the top of their head?

Re: Do developers and power users support stackoverflow.com solr tag?

2017-05-10 Thread Alexandre Rafalovitch
Just FYI.

The average number of Solr answers on SO is probably less than one.
Whether they get accepted even when answered is a different issue.

The questions do seem to have a larger back-and-forth (in comments)
than other topics though. That's something Mailing List is much better
for.

Finally, you can have SO Solr questions coming into your inbox. Just
subscribe to the tag, as explained in
https://meta.stackoverflow.com/questions/254318/how-to-subscribe-to-tags

Regards,
   Alex.

http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 10 May 2017 at 12:18, Walter Underwood  wrote:
> Sure, here is an example. The accepted answer doesn’t really answer the 
> question. Mine finally got an equal number of votes, but is not accepted. 
> Essentially, this is voting on physics, which is not a good way to find 
> engineering solutions.
>
> https://ham.stackexchange.com/questions/337/why-do-concurrent-fm-signals-not-mix-together/427
>  
> 
>
> Also, the back and forth to clarify the question is harder to do at Stack 
> Overflow.
>
> Finally, I only visit forum sites when I absolutely have to. I've had all the 
> discussions coming to one place since the early 1980’s, with Usenet. Visiting 
> one site per topic is a crazy waste of time.
>
> I maintained the internal forums and Notesfiles software at HP for about ten 
> years (before the WWW), so I’m pretty aware of discussions that don’t work 
> right.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
>> On May 10, 2017, at 8:49 AM, Karl-Philipp Richter  wrote:
>>
>> Hi,
>>
>> Am 10.05.2017 um 17:03 schrieb Walter Underwood:
>>> I have contributed some answers in the amateur radio group. Stack Overflow 
>>> has a bad
>>> tendency to get stuck on the earliest “might be right” answer, even if it 
>>> is wrong. Very
>>> frustrating. This happens a lot with questions about antennas.
>> I can imagine this, but it's hard to follow without an example - not
>> necessary because there's no need to discuss the ups and downs of
>> stackexchange (!= stackoverflow) sites. If a Q&A frustates you - and
>> you're active on mailing lists - then you must have stubled over a lot
>> of pretty rare issues ;)
>>
>> -Kalle
>>
>


Re: SOLR as nosql database store

2017-05-10 Thread Bharath Kumar
Yes Mike we have CDCR replication as well.

On Wed, May 10, 2017 at 1:15 AM, Mike Drob  wrote:

> > The searching install will be able to rebuild itself from the data
> storage install when that
> is required.
>
> Is this a use case for CDCR?
>
> Mike
>
> On Tue, May 9, 2017 at 6:39 AM, Shawn Heisey  wrote:
>
> > On 5/9/2017 12:58 AM, Bharath Kumar wrote:
> > > Thanks Hrishikesh and Dave. We use SOLR cloud with 2 extra replicas,
> > will that not serve as backup when something goes wrong? Also we use
> latest
> > solr 6 and from the documentation of solr, the indexing performance has
> > been good. The reason is that we are using MySQL as the primary data
> store
> > and the performance might not be optimal if we write data at a very rapid
> > rate. Already we index almost half the fields that are in MySQL in solr.
> >
> > A replica is protection against data loss in the event of hardware
> > failure, but there are classes of problems that it cannot protect
> against.
> >
> > Although Solr (Lucene) does try *really* hard to never lose data that it
> > hasn't been asked to delete, it is not designed to be a database.  It's
> > a search engine.  Solr doesn't offer the same kinds of guarantees about
> > the data it contains that software like MySQL does.
> >
> > I personally don't recommend trying to use Solr as a primary data store,
> > but if that's what you really want to do, then I would suggest that you
> > have two complete Solr installs, with multiple replicas on both.  One of
> > them will be used for searching and have a configuration you're already
> > familiar with, the other will be purely for data storage -- only certain
> > fields like the uniqueKey will be indexed, but every other field will be
> > stored only.
> >
> > Running with two separate Solr installs will allow you to optimize one
> > for searching and the other for data storage.  The searching install
> > will be able to rebuild itself from the data storage install when that
> > is required.  If better performance is needed for the rebuild, you have
> > the option of writing a multi-threaded or multi-process program that
> > reads from one and writes to the other.
> >
> > Thanks,
> > Shawn
> >
> >
>



-- 
Thanks & Regards,
Bharath MV Kumar

"Life is short, enjoy every moment of it"


Re: SOLR as nosql database store

2017-05-10 Thread Bharath Kumar
Thanks Walter and Mike. In our use case we have same schema on both source
and target sites. The idea is if we can avoid mysql replication on the
target site for a particular table in our mysql schema. Currently, we index
some of the fields in that table in solr, we want to move all the fields to
solr and index some of them and so store only for others.

On Wed, May 10, 2017 at 10:09 AM, Bharath Kumar 
wrote:

> Yes Mike we have CDCR replication as well.
>
> On Wed, May 10, 2017 at 1:15 AM, Mike Drob  wrote:
>
>> > The searching install will be able to rebuild itself from the data
>> storage install when that
>> is required.
>>
>> Is this a use case for CDCR?
>>
>> Mike
>>
>> On Tue, May 9, 2017 at 6:39 AM, Shawn Heisey  wrote:
>>
>> > On 5/9/2017 12:58 AM, Bharath Kumar wrote:
>> > > Thanks Hrishikesh and Dave. We use SOLR cloud with 2 extra replicas,
>> > will that not serve as backup when something goes wrong? Also we use
>> latest
>> > solr 6 and from the documentation of solr, the indexing performance has
>> > been good. The reason is that we are using MySQL as the primary data
>> store
>> > and the performance might not be optimal if we write data at a very
>> rapid
>> > rate. Already we index almost half the fields that are in MySQL in solr.
>> >
>> > A replica is protection against data loss in the event of hardware
>> > failure, but there are classes of problems that it cannot protect
>> against.
>> >
>> > Although Solr (Lucene) does try *really* hard to never lose data that it
>> > hasn't been asked to delete, it is not designed to be a database.  It's
>> > a search engine.  Solr doesn't offer the same kinds of guarantees about
>> > the data it contains that software like MySQL does.
>> >
>> > I personally don't recommend trying to use Solr as a primary data store,
>> > but if that's what you really want to do, then I would suggest that you
>> > have two complete Solr installs, with multiple replicas on both.  One of
>> > them will be used for searching and have a configuration you're already
>> > familiar with, the other will be purely for data storage -- only certain
>> > fields like the uniqueKey will be indexed, but every other field will be
>> > stored only.
>> >
>> > Running with two separate Solr installs will allow you to optimize one
>> > for searching and the other for data storage.  The searching install
>> > will be able to rebuild itself from the data storage install when that
>> > is required.  If better performance is needed for the rebuild, you have
>> > the option of writing a multi-threaded or multi-process program that
>> > reads from one and writes to the other.
>> >
>> > Thanks,
>> > Shawn
>> >
>> >
>>
>
>
>
> --
> Thanks & Regards,
> Bharath MV Kumar
>
> "Life is short, enjoy every moment of it"
>



-- 
Thanks & Regards,
Bharath MV Kumar

"Life is short, enjoy every moment of it"


Recommended index-size per core

2017-05-10 Thread S G
Hi,

Is there a recommendation on the size of index that one should host per
core?
Idea is to come up with an *initial* shard/replica setting for a load test.
And then arrive at a good cluster size based on that testing.


*Example: *

Num documents: 100 million
Average document size: 1kb
So total space required:  100 gb

Indexable fields per document: 5 strings, average field-size: 100 chars
So total index space required for all docs: 50gb (assuming all unique words)


*Rough estimates for an initial size:*

50gb index is best served if all of it is in memory.
And JVMs perform the best if their max-heap is between 15-20gb
So a starting point for num-shards: 50gb/20gb ~ 3

Now if all index is in memory per core, then replicas can serve queries
with a much higher throughput.
So we can begin with 2 replicas per shard.

*Questions:*

Are there any other factors that we can consider *initially* to make our
calculations more precise.
Note that the goal of the exercise is not to get rid of load-testing, only
to start with a close-enough cluster setting so that load testing can
finish faster.

Thanks
SG


RE: Underlying file changed by an external force

2017-05-10 Thread Oakley, Craig (NIH/NLM/NCBI) [C]
> You need to look at all of your core.properties files and see if any of them 
> point to the same data directory.

All the core.properties files are each in their own directory with no overlap.

> Second: if you issue a "kill -9" you can leave write locks lingering.

We manage our Solr instances with supervisor, which can send a "kill -9" if 
"kill -6" does not suffice; but the problem tends to manifest itself at some 
time other than startup

The Solr version is 5.4.1, in case that is relevant.


-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Thursday, May 04, 2017 3:20 PM
To: solr-user 
Subject: Re: Underlying file changed by an external force

You need to look at all of your core.properties files and see if any
of them point to the same data directory.

Second: if you issue a "kill -9" you can leave write locks lingering.

Best,
Erick

On Thu, May 4, 2017 at 11:00 AM, Oakley, Craig (NIH/NLM/NCBI) [C]
 wrote:
> We have been having problems with different collections on different 
> SolrCloud clusters, all seeming to be related to the write.lock file with 
> stack traces similar to the following. Are there any suggestions what might 
> be the cause and what might be the solution? Thanks
>
>
> org.apache.lucene.store.AlreadyClosedException: Underlying file changed by an 
> external force at 2017-04-13T20:43:08.630152Z, 
> (lock=NativeFSLock(path=/data/solr/biosample/dba_test_shard1_replica1/data/index/write.lock,impl=sun.nio.ch.FileLockImpl[0:9223372036854775807
>  exclusive valid],ctime=2017-04-13T20:43:08.630152Z))
>
>at 
> org.apache.lucene.store.NativeFSLockFactory$NativeFSLock.ensureValid(NativeFSLockFactory.java:179)
>
>at 
> org.apache.lucene.store.LockValidatingDirectoryWrapper.deleteFile(LockValidatingDirectoryWrapper.java:37)
>
>at 
> org.apache.lucene.index.IndexFileDeleter.deleteFile(IndexFileDeleter.java:732)
>
>at 
> org.apache.lucene.index.IndexFileDeleter.deletePendingFiles(IndexFileDeleter.java:503)
>
>at 
> org.apache.lucene.index.IndexFileDeleter.refresh(IndexFileDeleter.java:448)
>
>at 
> org.apache.lucene.index.IndexWriter.rollbackInternalNoCommit(IndexWriter.java:2099)
>
>at 
> org.apache.lucene.index.IndexWriter.rollbackInternal(IndexWriter.java:2041)
>
>at org.apache.lucene.index.IndexWriter.shutdown(IndexWriter.java:1083)
>
>at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1125)
>
>at 
> org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:131)
>
>at 
> org.apache.solr.update.DefaultSolrCoreState.changeWriter(DefaultSolrCoreState.java:183)
>
>at 
> org.apache.solr.update.DefaultSolrCoreState.newIndexWriter(DefaultSolrCoreState.java:207)
>
>at org.apache.solr.core.SolrCore.reload(SolrCore.java:472)
>
>at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:849)
>
>at 
> org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:768)
>
>at 
> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestInternal(CoreAdminHandler.java:230)
>
>at 
> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:184)
>
>at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:156)
>
>at 
> org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:664)
>
>at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:438)
>
>at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:223)
>
>at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:181)
>
>at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
>
>at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
>
>at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>
>at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
>
>at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
>
>at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
>
>at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
>
>at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>
>at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
>
>at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>
>at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
>
>at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
>
>at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrap

solrcloud collections restore documentation is confusing about the restored configset

2017-05-10 Thread Webster Homer
Looking at the solrcloud restore API I am confused about the solr
configuration
What configuration gets loaded into the restored collection? The one in
Zookeeper or the one from the backup?

Say I have a collection,  BAZ which has a configuration BAZ.config
Now I create a backup of BAZ
I make changes to the BAZ configuration, load updated data into BAZ.

I then discover a problem with the updated BAZ so I restore BAZ to BAZ-Old

I didn't see a new config in Zookeeper named BAS-Old
Did the restore replace the BAZ.config with the config from the backup? or
is BAZ-Old now using the modified config that BAZ is using.

The RESTORE says that you can specify a config name but the config must
already be in Zookeeper.

So what is the real story and can the documentation be made clearer?

It seems to me that the restore should create a new configset in Zookeeper
from the backed up configset and use the new collection name as the name of
the config set.

-- 


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.


Re: solrcloud collections restore documentation is confusing about the restored configset

2017-05-10 Thread Shawn Heisey
On 5/10/2017 12:26 PM, Webster Homer wrote:
> Looking at the solrcloud restore API I am confused about the solr
> configuration

> It seems to me that the restore should create a new configset in Zookeeper
> from the backed up configset and use the new collection name as the name of
> the config set.

This is what the documentation says:

==
The collection created will be of the same number of shards and replicas
as the original collection, preserving routing information, etc.
Optionally, you can override some parameters documented below. While
restoring, if a configSet with the same name exists in ZooKeeper then
Solr will reuse that, or else it will upload the backed up configSet in
ZooKeeper and use that.
==

Here's what I *HOPE* this means: The configuration name is saved as well
as the actual configuration.  If a configuration with the same name as
the backed up configuration exists in zookeeper already, it will be used
without modification, but if that config doesn't exist, the backed up
configuration will be uploaded to zookeeper with the original config name.

You seem to be assuming that the configuration will have the same name
as the collection, and that's not what I would assume.  I wonder which
of us is right.

You can check the configname being used by the restored collection by
clicking on Cloud, then Tree, opening the "collections" folder, and
clicking on the restored collection.  It will be on the right side,
below the table.

Thanks,
Shawn



Re: Recommended index-size per core

2017-05-10 Thread Toke Eskildsen
S G  wrote:
> *Rough estimates for an initial size:*
> 
> 50gb index is best served if all of it is in memory.

Assuming you need low latency and/or high throughput, yes. I mention this 
because in many cases the requirements for number of simultaneous users and 
response times are known (at least roughly) up front and sometimes there is no 
need to speculate in high performance.

> And JVMs perform the best if their max-heap is between 15-20gb

We stay below 32GB if possible, but the gist is the same: Avoid large heaps.

> So a starting point for num-shards: 50gb/20gb ~ 3

Sorry, I think you have misunderstood something here. The JVM heap is not used 
for caching the index data directly (although it holds derived data). What you 
need is free memory on your machine for OS disk-caching.

The ideal JVM size is extremely dependent on how you index, query and adjust 
the filter-cache (secondarily the other caches, but the filter-cache tends to 
be the large one).  A heap of 10GB might very well be fine for handling your 
whole 50GB index. If that is on a 64GB machine, the remaining 54GB of RAM 
(minus the other stuff that is running) ought to ensure a fully cached index.

- Toke Eskildsen


Re: Underlying file changed by an external force

2017-05-10 Thread Erick Erickson
bq: All the core.properties files are each in their own directory with
no overlap

Not quite what I was asking. By definition, all core.properties are in
their own directory. In fact Solr stops looking down the tree when it
finds the first directory with core.properties in it and immediately
moves on to the next sibling directory.

_Inside_ the core.properties files, are there any dataDir properties
pointing to the same place as in any other core.properties? Note,
dataDir properties usually aren't even present unless you did
something special so don't be surprised if there's nothing there.

Best,
Erick

On Wed, May 10, 2017 at 10:56 AM, Oakley, Craig (NIH/NLM/NCBI) [C]
 wrote:
>> You need to look at all of your core.properties files and see if any of them 
>> point to the same data directory.
>
> All the core.properties files are each in their own directory with no overlap.
>
>> Second: if you issue a "kill -9" you can leave write locks lingering.
>
> We manage our Solr instances with supervisor, which can send a "kill -9" if 
> "kill -6" does not suffice; but the problem tends to manifest itself at some 
> time other than startup
>
> The Solr version is 5.4.1, in case that is relevant.
>
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Thursday, May 04, 2017 3:20 PM
> To: solr-user 
> Subject: Re: Underlying file changed by an external force
>
> You need to look at all of your core.properties files and see if any
> of them point to the same data directory.
>
> Second: if you issue a "kill -9" you can leave write locks lingering.
>
> Best,
> Erick
>
> On Thu, May 4, 2017 at 11:00 AM, Oakley, Craig (NIH/NLM/NCBI) [C]
>  wrote:
>> We have been having problems with different collections on different 
>> SolrCloud clusters, all seeming to be related to the write.lock file with 
>> stack traces similar to the following. Are there any suggestions what might 
>> be the cause and what might be the solution? Thanks
>>
>>
>> org.apache.lucene.store.AlreadyClosedException: Underlying file changed by 
>> an external force at 2017-04-13T20:43:08.630152Z, 
>> (lock=NativeFSLock(path=/data/solr/biosample/dba_test_shard1_replica1/data/index/write.lock,impl=sun.nio.ch.FileLockImpl[0:9223372036854775807
>>  exclusive valid],ctime=2017-04-13T20:43:08.630152Z))
>>
>>at 
>> org.apache.lucene.store.NativeFSLockFactory$NativeFSLock.ensureValid(NativeFSLockFactory.java:179)
>>
>>at 
>> org.apache.lucene.store.LockValidatingDirectoryWrapper.deleteFile(LockValidatingDirectoryWrapper.java:37)
>>
>>at 
>> org.apache.lucene.index.IndexFileDeleter.deleteFile(IndexFileDeleter.java:732)
>>
>>at 
>> org.apache.lucene.index.IndexFileDeleter.deletePendingFiles(IndexFileDeleter.java:503)
>>
>>at 
>> org.apache.lucene.index.IndexFileDeleter.refresh(IndexFileDeleter.java:448)
>>
>>at 
>> org.apache.lucene.index.IndexWriter.rollbackInternalNoCommit(IndexWriter.java:2099)
>>
>>at 
>> org.apache.lucene.index.IndexWriter.rollbackInternal(IndexWriter.java:2041)
>>
>>at org.apache.lucene.index.IndexWriter.shutdown(IndexWriter.java:1083)
>>
>>at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1125)
>>
>>at 
>> org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:131)
>>
>>at 
>> org.apache.solr.update.DefaultSolrCoreState.changeWriter(DefaultSolrCoreState.java:183)
>>
>>at 
>> org.apache.solr.update.DefaultSolrCoreState.newIndexWriter(DefaultSolrCoreState.java:207)
>>
>>at org.apache.solr.core.SolrCore.reload(SolrCore.java:472)
>>
>>at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:849)
>>
>>at 
>> org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:768)
>>
>>at 
>> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestInternal(CoreAdminHandler.java:230)
>>
>>at 
>> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:184)
>>
>>at 
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:156)
>>
>>at 
>> org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:664)
>>
>>at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:438)
>>
>>at 
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:223)
>>
>>at 
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:181)
>>
>>at 
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
>>
>>at 
>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
>>
>>at 
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>>
>>at 
>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
>>
>>at 
>> org.eclipse.jetty.server.session.SessionHandler.do

Re: Automatic conversion to Range Query

2017-05-10 Thread Chris Hostetter
: I'm facing a issue when i'm querying the Solr
: my query is "xiomi Mi 5 -white [64GB/ 3GB]"
...
: +(((Synonym(nameSearch:xiaomi nameSearch:xiomi)) (nameSearch:mi)
: (nameSearch:5) -(Synonym(nameSearch:putih
: nameSearch:white))*(nameSearch:[64gb/ TO 3gb])*)~4)
...
: Now due to automatic conversion of query  to Range query i'm not able
: to find the result
...
: Solr Version-6.4.2
: Parser- edismax

That's really suprising to me -- but i can reproduce what you're 
describing ... not sure if the "implicit" assumption thta you wanted a 
range query is intentional or a bug -- but it's certainly weird so i've 
file a jira: https://issues.apache.org/jira/browse/LUCENE-7821

FWIW: It's not actaully anything special about edismax that's causing that 
to be parsed as a range query -- it seems that the underlying grammer 
(used by both the lucene & edismax solr QParsers) treats the "TO" as 
optional in a range query, so the remaining 2 "terms" inside the square 
brackets are considered the low/high ... if you'd had more then 2 terms 
(ie: "foo [64gb/ 3gb bar]") it wouldn't have parsed as a range query -- 
which means edismax would have fallen back to rerying to parse it with 
automatic escaping.



-Hoss
http://www.lucidworks.com/


SolrSpellChecker not showing suggestions when the first character of a word is wrong

2017-05-10 Thread aruninfo100
Hi All,

I am trying to do spell check with Solr.I am able to get suggestions when
the word is incorrectly spelled.
Eg:-word entered(incorrectly) :*maintaan*
I am getting *"maintain" *as suggestion,but if I provide *naintain*,it
doesnt provide suggestions.

*solrConfig:*

 
text_general

default
spell_text
solr.DirectSolrSpellChecker
internal
0.5


wordbreak
solr.WordBreakSolrSpellChecker
spell_text
true
true
5
5




  
default
wordbreak
true
true
5
2
5
true
true
5
3
 true
 true
  
  
spellcheck
  


Kindly hep me on this.

Thanks and Regards,
Arun



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrSpellChecker-not-showing-suggestions-when-the-first-character-of-a-word-is-wrong-tp4334554.html
Sent from the Solr - User mailing list archive at Nabble.com.


newbie question re solr.PatternReplaceFilterFactory

2017-05-10 Thread Michael Tobias
I am sure this is very simple but I cannot get the pattern right.

How can I use solr.PatternReplaceFilterFactory to remove all words in brackets 
from being indexed?

eg [ignore this]

thanks

Michael



Re: newbie question re solr.PatternReplaceFilterFactory

2017-05-10 Thread Erick Erickson
First use PatternReplaceCharFilterFactory. The difference is that
PatternReplaceCharFilterFactoryworks on the entire input whereas
PatternReplaceFilterFactory works only on the tokens emitted by the
tokenizer. Concrete example using WhitespeceTokenizerFactory would be
this [is some ] text
PatternReplaceFilterFactory would see 5 tokens, "this", "[is", "some",
"]", and "text". So it would be very hard to do what you want.

patternReplaceCharFilterFactory will see the entire input as one
string and operate on it, _then" send it through the tokenizer.

And also don't be fooled by the fact that the _stored_ data will still
contain the removed words. So when you get the doc back from solr
you'll see the original input, brackets and all. In the above example,
if you returned the field you'd still see

this [is some ] text

when the doc matched. This doc would be found when searching for
"this" or "text", but _not_ when searching for "is" or "some".

You want some pattern like
  

Best,
Erick

On Wed, May 10, 2017 at 6:08 PM, Michael Tobias  wrote:
> I am sure this is very simple but I cannot get the pattern right.
>
> How can I use solr.PatternReplaceFilterFactory to remove all words in 
> brackets from being indexed?
>
> eg [ignore this]
>
> thanks
>
> Michael
>