Re: Multiple solr instances per host vs Multiple cores in same solr instance

2018-08-28 Thread Bernd Fehling

Yes, I tested many cases.
As I already mentioned 3 Server as 3x3 SolrCloud cluster.
- 12 Mio. data records from our big single index
- always the same queries (SWD, german keyword norm data)
- Apache jmeter 3.1 for the load (separate server)
- Haproxy 1.6.11 with roundrobin (separate server)
- no autowarming in solr
- always with any setup, one first (cold) run (to see how the system behaves 
with empty caches)
- afterwards two (warm) runs with filled caches from first and second run
- all this with preferLocalShards set to true and false
- and all this with single instance multicore and multi instance multinode.
That was a lot of testing, starting, stopping, loading test data...

The difference between single instance and multi instance was that
single instance per server got 12GB JAVA heap (because it had to handle 3 cores)
and multi instance got 4GB JAVA heap per instance (because each instance had to 
handle just 1 core).

No real difference in CPU/memory utilization, but I used different
heap size between single instance and multi instance (see above).
But the response time with multi instance is much better and gives higher 
performance.
Between 30 and 60 QPS multi instance is about 1.5 times better than single 
instance
in my test case with my test data ... and so on, but the Cloud is much more 
complex.

preferLocalShards really gives advantage in 3x3 or 5x5 SolrCloud but I don't
know how it would compare to say 5x3 (5 server, 5 shards, 3 replicas).

Servers in total:
- 3 VM server on 3 different XEN hosts connected with 2 Gigabit Networks
  (the discs were not SSD as in our production system, just 15rpm spinning 
discs)
  3 zookeeper, one on each server but separate instances (not the solr internal 
ones)
- 1 extra server for haproxy
- 1 extra server for Apache jmeter

It's hard to tell where the bottleneck is, at least not with 60QPS and with 
spinning discs.
SSD as storage and separate physical server boxes will increase performance.

I think the matter is how complex is your data in the index, your query and 
query analysis.
My query not very easy, rows=100, facet.limit=100, 9 facet.fields and a boost 
with bq.
If you have rows=10 and facet=false without bq you will get higher performance.

Regards
Bernd


Am 27.08.2018 um 22:45 schrieb Wei:

Thanks Bernd.  Do you have preferLocalShards=true in both cases? Do you
notice CPU/memory utilization difference between the two deployments? How
many servers did you use in total?  I am curious what's the bottleneck for
the one instance and 3 cores configuration.

Thanks,
Wei

On Mon, Aug 27, 2018 at 1:45 AM Bernd Fehling <
bernd.fehl...@uni-bielefeld.de> wrote:


My tests with many combinations (instance, node, core) on a 3 server
cluster
with SolrCloud pointed out that highest performance is with multiple solr
instances and shards and replicas placed by rules so that you get advantage
from preferLocalShards=true.

The disadvantage ist the handling of the system, which means setup,
starting
and stopping, setting up the shards and replicas with rules and so on.

I tested with 3x3 SolrCloud (3 shards, 3 replicas).
A 3x3 system with one instance and 3 cores per host could handle up to
30QPS.
A 3x3 system with multi instance (different ports, single core and shard
per
instance) could handle 60QPS on same hardware with same data.

Also, the single instance per server setup has spikes in the response time
graph
which are not seen with a multi instance setup.

Tested about 2 month ago with SolCloud 6.4.2.

Regards,
Bernd


Am 26.08.2018 um 08:00 schrieb Wei:

Hi,

I have a question about the deployment configuration in solr cloud.  When
we need to increase the number of shards in solr cloud, there are two
options:

1.  Run multiple solr instances per host, each with a different port and
hosting a single core for one shard.

2.  Run one solr instance per host, and have multiple cores(shards) in

the

same solr instance.

Which would be better performance wise? For the first option I think JVM
size for each solr instance can be smaller, but deployment is more
complicated? Are there any differences for cpu utilization?

Thanks,
Wei







Re: Issue with adding an extra Solr Slave

2018-08-28 Thread Emir Arnautović
Hi Zafar,
How do you access admin console? Through ELB or you see this behaviour when 
accessing admin console of a new slave? Do you see any replication related 
errors in new slave’s logs? Did you check connectivity of a new slave and 
master nodes?

Thanks,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 27 Aug 2018, at 16:52, Zafar Khurasani  
> wrote:
> 
> Hi,
> 
> I'm running Solr 5.3 in one of our applications. Currently, we have one Solr 
> Master and one Solr slave running on AWS EC2 instances. I'm trying to add an 
> additional Solr slave. I'm using an Elastic LoadBalancer (ELB) in front of my 
> Slaves. I see the following error in the logs after adding the second slave,
> 
> 
> java version "1.8.0_121"
> 
> Solr version: 5.3.0 1696229
> 
> 
> org.apache.solr.common.SolrException: Core with core name [xxx-xxx-] does 
> not exist.
>at 
> org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:770)
>at 
> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestInternal(CoreAdminHandler.java:240)
>at 
> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:194)
>at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
>at 
> org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:675)
>at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:443)
>at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:210)
>at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)
>at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
>at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
>at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
>at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
>at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
>at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
>at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
>at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
>at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
>at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
>at org.eclipse.jetty.server.Server.handle(Server.java:499)
>at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
>at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
>at 
> org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
>at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
>at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
>at java.lang.Thread.run(Thread.java:745)
> 
> 
> Also, when I hit the Solr Admin UI, I'm able to see my core infrequently. I 
> have to refresh the page multiple times to be able to see it.  What's the 
> right way to add a slave to my existing setup?
> 
> FYI - the Solr Replication section in solrconfig.xml is exactly the same for 
> both the Slaves.
> 
> Thanks,
> Zafar Khurasani
> 



LIBLINEAR model lacks weight(s) when training for SolrFeatures in LTR

2018-08-28 Thread Zheng Lin Edwin Yeo
Hi,

I am using Solr 7.4.0, and using LIBLINEAR to do the training for the LTR
model based on this example:
https://github.com/bloomberg/lucene-solr/blob/master-ltr/solr/contrib/ltr/example/README.md

However, I found that when I wanted to train for solr filter query with the
class SolrFeature, I will get the following error saying that the model
lacks weight(s):

Exception: Status: 400 Bad Request
Response: {
  "responseHeader":{
"status":400,
"QTime":1},
  "error":{
"metadata":[
  "error-class","org.apache.solr.common.SolrException",
  "root-error-class","org.apache.solr.ltr.model.ModelException"],
"msg":"org.apache.solr.ltr.model.ModelException: Model myModel lacks
weight(s) for [category]",

This is how I define it in my feature JSON file:

  {
"store" : "myFeatures",
"name" : "category",
"class" : "org.apache.solr.ltr.feature.SolrFeature",
"params" : {
"fq": ["{!terms f=category}book"]
}
  }

What could be the reason that causes this, and how can we resolve this
issue?

Regards,
Edwin


Solr LTS and EOL

2018-08-28 Thread Dan Untenzu
Hey,

I would like to get some feedback about LTS & EOL timeframes in Solr.

The Solr website states that "6.4.x" is a LTS version and "7.x" is the
current mayor version (https://lucene.apache.org/solr/community.html).

Question 1: Shouldn't it use "6.x", since version 6.6.5 is the latest
release of the 6 branch.

Question 2: How long is the LTS timeframe - 6 / 12 / 36 months? When is
EOL of version 6.x?

It would be nice to have some roadmap/timeframe on the download or
community page. Right now an admin can not tell whether they should
prefer the LTS over the mayor version, because maybe EOL of version 6 is
just next week.

Dan

-- 
Dan Untenzu
Certified TYPO3 Integrator
webit! Gesellschaft für neue Medien mbH
Schandauer Straße 34 | 01309 Dresden | Germany
Telefon +49 351 46766-24 | Telefax +49 351 46766-66
unte...@webit.de | www.webit.de

Amtsgericht Dresden | HRB 15422
USt-ID DE 193477690
Geschäftsführer Sven Haubold


Re: Solr LTS and EOL

2018-08-28 Thread Shawn Heisey

On 8/28/2018 2:59 AM, Dan Untenzu wrote:

I would like to get some feedback about LTS & EOL timeframes in Solr.

The Solr website states that "6.4.x" is a LTS version and "7.x" is the
current mayor version (https://lucene.apache.org/solr/community.html).

Question 1: Shouldn't it use "6.x", since version 6.6.5 is the latest
release of the 6 branch.

Question 2: How long is the LTS timeframe - 6 / 12 / 36 months? When is
EOL of version 6.x?

It would be nice to have some roadmap/timeframe on the download or
community page. Right now an admin can not tell whether they should
prefer the LTS over the mayor version, because maybe EOL of version 6 is
just next week.


Here's the long-winded version of how things are done:

I have never heard of any specific timeframes, and I have never before 
heard of any release being designated LTS.  Releases are not made on a 
set schedule.  Because of that, there is not a specific number of months 
that each release gets supported.


The current stable branch is 7.x.  Solr 5.x and earlier are effectively 
dead -- changes will not be made.  The previous major version, Solr 6.x 
(specifically, the 6.6.x branch), is in maintenance mode, which 
basically means that there's a much higher standard for whether a 
problem gets fixed in that branch than there is for the stable branch.


Problems in a 7.x version will only be tackled if they are problems in 
the *current* 7.x release.  As of right now, that is 7.4.0.  So if you 
find an issue tomorrow in version 7.2.1, a fix will only be found in the 
next release -- 7.5.0.  If enough problems of the right kind are found 
after a minor (7.x.0) release, there may be point releases in that minor 
version, but normally once a new minor release is made, a previous minor 
release in the current major version will not be supplemented with point 
releases.


Problems in 6.x must be problems in the current 6.x release, currently 
6.6.5, and they must be either MAJOR bugs with no workaround, or a 
problem that is extremely trivial to fix -- a patch that is very 
unlikely to introduce NEW bugs.  If a new 6.x version is released, it 
will be a new point release on the last minor version -- 6.6.x.


When 8.0 gets released, 6.x is dead and the latest minor release branch 
for 7.x goes to maintenance mode.  There is no specific date planned for 
any release.  A release is made when one of the committers decides it's 
time and volunteers to be the release manager.


The community page needs a bit of an overhaul so it says what I just 
told you.


As for which release you should run ... typically that's the latest 
release.  All releases are considered stable unless they are very 
specifically labeled ALPHA or BETA.  Only two releases so far have ever 
had those designations -- 4.0-ALPHA and 4.0-BETA.


I personally would avoid a new major version until a few minor releases 
are made -- so I would have no plans to run 8.0, but I might run 8.2 or 8.3.


Thanks,
Shawn



Solr Indexing error

2018-08-28 Thread kunhu0...@gmail.com
Hello All,

Need help on the error related to Solr indexing. We are using Solr 6.6.3 and
Nutch crawler 1.14. While indexing data to Solr we see errors as below

possible analysis error: Document contains at least one immense term in
field="content" (whose UTF8 encoding is longer than the max length 32766),
all of which were skipped.  Please correct the analyzer to not produce such
terms.  The prefix of the first immense term is: '[84, 69, 82, 77, 83, 32,
79, 70, 32, 85, 83, 69, 10, 69, 102, 102, 101, 99, 116, 105, 118, 101, 32,
68, 97, 116, 101, 58, 32, 74]...', original message: bytes can be at most
32766 in length; got 40638. Perhaps the document has an indexed string field
(solr.StrField) which is too large.

Can anyone please help






--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr Indexing error

2018-08-28 Thread Shawn Heisey

On 8/28/2018 6:03 AM, kunhu0...@gmail.com wrote:

possible analysis error: Document contains at least one immense term in
field="content" (whose UTF8 encoding is longer than the max length 32766),


It's telling you exactly what is wrong.

The field named "content" is probably using a field class with no 
analysis, or using the Keyword Tokenizer so the whole field gets treated 
as a single term.  The length of that field for at least one of your 
documents is longer than 32766 characters. Maybe it's bytes -- a UTF8 
character can be more than a single byte.  Lucene has a limit on term 
length, and your input exceeded that length.


If you change the field type for content to something that's analyzed 
(split into words, basically) then this problem would likely go away.


Thanks,
Shawn



Re: Solr LTS and EOL

2018-08-28 Thread Dan Untenzu
Hey Shawn,

thanks a lot for your clarification, all questions answered.

Your message should indeed find it's way onto the community page.

Thanks.

Dan

Am 28.08.2018 um 13:18 schrieb Shawn Heisey:
> On 8/28/2018 2:59 AM, Dan Untenzu wrote:
>> I would like to get some feedback about LTS & EOL timeframes in Solr.
>>
>> The Solr website states that "6.4.x" is a LTS version and "7.x" is the
>> current mayor version (https://lucene.apache.org/solr/community.html).
>>
>> Question 1: Shouldn't it use "6.x", since version 6.6.5 is the latest
>> release of the 6 branch.
>>
>> Question 2: How long is the LTS timeframe - 6 / 12 / 36 months? When is
>> EOL of version 6.x?
>>
>> It would be nice to have some roadmap/timeframe on the download or
>> community page. Right now an admin can not tell whether they should
>> prefer the LTS over the mayor version, because maybe EOL of version 6 is
>> just next week.
> 
> Here's the long-winded version of how things are done:
> 
> I have never heard of any specific timeframes, and I have never before
> heard of any release being designated LTS.  Releases are not made on a
> set schedule.  Because of that, there is not a specific number of months
> that each release gets supported.
> 
> The current stable branch is 7.x.  Solr 5.x and earlier are effectively
> dead -- changes will not be made.  The previous major version, Solr 6.x
> (specifically, the 6.6.x branch), is in maintenance mode, which
> basically means that there's a much higher standard for whether a
> problem gets fixed in that branch than there is for the stable branch.
> 
> Problems in a 7.x version will only be tackled if they are problems in
> the *current* 7.x release.  As of right now, that is 7.4.0.  So if you
> find an issue tomorrow in version 7.2.1, a fix will only be found in the
> next release -- 7.5.0.  If enough problems of the right kind are found
> after a minor (7.x.0) release, there may be point releases in that minor
> version, but normally once a new minor release is made, a previous minor
> release in the current major version will not be supplemented with point
> releases.
> 
> Problems in 6.x must be problems in the current 6.x release, currently
> 6.6.5, and they must be either MAJOR bugs with no workaround, or a
> problem that is extremely trivial to fix -- a patch that is very
> unlikely to introduce NEW bugs.  If a new 6.x version is released, it
> will be a new point release on the last minor version -- 6.6.x.
> 
> When 8.0 gets released, 6.x is dead and the latest minor release branch
> for 7.x goes to maintenance mode.  There is no specific date planned for
> any release.  A release is made when one of the committers decides it's
> time and volunteers to be the release manager.
> 
> The community page needs a bit of an overhaul so it says what I just
> told you.
> 
> As for which release you should run ... typically that's the latest
> release.  All releases are considered stable unless they are very
> specifically labeled ALPHA or BETA.  Only two releases so far have ever
> had those designations -- 4.0-ALPHA and 4.0-BETA.
> 
> I personally would avoid a new major version until a few minor releases
> are made -- so I would have no plans to run 8.0, but I might run 8.2 or
> 8.3.
> 
> Thanks,
> Shawn
> 


Solr Cloud not routing to PULL replicas

2018-08-28 Thread Ash Ramesh
Hi again,

We are currently using Solr 7.3.1 and have a 8 shard collection. All our
TLOGs are in seperate machines & PULLs in others. Since not all shards are
in the same machine, the request will be distributed. However, we are
seeing that most of the 'distributed' parts of the requests are being
routed to the TLOG machines. This is evident as the TLOGs are saturated at
80%+ CPU while the PULL machines are sitting at 25% even through the load
balancer only routes to the PULL machines. I know we can use
'preferLocalShards', but that still doesn't solve the problem.

Is there something we have configured incorrectly? We are currently rushing
to upgrade to 7.4.0 so we can take advantage of
'shards.preference=replica.location:local,replica.type:PULL' parameter. In
the meantime, we would like to know if there is a reason for this behavior
and if there is anything we can do to avoid it.

Thank you & regards,

Ash

-- 
*P.S. We've launched a new blog to share the latest ideas and case studies 
from our team. Check it out here: product.canva.com 
. ***
** Empowering the world 
to design
Also, we're hiring. Apply here! 

  
  








Re: “solr.data.dir” can only config a single directory

2018-08-28 Thread Erick Erickson
Patches welcome.

On Mon, Aug 27, 2018, 23:03 zhenyuan wei  wrote:

> But this is not a common way to do so, I mean, nobody want to ADDREPLICA
> after collection was created.
>
> Erick Erickson  于2018年8月28日周二 下午1:24写道:
>
> > Every _replica_ can point to a different disk. When you do an
> > ADDREPLICA, then you can supply whatever path to the data
> > you desire. And you can have as many replicas per Solr instance
> > as makes sense.
> >
> > Best,
> > Erick
> > On Mon, Aug 27, 2018 at 8:48 PM zhenyuan wei  wrote:
> > >
> > > @Christopher Schultz
> > > So  you mean that  one 4TB disk is the same as  four  1TB disks ?
> > > HDFS、cassandra、ES can do so, multi data path maybe maximize
> indexing
> > > throughput   in same cases.
> > > click links
> > > 
> > with
> > > some explain
> > >
> > >
> > > Christopher Schultz  于2018年8月28日周二
> > 上午11:16写道:
> > >
> > > > -BEGIN PGP SIGNED MESSAGE-
> > > > Hash: SHA256
> > > >
> > > > Shawn,
> > > >
> > > > On 8/27/18 22:37, Shawn Heisey wrote:
> > > > > On 8/27/2018 8:29 PM, zhenyuan wei wrote:
> > > > >> I found the  “solr.data.dir” can only config a single directory.
> > > > >> I think it is necessary to be config  multi dirs,such as
> > > > >> ”solr.data.dir:/mnt/disk1,/mnt/disk2,/mnt/disk3" , due to one
> > > > >> disk overload or capacity limitation.  Any reason to support why
> > > > >> not do so?
> > > > >
> > > > > Nobody has written the code to support it.  It would very likely
> > > > > not be easy code to write.  Supporting one directory for that
> > > > > setting is pretty easy ... it would require changing a LOT of
> > > > > existing code to support more than one.
> > > >
> > > > Also, there are better ways to do this:
> > > >
> > > > - - multi-node Solr with sharding
> > > > - - LVM or similar with multi-disk volumes
> > > > - - ZFS surely has something for this
> > > > - - buy a bigger disk (disk is cheap!)
> > > > - - etc.
> > > >
> > > > - -chris
> > > > -BEGIN PGP SIGNATURE-
> > > > Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
> > > >
> > > > iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAluEvn8ACgkQHPApP6U8
> > > > pFgTTg//ayed4AXtocVrB6e/ZK0eWz5/E1Q7Oa7kF0c34l0MH6BIe4iOHDmrR+J9
> > > > A+t6SzVQqURMrDE8plg/xbPTlyGF8wGrEjZUZF4fpWlgnY/qNYxl5S9zJ1hPgBh7
> > > > fCKkb+LuLGdZMM4oORfCYtMgpDjOnLihHmDTfkrvZzyZwOQGeFpgEZDZKFYAjcur
> > > > wqIGTMTTWfSCoPQgQzvI8Husq7Rs75BEc+mAkaPOL0LvT9PQDEPEXXt3Kf5vXgM+
> > > > Eet1ymltZM/Xz+V/em/oeumCoCE18uxi9seuDhTpHRLjS9tCBbPWA0NmobriY3ct
> > > > GskwCnsFDAeGjG/7dcA/zmB8BK4t6JpUvI+OcJU5dvQczpQbhB9WT4GQUiME9Tvr
> > > > RjBES53HoEEKA8gb0kiuPN1pE2MSX8vO3uKpQtzVS2MOmuOeV/IebrnP/zLTll18
> > > > awtWWbPmzaAGAUfXL2ExK0+ism0o31i46CNfLfBBM8jh3lkc2HNdz5TLe8YfN3Sp
> > > > Tj0HfmYynhtH1CggOAcI1M4PIEbIGfoywX/ICSGHnLwfQoDUnBmjqXhGkFUIstWk
> > > > Dcntx+4E4NRny6zDZfg5UMjWYyo+fOVSoaDf6dfgBWIB1I3xPn5Dt0In7+oRtZ9i
> > > > Xlkw6DSaSZZ5caBqjaF278xj7IwEw2zipLPWB7hVCcUhKuJBbDY=
> > > > =rbrT
> > > > -END PGP SIGNATURE-
> > > >
> >
>


Re: Solr Cloud not routing to PULL replicas

2018-08-28 Thread Tomás Fernández Löbbe
Hi Ash,
Do you see all shard queries going to the TLOG replicas or “most” (are
there some going to the PULL replicas). You can confirm this by looking in
the logs for queries with “isShard=true” parameter. Are the PULL replicas
active (since you are using a load balancer I’m guessing you are not using
CloudSolrClient for queries)?
Did you look at other metrics other than the CPU utilization? like, are the
“/select” request metrics (or whatever handler path you are using)
confirming the issue (high in the TLOG replicas and low in the PULL
replicas).

Can you share a query from your logs (the main query and the shard queries
if possible)

Tomás


On Tue, Aug 28, 2018 at 6:22 AM Ash Ramesh  wrote:

> Hi again,
>
> We are currently using Solr 7.3.1 and have a 8 shard collection. All our
> TLOGs are in seperate machines & PULLs in others. Since not all shards are
> in the same machine, the request will be distributed. However, we are
> seeing that most of the 'distributed' parts of the requests are being
> routed to the TLOG machines. This is evident as the TLOGs are saturated at
> 80%+ CPU while the PULL machines are sitting at 25% even through the load
> balancer only routes to the PULL machines. I know we can use
> 'preferLocalShards', but that still doesn't solve the problem.
>
> Is there something we have configured incorrectly? We are currently rushing
> to upgrade to 7.4.0 so we can take advantage of
> 'shards.preference=replica.location:local,replica.type:PULL' parameter. In
> the meantime, we would like to know if there is a reason for this behavior
> and if there is anything we can do to avoid it.
>
> Thank you & regards,
>
> Ash
>
> --
> *P.S. We've launched a new blog to share the latest ideas and case studies
> from our team. Check it out here: product.canva.com
> . ***
> ** Empowering the world
> to design
> Also, we're hiring. Apply here!
> 
>  
>  
> 
>
>
>
>
>
>


Best way to train model for Solr LTR

2018-08-28 Thread Zheng Lin Edwin Yeo
Hi,

I am using Solr 7.4.0, and I would like to find out what is the best way to
train the model for the Solr Learning to Rank (LTR)?

So far these are the ways that I found:
- LibLinear, LibSVM
- LambdaMART
- RankLib
- NDCG

Regards,
Edwin


Re: Solr LTS and EOL

2018-08-28 Thread Jan Høydahl
Although I wrote that paragraph in the community page, it was never the 
intention to give the impression of a formal LTS system, more meant as an 
analogy to easier understand the release policy for that branch. So we should 
probably avoid the term LTS altogether. What about LTP "Long Term Patching", 
avoiding the word "Support" :)

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 28. aug. 2018 kl. 15:20 skrev Dan Untenzu :
> 
> Hey Shawn,
> 
> thanks a lot for your clarification, all questions answered.
> 
> Your message should indeed find it's way onto the community page.
> 
> Thanks.
> 
> Dan
> 
> Am 28.08.2018 um 13:18 schrieb Shawn Heisey:
>> On 8/28/2018 2:59 AM, Dan Untenzu wrote:
>>> I would like to get some feedback about LTS & EOL timeframes in Solr.
>>> 
>>> The Solr website states that "6.4.x" is a LTS version and "7.x" is the
>>> current mayor version (https://lucene.apache.org/solr/community.html).
>>> 
>>> Question 1: Shouldn't it use "6.x", since version 6.6.5 is the latest
>>> release of the 6 branch.
>>> 
>>> Question 2: How long is the LTS timeframe - 6 / 12 / 36 months? When is
>>> EOL of version 6.x?
>>> 
>>> It would be nice to have some roadmap/timeframe on the download or
>>> community page. Right now an admin can not tell whether they should
>>> prefer the LTS over the mayor version, because maybe EOL of version 6 is
>>> just next week.
>> 
>> Here's the long-winded version of how things are done:
>> 
>> I have never heard of any specific timeframes, and I have never before
>> heard of any release being designated LTS.  Releases are not made on a
>> set schedule.  Because of that, there is not a specific number of months
>> that each release gets supported.
>> 
>> The current stable branch is 7.x.  Solr 5.x and earlier are effectively
>> dead -- changes will not be made.  The previous major version, Solr 6.x
>> (specifically, the 6.6.x branch), is in maintenance mode, which
>> basically means that there's a much higher standard for whether a
>> problem gets fixed in that branch than there is for the stable branch.
>> 
>> Problems in a 7.x version will only be tackled if they are problems in
>> the *current* 7.x release.  As of right now, that is 7.4.0.  So if you
>> find an issue tomorrow in version 7.2.1, a fix will only be found in the
>> next release -- 7.5.0.  If enough problems of the right kind are found
>> after a minor (7.x.0) release, there may be point releases in that minor
>> version, but normally once a new minor release is made, a previous minor
>> release in the current major version will not be supplemented with point
>> releases.
>> 
>> Problems in 6.x must be problems in the current 6.x release, currently
>> 6.6.5, and they must be either MAJOR bugs with no workaround, or a
>> problem that is extremely trivial to fix -- a patch that is very
>> unlikely to introduce NEW bugs.  If a new 6.x version is released, it
>> will be a new point release on the last minor version -- 6.6.x.
>> 
>> When 8.0 gets released, 6.x is dead and the latest minor release branch
>> for 7.x goes to maintenance mode.  There is no specific date planned for
>> any release.  A release is made when one of the committers decides it's
>> time and volunteers to be the release manager.
>> 
>> The community page needs a bit of an overhaul so it says what I just
>> told you.
>> 
>> As for which release you should run ... typically that's the latest
>> release.  All releases are considered stable unless they are very
>> specifically labeled ALPHA or BETA.  Only two releases so far have ever
>> had those designations -- 4.0-ALPHA and 4.0-BETA.
>> 
>> I personally would avoid a new major version until a few minor releases
>> are made -- so I would have no plans to run 8.0, but I might run 8.2 or
>> 8.3.
>> 
>> Thanks,
>> Shawn
>> 



Re: Multiple solr instances per host vs Multiple cores in same solr instance

2018-08-28 Thread Erick Erickson
Bernd:

If you only knew how many times I've had the conversation "No, I can't
tell you what's best, you have to test with _your_ data on _your_
hardware with _your_ queries"  ;)

I suspect, but have no real proof, that GC is the biggest difference,
Solr has we call "the laggard problem". Since one replica from each
shard _must_ respond (twice) before the query returns, the slowest
replica to respond governs the total response for any individual
query. But that's a guess. The CPU utilization might give a clue, but
if it is GC then some of the CPU cycles are being used for GC, so that
isn't definitive.

Best,
Erick
On Tue, Aug 28, 2018 at 12:37 AM Bernd Fehling
 wrote:
>
> Yes, I tested many cases.
> As I already mentioned 3 Server as 3x3 SolrCloud cluster.
> - 12 Mio. data records from our big single index
> - always the same queries (SWD, german keyword norm data)
> - Apache jmeter 3.1 for the load (separate server)
> - Haproxy 1.6.11 with roundrobin (separate server)
> - no autowarming in solr
> - always with any setup, one first (cold) run (to see how the system behaves 
> with empty caches)
> - afterwards two (warm) runs with filled caches from first and second run
> - all this with preferLocalShards set to true and false
> - and all this with single instance multicore and multi instance multinode.
> That was a lot of testing, starting, stopping, loading test data...
>
> The difference between single instance and multi instance was that
> single instance per server got 12GB JAVA heap (because it had to handle 3 
> cores)
> and multi instance got 4GB JAVA heap per instance (because each instance had 
> to handle just 1 core).
>
> No real difference in CPU/memory utilization, but I used different
> heap size between single instance and multi instance (see above).
> But the response time with multi instance is much better and gives higher 
> performance.
> Between 30 and 60 QPS multi instance is about 1.5 times better than single 
> instance
> in my test case with my test data ... and so on, but the Cloud is much more 
> complex.
>
> preferLocalShards really gives advantage in 3x3 or 5x5 SolrCloud but I don't
> know how it would compare to say 5x3 (5 server, 5 shards, 3 replicas).
>
> Servers in total:
> - 3 VM server on 3 different XEN hosts connected with 2 Gigabit Networks
>(the discs were not SSD as in our production system, just 15rpm spinning 
> discs)
>3 zookeeper, one on each server but separate instances (not the solr 
> internal ones)
> - 1 extra server for haproxy
> - 1 extra server for Apache jmeter
>
> It's hard to tell where the bottleneck is, at least not with 60QPS and with 
> spinning discs.
> SSD as storage and separate physical server boxes will increase performance.
>
> I think the matter is how complex is your data in the index, your query and 
> query analysis.
> My query not very easy, rows=100, facet.limit=100, 9 facet.fields and a boost 
> with bq.
> If you have rows=10 and facet=false without bq you will get higher 
> performance.
>
> Regards
> Bernd
>
>
> Am 27.08.2018 um 22:45 schrieb Wei:
> > Thanks Bernd.  Do you have preferLocalShards=true in both cases? Do you
> > notice CPU/memory utilization difference between the two deployments? How
> > many servers did you use in total?  I am curious what's the bottleneck for
> > the one instance and 3 cores configuration.
> >
> > Thanks,
> > Wei
> >
> > On Mon, Aug 27, 2018 at 1:45 AM Bernd Fehling <
> > bernd.fehl...@uni-bielefeld.de> wrote:
> >
> >> My tests with many combinations (instance, node, core) on a 3 server
> >> cluster
> >> with SolrCloud pointed out that highest performance is with multiple solr
> >> instances and shards and replicas placed by rules so that you get advantage
> >> from preferLocalShards=true.
> >>
> >> The disadvantage ist the handling of the system, which means setup,
> >> starting
> >> and stopping, setting up the shards and replicas with rules and so on.
> >>
> >> I tested with 3x3 SolrCloud (3 shards, 3 replicas).
> >> A 3x3 system with one instance and 3 cores per host could handle up to
> >> 30QPS.
> >> A 3x3 system with multi instance (different ports, single core and shard
> >> per
> >> instance) could handle 60QPS on same hardware with same data.
> >>
> >> Also, the single instance per server setup has spikes in the response time
> >> graph
> >> which are not seen with a multi instance setup.
> >>
> >> Tested about 2 month ago with SolCloud 6.4.2.
> >>
> >> Regards,
> >> Bernd
> >>
> >>
> >> Am 26.08.2018 um 08:00 schrieb Wei:
> >>> Hi,
> >>>
> >>> I have a question about the deployment configuration in solr cloud.  When
> >>> we need to increase the number of shards in solr cloud, there are two
> >>> options:
> >>>
> >>> 1.  Run multiple solr instances per host, each with a different port and
> >>> hosting a single core for one shard.
> >>>
> >>> 2.  Run one solr instance per host, and have multiple cores(shards) in
> >> the
> >>> same solr instance.
> >>>
> >>> Which would be better per

RE: Issue with adding an extra Solr Slave

2018-08-28 Thread Zafar Khurasani
Hi Emir,

I access the admin console through the ELB. I do NOT see any replication errors 
in the new Slave's logs. I also double checked to make sure the connectivity 
between the master and slaves exist. The only error I see in the new Slave log 
is what I shared originally.

Thanks,
Zafar.



-Original Message-
From: Emir Arnautović [mailto:emir.arnauto...@sematext.com] 
Sent: Tuesday, August 28, 2018 2:55 AM
To: solr-user@lucene.apache.org
Subject: Re: Issue with adding an extra Solr Slave

Hi Zafar,
How do you access admin console? Through ELB or you see this behaviour when 
accessing admin console of a new slave? Do you see any replication related 
errors in new slave’s logs? Did you check connectivity of a new slave and 
master nodes?

Thanks,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch 
Consulting Support Training - http://sematext.com/



> On 27 Aug 2018, at 16:52, Zafar Khurasani  
> wrote:
> 
> Hi,
> 
> I'm running Solr 5.3 in one of our applications. Currently, we have 
> one Solr Master and one Solr slave running on AWS EC2 instances. I'm 
> trying to add an additional Solr slave. I'm using an Elastic 
> LoadBalancer (ELB) in front of my Slaves. I see the following error in 
> the logs after adding the second slave,
> 
> 
> java version "1.8.0_121"
> 
> Solr version: 5.3.0 1696229
> 
> 
> org.apache.solr.common.SolrException: Core with core name [xxx-xxx-] does 
> not exist.
>at 
> org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:770)
>at 
> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestInternal(CoreAdminHandler.java:240)
>at 
> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:194)
>at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
>at 
> org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:675)
>at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:443)
>at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:210)
>at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)
>at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
>at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
>at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
>at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
>at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
>at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
>at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
>at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
>at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
>at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
>at org.eclipse.jetty.server.Server.handle(Server.java:499)
>at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
>at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
>at 
> org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
>at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
>at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
>at java.lang.Thread.run(Thread.java:745)
> 
> 
> Also, when I hit the Solr Admin UI, I'm able to see my core infrequently. I 
> have to refresh the page multiple times to be able to see it.  What's the 
> right way to add a slave to my existing setup?
> 
> FYI - the Solr Replication section in solrconfig.xml is exactly the same for 
> both the Slaves.
> 
> Thanks,
> Zafar Khurasani
> 



Issues enabling zk 3.4.10 ACLs for Solr 7.2

2018-08-28 Thread Ana Maria
  Hello, I am working on a project implementing Zookeeper and Solr cloud on a 
cluster with 3 servers. I need to secure my zookeeper nodes so that they can 
only communicate among themselves, I tried implementing ACLs according to the 
documentation 
(https://lucene.apache.org/solr/guide/7_2/zookeeper-access-control.html) but I 
am still able to update a file on the cluster from another server outside the 
cluster, which means ACLs are not working properly. Here are the changes I 
made: solr-7.2.1/server/solr/solr.xml  ${host:}    ${jetty.port:8983}    
${hostContext:solr} ${genericCoreNodeNames:true} ${zkClientTimeout:3}    ${distribUpdateSoTimeout:60}    ${distribUpdateConnTimeout:6}    
#${zkCredentialsProvider:org.apache.solr.common.cloud.DefaultZkCredentialsProvider}
    ${zkCredentialsProvider:org.apache.solr.common.cloud.VMParamsSingleSetCredentialsDigestZkCredentialsProvider}
    #${zkACLProvider:org.apache.solr.common.cloud.DefaultZkACLProvider}
    ${zkACLProvider:org.apache.solr.common.cloud.VMParamsAllAndReadonlyDigestZkACLProvider}
 
/solr-7.2.1/server/scripts/cloud-scripts/zkcli.sh:
# Settings for ZK ACL
SOLR_ZK_CREDS_AND_ACLS="-DzkACLProvider=org.apache.solr.common.cloud.VMParamsAllAndReadonlyDigestZkACLProvider
 \
  
-DzkCredentialsProvider=org.apache.solr.common.cloud.VMParamsSingleSetCredentialsDigestZkCredentialsProvider
 \
  -DzkDigestUsername=admin -DzkDigestPassword=CHANGE"
#  -DzkDigestReadonlyUsername=readonly-user 
-DzkDigestReadonlyPassword=CHANGEME-READONLY-PASSWORD"

/solr-7.2.1/bin/solr.in.sh:
# Settings for ZK ACL
SOLR_ZK_CREDS_AND_ACLS="-DzkACLProvider=org.apache.solr.common.cloud.VMParamsAllAndReadonlyDigestZkACLProvider
 \
  
-DzkCredentialsProvider=org.apache.solr.common.cloud.VMParamsSingleSetCredentialsDigestZkCredentialsProvider
 \
  -DzkDigestUsername=admin -DzkDigestPassword=CHANGE"
#  -DzkDigestReadonlyUsername=readonly-user 
-DzkDigestReadonlyPassword=CHANGEME-READONLY-PASSWORD"
SOLR_OPTS="$SOLR_OPTS $SOLR_ZK_CREDS_AND_ACLS"
I would appreciate some input as to enabling ACLs and securing the zookeeper 
cluster.
Thank you,Ana

Re: Issue with adding an extra Solr Slave

2018-08-28 Thread Emir Arnautović
Hi Zafar,
Slaves are separate nodes and accessing admin console through ELB does not make 
much sense since different requests will go to different nodes and that’s why 
you sometimes see cores and other time it is empty. Since it is empty, it seems 
that you did not define core(s) on this new slave. Replication handler is 
defined on core level so I am not sure what you mean that solrconfig.xml are 
the same on both servers.

What you need to do is create new core on new slave. Make sure replication 
handler is properly configured and that master is reachable (try pinging 
replication handler of master from slave). Issue fetch index command for new 
slave (http://slave_host:port/solr/core_name/replication?command=fetchindex). 
And when checking in admin console, use slave’s IP, not ELB.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 28 Aug 2018, at 21:03, Zafar Khurasani  
> wrote:
> 
> Hi Emir,
> 
> I access the admin console through the ELB. I do NOT see any replication 
> errors in the new Slave's logs. I also double checked to make sure the 
> connectivity between the master and slaves exist. The only error I see in the 
> new Slave log is what I shared originally.
> 
> Thanks,
> Zafar.
> 
> 
> 
> -Original Message-
> From: Emir Arnautović [mailto:emir.arnauto...@sematext.com] 
> Sent: Tuesday, August 28, 2018 2:55 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Issue with adding an extra Solr Slave
> 
> Hi Zafar,
> How do you access admin console? Through ELB or you see this behaviour when 
> accessing admin console of a new slave? Do you see any replication related 
> errors in new slave’s logs? Did you check connectivity of a new slave and 
> master nodes?
> 
> Thanks,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection Solr & 
> Elasticsearch Consulting Support Training - http://sematext.com/
> 
> 
> 
>> On 27 Aug 2018, at 16:52, Zafar Khurasani  
>> wrote:
>> 
>> Hi,
>> 
>> I'm running Solr 5.3 in one of our applications. Currently, we have 
>> one Solr Master and one Solr slave running on AWS EC2 instances. I'm 
>> trying to add an additional Solr slave. I'm using an Elastic 
>> LoadBalancer (ELB) in front of my Slaves. I see the following error in 
>> the logs after adding the second slave,
>> 
>> 
>> java version "1.8.0_121"
>> 
>> Solr version: 5.3.0 1696229
>> 
>> 
>> org.apache.solr.common.SolrException: Core with core name [xxx-xxx-] 
>> does not exist.
>>   at 
>> org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:770)
>>   at 
>> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestInternal(CoreAdminHandler.java:240)
>>   at 
>> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:194)
>>   at 
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
>>   at 
>> org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:675)
>>   at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:443)
>>   at 
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:210)
>>   at 
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)
>>   at 
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
>>   at 
>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
>>   at 
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>>   at 
>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
>>   at 
>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
>>   at 
>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
>>   at 
>> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
>>   at 
>> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>>   at 
>> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
>>   at 
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>>   at 
>> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
>>   at 
>> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
>>   at 
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
>>   at org.eclipse.jetty.server.Server.handle(Server.java:499)
>>   at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
>>   at 
>> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
>>   at 
>> org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
>>   at 
>> org.eclipse.jetty.util.thread.Q

Re: Spring Content Error in Plugin

2018-08-28 Thread Zimmermann, Thomas
In case anyone else runs into this, I tracked it down. I had to force maven to 
explicitly include all of it’s dependent jars in the plugin jar using the 
assembly plugin in the pom like so:




  

  maven-assembly-plugin

  2.5.3

  



  jar-with-dependencies



  

  



  make-assembly

  package

  

single

  



  




Cheers!

TZ

From: Tom mailto:tzimmerm...@techtarget.com>>
Date: Monday, August 27, 2018 at 11:32 PM
To: "solr-user@lucene.apache.org" 
mailto:solr-user@lucene.apache.org>>
Subject: Spring Content Error in Plugin

Hi,

We have a custom java plugin that leverages the UpdateRequestProcessorFactory 
to push data to multiple cores when a single core is written to. We are 
building the plugin with maven, deploying it to /solr/lib and sourcing the jar 
via a lib directive in our solr config. It currently works correctly in our 
Solr 5.x cluster.

In Solr 7 when attempting to create the core, the plugin is failing with the 
long stack trace later in this post, but it seems to boil down to solr not 
finding a the Spring Context jar (Caused by: java.lang.ClassNotFoundException: 
org.springframework.context.ConfigurableApplicationContext

The jar is imported in the file, and maven has a dependency to bring it in. The 
maven build works perfectly, the dependency is built and the jar is generated.

Any ideas on a starting point for tracking this down? I’ve dug through a bunch 
of stack overflow’s with the same issue but not directly tied to solr and had 
no luck.

Thanks!

POM



org.springframework

org.springframework.context

3.2.2.RELEASE




Error

ERROR - 2018-08-28 03:15:54.253; [c:vignette_de s:shard1 r:core_node5 
x:vignette_de_shard1_replica_n2] org.apache.solr.handler.RequestHandlerBase; 
org.apache.solr.common.SolrException: Error CREATEing SolrCore 
'vignette_de_shard1_replica_n2': Unable to create core 
[vignette_de_shard1_replica_n2] Caused by: 
org.springframework.context.ConfigurableApplicationContext

at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1084)

at 
org.apache.solr.handler.admin.CoreAdminOperation.lambda$static$0(CoreAdminOperation.java:94)

at 
org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:380)

at 
org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:395)

at 
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:180)

at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)

at org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:734)

at 
org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:715)

at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:496)

at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377)

at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323)

at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634)

at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)

at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)

at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)

at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)

at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)

at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)

at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)

at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1253)

at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)

at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)

at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)

at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)

at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1155)

at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)

at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)

at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)

at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)

at 
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)

at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)

at org.eclipse.jetty.server.Server.handle(Server.java:531)

at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352)

at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.jav

missing jmx stats for num_docs and max_doc

2018-08-28 Thread Zehua Liu
Hi,

We are running a 7.4.0 solr cluster with 3 tlogs and a few pulls. There is
one collection divided into 8 shards, with each tlog has all 8 shards, and
each pull either has shard1 to 4 or shard5 to 8.

When using jmx to collect num_docs metrics via datadog, we found that the
metrics for some shards are missing. For example, on one tlog, we saw only
num_docs stats for shard3/4/5/8 and on another shard1/2/3/4/5/8. There
seems to be more max_doc, but it's also missing for some shards.

This only happens to the tlog instances so far. Restarting the solr process
does not help.

Did anyone encounter this before? What should I do next to continue to
troubleshoot this?


Thanks,
Zehua


Re: “solr.data.dir” can only config a single directory

2018-08-28 Thread zhenyuan wei
Pretty cool,here creates an issue to  put this discussion into practice.
issues: https://issues.apache.org/jira/browse/SOLR-12713

Best,
TinsWzy

Erick Erickson  于2018年8月28日周二 下午11:51写道:

> Patches welcome.
>
> On Mon, Aug 27, 2018, 23:03 zhenyuan wei  wrote:
>
> > But this is not a common way to do so, I mean, nobody want to ADDREPLICA
> > after collection was created.
> >
> > Erick Erickson  于2018年8月28日周二 下午1:24写道:
> >
> > > Every _replica_ can point to a different disk. When you do an
> > > ADDREPLICA, then you can supply whatever path to the data
> > > you desire. And you can have as many replicas per Solr instance
> > > as makes sense.
> > >
> > > Best,
> > > Erick
> > > On Mon, Aug 27, 2018 at 8:48 PM zhenyuan wei 
> wrote:
> > > >
> > > > @Christopher Schultz
> > > > So  you mean that  one 4TB disk is the same as  four  1TB disks ?
> > > > HDFS、cassandra、ES can do so, multi data path maybe maximize
> > indexing
> > > > throughput   in same cases.
> > > > click links
> > > > 
> > > with
> > > > some explain
> > > >
> > > >
> > > > Christopher Schultz  于2018年8月28日周二
> > > 上午11:16写道:
> > > >
> > > > > -BEGIN PGP SIGNED MESSAGE-
> > > > > Hash: SHA256
> > > > >
> > > > > Shawn,
> > > > >
> > > > > On 8/27/18 22:37, Shawn Heisey wrote:
> > > > > > On 8/27/2018 8:29 PM, zhenyuan wei wrote:
> > > > > >> I found the  “solr.data.dir” can only config a single directory.
> > > > > >> I think it is necessary to be config  multi dirs,such as
> > > > > >> ”solr.data.dir:/mnt/disk1,/mnt/disk2,/mnt/disk3" , due to one
> > > > > >> disk overload or capacity limitation.  Any reason to support why
> > > > > >> not do so?
> > > > > >
> > > > > > Nobody has written the code to support it.  It would very likely
> > > > > > not be easy code to write.  Supporting one directory for that
> > > > > > setting is pretty easy ... it would require changing a LOT of
> > > > > > existing code to support more than one.
> > > > >
> > > > > Also, there are better ways to do this:
> > > > >
> > > > > - - multi-node Solr with sharding
> > > > > - - LVM or similar with multi-disk volumes
> > > > > - - ZFS surely has something for this
> > > > > - - buy a bigger disk (disk is cheap!)
> > > > > - - etc.
> > > > >
> > > > > - -chris
> > > > > -BEGIN PGP SIGNATURE-
> > > > > Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
> > > > >
> > > > > iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAluEvn8ACgkQHPApP6U8
> > > > > pFgTTg//ayed4AXtocVrB6e/ZK0eWz5/E1Q7Oa7kF0c34l0MH6BIe4iOHDmrR+J9
> > > > > A+t6SzVQqURMrDE8plg/xbPTlyGF8wGrEjZUZF4fpWlgnY/qNYxl5S9zJ1hPgBh7
> > > > > fCKkb+LuLGdZMM4oORfCYtMgpDjOnLihHmDTfkrvZzyZwOQGeFpgEZDZKFYAjcur
> > > > > wqIGTMTTWfSCoPQgQzvI8Husq7Rs75BEc+mAkaPOL0LvT9PQDEPEXXt3Kf5vXgM+
> > > > > Eet1ymltZM/Xz+V/em/oeumCoCE18uxi9seuDhTpHRLjS9tCBbPWA0NmobriY3ct
> > > > > GskwCnsFDAeGjG/7dcA/zmB8BK4t6JpUvI+OcJU5dvQczpQbhB9WT4GQUiME9Tvr
> > > > > RjBES53HoEEKA8gb0kiuPN1pE2MSX8vO3uKpQtzVS2MOmuOeV/IebrnP/zLTll18
> > > > > awtWWbPmzaAGAUfXL2ExK0+ism0o31i46CNfLfBBM8jh3lkc2HNdz5TLe8YfN3Sp
> > > > > Tj0HfmYynhtH1CggOAcI1M4PIEbIGfoywX/ICSGHnLwfQoDUnBmjqXhGkFUIstWk
> > > > > Dcntx+4E4NRny6zDZfg5UMjWYyo+fOVSoaDf6dfgBWIB1I3xPn5Dt0In7+oRtZ9i
> > > > > Xlkw6DSaSZZ5caBqjaF278xj7IwEw2zipLPWB7hVCcUhKuJBbDY=
> > > > > =rbrT
> > > > > -END PGP SIGNATURE-
> > > > >
> > >
> >
>