Re: off-heap OOM

2020-05-01 Thread Mikhail Khludnev
> java.lang.OutOfMemoryError: unable to create new native thread
Usually mean code flaw, but there is a workaround to trigger heap GC.
It happens when app creates threads instead of proper pooling, and no GC
occurs, so java Thread objects hangs in heap in stopped state, but every of
them holds a native thread handler; and system run out of native threads
sooner or later. So, in this case reducing heap size, frees native thread
and app is able to recycle them. But you are right, it's rather better to
disable it.
Also, check docker host log, there's a specific error message for java
under docker.

On Fri, May 1, 2020 at 3:55 AM Raji N  wrote:

> It used to occur every 3 days ,we reduced heap and it started
> occurring every 5 days .  From the logs we can't get much. Some times we
> see "unable to create  new native thread" in the logs and many times no
> exceptions .
> When it says "unable to create native thread" error , we got below
> exceptions as we use cdcr. To eliminate cdcr from this issue , we disabled
> CDCR also. But we still get OOM.
>
>  WARN  (cdcr-update-log-synchronizer-93-thread-1) [   ]
> o.a.s.h.CdcrUpdateLogSynchronizer Caught unexpected exception
>
> java.lang.OutOfMemoryError: unable to create new native thread
>
>at java.lang.Thread.start0(Native Method) ~[?:1.8.0_211]
>
>at java.lang.Thread.start(Thread.java:717) ~[?:1.8.0_211]
>
>at
>
> org.apache.http.impl.client.IdleConnectionEvictor.start(IdleConnectionEvictor.java:96)
> ~[httpclient-4.5.3.jar:4.5.3]
>
>at
>
> org.apache.http.impl.client.HttpClientBuilder.build(HttpClientBuilder.java:1219)
> ~[httpclient-4.5.3.jar:4.5.3]
>
>at
>
> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:319)
> ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
> - nknize - 2018-12-07 14:47:53]
>
>at
>
> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:330)
> ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
> - nknize - 2018-12-07 14:47:53]
>
>at
>
> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:268)
> ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
> - nknize - 2018-12-07 14:47:53]
>
>at
>
> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:255)
> ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
> - nknize - 2018-12-07 14:47:53]
>
>at
>
> org.apache.solr.client.solrj.impl.HttpSolrClient.(HttpSolrClient.java:200)
> ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
> - nknize - 2018-12-07 14:47:53]
>
>at
>
> org.apache.solr.client.solrj.impl.HttpSolrClient$Builder.build(HttpSolrClient.java:957)
> ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
> - nknize - 2018-12-07 14:47:53]
>
>at
>
> org.apache.solr.handler.CdcrUpdateLogSynchronizer$UpdateLogSynchronisation.run(CdcrUpdateLogSynchronizer.java:139)
> [solr-core-7.6.0.jar:7.6.0-SNAPSHOT
> 34d82ed033cccd8120431b73e93554b85b24a278 - i843100 - 2019-09-30
> 14:02:46]
>
>at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> [?:1.8.0_211]
>
>at
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> [?:1.8.0_211]
>
>at
>
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> [?:1.8.0_211]
>
>at
>
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> [?:1.8.0_211]
>
>at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [?:1.8.0_211]
>
>at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [?:1.8.0_211]
>
> Thanks,
> Raji
> On Thu, Apr 30, 2020 at 12:24 AM Mikhail Khludnev  wrote:
>
> > Raji, how that "OOM for solr occur in every 5 days." exactly looks like?
> > What is the error message? Where it's occurring exactly?
> >
> > On Thu, Apr 30, 2020 at 1:30 AM Raji N  wrote:
> >
> > > Thanks so much Jan. Will try your suggestions , yes we are also running
> > > solr inside docker.
> > >
> > > Thanks,
> > > Raji
> > >
> > > On Wed, Apr 29, 2020 at 1:46 PM Jan Høydahl 
> > wrote:
> > >
> > > > I have seen the same, but only in Docker.
> > > > I think it does not relate to Solr’s off-heap usage for filters and
> > other
> > > > data structures, but rather how Docker treats memory-mapped files as
> > > > virtual memory.
> > > > As you know, when using MMapDirectoryFactory, you actually let Linux
> > > > handle the loading and unloading of the index files, and Solr will
> > access
> > > > them as if they were in a huge virtual memory pool. Naturally the
> index
> > > > f

Re: off-heap OOM

2020-05-01 Thread Raji N
Thanks for your  reply . Sure will take a look at the docker host log.  But
even when we got "unable to create new native thread" error , the heap dump
taken within hour before (we have hourly heap generation) the OOM did not
have more than 150 to 160 threads. So it doesn't look like it happens due
to running out of threads. Rather suspecting it happens because there is no
native memory?.

Thanks,
Raji

On Fri, May 1, 2020 at 12:13 AM Mikhail Khludnev  wrote:

> > java.lang.OutOfMemoryError: unable to create new native thread
> Usually mean code flaw, but there is a workaround to trigger heap GC.
> It happens when app creates threads instead of proper pooling, and no GC
> occurs, so java Thread objects hangs in heap in stopped state, but every of
> them holds a native thread handler; and system run out of native threads
> sooner or later. So, in this case reducing heap size, frees native thread
> and app is able to recycle them. But you are right, it's rather better to
> disable it.
> Also, check docker host log, there's a specific error message for java
> under docker.
>
> On Fri, May 1, 2020 at 3:55 AM Raji N  wrote:
>
> > It used to occur every 3 days ,we reduced heap and it started
> > occurring every 5 days .  From the logs we can't get much. Some times we
> > see "unable to create  new native thread" in the logs and many times no
> > exceptions .
> > When it says "unable to create native thread" error , we got below
> > exceptions as we use cdcr. To eliminate cdcr from this issue , we
> disabled
> > CDCR also. But we still get OOM.
> >
> >  WARN  (cdcr-update-log-synchronizer-93-thread-1) [   ]
> > o.a.s.h.CdcrUpdateLogSynchronizer Caught unexpected exception
> >
> > java.lang.OutOfMemoryError: unable to create new native thread
> >
> >at java.lang.Thread.start0(Native Method) ~[?:1.8.0_211]
> >
> >at java.lang.Thread.start(Thread.java:717) ~[?:1.8.0_211]
> >
> >at
> >
> >
> org.apache.http.impl.client.IdleConnectionEvictor.start(IdleConnectionEvictor.java:96)
> > ~[httpclient-4.5.3.jar:4.5.3]
> >
> >at
> >
> >
> org.apache.http.impl.client.HttpClientBuilder.build(HttpClientBuilder.java:1219)
> > ~[httpclient-4.5.3.jar:4.5.3]
> >
> >at
> >
> >
> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:319)
> > ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
> > - nknize - 2018-12-07 14:47:53]
> >
> >at
> >
> >
> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:330)
> > ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
> > - nknize - 2018-12-07 14:47:53]
> >
> >at
> >
> >
> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:268)
> > ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
> > - nknize - 2018-12-07 14:47:53]
> >
> >at
> >
> >
> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:255)
> > ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
> > - nknize - 2018-12-07 14:47:53]
> >
> >at
> >
> >
> org.apache.solr.client.solrj.impl.HttpSolrClient.(HttpSolrClient.java:200)
> > ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
> > - nknize - 2018-12-07 14:47:53]
> >
> >at
> >
> >
> org.apache.solr.client.solrj.impl.HttpSolrClient$Builder.build(HttpSolrClient.java:957)
> > ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
> > - nknize - 2018-12-07 14:47:53]
> >
> >at
> >
> >
> org.apache.solr.handler.CdcrUpdateLogSynchronizer$UpdateLogSynchronisation.run(CdcrUpdateLogSynchronizer.java:139)
> > [solr-core-7.6.0.jar:7.6.0-SNAPSHOT
> > 34d82ed033cccd8120431b73e93554b85b24a278 - i843100 - 2019-09-30
> > 14:02:46]
> >
> >at
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> > [?:1.8.0_211]
> >
> >at
> > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> > [?:1.8.0_211]
> >
> >at
> >
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> > [?:1.8.0_211]
> >
> >at
> >
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> > [?:1.8.0_211]
> >
> >at
> >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> > [?:1.8.0_211]
> >
> >at
> >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> > [?:1.8.0_211]
> >
> > Thanks,
> > Raji
> > On Thu, Apr 30, 2020 at 12:24 AM Mikhail Khludnev 
> wrote:
> >
> > > Raji, how that "OOM for solr occur in every 5 days." exactly looks
> like?
> > > What is the error message? Where it's occurring exactly?
> > >
> > > On Thu, Apr 30, 20

How to use percentage in collection specific policy

2020-05-01 Thread saicharan.k...@spglobal.com
Hello,

I was trying to use a collection specific policy with sysprop and percentage 
for replica

{
"set-policy": {
"generalpolicy": [
{
"replica": "33%",
"shard": "#EACH",
"sysprop.key": "general"
}
]
}
}

But when I am trying to create  a collection

http://search-solr2-av.midevcld.spglobal.com:8983/solr/admin/collections?action=CREATE&name=generalcollection&numShards=6&replicationFactor=3&policy=generalpolicy&maxShardsPerNode=6

its violating the sysprop.key parameter

Thanks,
Charan



The information contained in this message is intended only for the recipient, 
and may be a confidential attorney-client communication or may otherwise be 
privileged and confidential and protected from disclosure. If the reader of 
this message is not the intended recipient, or an employee or agent responsible 
for delivering this message to the intended recipient, please be aware that any 
dissemination or copying of this communication is strictly prohibited. If you 
have received this communication in error, please immediately notify us by 
replying to the message and deleting it from your computer. S&P Global Inc. 
reserves the right, subject to applicable local law, to monitor, review and 
process the content of any electronic message or information sent to or from 
S&P Global Inc. e-mail addresses without informing the sender or recipient of 
the message. By sending electronic message or information to S&P Global Inc. 
e-mail addresses you, as the sender, are consenting to S&P Global Inc. 
processing any of your personal data therein.


Re: Use TopicStream as percolator

2020-05-01 Thread SOLR4189
Hi everyone,

I wrote SOLR Update Processor that wraps Luwak library and implements Saved
Searches a la ElasticSearch Percolator.

https://github.com/SOLR4189/solcolator

for anyone who wants to use.



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Use TopicStream as percolator

2020-05-01 Thread Charlie Hull
Great! I ran Flax, where we created Luwak, up to last year when we 
merged with OSC, so this is great to see.


Did you know we donated Luwak to Lucene recently? 
https://issues.apache.org/jira/browse/LUCENE-8766


It would be great to work this up into a Solr contrib module

Charlie
..
Berlin Buzzwords, MICES and Haystack come together for an awesome merged 
online search conference! Check out www.haystackconf.com for news


On 01/05/2020 09:56, SOLR4189 wrote:

Hi everyone,

I wrote SOLR Update Processor that wraps Luwak library and implements Saved
Searches a la ElasticSearch Percolator.

https://github.com/SOLR4189/solcolator

for anyone who wants to use.



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html



--
Charlie Hull
OpenSource Connections, previously Flax

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.o19s.com



Re: off-heap OOM

2020-05-01 Thread Mikhail Khludnev
I don't know exactly, but couldn't it hit host-wide threads limit
limitation?

On Fri, May 1, 2020 at 11:02 AM Raji N  wrote:

> Thanks for your  reply . Sure will take a look at the docker host log.  But
> even when we got "unable to create new native thread" error , the heap dump
> taken within hour before (we have hourly heap generation) the OOM did not
> have more than 150 to 160 threads. So it doesn't look like it happens due
> to running out of threads. Rather suspecting it happens because there is no
> native memory?.
>
> Thanks,
> Raji
>
> On Fri, May 1, 2020 at 12:13 AM Mikhail Khludnev  wrote:
>
> > > java.lang.OutOfMemoryError: unable to create new native thread
> > Usually mean code flaw, but there is a workaround to trigger heap GC.
> > It happens when app creates threads instead of proper pooling, and no GC
> > occurs, so java Thread objects hangs in heap in stopped state, but every
> of
> > them holds a native thread handler; and system run out of native threads
> > sooner or later. So, in this case reducing heap size, frees native thread
> > and app is able to recycle them. But you are right, it's rather better to
> > disable it.
> > Also, check docker host log, there's a specific error message for java
> > under docker.
> >
> > On Fri, May 1, 2020 at 3:55 AM Raji N  wrote:
> >
> > > It used to occur every 3 days ,we reduced heap and it started
> > > occurring every 5 days .  From the logs we can't get much. Some times
> we
> > > see "unable to create  new native thread" in the logs and many times no
> > > exceptions .
> > > When it says "unable to create native thread" error , we got below
> > > exceptions as we use cdcr. To eliminate cdcr from this issue , we
> > disabled
> > > CDCR also. But we still get OOM.
> > >
> > >  WARN  (cdcr-update-log-synchronizer-93-thread-1) [   ]
> > > o.a.s.h.CdcrUpdateLogSynchronizer Caught unexpected exception
> > >
> > > java.lang.OutOfMemoryError: unable to create new native thread
> > >
> > >at java.lang.Thread.start0(Native Method) ~[?:1.8.0_211]
> > >
> > >at java.lang.Thread.start(Thread.java:717)
> ~[?:1.8.0_211]
> > >
> > >at
> > >
> > >
> >
> org.apache.http.impl.client.IdleConnectionEvictor.start(IdleConnectionEvictor.java:96)
> > > ~[httpclient-4.5.3.jar:4.5.3]
> > >
> > >at
> > >
> > >
> >
> org.apache.http.impl.client.HttpClientBuilder.build(HttpClientBuilder.java:1219)
> > > ~[httpclient-4.5.3.jar:4.5.3]
> > >
> > >at
> > >
> > >
> >
> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:319)
> > > ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
> > > - nknize - 2018-12-07 14:47:53]
> > >
> > >at
> > >
> > >
> >
> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:330)
> > > ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
> > > - nknize - 2018-12-07 14:47:53]
> > >
> > >at
> > >
> > >
> >
> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:268)
> > > ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
> > > - nknize - 2018-12-07 14:47:53]
> > >
> > >at
> > >
> > >
> >
> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:255)
> > > ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
> > > - nknize - 2018-12-07 14:47:53]
> > >
> > >at
> > >
> > >
> >
> org.apache.solr.client.solrj.impl.HttpSolrClient.(HttpSolrClient.java:200)
> > > ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
> > > - nknize - 2018-12-07 14:47:53]
> > >
> > >at
> > >
> > >
> >
> org.apache.solr.client.solrj.impl.HttpSolrClient$Builder.build(HttpSolrClient.java:957)
> > > ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
> > > - nknize - 2018-12-07 14:47:53]
> > >
> > >at
> > >
> > >
> >
> org.apache.solr.handler.CdcrUpdateLogSynchronizer$UpdateLogSynchronisation.run(CdcrUpdateLogSynchronizer.java:139)
> > > [solr-core-7.6.0.jar:7.6.0-SNAPSHOT
> > > 34d82ed033cccd8120431b73e93554b85b24a278 - i843100 - 2019-09-30
> > > 14:02:46]
> > >
> > >at
> > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> > > [?:1.8.0_211]
> > >
> > >at
> > > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> > > [?:1.8.0_211]
> > >
> > >at
> > >
> > >
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> > > [?:1.8.0_211]
> > >
> > >at
> > >
> > >
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> > > [?:1.8.0_211]
> > >
> > >at
> > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> > > [

Re: SolrCloud degraded during backup and batch CSV update

2020-05-01 Thread matthew sporleder
If the errors happen with garbage collection then potentially, yes.
You should never pause longer than your zk timeout (both sides).


On Thu, Apr 30, 2020 at 11:03 PM Ganesh Sethuraman
 wrote:
>
> Any other JVM settings change possible?
>
> On Tue, Apr 28, 2020, 10:15 PM Sethuraman, Ganesh
>  wrote:
>
> > Hi
> >
> > We are using SolrCloud 7.2.1 with 3 node Zookeeper ensemble. We have 92
> > collection each on avg. having 8 shards and 2 replica with 2 EC2 nodes,
> > with JVM size of 18GB (G1 GC). We need your help with the Issue we faced
> > today: The issue is SolrCloud server went into a degraded collections (for
> > few collections) when the Solr backup and the Solr batch CSV update load
> > happened at the same time as backup. The CSV data load was about ~5 GB per
> > shard/replica. We think this happened after zkClient disconnect happened as
> > noted below.  We had to restart Solr to bring it back to normal.
> >
> >
> >   1.  Is it not suggested to run backup and Solr batch CSV update large
> > load at the same time?
> >   2.  In the past we have seen two CSV batch update load in parallel
> > causes issues, is this also not suggested (this issue is not related to
> > that)?
> >   3.  Do you think we should increase Zookeeper timeout?
> >   4.  How do we know if  we need to up the JVM Max memory, and by how much?
> >   5.  We also see that once the Solr goes into degraded collection and
> > recovery failed, it NEVER get back to normal, even after when there is no
> > load. Is this a bug?
> >
> > The GC information and Solr Log below
> >
> >
> > https://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMjAvMDQvMjkvLS0wMl9zb2xyX2djLmxvZy56aXAtLTEtNDAtMzE=&channel=WEB
> >
> >
> > 2020-04-27 07:34:07.322 WARN
> > (zkConnectionManagerCallback-6-thread-1-processing-n:mysolrsever.com:6010_solr-SendThread(zoo-prd-n1:2181))
> > [   ] o.a.z.ClientCnxn Client session timed out, have not heard from server
> > in 10775ms for sessionid 0x171a6fb51310008
> > 
> > 2020-04-27 07:34:07.426 WARN
> > (zkConnectionManagerCallback-6-thread-1-processing-n:mysolrsever.com:6010_solr-EventThread)
> > [   ] o.a.s.c.c.ConnectionManager zkClient has disconnected
> >
> >
> >
> >
> > SOLR Log Below (Curtailed WARN log)
> > 
> > 2020-04-27 07:26:45.402 WARN
> > (recoveryExecutor-4-thread-697-processing-n:mysolrsever.com:6010_solr
> > x:mycollection_shard13_replica_n48 s:shard13 c:mycollection r:core_node51)
> > [c:mycollection s:shard13 r:core_node51 x:mycollection_shard13_replica_n48]
> > o.a.s.h.IndexFetcher Error in fetching file: _1kr_r.liv (downloaded 0 of
> > 587 bytes)
> > java.io.EOFException
> >   at
> > org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:168)
> >   at
> > org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:160)
> >   at
> > org.apache.solr.handler.IndexFetcher$FileFetcher.fetchPackets(IndexFetcher.java:1579)
> >   at
> > org.apache.solr.handler.IndexFetcher$FileFetcher.fetch(IndexFetcher.java:1545)
> >   at
> > org.apache.solr.handler.IndexFetcher$FileFetcher.fetchFile(IndexFetcher.java:1526)
> >   at
> > org.apache.solr.handler.IndexFetcher.downloadIndexFiles(IndexFetcher.java:1008)
> >   at
> > org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:566)
> >   at
> > org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:345)
> >   at
> > org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:420)
> >   at
> > org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:225)
> >   at
> > org.apache.solr.cloud.RecoveryStrategy.doSyncOrReplicateRecovery(RecoveryStrategy.java:626)
> >   at
> > org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:308)
> >   at
> > org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:292)
> >   at
> > com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
> >   at
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> >   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> >   at
> > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:188)
> >   at
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> >   at
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> >   at java.lang.Thread.run(Thread.java:748)
> > 2020-04-27 07:26:45.405 WARN
> > (recoveryExecutor-4-thread-697-processing-n:mysolrsever.com:6010_solr
> > x:mycollection_shard13_replica_n48 s:shard13 c:mycollection r:core_node51)
> > [c:mycollection s:shard13 r:core_node51 x:mycollection_shard13_replica_n48]
> > o.a.s.h.IndexFetc

Re: Possible issue with Stemming and nouns ended with suffix 'ion'

2020-05-01 Thread Mike Drob
Jhonny,

Are you planning on reporting the issue to snowball, or would you prefer
one of us take care of it?
If you do report it, please share the link to the issue or mail archive
back here so that we know when it is resolved and can update our
dependencies.

Thanks,
Mike

On Thu, Apr 30, 2020 at 5:37 PM Jhonny Lopez 
wrote:

> Yes, sounds like worth it.
>
> Thanks guys!
>
> -Original Message-
> From: Mike Drob 
> Sent: jueves, 30 de abril de 2020 5:30 p. m.
> To: solr-user@lucene.apache.org
> Subject: Re: Possible issue with Stemming and nouns ended with suffix 'ion'
>
> This email has been sent from a source external to Publicis Groupe. Please
> use caution when clicking links or opening attachments.
> Cet email a été envoyé depuis une source externe à Publicis Groupe.
> Veuillez faire preuve de prudence lorsque vous cliquez sur des liens ou
> lorsque vous ouvrez des pièces jointes.
>
>
>
> Is this worth filing a bug/suggestion to the folks over at
> snowballstem.org?
>
> On Thu, Apr 30, 2020 at 4:08 PM Audrey Lorberfeld -
> audrey.lorberf...@ibm.com  wrote:
>
> > I agree with Erick. I think that's just how the cookie crumbles when
> > stemming. If you have some time on your hands, you can integrate
> > OpenNLP with your Solr instance and start using the lemmas of tokens
> > instead of the stems. In this case, I believe if you were to lemmatize
> > both "identify" and "identification," they would both condense to
> "identify."
> >
> > Best,
> > Audrey
> >
> > On 4/30/20, 3:54 PM, "Erick Erickson"  wrote:
> >
> > They are being stemmed to two different tokens, “identif” and
> > “identifi”. Stemming is algorithmic and imperfect and in this case
> > you’re getting bitten by that algorithm. It looks like you’re using
> > PorterStemFilter, if you want you can look up the exact algorithm, but
> > I don’t think it’s a bug, just one of those little joys of English...
> >
> > To get a clearer picture of exactly what’s being searched, try
> > adding &debug=query to your query, in particular looking at the parsed
> > query that’s returned. That’ll tell you a bunch. In this particular
> > case I don’t think it’ll tell you anything more, but for future…
> >
> > Best,
> > Erick
> >
> > On, and un-checking the ‘verbose’ box on the analysis page removes
> > a lot of distraction, the detailed information is often TMI ;)
> >
> > > On Apr 30, 2020, at 2:51 PM, Jhonny Lopez <
> > jhonny.lo...@publicismedia.com> wrote:
> > >
> > > Sure, rewriting the message with links for images:
> > >
> > >
> > > We’re facing an issue with stemming in solr. Most of the cases
> > are working correctly, for example, if we search for bidding, solr
> > brings results for bidding, bid, bids, etc. However, with nouns ended
> with ‘ion’
> > suffix, stemming is not working. Even when analyzers seems to have
> > correct stemming of the word, the results are not reflecting that. One
> > example. If I search ‘identifying’, this is the output:
> > >
> > > Analyzer (image link):
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__1drv.ms_u_s-21AlRTlFq8tQbShd4-2DCp40Cmc0QioS0A-3Fe-3D1f3GJp&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=8Xt1N2A4ODj--DlLb242c8JMnJr6nWQIwcKjiDiA__s&s=U-Wmu118X5bfNxDnADO_6ompf9kUxZYHj1DZM2lG4jo&e=
> > >
> > > A clip of results:
> > > "haschildren_b":false,
> > >"isbucket_text_s":"0",
> > >"sectionbody_t":"\n\n\nIn order to identify 1st price
> > auctions, leverage the proprietary tools available or manually pull a
> > log file report to understand the trends and gauge auction spread
> > overtime to assess the impact of variable auction
> dynamics.\n\n\n\n\n\n\n",
> > >"parsedupdatedby_s":"sitecorecarvaini",
> > >"sectionbody_t_en":"\n\n\nIn order to identify 1st price
> > auctions, leverage the proprietary tools available or manually pull a
> > log file report to understand the trends and gauge auction spread
> > overtime to assess the impact of variable auction
> dynamics.\n\n\n\n\n\n\n",
> > >"hide_section_b":false
> > >
> > >
> > > As you can see, it has used the stemming correctly and brings
> > results for other words based in the root, in this case “Identify”.
> > >
> > > However, if I search for “Identification”, this is the output:
> > >
> > > Analyzer (imagelink):
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__1drv.ms_u_s-21AlRTlFq8tQbShd49RpiQObzMgSjVhA&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=8Xt1N2A4ODj--DlLb242c8JMnJr6nWQIwcKjiDiA__s&s=5RlkLH-90sYc4nyIgnPO9MsBlyh7iWSOphEVdjUvTIE&e=
> > >
> > >
> > > Even with proper stemming, solr is only bringing results for the
> > word identification (or identifications) but nothing else.
> > >
> > > The queries are over the same field that has the Porter Stemming
> > Filter applied 

Indexing Korean

2020-05-01 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
 Hi All,

My team would like to index Korean, but it looks like Solr OOTB does not have 
explicit support for Korean. If any of you have schema pipelines you could 
share for your Korean documents, I would love to see them! I'm assuming I would 
just use some combination of the OOTB CJK factories

Best,
Audrey



RE: Possible issue with Stemming and nouns ended with suffix 'ion'

2020-05-01 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
Unless I'm misunderstanding the bug in question, there is no bug. What you are 
observing is simply just how things get stemmed...

Best,
Audrey

On 4/30/20, 6:37 PM, "Jhonny Lopez"  wrote:

Yes, sounds like worth it.

Thanks guys!

-Original Message-
From: Mike Drob 
Sent: jueves, 30 de abril de 2020 5:30 p. m.
To: solr-user@lucene.apache.org
Subject: Re: Possible issue with Stemming and nouns ended with suffix 'ion'

This email has been sent from a source external to Publicis Groupe. Please 
use caution when clicking links or opening attachments.
Cet email a été envoyé depuis une source externe à Publicis Groupe. 
Veuillez faire preuve de prudence lorsque vous cliquez sur des liens ou lorsque 
vous ouvrez des pièces jointes.



Is this worth filing a bug/suggestion to the folks over at snowballstem.org?

On Thu, Apr 30, 2020 at 4:08 PM Audrey Lorberfeld - 
audrey.lorberf...@ibm.com  wrote:

> I agree with Erick. I think that's just how the cookie crumbles when
> stemming. If you have some time on your hands, you can integrate
> OpenNLP with your Solr instance and start using the lemmas of tokens
> instead of the stems. In this case, I believe if you were to lemmatize
> both "identify" and "identification," they would both condense to 
"identify."
>
> Best,
> Audrey
>
> On 4/30/20, 3:54 PM, "Erick Erickson"  wrote:
>
> They are being stemmed to two different tokens, “identif” and
> “identifi”. Stemming is algorithmic and imperfect and in this case
> you’re getting bitten by that algorithm. It looks like you’re using
> PorterStemFilter, if you want you can look up the exact algorithm, but
> I don’t think it’s a bug, just one of those little joys of English...
>
> To get a clearer picture of exactly what’s being searched, try
> adding &debug=query to your query, in particular looking at the parsed
> query that’s returned. That’ll tell you a bunch. In this particular
> case I don’t think it’ll tell you anything more, but for future…
>
> Best,
> Erick
>
> On, and un-checking the ‘verbose’ box on the analysis page removes
> a lot of distraction, the detailed information is often TMI ;)
>
> > On Apr 30, 2020, at 2:51 PM, Jhonny Lopez <
> jhonny.lo...@publicismedia.com> wrote:
> >
> > Sure, rewriting the message with links for images:
> >
> >
> > We’re facing an issue with stemming in solr. Most of the cases
> are working correctly, for example, if we search for bidding, solr
> brings results for bidding, bid, bids, etc. However, with nouns ended 
with ‘ion’
> suffix, stemming is not working. Even when analyzers seems to have
> correct stemming of the word, the results are not reflecting that. One
> example. If I search ‘identifying’, this is the output:
> >
> > Analyzer (image link):
> >
> 
https://urldefense.proofpoint.com/v2/url?u=https-3A__1drv.ms_u_s-21AlRTlFq8tQbShd4-2DCp40Cmc0QioS0A-3Fe-3D1f3GJp&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=8Xt1N2A4ODj--DlLb242c8JMnJr6nWQIwcKjiDiA__s&s=U-Wmu118X5bfNxDnADO_6ompf9kUxZYHj1DZM2lG4jo&e=
> >
> > A clip of results:
> > "haschildren_b":false,
> >"isbucket_text_s":"0",
> >"sectionbody_t":"\n\n\nIn order to identify 1st price
> auctions, leverage the proprietary tools available or manually pull a
> log file report to understand the trends and gauge auction spread
> overtime to assess the impact of variable auction 
dynamics.\n\n\n\n\n\n\n",
> >"parsedupdatedby_s":"sitecorecarvaini",
> >"sectionbody_t_en":"\n\n\nIn order to identify 1st price
> auctions, leverage the proprietary tools available or manually pull a
> log file report to understand the trends and gauge auction spread
> overtime to assess the impact of variable auction 
dynamics.\n\n\n\n\n\n\n",
> >"hide_section_b":false
> >
> >
> > As you can see, it has used the stemming correctly and brings
> results for other words based in the root, in this case “Identify”.
> >
> > However, if I search for “Identification”, this is the output:
> >
> > Analyzer (imagelink):
> >
> 
https://urldefense.proofpoint.com/v2/url?u=https-3A__1drv.ms_u_s-21AlRTlFq8tQbShd49RpiQObzMgSjVhA&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=8Xt1N2A4ODj--DlLb242c8JMnJr6nWQIwcKjiDiA__s&s=5RlkLH-90sYc4nyIgnPO9MsBlyh7iWSOphEVdjUvTIE&e=
> >
> >
> > Even with proper stemming, solr is only bringing results for the
> word identification (or identifications) but nothing else.
> >
> > The queries are over the same field that

Re: Possible issue with Stemming and nouns ended with suffix 'ion'

2020-05-01 Thread Mike Drob
This is how things get stemmed *now*, but I believe there is an open
question as to whether that is how they *should* be stemmed. Specifically,
the case appears to be -ify words not stemming to the same as -ification -
this applies to much more than identify/identification. Also, justify,
fortify, notify, many many others.

$ grep ification /usr/share/dict/words | wc -l
 328

I am by no means an expert on stemming, and if the folks at snowball decide
to tell us that this change is bad or hard because it would overstem some
other words, then I'll happily accept that. But I definitely want to use
their expertise rather than relying on my own.

Mike

On Fri, May 1, 2020 at 10:35 AM Audrey Lorberfeld -
audrey.lorberf...@ibm.com  wrote:

> Unless I'm misunderstanding the bug in question, there is no bug. What you
> are observing is simply just how things get stemmed...
>
> Best,
> Audrey
>
> On 4/30/20, 6:37 PM, "Jhonny Lopez" 
> wrote:
>
> Yes, sounds like worth it.
>
> Thanks guys!
>
> -Original Message-
> From: Mike Drob 
> Sent: jueves, 30 de abril de 2020 5:30 p. m.
> To: solr-user@lucene.apache.org
> Subject: Re: Possible issue with Stemming and nouns ended with suffix
> 'ion'
>
> This email has been sent from a source external to Publicis Groupe.
> Please use caution when clicking links or opening attachments.
> Cet email a été envoyé depuis une source externe à Publicis Groupe.
> Veuillez faire preuve de prudence lorsque vous cliquez sur des liens ou
> lorsque vous ouvrez des pièces jointes.
>
>
>
> Is this worth filing a bug/suggestion to the folks over at
> snowballstem.org?
>
> On Thu, Apr 30, 2020 at 4:08 PM Audrey Lorberfeld -
> audrey.lorberf...@ibm.com  wrote:
>
> > I agree with Erick. I think that's just how the cookie crumbles when
> > stemming. If you have some time on your hands, you can integrate
> > OpenNLP with your Solr instance and start using the lemmas of tokens
> > instead of the stems. In this case, I believe if you were to
> lemmatize
> > both "identify" and "identification," they would both condense to
> "identify."
> >
> > Best,
> > Audrey
> >
> > On 4/30/20, 3:54 PM, "Erick Erickson" 
> wrote:
> >
> > They are being stemmed to two different tokens, “identif” and
> > “identifi”. Stemming is algorithmic and imperfect and in this case
> > you’re getting bitten by that algorithm. It looks like you’re using
> > PorterStemFilter, if you want you can look up the exact algorithm,
> but
> > I don’t think it’s a bug, just one of those little joys of English...
> >
> > To get a clearer picture of exactly what’s being searched, try
> > adding &debug=query to your query, in particular looking at the
> parsed
> > query that’s returned. That’ll tell you a bunch. In this particular
> > case I don’t think it’ll tell you anything more, but for future…
> >
> > Best,
> > Erick
> >
> > On, and un-checking the ‘verbose’ box on the analysis page
> removes
> > a lot of distraction, the detailed information is often TMI ;)
> >
> > > On Apr 30, 2020, at 2:51 PM, Jhonny Lopez <
> > jhonny.lo...@publicismedia.com> wrote:
> > >
> > > Sure, rewriting the message with links for images:
> > >
> > >
> > > We’re facing an issue with stemming in solr. Most of the cases
> > are working correctly, for example, if we search for bidding, solr
> > brings results for bidding, bid, bids, etc. However, with nouns
> ended with ‘ion’
> > suffix, stemming is not working. Even when analyzers seems to have
> > correct stemming of the word, the results are not reflecting that.
> One
> > example. If I search ‘identifying’, this is the output:
> > >
> > > Analyzer (image link):
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__1drv.ms_u_s-21AlRTlFq8tQbShd4-2DCp40Cmc0QioS0A-3Fe-3D1f3GJp&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=8Xt1N2A4ODj--DlLb242c8JMnJr6nWQIwcKjiDiA__s&s=U-Wmu118X5bfNxDnADO_6ompf9kUxZYHj1DZM2lG4jo&e=
> > >
> > > A clip of results:
> > > "haschildren_b":false,
> > >"isbucket_text_s":"0",
> > >"sectionbody_t":"\n\n\nIn order to identify 1st price
> > auctions, leverage the proprietary tools available or manually pull a
> > log file report to understand the trends and gauge auction spread
> > overtime to assess the impact of variable auction
> dynamics.\n\n\n\n\n\n\n",
> > >"parsedupdatedby_s":"sitecorecarvaini",
> > >"sectionbody_t_en":"\n\n\nIn order to identify 1st price
> > auctions, leverage the proprietary tools available or manually pull a
> > log file report to understand the trends and gauge auction spread
> > overtime to assess the impact of variable auction

Re: Possible issue with Stemming and nouns ended with suffix 'ion'

2020-05-01 Thread Walter Underwood
The Porter/Snowball stemmer is an evolved version of a forty year old hack.
It is neat that it works at all, but don’t expect too much. I think it is too 
aggressive
for search use.

What does KStem do with this? That is based on better linguistic models.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On May 1, 2020, at 8:45 AM, Mike Drob  wrote:
> 
> This is how things get stemmed *now*, but I believe there is an open
> question as to whether that is how they *should* be stemmed. Specifically,
> the case appears to be -ify words not stemming to the same as -ification -
> this applies to much more than identify/identification. Also, justify,
> fortify, notify, many many others.
> 
> $ grep ification /usr/share/dict/words | wc -l
> 328
> 
> I am by no means an expert on stemming, and if the folks at snowball decide
> to tell us that this change is bad or hard because it would overstem some
> other words, then I'll happily accept that. But I definitely want to use
> their expertise rather than relying on my own.
> 
> Mike
> 
> On Fri, May 1, 2020 at 10:35 AM Audrey Lorberfeld -
> audrey.lorberf...@ibm.com  wrote:
> 
>> Unless I'm misunderstanding the bug in question, there is no bug. What you
>> are observing is simply just how things get stemmed...
>> 
>> Best,
>> Audrey
>> 
>> On 4/30/20, 6:37 PM, "Jhonny Lopez" 
>> wrote:
>> 
>>Yes, sounds like worth it.
>> 
>>Thanks guys!
>> 
>>-Original Message-
>>From: Mike Drob 
>>Sent: jueves, 30 de abril de 2020 5:30 p. m.
>>To: solr-user@lucene.apache.org
>>Subject: Re: Possible issue with Stemming and nouns ended with suffix
>> 'ion'
>> 
>>This email has been sent from a source external to Publicis Groupe.
>> Please use caution when clicking links or opening attachments.
>>Cet email a été envoyé depuis une source externe à Publicis Groupe.
>> Veuillez faire preuve de prudence lorsque vous cliquez sur des liens ou
>> lorsque vous ouvrez des pièces jointes.
>> 
>> 
>> 
>>Is this worth filing a bug/suggestion to the folks over at
>> snowballstem.org?
>> 
>>On Thu, Apr 30, 2020 at 4:08 PM Audrey Lorberfeld -
>> audrey.lorberf...@ibm.com  wrote:
>> 
>>> I agree with Erick. I think that's just how the cookie crumbles when
>>> stemming. If you have some time on your hands, you can integrate
>>> OpenNLP with your Solr instance and start using the lemmas of tokens
>>> instead of the stems. In this case, I believe if you were to
>> lemmatize
>>> both "identify" and "identification," they would both condense to
>> "identify."
>>> 
>>> Best,
>>> Audrey
>>> 
>>> On 4/30/20, 3:54 PM, "Erick Erickson" 
>> wrote:
>>> 
>>>They are being stemmed to two different tokens, “identif” and
>>> “identifi”. Stemming is algorithmic and imperfect and in this case
>>> you’re getting bitten by that algorithm. It looks like you’re using
>>> PorterStemFilter, if you want you can look up the exact algorithm,
>> but
>>> I don’t think it’s a bug, just one of those little joys of English...
>>> 
>>>To get a clearer picture of exactly what’s being searched, try
>>> adding &debug=query to your query, in particular looking at the
>> parsed
>>> query that’s returned. That’ll tell you a bunch. In this particular
>>> case I don’t think it’ll tell you anything more, but for future…
>>> 
>>>Best,
>>>Erick
>>> 
>>>On, and un-checking the ‘verbose’ box on the analysis page
>> removes
>>> a lot of distraction, the detailed information is often TMI ;)
>>> 
 On Apr 30, 2020, at 2:51 PM, Jhonny Lopez <
>>> jhonny.lo...@publicismedia.com> wrote:
 
 Sure, rewriting the message with links for images:
 
 
 We’re facing an issue with stemming in solr. Most of the cases
>>> are working correctly, for example, if we search for bidding, solr
>>> brings results for bidding, bid, bids, etc. However, with nouns
>> ended with ‘ion’
>>> suffix, stemming is not working. Even when analyzers seems to have
>>> correct stemming of the word, the results are not reflecting that.
>> One
>>> example. If I search ‘identifying’, this is the output:
 
 Analyzer (image link):
 
>>> 
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__1drv.ms_u_s-21AlRTlFq8tQbShd4-2DCp40Cmc0QioS0A-3Fe-3D1f3GJp&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=8Xt1N2A4ODj--DlLb242c8JMnJr6nWQIwcKjiDiA__s&s=U-Wmu118X5bfNxDnADO_6ompf9kUxZYHj1DZM2lG4jo&e=
 
 A clip of results:
 "haschildren_b":false,
   "isbucket_text_s":"0",
   "sectionbody_t":"\n\n\nIn order to identify 1st price
>>> auctions, leverage the proprietary tools available or manually pull a
>>> log file report to understand the trends and gauge auction spread
>>> overtime to assess the impact of variable auction
>> dynamics.\n\n\n\n\n\n\n",
   "parsedupdatedby_s":"sitecorecarvaini",
   "sectionbody_t_en":"\n\n\nIn order to identify 1st price
>>> auctions, leverag

RE: Indexing Korean

2020-05-01 Thread Markus Jelsma
Hello,

Although it is not mentioned in Solr's language analysis page in the manual, 
Lucene has had support for Korean for quite a while now.

https://lucene.apache.org/core/8_5_0/analyzers-nori/index.html

Regards,
Markus

 
 
-Original message-
> From:Audrey Lorberfeld - audrey.lorberf...@ibm.com 
> Sent: Friday 1st May 2020 17:34
> To: solr-user@lucene.apache.org
> Subject: Indexing Korean
> 
>  Hi All,
> 
> My team would like to index Korean, but it looks like Solr OOTB does not have 
> explicit support for Korean. If any of you have schema pipelines you could 
> share for your Korean documents, I would love to see them! I'm assuming I 
> would just use some combination of the OOTB CJK factories
> 
> Best,
> Audrey
> 
> 


Solr 8.5.1 Using Port 10001 doesn't work in Dashboard

2020-05-01 Thread Phill Campbell
Today I installed Solr 8.5.1 to replace an 8.2.0 installation.
It is a clean install, not a migration, there was no data that I needed to keep.

I run Solr (Solr Cloud Mode) on ports starting with 10001. I have been doing 
this since Solr 5x releases.

In my experiment I have 1 shard with replication factor of 2.

http://10.xxx.xxx.xxx:10001/solr/#/ 

http://10.xxx.xxx.xxx:10002/solr/#/ 

If I go to the “10001” instance the URL changes and is messed up and no matter 
which link in the dashboard I click it shows the same information.
So, use Solr is running, the dashboard comes up.

The URL changes and looks like this:

http://10.xxx.xxx.xxx:10001/solr/#!/#%2F

However, on port 10002 it stays like this and show the proper UI in the 
dashboard:

http://10.xxx.xxx.xxx:10002/solr/#/ 

To make sure something wasn’t interfering with port 10001 I re-installed my 
previous Solr installation and it works fine.

What is this “#!” (Hash bang) stuff in the URL?
How can I run on port 10001?

Probably something obvious, but I just can’t see it.

For every link from the dashboard:
:10001/solr/#!/#%2F~logging
:10001/solr/#!/#%2F~cloud
:10001/solr/#!/#%2F~collections
:10001/solr/#!/#%2F~java-properties
:10001/solr/#!/#%2F~threads
:10001/solr/#!/#%2F~cluster-suggestions



From “10002” I see everything fine.
:10002/solr/#/~cloud

Shows the following:

Host
10.xxx.xxx.xxx
Linux 3.10.0-1127.el7.x86_64, 2cpu
Uptime: unknown
Memory: 14.8Gb
File descriptors: 180/100
Disk: 49.1Gb used: 5%
Load: 0

Node
10001_solr
Uptime: 2h 10m
Java 1.8.0_222
Solr 8.5.1
---
10002_solr
Uptime: 2h 9m
Java 1.8.0_222
Solr 8.5.1


If I switch my starting port from 10001 to 10002 both instances work. (10002, 
and 10003)
If I switch my starting port from 10001 to 10101 both instances work. (10101, 
and 10102)

Any help is appreciated.

Re: Solr 8.5.1 Using Port 10001 doesn't work in Dashboard

2020-05-01 Thread Phill Campbell
The browser is Chrome. I forgot to state that before.
That got me to thinking and so I ran it from Fire Fox.
Everything seems to be fine there! 

Interesting. Since this is my development environment I do not run any plugins 
on any of my browsers.

> On May 1, 2020, at 2:41 PM, Phill Campbell  
> wrote:
> 
> Today I installed Solr 8.5.1 to replace an 8.2.0 installation.
> It is a clean install, not a migration, there was no data that I needed to 
> keep.
> 
> I run Solr (Solr Cloud Mode) on ports starting with 10001. I have been doing 
> this since Solr 5x releases.
> 
> In my experiment I have 1 shard with replication factor of 2.
> 
> http://10.xxx.xxx.xxx:10001/solr/#/ 
> 
> http://10.xxx.xxx.xxx:10002/solr/#/ 
> 
> If I go to the “10001” instance the URL changes and is messed up and no 
> matter which link in the dashboard I click it shows the same information.
> So, use Solr is running, the dashboard comes up.
> 
> The URL changes and looks like this:
> 
> http://10.xxx.xxx.xxx:10001/solr/#!/#%2F
> 
> However, on port 10002 it stays like this and show the proper UI in the 
> dashboard:
> 
> http://10.xxx.xxx.xxx:10002/solr/#/ 
> 
> To make sure something wasn’t interfering with port 10001 I re-installed my 
> previous Solr installation and it works fine.
> 
> What is this “#!” (Hash bang) stuff in the URL?
> How can I run on port 10001?
> 
> Probably something obvious, but I just can’t see it.
> 
> For every link from the dashboard:
> :10001/solr/#!/#%2F~logging
> :10001/solr/#!/#%2F~cloud
> :10001/solr/#!/#%2F~collections
> :10001/solr/#!/#%2F~java-properties
> :10001/solr/#!/#%2F~threads
> :10001/solr/#!/#%2F~cluster-suggestions
> 
> 
> 
> From “10002” I see everything fine.
> :10002/solr/#/~cloud
> 
> Shows the following:
> 
> Host
> 10.xxx.xxx.xxx
> Linux 3.10.0-1127.el7.x86_64, 2cpu
> Uptime: unknown
> Memory: 14.8Gb
> File descriptors: 180/100
> Disk: 49.1Gb used: 5%
> Load: 0
> 
> Node
> 10001_solr
> Uptime: 2h 10m
> Java 1.8.0_222
> Solr 8.5.1
> ---
> 10002_solr
> Uptime: 2h 9m
> Java 1.8.0_222
> Solr 8.5.1
> 
> 
> If I switch my starting port from 10001 to 10002 both instances work. (10002, 
> and 10003)
> If I switch my starting port from 10001 to 10101 both instances work. (10101, 
> and 10102)
> 
> Any help is appreciated.



RE: Possible issue with Stemming and nouns ended with suffix 'ion'

2020-05-01 Thread Jhonny Lopez
I tried KStem but some other scenarios were broken, so I reverted id. However, 
it might have happened for some misconfiguration. I will try that once more.

Thanks.

  Jhonny Lopez
  Technical Architect
  Avenida Calle 26 No. 92 - 32, Edificio BTS3
  APDO. 128-1255 Bogota
  T: +57 300 6805461
  jhonny.lo...@prodigious.com
  www.prodigious.com



-Mensaje original-
De: Walter Underwood 
Enviado el: viernes, 1 de mayo de 2020 11:24 a.m.
Para: solr-user@lucene.apache.org
Asunto: Re: Possible issue with Stemming and nouns ended with suffix 'ion'

This email has been sent from a source external to Publicis Groupe. Please use 
caution when clicking links or opening attachments.
Cet email a été envoyé depuis une source externe à Publicis Groupe. Veuillez 
faire preuve de prudence lorsque vous cliquez sur des liens ou lorsque vous 
ouvrez des pièces jointes.



The Porter/Snowball stemmer is an evolved version of a forty year old hack.
It is neat that it works at all, but don’t expect too much. I think it is too 
aggressive for search use.

What does KStem do with this? That is based on better linguistic models.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On May 1, 2020, at 8:45 AM, Mike Drob  wrote:
>
> This is how things get stemmed *now*, but I believe there is an open
> question as to whether that is how they *should* be stemmed.
> Specifically, the case appears to be -ify words not stemming to the
> same as -ification - this applies to much more than
> identify/identification. Also, justify, fortify, notify, many many others.
>
> $ grep ification /usr/share/dict/words | wc -l
> 328
>
> I am by no means an expert on stemming, and if the folks at snowball
> decide to tell us that this change is bad or hard because it would
> overstem some other words, then I'll happily accept that. But I
> definitely want to use their expertise rather than relying on my own.
>
> Mike
>
> On Fri, May 1, 2020 at 10:35 AM Audrey Lorberfeld -
> audrey.lorberf...@ibm.com  wrote:
>
>> Unless I'm misunderstanding the bug in question, there is no bug.
>> What you are observing is simply just how things get stemmed...
>>
>> Best,
>> Audrey
>>
>> On 4/30/20, 6:37 PM, "Jhonny Lopez" 
>> wrote:
>>
>>Yes, sounds like worth it.
>>
>>Thanks guys!
>>
>>-Original Message-
>>From: Mike Drob 
>>Sent: jueves, 30 de abril de 2020 5:30 p. m.
>>To: solr-user@lucene.apache.org
>>Subject: Re: Possible issue with Stemming and nouns ended with
>> suffix 'ion'
>>
>>This email has been sent from a source external to Publicis Groupe.
>> Please use caution when clicking links or opening attachments.
>>Cet email a été envoyé depuis une source externe à Publicis Groupe.
>> Veuillez faire preuve de prudence lorsque vous cliquez sur des liens
>> ou lorsque vous ouvrez des pièces jointes.
>>
>>
>>
>>Is this worth filing a bug/suggestion to the folks over at
>> snowballstem.org?
>>
>>On Thu, Apr 30, 2020 at 4:08 PM Audrey Lorberfeld -
>> audrey.lorberf...@ibm.com  wrote:
>>
>>> I agree with Erick. I think that's just how the cookie crumbles when
>>> stemming. If you have some time on your hands, you can integrate
>>> OpenNLP with your Solr instance and start using the lemmas of tokens
>>> instead of the stems. In this case, I believe if you were to
>> lemmatize
>>> both "identify" and "identification," they would both condense to
>> "identify."
>>>
>>> Best,
>>> Audrey
>>>
>>> On 4/30/20, 3:54 PM, "Erick Erickson" 
>> wrote:
>>>
>>>They are being stemmed to two different tokens, “identif” and
>>> “identifi”. Stemming is algorithmic and imperfect and in this case
>>> you’re getting bitten by that algorithm. It looks like you’re using
>>> PorterStemFilter, if you want you can look up the exact algorithm,
>> but
>>> I don’t think it’s a bug, just one of those little joys of English...
>>>
>>>To get a clearer picture of exactly what’s being searched, try
>>> adding &debug=query to your query, in particular looking at the
>> parsed
>>> query that’s returned. That’ll tell you a bunch. In this particular
>>> case I don’t think it’ll tell you anything more, but for future…
>>>
>>>Best,
>>>Erick
>>>
>>>On, and un-checking the ‘verbose’ box on the analysis page
>> removes
>>> a lot of distraction, the detailed information is often TMI ;)
>>>
 On Apr 30, 2020, at 2:51 PM, Jhonny Lopez <
>>> jhonny.lo...@publicismedia.com> wrote:

 Sure, rewriting the message with links for images:


 We’re facing an issue with stemming in solr. Most of the cases
>>> are working correctly, for example, if we search for bidding, solr
>>> brings results for bidding, bid, bids, etc. However, with nouns
>> ended with ‘ion’
>>> suffix, stemming is not working. Even when analyzers seems to have
>>> correct stemming of the word, the results are not reflecting that.
>> One
>>> example. If I search ‘identifying’, this is the output:

 An

Re: Solr 8.5.1 Using Port 10001 doesn't work in Dashboard

2020-05-01 Thread Sylvain James
Hi Phil,

I encountered something similar recently, and after switched to Firefox,
all urls were fine.
May be a encoding side effect.
It seems to me that a new solr ui is in development. May be this issue will
be fixed for the release of this ui.

Sylvain


Le ven. 1 mai 2020 à 22:52, Phill Campbell 
a écrit :

> The browser is Chrome. I forgot to state that before.
> That got me to thinking and so I ran it from Fire Fox.
> Everything seems to be fine there!
>
> Interesting. Since this is my development environment I do not run any
> plugins on any of my browsers.
>
> > On May 1, 2020, at 2:41 PM, Phill Campbell 
> wrote:
> >
> > Today I installed Solr 8.5.1 to replace an 8.2.0 installation.
> > It is a clean install, not a migration, there was no data that I needed
> to keep.
> >
> > I run Solr (Solr Cloud Mode) on ports starting with 10001. I have been
> doing this since Solr 5x releases.
> >
> > In my experiment I have 1 shard with replication factor of 2.
> >
> > http://10.xxx.xxx.xxx:10001/solr/#/  >
> >
> > http://10.xxx.xxx.xxx:10002/solr/#/  >
> >
> > If I go to the “10001” instance the URL changes and is messed up and no
> matter which link in the dashboard I click it shows the same information.
> > So, use Solr is running, the dashboard comes up.
> >
> > The URL changes and looks like this:
> >
> > http://10.xxx.xxx.xxx:10001/solr/#!/#%2F
> 
> >
> > However, on port 10002 it stays like this and show the proper UI in the
> dashboard:
> >
> > http://10.xxx.xxx.xxx:10002/solr/#/  >
> >
> > To make sure something wasn’t interfering with port 10001 I re-installed
> my previous Solr installation and it works fine.
> >
> > What is this “#!” (Hash bang) stuff in the URL?
> > How can I run on port 10001?
> >
> > Probably something obvious, but I just can’t see it.
> >
> > For every link from the dashboard:
> > :10001/solr/#!/#%2F~logging
> > :10001/solr/#!/#%2F~cloud
> > :10001/solr/#!/#%2F~collections
> > :10001/solr/#!/#%2F~java-properties
> > :10001/solr/#!/#%2F~threads
> > :10001/solr/#!/#%2F~cluster-suggestions
> >
> >
> >
> > From “10002” I see everything fine.
> > :10002/solr/#/~cloud
> >
> > Shows the following:
> >
> > Host
> > 10.xxx.xxx.xxx
> > Linux 3.10.0-1127.el7.x86_64, 2cpu
> > Uptime: unknown
> > Memory: 14.8Gb
> > File descriptors: 180/100
> > Disk: 49.1Gb used: 5%
> > Load: 0
> >
> > Node
> > 10001_solr
> > Uptime: 2h 10m
> > Java 1.8.0_222
> > Solr 8.5.1
> > ---
> > 10002_solr
> > Uptime: 2h 9m
> > Java 1.8.0_222
> > Solr 8.5.1
> >
> >
> > If I switch my starting port from 10001 to 10002 both instances work.
> (10002, and 10003)
> > If I switch my starting port from 10001 to 10101 both instances work.
> (10101, and 10102)
> >
> > Any help is appreciated.
>
>


Re: Solr 8.5.1 Using Port 10001 doesn't work in Dashboard

2020-05-01 Thread Phill Campbell
Unless someone knows something concrete, I am going to move forward and assume 
that it is Google Chrome.
Thank you Sylvain.

> On May 1, 2020, at 3:42 PM, Sylvain James  > wrote:
> 
> Hi Phil,
> 
> I encountered something similar recently, and after switched to Firefox,
> all urls were fine.
> May be a encoding side effect.
> It seems to me that a new solr ui is in development. May be this issue will
> be fixed for the release of this ui.
> 
> Sylvain
> 
> 
> Le ven. 1 mai 2020 à 22:52, Phill Campbell  >
> a écrit :
> 
>> The browser is Chrome. I forgot to state that before.
>> That got me to thinking and so I ran it from Fire Fox.
>> Everything seems to be fine there!
>> 
>> Interesting. Since this is my development environment I do not run any
>> plugins on any of my browsers.
>> 
>>> On May 1, 2020, at 2:41 PM, Phill Campbell >> >
>> wrote:
>>> 
>>> Today I installed Solr 8.5.1 to replace an 8.2.0 installation.
>>> It is a clean install, not a migration, there was no data that I needed
>> to keep.
>>> 
>>> I run Solr (Solr Cloud Mode) on ports starting with 10001. I have been
>> doing this since Solr 5x releases.
>>> 
>>> In my experiment I have 1 shard with replication factor of 2.
>>> 
>>> http://10.xxx.xxx.xxx:10001/solr/#/  
>>> 
>>> 
>>> 
>>> http://10.xxx.xxx.xxx:10002/solr/#/  
>>> 
>>> 
>>> 
>>> If I go to the “10001” instance the URL changes and is messed up and no
>> matter which link in the dashboard I click it shows the same information.
>>> So, use Solr is running, the dashboard comes up.
>>> 
>>> The URL changes and looks like this:
>>> 
>>> http://10.xxx.xxx.xxx:10001/solr/#!/#%2F 
>>> 
>> > >
>>> 
>>> However, on port 10002 it stays like this and show the proper UI in the
>> dashboard:
>>> 
>>> http://10.xxx.xxx.xxx:10002/solr/#/  
>>> 
>>> 
>>> 
>>> To make sure something wasn’t interfering with port 10001 I re-installed
>> my previous Solr installation and it works fine.
>>> 
>>> What is this “#!” (Hash bang) stuff in the URL?
>>> How can I run on port 10001?
>>> 
>>> Probably something obvious, but I just can’t see it.
>>> 
>>> For every link from the dashboard:
>>> :10001/solr/#!/#%2F~logging
>>> :10001/solr/#!/#%2F~cloud
>>> :10001/solr/#!/#%2F~collections
>>> :10001/solr/#!/#%2F~java-properties
>>> :10001/solr/#!/#%2F~threads
>>> :10001/solr/#!/#%2F~cluster-suggestions
>>> 
>>> 
>>> 
>>> From “10002” I see everything fine.
>>> :10002/solr/#/~cloud
>>> 
>>> Shows the following:
>>> 
>>> Host
>>> 10.xxx.xxx.xxx
>>> Linux 3.10.0-1127.el7.x86_64, 2cpu
>>> Uptime: unknown
>>> Memory: 14.8Gb
>>> File descriptors: 180/100
>>> Disk: 49.1Gb used: 5%
>>> Load: 0
>>> 
>>> Node
>>> 10001_solr
>>> Uptime: 2h 10m
>>> Java 1.8.0_222
>>> Solr 8.5.1
>>> ---
>>> 10002_solr
>>> Uptime: 2h 9m
>>> Java 1.8.0_222
>>> Solr 8.5.1
>>> 
>>> 
>>> If I switch my starting port from 10001 to 10002 both instances work.
>> (10002, and 10003)
>>> If I switch my starting port from 10001 to 10101 both instances work.
>> (10101, and 10102)
>>> 
>>> Any help is appreciated.