[ANNOUNCE] Apache Solr 8.5.1 released

2020-04-16 Thread Ignacio Vera
## 16 April 2020, Apache Solr™ 8.5.1 available


The Lucene PMC is pleased to announce the release of Apache Solr 8.5.1.


Solr is the popular, blazing fast, open source NoSQL search platform from
the Apache Lucene project. Its major features include powerful full-text
search, hit highlighting, faceted search, dynamic clustering, database
integration, rich document handling, and geospatial search. Solr is highly
scalable, providing fault tolerant distributed search and indexing, and
powers the search and navigation features of many of the world's largest
internet sites.


Solr 8.5.1 is available for immediate download at:


  


### This release contains no change over 8.5.0 for Solr.


Solr 8.5.1 also includes and bugfixes in the corresponding Apache Lucene
release:


  


Note: The Apache Software Foundation uses an extensive mirroring network for

distributing releases. It is possible that the mirror you are using may not
have

replicated the release yet. If that is the case, please try another mirror.

This also applies to Maven access.


How to implement spellcheck for custom suggest component?

2020-04-16 Thread aTan
Hello.
I'm new to Solr and would be thankful for advice for the following case:
We have Suggest API running on production using Solr 6, which currently
prevent changes in the response and query parameters. That's why SpellCheck
component can't be used (parameter is custom, not 'q'or 'spellcheck.q').
I've tried to search for the solution, but many threads ends without any
clear answer.

To my understanding there is 2 main ways.
1. Combine default filters, to emulate spellcheck behavior. 
Question: which combination might give good enough result?
Advantage: will be very easy to integrate.
Disadvantage: the quality and flexibility will be not very good
2. Implement custom filter, inside which implement advanced spellcheck
functionality, using some open-source. 
Advantage: quality will be much higher
Disadvantage: "invention of the bicycle" and even add custom filter to the
production currently quite complicated.
3. Something else... open for suggestions :)

The expected behavior:
myrequestparam.q=iphon
suggest: iphone, iphone 8...

myrequestparam.q=iphonn
suggest: iphone, iphone 8...

If there is both cases possible and corrected suggestion is highly possible
along with original one, maybe put it with lower weight in the list. But the
response list should be the single entity (merged).

Thanks.



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


How to implement spellcheck for custom suggest component?

2020-04-16 Thread aTan
Hello.
I'm new to Solr and would be thankful for advice for the following case:
We have Suggest API running on production using Solr 6, which currently
prevent changes in the response and query parameters. That's why SpellCheck
component can't be used (parameter is custom, not 'q'or 'spellcheck.q').
I've tried to search for the solution, but many threads ends without any
clear answer.

To my understanding there is 2 main ways.
1. Combine default filters, to emulate spellcheck behavior. 
Question: which combination might give good enough result?
Advantage: will be very easy to integrate.
Disadvantage: the quality and flexibility will be not very good
2. Implement custom filter, inside which implement advanced spellcheck
functionality, using some open-source. 
Advantage: quality will be much higher
Disadvantage: "invention of the bicycle" and even add custom filter to the
production currently quite complicated.
3. Something else... open for suggestions :)

The expected behavior:
myrequestparam.q=iphon
suggest: iphone, iphone 8...

myrequestparam.q=iphonn
suggest: iphone, iphone 8...

If there is both cases possible and corrected suggestion is highly possible
along with original one, maybe put it with lower weight in the list. But the
response list should be the single entity (merged).

Thanks.



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: SOLR Cloud: 1500+ threads are in TIMED_WAITING status

2020-04-16 Thread raji
Hi,
Is there any solution found for this issue. We are using Solr 7.6 and
sometimes we do see lot of QTP threads  with the stack trace

sun.misc.Unsafe.park(Native method)
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:392)
org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:653)
org.eclipse.jetty.util.thread.QueuedThreadPool.access$800(QueuedThreadPool.java:48)
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:717)
java.lang.Thread.run(Thread.java:748)

Thanks,
Raji



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Unable to RESTORE collections via Collections API

2020-04-16 Thread Jan Høydahl
Thanks for sharing your workaround!

Agree that error message is extremely unhelpful :)
Would you care to report it as a bug in JIRA?

Jan

> 15. apr. 2020 kl. 21:54 skrev Eugene Livis :
> 
> In case this helps somebody in the future, given how completely unhelpful
> the Solr error message is - turns out the problem was occurring because in
> solrconfig.xml the  updateLog was disabled. I have enabled  updateLog the
> following way and "restore" operation started working:
> 
>
>  ${solr.ulog.dir:}
>   name="numVersionBuckets">${solr.ulog.numVersionBuckets:65536}
>
> 
> 
> On Wed, Apr 8, 2020 at 5:17 PM Eugene Livis  wrote:
> 
>> Hello,
>> 
>> I have been unsuccessfully trying to find a way to restore a collection
>> using the Collections API for the last several days. *I would
>> greatly appreciate any help as I am now stuck.* I am using Solr 8.2 in
>> cloud mode. To simplify things, I have only a single Solr node on my local
>> machine (though I have also tried multi-node setups). I am able to
>> successfully create collections, index documents, and run searches against
>> the index. General gist of what I am trying to do is to be able to backup
>> an existing collection, delete that collection, and then restore the
>> collection from backup when needed. Our use case requires creating
>> thousands of collections, which I believe is highly not recommended in Solr
>> Cloud (is that correct?), so this backup-delete-restore mechanism is our
>> way of reducing the number of collections in Solr at any given time. Just
>> FYI, in our use case it is ok for searches to take a couple of minutes to
>> produce results.
>> 
>> So far I have not been able to restore a single collection using the
>> collection RESTORE API. I am able to create backups by running the
>> following HTTP command in Solr admin console:
>> 
>> 
>> http://localhost:8983/solr/admin/collections?action=BACKUP&name=test1_backup2&collection=test1_20200407_170438_20200407_170441&location=C:\TEST\DELETE\BACKUPS
>> 
>> 
>> The backup appears to be successfully created. It contains a snapshot from
>> "shard1", which is the only shard in the cluster. It also contains the
>> "zk_backup" directory, which contains my Solr config and
>> "collection_state.json" file. And it contains a "backup.properties" file.
>> Everything looks good to me.
>> 
>> However, when I attempt to restore the collection using the following HTTP
>> command:
>> 
>> 
>> http://localhost:8983/solr/admin/collections?action=RESTORE&name=test1_backup2&location=C:\TEST\DELETE\BACKUPS&collection=test1_backup_NEW
>> 
>> 
>> I get the same extremely vague error messages in Solr logs:
>> 
>> 
>> *RequestHandlerBase org.apache.solr.common.SolrException: Error CREATEing
>> SolrCore 'test1_backup_NEW_shard1_replica_n1': Unable to create core
>> [test1_backup_NEW_shard1_replica_n1] Caused by: nullERROR
>> (qtp1099855928-24) [c:test1_backup_NEW   ] o.a.s.s.HttpSolrCall
>> null:org.apache.solr.common.SolrException: ADDREPLICA failed to create
>> replica   *
>> 
>> Below is the full Solr log of the restore command:
>> 
>> 2020-04-08 21:10:56.753 INFO  (qtp1099855928-24) [   ]
>> o.a.s.h.a.CollectionsHandler Invoked Collection Action :restore with params
>> name=test1_backup2&action=RESTORE&location=C:\TEST\DELETE\BACKUPS&collection=test1_backup_NEW
>> and sendToOCPQueue=true
>> 2020-04-08 21:10:56.783 INFO
>> (OverseerThreadFactory-9-thread-1-processing-n:localhost:8983_solr)
>> [c:test1_backup_NEW   ] o.a.s.c.a.c.RestoreCmd Using existing config
>> AutopsyConfig
>> 2020-04-08 21:10:56.783 INFO
>> (OverseerThreadFactory-9-thread-1-processing-n:localhost:8983_solr)
>> [c:test1_backup_NEW   ] o.a.s.c.a.c.RestoreCmd Starting restore into
>> collection=test1_backup_NEW with backup_name=test1_backup2 at
>> location=file:///C:/TEST/DELETE/BACKUPS/
>> 2020-04-08 21:10:56.784 INFO
>> (OverseerThreadFactory-9-thread-1-processing-n:localhost:8983_solr)
>> [c:test1_backup_NEW   ] o.a.s.c.a.c.CreateCollectionCmd Create collection
>> test1_backup_NEW
>> 2020-04-08 21:10:56.900 WARN
>> (OverseerThreadFactory-9-thread-1-processing-n:localhost:8983_solr)
>> [c:test1_backup_NEW   ] o.a.s.c.a.c.CreateCollectionCmd It is unusual to
>> create a collection (test1_backup_NEW) without cores.
>> 2020-04-08 21:10:56.908 INFO
>> (OverseerStateUpdate-72057883336638464-localhost:8983_solr-n_03) [
>>  ] o.a.s.c.o.SliceMutator Update shard state invoked for collection:
>> test1_backup_NEW with message: {
>>  "shard1":"construction",
>>  "collection":"test1_backup_NEW",
>>  "operation":"updateshardstate"}
>> 2020-04-08 21:10:56.908 INFO
>> (OverseerStateUpdate-72057883336638464-localhost:8983_solr-n_03) [
>>  ] o.a.s.c.o.SliceMutator Update shard state shard1 to construction
>> 2020-04-08 21:10:56.925 INFO
>> (OverseerThreadFactory-9-thread-1-processing-n:localhost:8983_solr)
>> [c:test1_backup_NEW   ] o.a.s.c.a.c.RestoreCmd Adding replica for
>> shard=shard1
>> collection=DocCollection(test1_backup_

404 response from Schema API

2020-04-16 Thread Mark H. Wood
I need to ask Solr 4.10 for the name of the unique key field of a
schema.  So far, no matter what I've done, Solr is returning a 404.

This works:

  curl 'https://toolshed.wood.net:8443/isw6_3/solr/statistics/select'

This gets a 404:

  curl 'https://toolshed.wood.net:8443/isw6_3/solr/statistics/schema/uniquekey'

So does this:

  curl 'https://toolshed.wood.net:8443/isw6_3/solr/statistics/schema'

We normally use the ClassicIndexSchemaFactory.  I tried switching to
ManagedIndexSchemaFactory but it made no difference.  Nothing is
logged for the failed requests.

Ideas?

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu


signature.asc
Description: PGP signature


Re: facets & docValues

2020-04-16 Thread Revas
Hi Erick, You are correct, we have only about 1.8M documents so far and
turning on the indexing on the facet fields helped improve the timings of
the facet query a lot which has (sub facets and facet queries). So does
docValues help at all for sub facets and facet query, our tests
revealed further query time improvement when we turned off the docValues.
is that the right approach?

Currently we have only 1 shard and  we are thinking of scaling by
increasing the number of shards when we see a deterioration on query time.
Any suggestions?

Thanks.


On Wed, Apr 15, 2020 at 8:21 AM Erick Erickson 
wrote:

> In a word, “yes”. I also suspect your corpus isn’t very big.
>
> I think the key is the facet queries. Now, I’m talking from
> theory rather than diving into the code, but querying on
> a docValues=true, indexed=false field is really doing a
> search. And searching on a field like that is effectively
> analogous to a table scan. Even if somehow an internal
> structure would be constructed to deal with it, it would
> probably be on the heap, where you don’t want it.
>
> So the test would be to take the queries out and measure
> performance, but I think that’s the root issue here.
>
> Best,
> Erick
>
> > On Apr 14, 2020, at 11:51 PM, Revas  wrote:
> >
> > We have faceting fields that have been defined as indexed=false,
> > stored=false and docValues=true
> >
> > However we use a lot of subfacets  using  json facets and facet ranges
> > using facet.queries. We see that after every soft-commit our performance
> > worsens and performs ideal between commits
> >
> > how is that docValue fields are affected by soft-commit and do we need to
> > enable indexing if we use subfacets and facet query to improve
> performance?
> >
> > Tha
>
>


Re: facets & docValues

2020-04-16 Thread Erick Erickson
DocValues should help when faceting over fields, i.e. facet.field=blah.

I would expect docValues to help with sub facets and, but don’t know
the code well enough to say definitely one way or the other.

The empirical approach would be to set “uninvertible=true” (Solr 7.6) and
turn docValues off. What that means is that if any operation tries to uninvert
the index on the Java heap, you’ll get an exception like:
"can not sort on a field w/o docValues unless it is indexed=true 
uninvertible=true and the type supports Uninversion:”

See SOLR-12962

Speed is only one issue. The entire point of docValues is to not “uninvert”
the field on the heap. This used to lead to very significant memory
pressure. So when turning docValues off, you run the risk of 
reverting back to the old behavior and having unexpected memory
consumption, not to mention slowdowns when the uninversion
takes place.

Also, unless your documents are very large, this is a tiny corpus. It can be
quite hard to get realistic numbers, the signal gets lost in the noise.

You should only shard when your individual query times exceed your
requirement. Say you have a 95%tile requirement of 1 second response time.

Let’s further say that you can meet that requirement with 50 queries/second,
but when you get to 75 queries/second your response time exceeds your 
requirements. Do NOT shard at this point. Add another replica instead.
Sharding adds inevitable overhead and should only be considered when
you can’t get adequate response time even under fairly light query loads
as a general rule.

Best,
Erick

> On Apr 16, 2020, at 12:08 PM, Revas  wrote:
> 
> Hi Erick, You are correct, we have only about 1.8M documents so far and
> turning on the indexing on the facet fields helped improve the timings of
> the facet query a lot which has (sub facets and facet queries). So does
> docValues help at all for sub facets and facet query, our tests
> revealed further query time improvement when we turned off the docValues.
> is that the right approach?
> 
> Currently we have only 1 shard and  we are thinking of scaling by
> increasing the number of shards when we see a deterioration on query time.
> Any suggestions?
> 
> Thanks.
> 
> 
> On Wed, Apr 15, 2020 at 8:21 AM Erick Erickson 
> wrote:
> 
>> In a word, “yes”. I also suspect your corpus isn’t very big.
>> 
>> I think the key is the facet queries. Now, I’m talking from
>> theory rather than diving into the code, but querying on
>> a docValues=true, indexed=false field is really doing a
>> search. And searching on a field like that is effectively
>> analogous to a table scan. Even if somehow an internal
>> structure would be constructed to deal with it, it would
>> probably be on the heap, where you don’t want it.
>> 
>> So the test would be to take the queries out and measure
>> performance, but I think that’s the root issue here.
>> 
>> Best,
>> Erick
>> 
>>> On Apr 14, 2020, at 11:51 PM, Revas  wrote:
>>> 
>>> We have faceting fields that have been defined as indexed=false,
>>> stored=false and docValues=true
>>> 
>>> However we use a lot of subfacets  using  json facets and facet ranges
>>> using facet.queries. We see that after every soft-commit our performance
>>> worsens and performs ideal between commits
>>> 
>>> how is that docValue fields are affected by soft-commit and do we need to
>>> enable indexing if we use subfacets and facet query to improve
>> performance?
>>> 
>>> Tha
>> 
>> 



Re: 404 response from Schema API

2020-04-16 Thread Erick Erickson
Assuming isw6_3 is your collection name, you have
“solr” and “isw6_3” reversed in the URL.

Should be something like:
https://toolshed.wood.net:8443/solr/isw6_3/schema/uniquekey

If that’s not the case you need to mention your collection. But in
either case your collection name comes after /solr/.

Best,
Erick

> On Apr 16, 2020, at 12:07 PM, Mark H. Wood  wrote:
> 
> I need to ask Solr 4.10 for the name of the unique key field of a
> schema.  So far, no matter what I've done, Solr is returning a 404.
> 
> This works:
> 
>  curl 'https://toolshed.wood.net:8443/isw6_3/solr/statistics/select'
> 
> This gets a 404:
> 
>  curl 'https://toolshed.wood.net:8443/isw6_3/solr/statistics/schema/uniquekey'
> 
> So does this:
> 
>  curl 'https://toolshed.wood.net:8443/isw6_3/solr/statistics/schema'
> 
> We normally use the ClassicIndexSchemaFactory.  I tried switching to
> ManagedIndexSchemaFactory but it made no difference.  Nothing is
> logged for the failed requests.
> 
> Ideas?
> 
> -- 
> Mark H. Wood
> Lead Technology Analyst
> 
> University Library
> Indiana University - Purdue University Indianapolis
> 755 W. Michigan Street
> Indianapolis, IN 46202
> 317-274-0749
> www.ulib.iupui.edu



Re: 404 response from Schema API

2020-04-16 Thread Mark H. Wood
On Thu, Apr 16, 2020 at 02:00:06PM -0400, Erick Erickson wrote:
> Assuming isw6_3 is your collection name, you have
> “solr” and “isw6_3” reversed in the URL.

No.  Solr's context is '/isw6_3/solr' and the core is 'statistics'.

> Should be something like:
> https://toolshed.wood.net:8443/solr/isw6_3/schema/uniquekey
> 
> If that’s not the case you need to mention your collection. But in
> either case your collection name comes after /solr/.

Thank you.  I think that's what I have now.

> > On Apr 16, 2020, at 12:07 PM, Mark H. Wood  wrote:
> > 
> > I need to ask Solr 4.10 for the name of the unique key field of a
> > schema.  So far, no matter what I've done, Solr is returning a 404.
> > 
> > This works:
> > 
> >  curl 'https://toolshed.wood.net:8443/isw6_3/solr/statistics/select'
> > 
> > This gets a 404:
> > 
> >  curl 
> > 'https://toolshed.wood.net:8443/isw6_3/solr/statistics/schema/uniquekey'
> > 
> > So does this:
> > 
> >  curl 'https://toolshed.wood.net:8443/isw6_3/solr/statistics/schema'
> > 
> > We normally use the ClassicIndexSchemaFactory.  I tried switching to
> > ManagedIndexSchemaFactory but it made no difference.  Nothing is
> > logged for the failed requests.
> > 
> > Ideas?
> > 
> > -- 
> > Mark H. Wood
> > Lead Technology Analyst
> > 
> > University Library
> > Indiana University - Purdue University Indianapolis
> > 755 W. Michigan Street
> > Indianapolis, IN 46202
> > 317-274-0749
> > www.ulib.iupui.edu
> 
> 

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu


signature.asc
Description: PGP signature


Is Banana deprecated?

2020-04-16 Thread S G
Hello,

I still see releases happening on it:
https://github.com/lucidworks/banana/pull/355

So it is something recommended to be used for production?

Regards,
SG