Re: 'velocity' does not exist . Do an 'create-queryresponsewriter' , if you want to create it

2020-05-19 Thread Erik Hatcher
Need to also make sure the velocity writer and dependencies are ’d in in 
solrconfig.xml

> On May 19, 2020, at 02:30, Prakhar Kumar  
> wrote:
> 
> Hello Team,
> 
> I am using Solr 8.5.0 and here is the full log for the error which I am
> getting:
> 
> SolrConfigHandler Error checking plugin :  =>
> org.apache.solr.common.SolrException: Error loading class
> 'solr.VelocityResponseWriter'
> @40005ec3702b3710a43c at
> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:570)
> @40005ec3702b3710a824 org.apache.solr.common.SolrException: Error
> loading class 'solr.VelocityResponseWriter'
> @40005ec3702b3710ac0c at
> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:570)
> ~[?:?]
> @40005ec3702b3710f25c at
> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:501)
> ~[?:?]
> @40005ec3702b3710f644 at
> org.apache.solr.core.SolrCore.createInstance(SolrCore.java:824) ~[?:?]
> @40005ec3702b3710f644 at
> org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:880) ~[?:?]
> @40005ec3702b3710fa2c at
> org.apache.solr.handler.SolrConfigHandler$Command.verifyClass(SolrConfigHandler.java:601)
> ~[?:?]
> @40005ec3702b371105e4 at
> org.apache.solr.handler.SolrConfigHandler$Command.updateNamedPlugin(SolrConfigHandler.java:565)
> ~[?:?]
> @40005ec3702b371105e4 at
> org.apache.solr.handler.SolrConfigHandler$Command.handleCommands(SolrConfigHandler.java:502)
> ~[?:?]
> @40005ec3702b3711196c at
> org.apache.solr.handler.SolrConfigHandler$Command.handlePOST(SolrConfigHandler.java:363)
> ~[?:?]
> @40005ec3702b3711196c at
> org.apache.solr.handler.SolrConfigHandler$Command.access$100(SolrConfigHandler.java:161)
> ~[?:?]
> @40005ec3702b37111d54 at
> org.apache.solr.handler.SolrConfigHandler.handleRequestBody(SolrConfigHandler.java:139)
> ~[?:?]
> @40005ec3702b3711213c at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)
> ~[?:?]
> @40005ec3702b3711290c at
> org.apache.solr.core.SolrCore.execute(SolrCore.java:2596) ~[?:?]
> @40005ec3702b37112cf4 at
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:802) ~[?:?]
> @40005ec3702b37112cf4 at
> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:579) ~[?:?]
> @40005ec3702b371130dc at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:420)
> ~[?:?]
> @40005ec3702b37115404 at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:352)
> ~[?:?]
> @40005ec3702b371157ec at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1596)
> ~[jetty-servlet-9.4.24.v20191120.jar:9.4.24.v20191120]
> @40005ec3702b371157ec at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545)
> ~[jetty-servlet-9.4.24.v20191120.jar:9.4.24.v20191120]
> @40005ec3702b3711678c at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
> ~[jetty-server-9.4.24.v20191120.jar:9.4.24.v20191120]
> @40005ec3702b3711678c at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:590)
> ~[jetty-security-9.4.24.v20191120.jar:9.4.24.v20191120]
> @40005ec3702b37116b74 at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
> ~[jetty-server-9.4.24.v20191120.jar:9.4.24.v20191120]
> @40005ec3702b37117344 at
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)
> ~[jetty-server-9.4.24.v20191120.jar:9.4.24.v20191120]
> @40005ec3702b3711772c at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1607)
> ~[jetty-server-9.4.24.v20191120.jar:9.4.24.v20191120]
> @40005ec3702b37117b14 at
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
> ~[jetty-server-9.4.24.v20191120.jar:9.4.24.v20191120]
> @40005ec3702b371182e4 at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1297)
> ~[jetty-server-9.4.24.v20191120.jar:9.4.24.v20191120]
> @40005ec3702b37119284 at
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
> ~[jetty-server-9.4.24.v20191120.jar:9.4.24.v20191120]
> @40005ec3702b37119284 at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485)
> ~[jetty-servlet-9.4.24.v20191120.jar:9.4.24.v20191120]
> @40005ec3702b3711966c at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1577)
> ~[jetty-server-9.4.24.v20191120.jar:9.4.24.v20191120]
> @40005ec3702b3711a224 at
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
> ~[jetty-server-9.4.24.v20191120.jar:9.4.24.v20191120]
> @40005ec3702b3711a60c at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1212)
> ~[jetty-server-9.4.24.v20191120.jar:9.4.24.v20191120]
> @40005ec3702b3711a9f4 at
> org.eclipse.jetty.server.hand

Upgrade 5.5.5 to 8.5.1 / Segment stucked in lucene v6

2020-05-19 Thread VILA Jean-Louis
Dear all,

We start to upgrade a huge SolrCloud cluster from 5.4.1 to lastest version 
8.5.1.
Context :
. Ubuntu 16.04, 64b, JVM Oracle 8 101 and now OpenJDK 8 252
. We can't reindex documents because old ones doesn't exist anymore, so no 
other choices than upgrading indexes.

Our upgrading strategy is based on indexUpgrader Tool.
5.4.1 -> 5.5.5 : Ok
5.5.5 -> 6.6.6 : Ok
6.6.6 -> 7.7.3 : ok
Unable to upgrade 7.7.3 to 8.5.1 : here my problem using 8.5.1, 
indexUpgrader :

Exception in thread "main" org.apache.lucene.index.IndexFormatTooOldException: 
Format version is not supported (resource 
BufferedChecksumIndexInput(MMapIndexInput(path="/data2/solr/nodes/node1/solr/insight_dw_shard3_replica_n69/data/index/segments_2nz0"))):
 This index was initially created with Lucene 6.x while the current version is 
8.5.1 and Lucene only supports reading the current and previous major 
versions.. This version of Lucene only supports indexes created with release 
7.0 and later.
at 
org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:318)
at 
org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:289)
at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:432)
at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:429)
at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:680)
at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:632)
at 
org.apache.lucene.index.SegmentInfos.readLatestCommit(SegmentInfos.java:434)
at 
org.apache.lucene.index.DirectoryReader.listCommits(DirectoryReader.java:285)
at org.apache.lucene.index.IndexUpgrader.upgrade(IndexUpgrader.java:158)
at org.apache.lucene.index.IndexUpgrader.main(IndexUpgrader.java:78)

But when I check the index version with 7.7.3, the segment seems to be 7.7.3!
0.00% total deletions; 50756501 documents; 0 deleteions
Segments file=segments_2nz0 numSegments=1 version=7.7.3 
id=ay2stfke7hwy9gippl8k77tdd userData={commitTimeMSec=1589314850951}
  1 of 1: name=_2rr9t maxDoc=50756501
version=7.7.3
id=9pubpiwgt38rzyxr7litvgcu5
codec=Lucene70
compound=false
numFiles=10
size (MB)=338,143.905
diagnostics = {os=Linux, java.vendor=Oracle Corporation, 
java.version=1.8.0_101, java.vm.version=25.101-b13, lucene.version=7.7.3, 
mergeMaxNumSegments=1, os.arch=amd64, java.runtime.version=1.8.0_101-b13, 
source=merge, mergeFactor=2, os.version=3.13.0-147-generic, 
timestamp=1589484981711}
no deletions
test: open reader.OK [took 2.779 sec]

When I read the different thread, some people say that when a segment is 
"marked as v6 lucene index", this mark remains across upgrading, so we are 
stucked in 7.7.3 version.

What are my options?

Many many thanks for your help,
Jean-Louis



Jean-Louis Vila, PhD
Directeur technique
Sword SAS

d +33 4 72 85 37 60
m+33 6 17 81 14 69
t  +33 4 72 85 37 40
e 
jean-louis.v...@sword-group.com

9 avenue Charles de Gaulle
69771, Saint Didier au Mont d'Or
France
www.sword-group.com
P Pensez à l'environnement avant d'imprimer ce message /  Please consider the 
environment before printing this mail note.
Ce message et toutes les pièces jointes (ci-après le "message") sont établis à 
l'intention exclusive de ses destinataires et sont confidentiels. Si vous 
recevez ce message par erreur, merci de le détruire et d'en avertir 
immédiatement l'expéditeur. Toute utilisation de ce message non conforme à sa 
destination, toute diffusion ou toute publication, totale ou partielle, est 
interdite, sauf autorisation expresse. Internet ne permettant pas d'assurer 
l'intégrité de ce message, le Groupe Sword (et ses filiales) décline(nt) toute 
responsabilité au titre de ce message, dans l'hypothèse où il aurait été 
modifié, altéré ou falsifié. Le Groupe Sword vous remercie de votre attention.



Shingles behavior

2020-05-19 Thread Radu Gheorghe
Hello Solr users,

I’m quite puzzled about how shingles work. The way tokens are analysed looks 
fine to me, but the query seems too restrictive.

Here’s the sample use-case. I have three documents:

mona lisa smile
mona lisa
mona

I have a shingle filter set up like this (both index- and query-time):

>  maxShingleSize=“4”/>

When I query for “Mona Lisa smile” (no quotes), I expect to get all three 
documents back, in that order. Because the first document matches all the terms:

mona
mona lisa
mona lisa smile
lisa
lisa smile
smile

And the second one matches only some, and the third document only matches one.

Instead, I only get the first document back. That’s because the query expects 
all the “words” to match:

> "parsedquery":"+DisjunctionMaxQuery+shingle_field:mona 
> +usage_query_view_tags:lisa +shingle_field:smile) (+shingle_field:mona 
> +shingle_field:lisa smile) (+shingle_field:mona lisa +shingle_field:smile) 
> shingle_field:mona lisa smile)))”,

The query above is generated by the Edismax query parser, when I’m using 
“shingle_field” as “df”.

Is there a way to get “any of the words” to match? I’ve tried all the options I 
can think of:
- different query parsers
- q.OP=OR
- mm=0 (or 1 or 0% or 10% or…)

Nothing seems to change the parsed query from the above.

I’ve compared this to the behaviour of Elasticsearch. There, I get “OR” by 
default, and minimum_should_match works as expected. The only difference I see 
between the two, on the analysis side, is that tokens start at 0 in 
Elasticsearch and at 1 in Solr. I doubt that’s the problem, because I see that 
the default “text_en”, for example, also starts at position 1.

Is it just a bug that mm doesn’t work in the context of shingles? Or is there a 
workaround?

Thanks and best regards,
Radu

REINDEXCOLLECTION not working on an alias

2020-05-19 Thread Bjarke Buur Mortensen
Hi list,

I seem to be unable to get REINDEXCOLLECTION to work on a collection alias
(running Solr 8.2.0). The documentation seems to state that that should be
possible:
https://lucene.apache.org/solr/guide/8_2/collection-management.html#reindexcollection
"name
Source collection name, may be an alias. This parameter is required."

If I run on my alias (qa_supplier_products):
curl "
http://localhost:8983/solr/admin/collections?action=REINDEXCOLLECTION&name=qa_supplier_products&numShards=1&cmd=start
I get an error:
"org.apache.solr.common.SolrException: Unable to copy documents from
qa_supplier_products to .rx_qa_supplier_products_6:
{\"result-set\":{\"docs\":[\n
 {\"DaemonOp\":\"Deamon:.rx_qa_supplier_products_6 started on
.rx_qa_supplier_products_0_shard1_replica_n1\"

If I instead point to the underlying collection, everything works fine. Now
I have an alias pointing to an alias, which works, but ideally I would like
to just have my main alias point to the newly reindexed collection.

Can anybody help me out here?

Thanks,
/Bjarke


Re: Upgrade 5.5.5 to 8.5.1 / Segment stucked in lucene v6

2020-05-19 Thread Erick Erickson
This will not work. Lucene has never promised this upgrade path would work, the 
“one major version back-compat” means that Lucene X has special handling for 
X-1, but for X-2, all bets are off. Starting with Solr 6, a marker is written 
into the segments recording the version of Lucene the segment was written with. 
That marker is preserved through all merges/upgrades/whatever.

Starting with Lucene 8, if any segment has a marker for Lucene 6 (or no marker 
at all for earlier versions), then Lucene will refuse to open the index.

IndexUpgraderTool and the like simply cannot synthesize the new index format, 
the most succinct explanation I’ve seen is from Robert Muir:

“I think the key issue here is Lucene is an index not a database. Because it is 
a lossy index and does not retain all of the user's data, its not possible to 
safely migrate some things automagically. In the norms case IndexWriter needs 
to re-analyze the text ("re-index") and compute stats to get back the value, so 
it can be re-encoded. The function is y = f(x) and if x is not available its 
not possible, so lucene can't do it.”

So you’ll have to re-index your corpus with Solr 8 I’m afraid.

 Best,
Erick


> On May 19, 2020, at 4:19 AM, VILA Jean-Louis 
>  wrote:
> 
> Dear all,
> 
> We start to upgrade a huge SolrCloud cluster from 5.4.1 to lastest version 
> 8.5.1.
>Context :
> . Ubuntu 16.04, 64b, JVM Oracle 8 101 and now OpenJDK 8 252
> . We can't reindex documents because old ones doesn't exist anymore, so no 
> other choices than upgrading indexes.
> 
> Our upgrading strategy is based on indexUpgrader Tool.
>5.4.1 -> 5.5.5 : Ok
>5.5.5 -> 6.6.6 : Ok
>6.6.6 -> 7.7.3 : ok
>Unable to upgrade 7.7.3 to 8.5.1 : here my problem using 
> 8.5.1, indexUpgrader :
> 
> Exception in thread "main" 
> org.apache.lucene.index.IndexFormatTooOldException: Format version is not 
> supported (resource 
> BufferedChecksumIndexInput(MMapIndexInput(path="/data2/solr/nodes/node1/solr/insight_dw_shard3_replica_n69/data/index/segments_2nz0"))):
>  This index was initially created with Lucene 6.x while the current version 
> is 8.5.1 and Lucene only supports reading the current and previous major 
> versions.. This version of Lucene only supports indexes created with release 
> 7.0 and later.
>at 
> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:318)
>at 
> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:289)
>at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:432)
>at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:429)
>at 
> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:680)
>at 
> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:632)
>at 
> org.apache.lucene.index.SegmentInfos.readLatestCommit(SegmentInfos.java:434)
>at 
> org.apache.lucene.index.DirectoryReader.listCommits(DirectoryReader.java:285)
>at 
> org.apache.lucene.index.IndexUpgrader.upgrade(IndexUpgrader.java:158)
>at org.apache.lucene.index.IndexUpgrader.main(IndexUpgrader.java:78)
> 
> But when I check the index version with 7.7.3, the segment seems to be 7.7.3!
> 0.00% total deletions; 50756501 documents; 0 deleteions
> Segments file=segments_2nz0 numSegments=1 version=7.7.3 
> id=ay2stfke7hwy9gippl8k77tdd userData={commitTimeMSec=1589314850951}
>  1 of 1: name=_2rr9t maxDoc=50756501
>version=7.7.3
>id=9pubpiwgt38rzyxr7litvgcu5
>codec=Lucene70
>compound=false
>numFiles=10
>size (MB)=338,143.905
>diagnostics = {os=Linux, java.vendor=Oracle Corporation, 
> java.version=1.8.0_101, java.vm.version=25.101-b13, lucene.version=7.7.3, 
> mergeMaxNumSegments=1, os.arch=amd64, java.runtime.version=1.8.0_101-b13, 
> source=merge, mergeFactor=2, os.version=3.13.0-147-generic, 
> timestamp=1589484981711}
>no deletions
>test: open reader.OK [took 2.779 sec]
> 
> When I read the different thread, some people say that when a segment is 
> "marked as v6 lucene index", this mark remains across upgrading, so we are 
> stucked in 7.7.3 version.
> 
> What are my options?
> 
> Many many thanks for your help,
> Jean-Louis
> 
> 
> 
> Jean-Louis Vila, PhD
> Directeur technique
> Sword SAS
> 
> d +33 4 72 85 37 60
> m+33 6 17 81 14 69
> t  +33 4 72 85 37 40
> e 
> jean-louis.v...@sword-group.com
> 
> 9 avenue Charles de Gaulle
> 69771, Saint Didier au Mont d'Or
> France
> www.sword-group.com
> P Pensez à l'environnement avant d'imprimer ce message /  Please consider the 
> environment before printing this mail note.
> Ce message et toutes les pièces jointes (ci-après le "message") sont établis 
> à l'intention exclusive de ses destinataires et sont confidentiels. Si vous 
> recevez ce message pa

Re: Fetch related documents from Custom Function

2020-05-19 Thread mganeshs
Solr Experts, any easy way for reading other solr docs ( other docs ) from
solr custom function ? 



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


RE: Upgrade 5.5.5 to 8.5.1 / Segment stucked in lucene v6

2020-05-19 Thread VILA Jean-Louis
Many thanks for your answers Erik. 

Effectively, I've read this into many different threads that the migration path 
will not be guaranteed but, what's strange is that there's no formal 
information on this impossibility because clearly we can't migrate to v8 if 
indexes are not "pure" v7 indexes. I understand reason (y =f(x)) but al least a 
simple documentation about the fact that a Lucene 6 segments can't be upgrade 
into Lucene 8 would be appreciate.

More, the check tool just shows v7.7.3 index and there is no mention about 
"real" segment version which v6! So forbid to open v7 lucene indexes upgraded 
from v6, is quiet brutal and the rule about that we can migrate only from 
previous major version is not completely true :-(
I'll stay into v7.7.3

Thanks again,
Jean-Louis

-Original Message-
From: Erick Erickson  
Sent: mardi 19 mai 2020 15:00
To: solr-user@lucene.apache.org
Subject: Re: Upgrade 5.5.5 to 8.5.1 / Segment stucked in lucene v6

This will not work. Lucene has never promised this upgrade path would work, the 
“one major version back-compat” means that Lucene X has special handling for 
X-1, but for X-2, all bets are off. Starting with Solr 6, a marker is written 
into the segments recording the version of Lucene the segment was written with. 
That marker is preserved through all merges/upgrades/whatever.

Starting with Lucene 8, if any segment has a marker for Lucene 6 (or no marker 
at all for earlier versions), then Lucene will refuse to open the index.

IndexUpgraderTool and the like simply cannot synthesize the new index format, 
the most succinct explanation I’ve seen is from Robert Muir:

“I think the key issue here is Lucene is an index not a database. Because it is 
a lossy index and does not retain all of the user's data, its not possible to 
safely migrate some things automagically. In the norms case IndexWriter needs 
to re-analyze the text ("re-index") and compute stats to get back the value, so 
it can be re-encoded. The function is y = f(x) and if x is not available its 
not possible, so lucene can't do it.”

So you’ll have to re-index your corpus with Solr 8 I’m afraid.

 Best,
Erick


> On May 19, 2020, at 4:19 AM, VILA Jean-Louis 
>  wrote:
> 
> Dear all,
> 
> We start to upgrade a huge SolrCloud cluster from 5.4.1 to lastest version 
> 8.5.1.
>Context :
> . Ubuntu 16.04, 64b, JVM Oracle 8 101 and now OpenJDK 8 252 . We can't 
> reindex documents because old ones doesn't exist anymore, so no other choices 
> than upgrading indexes.
> 
> Our upgrading strategy is based on indexUpgrader Tool.
>5.4.1 -> 5.5.5 : Ok
>5.5.5 -> 6.6.6 : Ok
>6.6.6 -> 7.7.3 : ok
>Unable to upgrade 7.7.3 to 8.5.1 : here my problem using 
> 8.5.1, indexUpgrader :
> 
> Exception in thread "main" 
> org.apache.lucene.index.IndexFormatTooOldException: Format version is not 
> supported (resource 
> BufferedChecksumIndexInput(MMapIndexInput(path="/data2/solr/nodes/node1/solr/insight_dw_shard3_replica_n69/data/index/segments_2nz0"))):
>  This index was initially created with Lucene 6.x while the current version 
> is 8.5.1 and Lucene only supports reading the current and previous major 
> versions.. This version of Lucene only supports indexes created with release 
> 7.0 and later.
>at 
> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:318)
>at 
> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:289)
>at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:432)
>at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:429)
>at 
> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:680)
>at 
> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:632)
>at 
> org.apache.lucene.index.SegmentInfos.readLatestCommit(SegmentInfos.java:434)
>at 
> org.apache.lucene.index.DirectoryReader.listCommits(DirectoryReader.java:285)
>at 
> org.apache.lucene.index.IndexUpgrader.upgrade(IndexUpgrader.java:158)
>at 
> org.apache.lucene.index.IndexUpgrader.main(IndexUpgrader.java:78)
> 
> But when I check the index version with 7.7.3, the segment seems to be 7.7.3!
> 0.00% total deletions; 50756501 documents; 0 deleteions Segments 
> file=segments_2nz0 numSegments=1 version=7.7.3 
> id=ay2stfke7hwy9gippl8k77tdd userData={commitTimeMSec=1589314850951}
>  1 of 1: name=_2rr9t maxDoc=50756501
>version=7.7.3
>id=9pubpiwgt38rzyxr7litvgcu5
>codec=Lucene70
>compound=false
>numFiles=10
>size (MB)=338,143.905
>diagnostics = {os=Linux, java.vendor=Oracle Corporation, 
> java.version=1.8.0_101, java.vm.version=25.101-b13, lucene.version=7.7.3, 
> mergeMaxNumSegments=1, os.arch=amd64, java.runtime.version=1.8.0_101-b13, 
> source=merge, mergeFactor=2, os.version=3.13.0-147-generic, 
> timestamp=1589484981711}
>no deletions
>test: open rea

Corrupted .cfs file

2020-05-19 Thread nettadalet
I get the following exception:
Caused by: org.apache.lucene.index.CorruptIndexException: length should be
104004663 bytes, but is 104856631 instead
(resource=MMapIndexInput(path="path_to_index\index\_jlp.cfs"))

What may be the cause of this?
How can the length of the .cfs file change so it become corrupted?
Can I simply delete this .cfs file and then synchronized the index against
the database, so only the missing files will be indexed, instead of
reindexing all the files?

Thanks in advance.



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Large query size in Solr 8.3.0

2020-05-19 Thread vishal patel

Which query parser is used if my query length is large?
My query is 
https://drive.google.com/file/d/1P609VQReKM0IBzljvG2PDnyJcfv1P3Dz/view


Regards,
Vishal Patel


Re: Large query size in Solr 8.3.0

2020-05-19 Thread Vincenzo D'Amore
Hi, I don't think query size can affect the kind of the parser chosen. I
remember there is a maximum number of boolean clause  (maxBooleanClauses),
but this a slight different thing.
If the query is too large, you can have an http error (bad request?), I
don't remember, well just change the http method used (instead of GET use
POST)

On Tue, May 19, 2020 at 4:16 PM vishal patel 
wrote:

>
> Which query parser is used if my query length is large?
> My query is
> https://drive.google.com/file/d/1P609VQReKM0IBzljvG2PDnyJcfv1P3Dz/view
>
>
> Regards,
> Vishal Patel
>


-- 
Vincenzo D'Amore


Normalized score

2020-05-19 Thread Venu
Hi
Is it possible to normalize the per field score before applying the boosts?

let say 2 documents match my search criteria on the query fields *title* and
*description* using Dismax Parser with individual boosts.

q=cookie&qf = text^2 description^1

let's say below are the TF-IDF scores for the documents:
Doc1:
title - 2description - 4
Doc2:
title - 2.5description - 3

Idea is to normalize with the max value before applying the boost. 

Doc1:
title - (2/4) *2 (boost of title)   description - (4/4) * 1(boost of
description)
Doc2:
title - (2.5/4) * 2   description - (3/4) * 1 

Is this possible? Or Do I need to do re-ranking/LTR here?

Can you guys please suggest if this is doable?




--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: REINDEXCOLLECTION not working on an alias

2020-05-19 Thread Joel Bernstein
I believe the issue is that under the covers this feature is using the
"topic" streaming expressions which it was just reported doesn't work with
aliases. This is something that will get fixed, but for the current release
there isn't a workaround for this issue.


Joel Bernstein
http://joelsolr.blogspot.com/


On Tue, May 19, 2020 at 8:25 AM Bjarke Buur Mortensen 
wrote:

> Hi list,
>
> I seem to be unable to get REINDEXCOLLECTION to work on a collection alias
> (running Solr 8.2.0). The documentation seems to state that that should be
> possible:
>
> https://lucene.apache.org/solr/guide/8_2/collection-management.html#reindexcollection
> "name
> Source collection name, may be an alias. This parameter is required."
>
> If I run on my alias (qa_supplier_products):
> curl "
>
> http://localhost:8983/solr/admin/collections?action=REINDEXCOLLECTION&name=qa_supplier_products&numShards=1&cmd=start
> I get an error:
> "org.apache.solr.common.SolrException: Unable to copy documents from
> qa_supplier_products to .rx_qa_supplier_products_6:
> {\"result-set\":{\"docs\":[\n
>  {\"DaemonOp\":\"Deamon:.rx_qa_supplier_products_6 started on
> .rx_qa_supplier_products_0_shard1_replica_n1\"
>
> If I instead point to the underlying collection, everything works fine. Now
> I have an alias pointing to an alias, which works, but ideally I would like
> to just have my main alias point to the newly reindexed collection.
>
> Can anybody help me out here?
>
> Thanks,
> /Bjarke
>


Re: Upgrade 5.5.5 to 8.5.1 / Segment stucked in lucene v6

2020-05-19 Thread Walter Underwood
Hmm, might be able to hack this with optimize (forced merge).

First, you would have to add enough extra documents to force a rewrite of all 
segments. That might be as many documents as are already in the index. You 
could set a “fake:true” field and filter them out with an fq. Or make sure they 
have no searchable text.

After adding all those, run optimize. This should rewrite all the segments in 
the new format.

Finally, delete all the extra documents. Might want to do another optimize 
after that.

No guarantee that this desperate hack will work.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On May 19, 2020, at 6:21 AM, VILA Jean-Louis 
>  wrote:
> 
> Many thanks for your answers Erik. 
> 
> Effectively, I've read this into many different threads that the migration 
> path will not be guaranteed but, what's strange is that there's no formal 
> information on this impossibility because clearly we can't migrate to v8 if 
> indexes are not "pure" v7 indexes. I understand reason (y =f(x)) but al least 
> a simple documentation about the fact that a Lucene 6 segments can't be 
> upgrade into Lucene 8 would be appreciate.
> 
> More, the check tool just shows v7.7.3 index and there is no mention about 
> "real" segment version which v6! So forbid to open v7 lucene indexes upgraded 
> from v6, is quiet brutal and the rule about that we can migrate only from 
> previous major version is not completely true :-(
> I'll stay into v7.7.3
> 
> Thanks again,
> Jean-Louis
> 
> -Original Message-
> From: Erick Erickson  
> Sent: mardi 19 mai 2020 15:00
> To: solr-user@lucene.apache.org
> Subject: Re: Upgrade 5.5.5 to 8.5.1 / Segment stucked in lucene v6
> 
> This will not work. Lucene has never promised this upgrade path would work, 
> the “one major version back-compat” means that Lucene X has special handling 
> for X-1, but for X-2, all bets are off. Starting with Solr 6, a marker is 
> written into the segments recording the version of Lucene the segment was 
> written with. That marker is preserved through all merges/upgrades/whatever.
> 
> Starting with Lucene 8, if any segment has a marker for Lucene 6 (or no 
> marker at all for earlier versions), then Lucene will refuse to open the 
> index.
> 
> IndexUpgraderTool and the like simply cannot synthesize the new index format, 
> the most succinct explanation I’ve seen is from Robert Muir:
> 
> “I think the key issue here is Lucene is an index not a database. Because it 
> is a lossy index and does not retain all of the user's data, its not possible 
> to safely migrate some things automagically. In the norms case IndexWriter 
> needs to re-analyze the text ("re-index") and compute stats to get back the 
> value, so it can be re-encoded. The function is y = f(x) and if x is not 
> available its not possible, so lucene can't do it.”
> 
> So you’ll have to re-index your corpus with Solr 8 I’m afraid.
> 
> Best,
> Erick
> 
> 
>> On May 19, 2020, at 4:19 AM, VILA Jean-Louis 
>>  wrote:
>> 
>> Dear all,
>> 
>> We start to upgrade a huge SolrCloud cluster from 5.4.1 to lastest version 
>> 8.5.1.
>>   Context :
>> . Ubuntu 16.04, 64b, JVM Oracle 8 101 and now OpenJDK 8 252 . We can't 
>> reindex documents because old ones doesn't exist anymore, so no other 
>> choices than upgrading indexes.
>> 
>> Our upgrading strategy is based on indexUpgrader Tool.
>>   5.4.1 -> 5.5.5 : Ok
>>   5.5.5 -> 6.6.6 : Ok
>>   6.6.6 -> 7.7.3 : ok
>>   Unable to upgrade 7.7.3 to 8.5.1 : here my problem using 
>> 8.5.1, indexUpgrader :
>> 
>> Exception in thread "main" 
>> org.apache.lucene.index.IndexFormatTooOldException: Format version is not 
>> supported (resource 
>> BufferedChecksumIndexInput(MMapIndexInput(path="/data2/solr/nodes/node1/solr/insight_dw_shard3_replica_n69/data/index/segments_2nz0"))):
>>  This index was initially created with Lucene 6.x while the current version 
>> is 8.5.1 and Lucene only supports reading the current and previous major 
>> versions.. This version of Lucene only supports indexes created with release 
>> 7.0 and later.
>>   at 
>> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:318)
>>   at 
>> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:289)
>>   at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:432)
>>   at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:429)
>>   at 
>> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:680)
>>   at 
>> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:632)
>>   at 
>> org.apache.lucene.index.SegmentInfos.readLatestCommit(SegmentInfos.java:434)
>>   at 
>> org.apache.lucene.index.DirectoryReader.listCommits(DirectoryReader.java:285)
>>   at 
>> org.apache.lucene.index.IndexUpgrader.upgrade(IndexUpgrader.java:158)
>>   at 
>> org.apache.lucene

RE: Upgrade 5.5.5 to 8.5.1 / Segment stucked in lucene v6

2020-05-19 Thread VILA Jean-Louis
Thanks Walter, but I can't imagine that will work because if this could work, 
then the index Upgrader should work and it is not the case ☹
Because of the format, the index iv6 can't be rewrite whatever the process you 
use (add replica, optimize, etc...)
The only way I have is the full reindexing! 260 000 000 docs / 3TB indexes, a 
specific preprocessing, it will be very very long..


-Original Message-
From: Walter Underwood  
Sent: mardi 19 mai 2020 17:43
To: solr-user@lucene.apache.org
Subject: Re: Upgrade 5.5.5 to 8.5.1 / Segment stucked in lucene v6

Hmm, might be able to hack this with optimize (forced merge).

First, you would have to add enough extra documents to force a rewrite of all 
segments. That might be as many documents as are already in the index. You 
could set a “fake:true” field and filter them out with an fq. Or make sure they 
have no searchable text.

After adding all those, run optimize. This should rewrite all the segments in 
the new format.

Finally, delete all the extra documents. Might want to do another optimize 
after that.

No guarantee that this desperate hack will work.

wunder
Walter Underwood
wun...@wunderwood.org
https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fobserver.wunderwood.org%2F&data=02%7C01%7CJean-Louis.VILA%40sword-group.com%7C630d6fc16a954cac9c6008d7fc0b587b%7C6adf23d8eabe44c8b68a0b8fb7aacef9%7C1%7C0%7C637254997968577639&sdata=yPhyNyGjjJhKgu%2Bmvkp7%2Fwsx8%2FAR8x5rEnmWRjgmSv8%3D&reserved=0
  (my blog)

> On May 19, 2020, at 6:21 AM, VILA Jean-Louis 
>  wrote:
> 
> Many thanks for your answers Erik. 
> 
> Effectively, I've read this into many different threads that the migration 
> path will not be guaranteed but, what's strange is that there's no formal 
> information on this impossibility because clearly we can't migrate to v8 if 
> indexes are not "pure" v7 indexes. I understand reason (y =f(x)) but al least 
> a simple documentation about the fact that a Lucene 6 segments can't be 
> upgrade into Lucene 8 would be appreciate.
> 
> More, the check tool just shows v7.7.3 index and there is no mention 
> about "real" segment version which v6! So forbid to open v7 lucene 
> indexes upgraded from v6, is quiet brutal and the rule about that we 
> can migrate only from previous major version is not completely true 
> :-( I'll stay into v7.7.3
> 
> Thanks again,
> Jean-Louis
> 
> -Original Message-
> From: Erick Erickson 
> Sent: mardi 19 mai 2020 15:00
> To: solr-user@lucene.apache.org
> Subject: Re: Upgrade 5.5.5 to 8.5.1 / Segment stucked in lucene v6
> 
> This will not work. Lucene has never promised this upgrade path would work, 
> the “one major version back-compat” means that Lucene X has special handling 
> for X-1, but for X-2, all bets are off. Starting with Solr 6, a marker is 
> written into the segments recording the version of Lucene the segment was 
> written with. That marker is preserved through all merges/upgrades/whatever.
> 
> Starting with Lucene 8, if any segment has a marker for Lucene 6 (or no 
> marker at all for earlier versions), then Lucene will refuse to open the 
> index.
> 
> IndexUpgraderTool and the like simply cannot synthesize the new index format, 
> the most succinct explanation I’ve seen is from Robert Muir:
> 
> “I think the key issue here is Lucene is an index not a database. Because it 
> is a lossy index and does not retain all of the user's data, its not possible 
> to safely migrate some things automagically. In the norms case IndexWriter 
> needs to re-analyze the text ("re-index") and compute stats to get back the 
> value, so it can be re-encoded. The function is y = f(x) and if x is not 
> available its not possible, so lucene can't do it.”
> 
> So you’ll have to re-index your corpus with Solr 8 I’m afraid.
> 
> Best,
> Erick
> 
> 
>> On May 19, 2020, at 4:19 AM, VILA Jean-Louis 
>>  wrote:
>> 
>> Dear all,
>> 
>> We start to upgrade a huge SolrCloud cluster from 5.4.1 to lastest version 
>> 8.5.1.
>>   Context :
>> . Ubuntu 16.04, 64b, JVM Oracle 8 101 and now OpenJDK 8 252 . We 
>> can't reindex documents because old ones doesn't exist anymore, so no other 
>> choices than upgrading indexes.
>> 
>> Our upgrading strategy is based on indexUpgrader Tool.
>>   5.4.1 -> 5.5.5 : Ok
>>   5.5.5 -> 6.6.6 : Ok
>>   6.6.6 -> 7.7.3 : ok
>>   Unable to upgrade 7.7.3 to 8.5.1 : here my problem using 
>> 8.5.1, indexUpgrader :
>> 
>> Exception in thread "main" 
>> org.apache.lucene.index.IndexFormatTooOldException: Format version is not 
>> supported (resource 
>> BufferedChecksumIndexInput(MMapIndexInput(path="/data2/solr/nodes/node1/solr/insight_dw_shard3_replica_n69/data/index/segments_2nz0"))):
>>  This index was initially created with Lucene 6.x while the current version 
>> is 8.5.1 and Lucene only supports reading the current and previous major 
>> versions.. This version of Lucene only supports indexes created with release 

Re: Unbalanced shard requests

2020-05-19 Thread Wei
Hi Phill,

What is the RAM config you are referring to, JVM size? How is that related
to the load balancing, if each node has the same configuration?

Thanks,
Wei

On Mon, May 18, 2020 at 3:07 PM Phill Campbell
 wrote:

> In my previous report I was configured to use as much RAM as possible.
> With that configuration it seemed it was not load balancing.
> So, I reconfigured and redeployed to use 1/4 the RAM. What a difference
> for the better!
>
> 10.156.112.50   load average: 13.52, 10.56, 6.46
> 10.156.116.34   load average: 11.23, 12.35, 9.63
> 10.156.122.13   load average: 10.29, 12.40, 9.69
>
> Very nice.
> My tool that tests records RPS. In the “bad” configuration it was less
> than 1 RPS.
> NOW it is showing 21 RPS.
>
>
> http://10.156.112.50:10002/solr/admin/metrics?group=core&prefix=QUERY./select.requestTimes
> <
> http://10.156.112.50:10002/solr/admin/metrics?group=core&prefix=QUERY./select.requestTimes
> >
> {
>   "responseHeader":{
> "status":0,
> "QTime":161},
>   "metrics":{
> "solr.core.BTS.shard1.replica_n2":{
>   "QUERY./select.requestTimes":{
> "count":5723,
> "meanRate":6.8163888639859085,
> "1minRate":11.557013215119536,
> "5minRate":8.760356217628159,
> "15minRate":4.707624230995833,
> "min_ms":0.131545,
> "max_ms":388.710848,
> "mean_ms":30.300492048215947,
> "median_ms":6.336654,
> "stddev_ms":51.527164088667035,
> "p75_ms":35.427943,
> "p95_ms":140.025957,
> "p99_ms":230.533099,
> "p999_ms":388.710848
>
>
>
> http://10.156.122.13:10004/solr/admin/metrics?group=core&prefix=QUERY./select.requestTimes
> <
> http://10.156.122.13:10004/solr/admin/metrics?group=core&prefix=QUERY./select.requestTimes
> >
> {
>   "responseHeader":{
> "status":0,
> "QTime":11},
>   "metrics":{
> "solr.core.BTS.shard2.replica_n8":{
>   "QUERY./select.requestTimes":{
> "count":6469,
> "meanRate":7.502581801189549,
> "1minRate":12.211423085368564,
> "5minRate":9.445681397767322,
> "15minRate":5.216209798637846,
> "min_ms":0.154691,
> "max_ms":701.657394,
> "mean_ms":34.2734699171445,
> "median_ms":5.640378,
> "stddev_ms":62.27649205954566,
> "p75_ms":39.016371,
> "p95_ms":156.997982,
> "p99_ms":288.883028,
> "p999_ms":538.368031
>
>
> http://10.156.116.34:10002/solr/admin/metrics?group=core&prefix=QUERY./select.requestTimes
> <
> http://10.156.116.34:10002/solr/admin/metrics?group=core&prefix=QUERY./select.requestTimes
> >
> {
>   "responseHeader":{
> "status":0,
> "QTime":67},
>   "metrics":{
> "solr.core.BTS.shard3.replica_n16":{
>   "QUERY./select.requestTimes":{
> "count":7109,
> "meanRate":7.787524673806184,
> "1minRate":11.88519763582083,
> "5minRate":9.893315557386755,
> "15minRate":5.620178363676527,
> "min_ms":0.150887,
> "max_ms":472.826462,
> "mean_ms":32.184282366621204,
> "median_ms":6.977733,
> "stddev_ms":55.729908615189196,
> "p75_ms":36.655011,
> "p95_ms":151.12627,
> "p99_ms":251.440162,
> "p999_ms":472.826462
>
>
> Compare that to the previous report and you can see the improvement.
> So, note to myself. Figure out the sweet spot for RAM usage. Use too much
> and strange behavior is noticed. While using too much all the load focused
> on one box and query times slowed.
> I did not see any OOM errors during any of this.
>
> Regards
>
>
>
> > On May 18, 2020, at 3:23 PM, Phill Campbell
>  wrote:
> >
> > I have been testing 8.5.2 and it looks like the load has moved but is
> still on one machine.
> >
> > Setup:
> > 3 physical machines.
> > Each machine hosts 8 instances of Solr.
> > Each instance of Solr hosts one replica.
> >
> > Another way to say it:
> > Number of shards = 8. Replication factor = 3.
> >
> > Here is the cluster state. You can see that the leaders are well
> distributed.
> >
> > {"TEST_COLLECTION":{
> >"pullReplicas":"0",
> >"replicationFactor":"3",
> >"shards":{
> >  "shard1":{
> >"range":"8000-9fff",
> >"state":"active",
> >"replicas":{
> >  "core_node3":{
> >"core":"TEST_COLLECTION_shard1_replica_n1",
> >"base_url":"http://10.156.122.13:10007/solr";,
> >"node_name":"10.156.122.13:10007_solr",
> >"state":"active",
> >"type":"NRT",
> >"force_set_state":"false"},
> >  "core_node5":{
> >"core":"TEST_COLLECTION_shard1_replica_n2",
> >"base_url":"http://10.156.112.50:10002/solr";,
> >"node_name":"10.156.112.50:10002_solr",
> >"state":"active",
> >"type":"NRT",
> >"force_set_state":"false",
> >"leader":"true"},
> >  "core_node7":{
> >"core":"TEST_COLLECTION_

Re: Upgrade 5.5.5 to 8.5.1 / Segment stucked in lucene v6

2020-05-19 Thread Erick Erickson
Jean-Louis:

One explication is here: 
https://lucene.apache.org/solr/guide/8_5/indexupgrader-tool.html, but then 
again the reference guide is very long, I’m not sure how to make it more 
findable. Or, for that matter, whether it should be part of the 
IndexUpgraderTool section or not. Please feel free to suggest (even better, 
submit a patch) if you can think of a place it’d be more easily findable. It’s 
always useful to have someone with fresh eyes weigh in.

Optimize won’t work. Under the covers, optimize is just a merge. It uses the 
exact same low-level merging code that background merging uses, including 
preserving the markers in the segment files. That’s why the Lucene devs use 
“forceMerge” rather than “optimize”, the latter is easy to interpret as 
something that does more than it really does.

This is also the same code that IndexUpgraderTool uses too for that matter. 
IndexUpgraderTool is, really, just a forceMerge down to one segment, which is 
all optimize is (assuming you specify maxSegments=1).

Best,
Erick

> On May 19, 2020, at 11:42 AM, Walter Underwood  wrote:
> 
> Hmm, might be able to hack this with optimize (forced merge).
> 
> First, you would have to add enough extra documents to force a rewrite of all 
> segments. That might be as many documents as are already in the index. You 
> could set a “fake:true” field and filter them out with an fq. Or make sure 
> they have no searchable text.
> 
> After adding all those, run optimize. This should rewrite all the segments in 
> the new format.
> 
> Finally, delete all the extra documents. Might want to do another optimize 
> after that.
> 
> No guarantee that this desperate hack will work.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
>> On May 19, 2020, at 6:21 AM, VILA Jean-Louis 
>>  wrote:
>> 
>> Many thanks for your answers Erik. 
>> 
>> Effectively, I've read this into many different threads that the migration 
>> path will not be guaranteed but, what's strange is that there's no formal 
>> information on this impossibility because clearly we can't migrate to v8 if 
>> indexes are not "pure" v7 indexes. I understand reason (y =f(x)) but al 
>> least a simple documentation about the fact that a Lucene 6 segments can't 
>> be upgrade into Lucene 8 would be appreciate.
>> 
>> More, the check tool just shows v7.7.3 index and there is no mention about 
>> "real" segment version which v6! So forbid to open v7 lucene indexes 
>> upgraded from v6, is quiet brutal and the rule about that we can migrate 
>> only from previous major version is not completely true :-(
>> I'll stay into v7.7.3
>> 
>> Thanks again,
>> Jean-Louis
>> 
>> -Original Message-
>> From: Erick Erickson  
>> Sent: mardi 19 mai 2020 15:00
>> To: solr-user@lucene.apache.org
>> Subject: Re: Upgrade 5.5.5 to 8.5.1 / Segment stucked in lucene v6
>> 
>> This will not work. Lucene has never promised this upgrade path would work, 
>> the “one major version back-compat” means that Lucene X has special handling 
>> for X-1, but for X-2, all bets are off. Starting with Solr 6, a marker is 
>> written into the segments recording the version of Lucene the segment was 
>> written with. That marker is preserved through all merges/upgrades/whatever.
>> 
>> Starting with Lucene 8, if any segment has a marker for Lucene 6 (or no 
>> marker at all for earlier versions), then Lucene will refuse to open the 
>> index.
>> 
>> IndexUpgraderTool and the like simply cannot synthesize the new index 
>> format, the most succinct explanation I’ve seen is from Robert Muir:
>> 
>> “I think the key issue here is Lucene is an index not a database. Because it 
>> is a lossy index and does not retain all of the user's data, its not 
>> possible to safely migrate some things automagically. In the norms case 
>> IndexWriter needs to re-analyze the text ("re-index") and compute stats to 
>> get back the value, so it can be re-encoded. The function is y = f(x) and if 
>> x is not available its not possible, so lucene can't do it.”
>> 
>> So you’ll have to re-index your corpus with Solr 8 I’m afraid.
>> 
>> Best,
>> Erick
>> 
>> 
>>> On May 19, 2020, at 4:19 AM, VILA Jean-Louis 
>>>  wrote:
>>> 
>>> Dear all,
>>> 
>>> We start to upgrade a huge SolrCloud cluster from 5.4.1 to lastest version 
>>> 8.5.1.
>>>  Context :
>>> . Ubuntu 16.04, 64b, JVM Oracle 8 101 and now OpenJDK 8 252 . We can't 
>>> reindex documents because old ones doesn't exist anymore, so no other 
>>> choices than upgrading indexes.
>>> 
>>> Our upgrading strategy is based on indexUpgrader Tool.
>>>  5.4.1 -> 5.5.5 : Ok
>>>  5.5.5 -> 6.6.6 : Ok
>>>  6.6.6 -> 7.7.3 : ok
>>>  Unable to upgrade 7.7.3 to 8.5.1 : here my problem using 
>>> 8.5.1, indexUpgrader :
>>> 
>>> Exception in thread "main" 
>>> org.apache.lucene.index.IndexFormatTooOldException: Format version is not 
>>> supported (resource 
>>> BufferedChecksum

Re: Corrupted .cfs file

2020-05-19 Thread Erick Erickson
Usually this is caused by one of
1> the file on disk getting corrupted, i.e. the disk going bad.
2> the disk getting full at some point and writing a partial segment

No, you cannot delete the cfs file and re-index only the documents
that were in it because you have no way of knowing exactly what
those documents are. Segments are merged in the background as
part of normal indexing, so figuring out what docs were in the
segment isn’t really possible. (OK, it’s determinate, but there are
so many variables that it might as well be impossible).

CheckIndex -fix will remove the corrupted segments, leaving holes
in your index. You can’t just delete the cfs file yourself because the
segments file which tells Lucene what segments are current references
it. But CheckIndex will take care of both parts for you.

If you really can’t re-index everything, you could certainly use a
streaming expression to get a list of all the IDs in the index, compare
that against your DB and only index the difference, but whether that’s
more work than just reindexing anyway I don’t know.

You don’t say whether you’re using SolrCloud or not, but if you are _and_
if you have more than one replica, just DELETEREPLICA on the bad one and
use ADDREPLICA to put it back. It’ll sync with the leader automatically.

Best,
Erick

> On May 19, 2020, at 9:33 AM, nettadalet  wrote:
> 
> I get the following exception:
> Caused by: org.apache.lucene.index.CorruptIndexException: length should be
> 104004663 bytes, but is 104856631 instead
> (resource=MMapIndexInput(path="path_to_index\index\_jlp.cfs"))
> 
> What may be the cause of this?
> How can the length of the .cfs file change so it become corrupted?
> Can I simply delete this .cfs file and then synchronized the index against
> the database, so only the missing files will be indexed, instead of
> reindexing all the files?
> 
> Thanks in advance.
> 
> 
> 
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Re: Upgrade 5.5.5 to 8.5.1 / Segment stucked in lucene v6

2020-05-19 Thread VILA Jean-Louis
Erick

I just suggest a dedicated page to upgrade path because reading the page about 
indexUpgraderTool, we understand well that we can’t upgrade in one phase but 
6->7->8 must be made and nowhere it is specified that from Lucene 6, the 
segments are marked V6 for ever. 
Naively, by transitivity, the upgrade path 6>7>8 is quiet natural. From my 
point of view, we must speak about “since Lucene 6, version is compatible 
previous major version of an index” not upgrading. The term is ambiguous.
The thinks must be clear, I understand the problem :-)
Jean louis 

> Le 19 mai 2020 à 19:03, Erick Erickson  a écrit :
> 
> Jean-Louis:
> 
> One explication is here: 
> https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flucene.apache.org%2Fsolr%2Fguide%2F8_5%2Findexupgrader-tool.html&data=02%7C01%7CJean-Louis.VILA%40sword-group.com%7C26ae67c92e4149a2eb7c08d7fc167c6f%7C6adf23d8eabe44c8b68a0b8fb7aacef9%7C1%7C0%7C63725504581966&sdata=HapOVXDPluPWEC%2BSAVpTJju94od0y4X%2BNNoRd%2Beh2TE%3D&reserved=0,
>  but then again the reference guide is very long, I’m not sure how to make it 
> more findable. Or, for that matter, whether it should be part of the 
> IndexUpgraderTool section or not. Please feel free to suggest (even better, 
> submit a patch) if you can think of a place it’d be more easily findable. 
> It’s always useful to have someone with fresh eyes weigh in.
> 
> Optimize won’t work. Under the covers, optimize is just a merge. It uses the 
> exact same low-level merging code that background merging uses, including 
> preserving the markers in the segment files. That’s why the Lucene devs use 
> “forceMerge” rather than “optimize”, the latter is easy to interpret as 
> something that does more than it really does.
> 
> This is also the same code that IndexUpgraderTool uses too for that matter. 
> IndexUpgraderTool is, really, just a forceMerge down to one segment, which is 
> all optimize is (assuming you specify maxSegments=1).
> 
> Best,
> Erick
> 
>> On May 19, 2020, at 11:42 AM, Walter Underwood  wrote:
>> 
>> Hmm, might be able to hack this with optimize (forced merge).
>> 
>> First, you would have to add enough extra documents to force a rewrite of 
>> all segments. That might be as many documents as are already in the index. 
>> You could set a “fake:true” field and filter them out with an fq. Or make 
>> sure they have no searchable text.
>> 
>> After adding all those, run optimize. This should rewrite all the segments 
>> in the new format.
>> 
>> Finally, delete all the extra documents. Might want to do another optimize 
>> after that.
>> 
>> No guarantee that this desperate hack will work.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fobserver.wunderwood.org%2F&data=02%7C01%7CJean-Louis.VILA%40sword-group.com%7C26ae67c92e4149a2eb7c08d7fc167c6f%7C6adf23d8eabe44c8b68a0b8fb7aacef9%7C1%7C0%7C63725504581966&sdata=uLAG8jtE15ydynynxEgKEEhOeng08DdpKgaKU81RB%2Bk%3D&reserved=0
>>   (my blog)
>> 
 On May 19, 2020, at 6:21 AM, VILA Jean-Louis 
  wrote:
>>> 
>>> Many thanks for your answers Erik. 
>>> 
>>> Effectively, I've read this into many different threads that the migration 
>>> path will not be guaranteed but, what's strange is that there's no formal 
>>> information on this impossibility because clearly we can't migrate to v8 if 
>>> indexes are not "pure" v7 indexes. I understand reason (y =f(x)) but al 
>>> least a simple documentation about the fact that a Lucene 6 segments can't 
>>> be upgrade into Lucene 8 would be appreciate.
>>> 
>>> More, the check tool just shows v7.7.3 index and there is no mention about 
>>> "real" segment version which v6! So forbid to open v7 lucene indexes 
>>> upgraded from v6, is quiet brutal and the rule about that we can migrate 
>>> only from previous major version is not completely true :-(
>>> I'll stay into v7.7.3
>>> 
>>> Thanks again,
>>> Jean-Louis
>>> 
>>> -Original Message-
>>> From: Erick Erickson  
>>> Sent: mardi 19 mai 2020 15:00
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: Upgrade 5.5.5 to 8.5.1 / Segment stucked in lucene v6
>>> 
>>> This will not work. Lucene has never promised this upgrade path would work, 
>>> the “one major version back-compat” means that Lucene X has special 
>>> handling for X-1, but for X-2, all bets are off. Starting with Solr 6, a 
>>> marker is written into the segments recording the version of Lucene the 
>>> segment was written with. That marker is preserved through all 
>>> merges/upgrades/whatever.
>>> 
>>> Starting with Lucene 8, if any segment has a marker for Lucene 6 (or no 
>>> marker at all for earlier versions), then Lucene will refuse to open the 
>>> index.
>>> 
>>> IndexUpgraderTool and the like simply cannot synthesize the new index 
>>> format, the most succinct explanation I’ve seen is from Robert Muir:
>>> 
>>> “I think the key issue here is Lucene is an index not a database. Because 
>>> i

Getting terms in descending order

2020-05-19 Thread Gajjar, Jigar
Hello,

We have requirement to get terms using ascending and descending order.
We are using qt=/terms but this only gives terms in ascending order if I set 
terms.sort=index

Is there a way or workound to get terms in descending order.

We are also providing terms.lower and terms.upper to restrict data.


Thanks,
Jigar Gajjar


when to use docvalue

2020-05-19 Thread matthew sporleder
I have quite a few numeric / meta-data type fields in my schema and
pretty much only use them in fq=, sort=, and friends.  Should I always
use DocValue on these if i never plan to q=search: on them?  Are there
any drawbacks?

Thanks,
Matt


Re: when to use docvalue

2020-05-19 Thread Erick Erickson
Yes. You should also index them….

Here’s the way I think of it. 

For questions “For term X, which docs contain that value?” means index=true. 
This is a search.

For questions “Does doc X have value Y in field Z”, means docValues=true.

what’s the difference? Well, the first one is to get the result set. The second 
is for, given a result set,
count/sort/whatever.

fq clauses are searches, so index=true. 

sorting, faceting, grouping and function queries  are “for each doc in the 
result set, what values does field Y contain?”

Maybe that made things clear as mud, but it’s the way I think of it ;)

Best,
Erick



fq clauses are searches. Indexed=true is for searching.

sort 

> On May 19, 2020, at 4:00 PM, matthew sporleder  wrote:
> 
> I have quite a few numeric / meta-data type fields in my schema and
> pretty much only use them in fq=, sort=, and friends.  Should I always
> use DocValue on these if i never plan to q=search: on them?  Are there
> any drawbacks?
> 
> Thanks,
> Matt



Re: when to use docvalue

2020-05-19 Thread matthew sporleder
You can index AND docvalue?  For some reason I thought they were exclusive

On Tue, May 19, 2020 at 5:36 PM Erick Erickson  wrote:
>
> Yes. You should also index them….
>
> Here’s the way I think of it.
>
> For questions “For term X, which docs contain that value?” means index=true. 
> This is a search.
>
> For questions “Does doc X have value Y in field Z”, means docValues=true.
>
> what’s the difference? Well, the first one is to get the result set. The 
> second is for, given a result set,
> count/sort/whatever.
>
> fq clauses are searches, so index=true.
>
> sorting, faceting, grouping and function queries  are “for each doc in the 
> result set, what values does field Y contain?”
>
> Maybe that made things clear as mud, but it’s the way I think of it ;)
>
> Best,
> Erick
>
>
>
> fq clauses are searches. Indexed=true is for searching.
>
> sort
>
> > On May 19, 2020, at 4:00 PM, matthew sporleder  wrote:
> >
> > I have quite a few numeric / meta-data type fields in my schema and
> > pretty much only use them in fq=, sort=, and friends.  Should I always
> > use DocValue on these if i never plan to q=search: on them?  Are there
> > any drawbacks?
> >
> > Thanks,
> > Matt
>


Re: Getting terms in descending order

2020-05-19 Thread Erick Erickson
In a word, “no”. The Terms component is intended
to look forward through the terms list.

You could always specify terms.limit=-1 and only display
the last N of the returned list, but the list may be very long.

Best,
Erick


> On May 19, 2020, at 3:58 PM, Gajjar, Jigar  wrote:
> 
> Hello,
> 
> We have requirement to get terms using ascending and descending order.
> We are using qt=/terms but this only gives terms in ascending order if I set 
> terms.sort=index
> 
> Is there a way or workound to get terms in descending order.
> 
> We are also providing terms.lower and terms.upper to restrict data.
> 
> 
> Thanks,
> Jigar Gajjar



Re: when to use docvalue

2020-05-19 Thread Erick Erickson
They are _absolutely_ able to be used together. Background:

“In the bad old days”, there was no docValues. So whenever you needed
to facet/sort/group/use function queries Solr (well, Lucene) had to take
the inverted structure resulting from “index=true” and “uninvert” it on the
Java heap.

docValues essentially does the “uninverting” at index time and puts 
that structure in a separate file for each segment. So rather than uninvert
the index on the heap, Lucene can just read it in from disk in MMapDirectory
(i.e. OS) memory space.

The downside is that your index will be bigger when you do both, that is the 
size on disk will be bigger. But, it’ll be much faster to load, much faster to
autowarm, and will move the structures necessary to do faceting/sorting/etc
into OS memory where the garbage collection is vastly more efficient than
Javas.

And frankly I don’t think the increased size on disk is a downside. You’ll have
to have the memory anyway, and having it used on the OS memory space is
so much more efficient than on Java’s heap that it’s a win-win IMO.

Oh, and if you never sort/facet/group/use function queries, then the 
docValues structures are never even read into MMapDirectory space.

So yes, freely do both.

Best,
Erick


> On May 19, 2020, at 5:41 PM, matthew sporleder  wrote:
> 
> You can index AND docvalue?  For some reason I thought they were exclusive
> 
> On Tue, May 19, 2020 at 5:36 PM Erick Erickson  
> wrote:
>> 
>> Yes. You should also index them….
>> 
>> Here’s the way I think of it.
>> 
>> For questions “For term X, which docs contain that value?” means index=true. 
>> This is a search.
>> 
>> For questions “Does doc X have value Y in field Z”, means docValues=true.
>> 
>> what’s the difference? Well, the first one is to get the result set. The 
>> second is for, given a result set,
>> count/sort/whatever.
>> 
>> fq clauses are searches, so index=true.
>> 
>> sorting, faceting, grouping and function queries  are “for each doc in the 
>> result set, what values does field Y contain?”
>> 
>> Maybe that made things clear as mud, but it’s the way I think of it ;)
>> 
>> Best,
>> Erick
>> 
>> 
>> 
>> fq clauses are searches. Indexed=true is for searching.
>> 
>> sort
>> 
>>> On May 19, 2020, at 4:00 PM, matthew sporleder  wrote:
>>> 
>>> I have quite a few numeric / meta-data type fields in my schema and
>>> pretty much only use them in fq=, sort=, and friends.  Should I always
>>> use DocValue on these if i never plan to q=search: on them?  Are there
>>> any drawbacks?
>>> 
>>> Thanks,
>>> Matt
>> 



Re: Upgrade 5.5.5 to 8.5.1 / Segment stucked in lucene v6

2020-05-19 Thread Erick Erickson
Jean-Louis:

One of the great advantages of open source is that it allows people to look at 
a problem with “fresh eyes” and add to the project in a way that help other 
people who aren’t steeped in the arcana of Lucene/Solr. So it’d be great if you 
could go ahead and make a patch and JIRA to put this information in a place 
that makes the most sense to someone coming in fresh.

And I fully appreciate that “it’ in the reference guide” isn’t adequate, it’s 
over 1,300 pages last I knew. So putting this information somewhere that 
someone like yourself is likely to find it is the best option…

If you create a JIRA and patch, use “@erick” in the comment and I’ll see it and 
we can incorporate the info.

Best,
Erick.

> On May 19, 2020, at 2:57 PM, VILA Jean-Louis 
>  wrote:
> 
> Erick
> 
> I just suggest a dedicated page to upgrade path because reading the page 
> about indexUpgraderTool, we understand well that we can’t upgrade in one 
> phase but 6->7->8 must be made and nowhere it is specified that from Lucene 
> 6, the segments are marked V6 for ever. 
> Naively, by transitivity, the upgrade path 6>7>8 is quiet natural. From my 
> point of view, we must speak about “since Lucene 6, version is compatible 
> previous major version of an index” not upgrading. The term is ambiguous.
> The thinks must be clear, I understand the problem :-)
> Jean louis 
> 
>> Le 19 mai 2020 à 19:03, Erick Erickson  a écrit :
>> 
>> Jean-Louis:
>> 
>> One explication is here: 
>> https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flucene.apache.org%2Fsolr%2Fguide%2F8_5%2Findexupgrader-tool.html&data=02%7C01%7CJean-Louis.VILA%40sword-group.com%7C26ae67c92e4149a2eb7c08d7fc167c6f%7C6adf23d8eabe44c8b68a0b8fb7aacef9%7C1%7C0%7C63725504581966&sdata=HapOVXDPluPWEC%2BSAVpTJju94od0y4X%2BNNoRd%2Beh2TE%3D&reserved=0,
>>  but then again the reference guide is very long, I’m not sure how to make 
>> it more findable. Or, for that matter, whether it should be part of the 
>> IndexUpgraderTool section or not. Please feel free to suggest (even better, 
>> submit a patch) if you can think of a place it’d be more easily findable. 
>> It’s always useful to have someone with fresh eyes weigh in.
>> 
>> Optimize won’t work. Under the covers, optimize is just a merge. It uses the 
>> exact same low-level merging code that background merging uses, including 
>> preserving the markers in the segment files. That’s why the Lucene devs use 
>> “forceMerge” rather than “optimize”, the latter is easy to interpret as 
>> something that does more than it really does.
>> 
>> This is also the same code that IndexUpgraderTool uses too for that matter. 
>> IndexUpgraderTool is, really, just a forceMerge down to one segment, which 
>> is all optimize is (assuming you specify maxSegments=1).
>> 
>> Best,
>> Erick
>> 
>>> On May 19, 2020, at 11:42 AM, Walter Underwood  
>>> wrote:
>>> 
>>> Hmm, might be able to hack this with optimize (forced merge).
>>> 
>>> First, you would have to add enough extra documents to force a rewrite of 
>>> all segments. That might be as many documents as are already in the index. 
>>> You could set a “fake:true” field and filter them out with an fq. Or make 
>>> sure they have no searchable text.
>>> 
>>> After adding all those, run optimize. This should rewrite all the segments 
>>> in the new format.
>>> 
>>> Finally, delete all the extra documents. Might want to do another optimize 
>>> after that.
>>> 
>>> No guarantee that this desperate hack will work.
>>> 
>>> wunder
>>> Walter Underwood
>>> wun...@wunderwood.org
>>> https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fobserver.wunderwood.org%2F&data=02%7C01%7CJean-Louis.VILA%40sword-group.com%7C26ae67c92e4149a2eb7c08d7fc167c6f%7C6adf23d8eabe44c8b68a0b8fb7aacef9%7C1%7C0%7C63725504581966&sdata=uLAG8jtE15ydynynxEgKEEhOeng08DdpKgaKU81RB%2Bk%3D&reserved=0
>>>   (my blog)
>>> 
> On May 19, 2020, at 6:21 AM, VILA Jean-Louis 
>  wrote:
 
 Many thanks for your answers Erik. 
 
 Effectively, I've read this into many different threads that the migration 
 path will not be guaranteed but, what's strange is that there's no formal 
 information on this impossibility because clearly we can't migrate to v8 
 if indexes are not "pure" v7 indexes. I understand reason (y =f(x)) but al 
 least a simple documentation about the fact that a Lucene 6 segments can't 
 be upgrade into Lucene 8 would be appreciate.
 
 More, the check tool just shows v7.7.3 index and there is no mention about 
 "real" segment version which v6! So forbid to open v7 lucene indexes 
 upgraded from v6, is quiet brutal and the rule about that we can migrate 
 only from previous major version is not completely true :-(
 I'll stay into v7.7.3
 
 Thanks again,
 Jean-Louis
 
 -Original Message-
 From: Erick Erickson  
 Sent: mardi 19 mai 2020 15:00
 To: solr-user@lucene.apache.org
 Sub