Partial updates on collection with router.field lead to duplicated index

2020-11-06 Thread Zhivko Donev
Hi All,

I believe that this is a bug on solr side but want to be sure before filing
a JIRA ticket.
Setup:
Solr Cloud 8.3
Collection with 2 shards, 2 replicas, router = compositeId,
router.field=routerField_s

I am adding a document and then updating it as follows:

{
"id":"1",
"routerField_s":"1"
}
-
/update?*_route_=1*
[{
"id":"1",
"routerField_s":"1",
"test_s":{"set":"1"}
}]
--
/update?*_route_=1*
[{
"id":"1",
"routerField_s":"1",
"test_s":{"set":"2"}
}]
--
/update?*_route_=1*
[{
"id":"1",
"routerField_s":"1",
"test_s":{"set":"3"}
}]

When I query the collection for document with id:1 and limit = 10 all seems
to be fine. However if I query with limit 1 the response is saying
numFound=4 (indicating duplicated index).
Moreover if I query the added field test_s for particular value I will get
matches for all of the updated values - 1,2 and 3

If I execute the update without the _route_ param everything seems to work
properly - can someone confirm this?
The same behaviour can be observed if I have the following for the
routerField_s:
"routerField_s":{"set":"1"}

If I try to update with just _route_ param and "id" inside the update body
the request is rejected stating that the "routerField_s" is missing and no
shard can be identified. This seems like expected behaviour.
At a bare minimum I believe that the documentation for updating parts of
the document should be updated with examples how to handle cases like this.
Ideally I would expect solr to reject any requests containing both _route_
param and "routerField_s" values as well as using the {"set":"value"} for
the "routerField_s".

And final question - Do I have any other options for fixing the duplicated
index beside:
1. Delete documents by query "id:{corrupted_id}", then add the document
again
2. Do a full reload to a new collection and switch to using it.

Any thoughts will be much appreciated.


Phrase query no hits when stopwords and FlattenGraphFilterFactory used

2020-11-06 Thread Edward Turner
Hi all,

We are experiencing some unexpected behaviour for phrase queries which we
believe might be related to the FlattenGraphFilterFactory and stopwords.

Brief description: when performing a phrase query
"Molecular cloning and evolution of the" => we get expected hits
"Molecular cloning and evolution of the genes" => we get no hits
(unexpected behaviour)

I think it's worthwhile adding the analyzers we use to help you see what
we're doing:
 Analyzers 

   
  
  
  
  
  
  
   
   
  
  
  
  
  
   

 End of Analyzers 

 Stopwords 
We use the following stopwords:
a, an, and, are, as, at, be, but, by, for, if, in, into, is, it, no, not,
of, on, or, such, that, the, their, then, there, these, they, this, to,
was, will, with, which
 End of Stopwords 

 Analysis Admin page output ---
... And to see what's going on when we're indexing/querying, I created a
gist with an image of the (non-verbose) output of the analysis admin page
for, index data/query, "Molecular cloning and evolution of the genes":
https://gist.github.com/eddturner/81dbf409703aad402e9009b13d42e43c#file-analysis-admin-png

Hopefully this link works, and you can see that the resulting terms and
positions are identical until the FlattenGraphFilterFactory step in the
"index" phase.

Final stage of index analysis:
(1)molecular (2)cloning (3) (4)evolution (5) (6)genes

Final stage of query analysis:
(1)molecular (2)cloning (3) (4)evolution (5) (6) (7)genes

The empty positions are because of stopwords (presumably)
 End of Analysis Admin page output ---

Main question:
Could someone explain why the FlattenGraphFilterFactory changes the
position of the "genes" token? From what we see, this happens after a,
"the" (but we've not checked exhaustively, and continue to test).

Perhaps, we are doing something wrong in our analysis setup?

Any help would be much appreciated -- getting phrase queries to work is an
important use-case of ours.

Kind regards and thank you in advance,
Edd

Edward Turner


can't connect to SOLR with JDBC url

2020-11-06 Thread Vincent Bossuet
Hi all :)

I'm trying to connect to Solr with JDBC, but I always have
"java.util.concurrent.TimeoutException: Could not connect to ZooKeeper
localhost:9983/ within 15000 ms" (or other port, depends wich jdbc url I
test).

Here what I did :

   -

   I installed Solr 7.7.2 (i followed install doc here
   ), i.e.
   download, extract, start (without option : bin/solr start). This version of
   Solr is the one I have at work, so i installed the same to test before on
   localhost.
   -

   I added a 'test' collection and the examples xml documents, I can see
   them at this url 
   -

   then I installed DbVisualizer, added driver and a connection, like explained
   here
    =>
   the only differences I saw with documentation is that on screencopy with
   the jar to import, versions are differents and there is one more jar in
   solr archive (commons-math3-3.6.1.jar). Also, the jdbc url to use is with
   or without a '/' in the middle (see here
   ), as this :
   jdbc:solr://localhost:9983?collection=test or
   jdbc:solr://localhost:9983/?collection=test. I don't know if it is
   important...
   -

   and I tried both on Ubuntu VM and Windows 10

So, all seems to be installed correctly, as in documentation, but when I
click on 'connect', always have a timeout. Every website where I found some
info talk about an url with 9983 port, I tried other possibilities (just in
case) but no success...

   - jdbc:solr://localhost:9983?collection=test
   - jdbc:solr://127.0.0.1:9983?collection=test
   - jdbc:solr://localhost:9983/?collection=test
   - jdbc:solr://localhost:9983/solr?collection=test
   - jdbc:solr://localhost:8983/?collection=test
   - jdbc:solr://localhost:8983?collection=test
   - jdbc:solr://localhost:8983/solr?collection=test
   - jdbc:solr://localhost:2181?collection=test
   - jdbc:solr://localhost:2181/?collection=test
   - jdbc:solr://localhost:2181/solr?collection=test

If you have an idea, thanks for help !

Vincent


JVM Memory Issue with Solr 8.7.0

2020-11-06 Thread Thomas Heldmann
Dear All,

After the release of Solr 8.7.0 I want to test the new version on my
notebook. It has the following specifications: Windows 10 64-bit, 16 GB
RAM, Amazon Corretto 11 64-bit, 50 GB free disk space. I downloaded
solr-8.7.0.zip and unzipped it into a local folder. In order to start
Solr in cloud mode and to use the Blob Store API, I start it with the
following command:

C:\Users\...\SolrCloud\solr-8.7.0\bin>solr start -cloud
-Denable.runtime.lib=true

So far everything works fine, I am able to access the Solr GUI via
http://localhost:8983/solr and the JVM Memory usage is about 200 MB.

Since the configset, which I want to load to Solr, requires a big jar
file with synonym files and commons-lang-2.6.jar, I created a folder
C:\Users\...\SolrCloud\solr-8.7.0\server\solr\lib where I copied these
two jar files into. Now I uploaded the configset to ZooKeeper using the
following command:

solr zk upconfig -d ... -z localhost:9983 -n ...

Now I create the collection via the Solr GUI. In earlier Solr versions,
JVM Memory usage was increased for a few seconds after creating the
collection and then it decreased and no Java errors occurred. But with
Solr 8.7.0, Solr uses the entire JVM Memory which it has by default (512
MB), the browser hangs up, my notebook becomes extremely slow and in the
Windows command line I am getting a java.lang.OutOfMemoryError. My first
thought was that 512 MB JVM Memory might be too little, so I stopped
Solr, activated the "set SOLR_JAVA_MEM" line in the bin\solr.in.cmd
file, set -Xmx to 1024m and restarted Solr. But Solr again claimed the
entire JVM Memory. I increased -Xmx again to 1024m, but that did not
help either.

>From the CHANGES.txt I learned that Circuit Breaker Infrastructure and a
JVM heap usage memory tracking circuit breaker implementation was
introduced with Solr 8.7.0. I am not using a Circuit Breaker in my
solrconfig.xml. Is it possible that the issue described above is because
I am not using a Circuit Breaker? If this is not the case, has there
anything else changed from Solr 8.6.3 to Solr 8.7.0 that might cause
this issue? Or is there a problem with Solr and Windows 10 or Amazon
Corretto?

As I already said, the procedure described above worked well for the
Solr versions since Solr 6.6.1, without java.lang.OutOfMemoryError after
creating the collection.

Best regards,
Thomas Heldmann