Re: Vector Scoring Plugin for Solr : Dot Product and Cosine Similarity

2020-06-22 Thread Vincenzo D'Amore
Hi Edward,

thank you for the detailed description, I'm going through it to understand
what to do next.
The "rough" survey written by Trey Grainger is really precious, I'm using
this as guidance because starting from today I have a week or two to spend
developing on this, this could be useful to help the community.
And I'm evaluating also https://github.com/o19s/hangry
But given the lack of time (at end there are less than ten days), I need to
concentrate my efforts only on one project.
Again, any further suggestion is appreciated.

On Sat, Jun 20, 2020 at 2:34 PM Edward Ribeiro 
wrote:

> Hi Vincenzo,
>
> The vector search support in Solr is a work-in-progress with a lot of
> discussions scattered among some JIRA issues.
>
> Start here: https://issues.apache.org/jira/plugins/servlet/mobile#issue
> 
> /SOLR-12890
> 
>
> Other discussions (some outdated):
>
> https://issues.apache.org/jira/plugins/servlet/mobile#issue
> 
> /SOLR-13500
> 
>
> https://issues.apache.org/jira/plugins/servlet/mobile#issue
>  >/SOLR-14397
> 
>
> Even though cosine similarity can be used to check similarity among
> vectors, this solution is not scalable. The state of the art approach for a
> large number of vectors has been to use Approximate Nearest Neighbors (ANN)
> algorithms like HNSW graph. So, the related support to ANN algorithms in
> Lucene has also been discussed:
>
> https://issues.apache.org/jira/plugins/servlet/mobile#issue
> 
> /LUCENE-9004
> 
>
> https://issues.apache.org/jira/plugins/servlet/mobile#issue
> 
> /LUCENE-9136
> 
>
> https://issues.apache.org/jira/plugins/servlet/mobile#issue
> 
> /LUCENE-9322
> 
>
>
> Best,
> Edward
>
> Em sex, 19 de jun de 2020 21:02, Vincenzo D'Amore 
> escreveu:
>
> > Hi all,
> >
> > I've started to look at image similarity. For each image in documents I
> > have a couple of vectors that represent the color and shape similarity.
> >
> >  As pointer to begin a colleague suggested me to start with this project:
> >
> > https://github.com/moshebla/solr-vector-scoring
> >
> > but the project seems old and not maintained... On the other hand,
> looking
> > at Solr Documentation I see there is support for dot product and cosine
> > Similarity
> >
> >
> >
> https://lucene.apache.org/solr/guide/7_5/vector-math.html#dot-product-and-cosine-similarity
> >
> > So Solr has the capability to calculate the similarity between two
> vectors.
> > But it is not clear how to use this native feature when searching. Am I
> > missing something? Any help or even suggestion would be appreciated.
> >
> > Best regards,
> > Vincenzo
> >
> >
> > --
> > Vincenzo D'Amore
> >
>


-- 
Vincenzo D'Amore


solr.common.SolrException: Unable to create core

2020-06-22 Thread Chris Larsson
Trying to track down an issue I am seeing with Solr 8.5.1 running on CentOS
8.2 and Java 14.0.1 (openJDK).  My test system was running fine before
updating the OS packages and rebooting at which time it started throwing
the following error:

2020-06-19 20:23:37.877 INFO  (main) [   ] o.e.j.s.Server Started @9695ms
2020-06-19 20:23:37.928 INFO  (coreLoadExecutor-9-thread-1) [   x:testcore]
o.a.s.s.IndexSchema [testcore] Schema name=testcore core
2020-06-19 20:23:38.597 INFO  (coreLoadExecutor-9-thread-1) [   x:testcore]
o.a.s.s.IndexSchema Loaded schema testcore core/1.1 with uniqueid field Urn
2020-06-19 20:23:38.700 INFO  (coreLoadExecutor-9-thread-1) [   x:testcore]
o.a.s.c.CoreContainer Creating SolrCore 'testcore' using configuration from
instancedir /opt/solr/server/solr/cores/testcore, trusted=true
2020-06-19 20:23:38.749 INFO  (coreLoadExecutor-9-thread-1) [   x:testcore]
o.a.s.m.r.SolrJmxReporter JMX monitoring for 'solr.core.testcore' (registry
'solr.core.testcore') enabled at server:
com.sun.jmx.mbeanserver.JmxMBeanServer@202b0582
2020-06-19 20:23:38.769 INFO  (coreLoadExecutor-9-thread-1) [   x:testcore]
o.a.s.c.SolrCore [[testcore] ] Opening new SolrCore at
[/opt/solr/server/solr/cores/testcore],
dataDir=[/opt/solr/server/solr/cores/testcore/data/]
2020-06-19 20:23:39.124 INFO  (coreLoadExecutor-9-thread-1) [   x:testcore]
o.a.s.c.SolrCore [testcore]  CLOSING SolrCore
org.apache.solr.core.SolrCore@7ef79721
2020-06-19 20:23:39.124 INFO  (coreLoadExecutor-9-thread-1) [   x:testcore]
o.a.s.m.SolrMetricManager Closing metric reporters for
registry=solr.core.testcore, tag=SolrCore@7ef79721
2020-06-19 20:23:39.125 INFO  (coreLoadExecutor-9-thread-1) [   x:testcore]
o.a.s.m.r.SolrJmxReporter Closing reporter
[org.apache.solr.metrics.reporters.SolrJmxReporter@41d9b174: rootName =
null, domain = solr.core.testcore, service url = null, agent id = null] for
registry solr.core.testcore / com.codahale.metrics.MetricRegistry@59cd1690
2020-06-19 20:23:39.133 ERROR (coreContainerWorkExecutor-2-thread-1) [   ]
o.a.s.c.CoreContainer Error waiting for SolrCore to be loaded on startup =>
org.apache.solr.common.SolrException: Unable to create core [testcore]
at
org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1327)
org.apache.solr.common.SolrException: Unable to create core [testcore]
at
org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1327)
~[solr-core-8.5.1.jar:8.5.1 edb9fc409398f2c3446883f9f80595c884d245d0 -
ivera - 2020-04-08 09:01:41]
at
org.apache.solr.core.CoreContainer.lambda$load$13(CoreContainer.java:802)
~[solr-core-8.5.1.jar:8.5.1 edb9fc409398f2c3446883f9f80595c884d245d0 -
ivera - 2020-04-08 09:01:41]
at
com.codahale.metrics.InstrumentedExecutorService$InstrumentedCallable.call(InstrumentedExecutorService.java:202)
~[metrics-core-4.1.2.jar:4.1.2]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:210)
~[solr-solrj-8.5.1.jar:8.5.1 edb9fc409398f2c3446883f9f80595c884d245d0 -
ivera - 2020-04-08 09:01:44]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
~[?:?]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
~[?:?]
at java.lang.Thread.run(Thread.java:832) [?:?]
Caused by: org.apache.solr.common.SolrException: Bad or unsupported
pattern: java.time.format.DateTimeFormatter$ClassicFormat@4e663e55
at org.apache.solr.core.SolrCore.(SolrCore.java:1072)
~[solr-core-8.5.1.jar:8.5.1 edb9fc409398f2c3446883f9f80595c884d245d0 -
ivera - 2020-04-08 09:01:41]
at org.apache.solr.core.SolrCore.(SolrCore.java:901)
~[solr-core-8.5.1.jar:8.5.1 edb9fc409398f2c3446883f9f80595c884d245d0 -
ivera - 2020-04-08 09:01:41]
at
org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1306)
~[solr-core-8.5.1.jar:8.5.1 edb9fc409398f2c3446883f9f80595c884d245d0 -
ivera - 2020-04-08 09:01:41]
... 7 more
Caused by: org.apache.solr.common.SolrException: Bad or unsupported
pattern: java.time.format.DateTimeFormatter$ClassicFormat@4e663e55
at
org.apache.solr.update.processor.ParseDateFieldUpdateProcessorFactory.validateFormatter(ParseDateFieldUpdateProcessorFactory.java:217)
~[solr-core-8.5.1.jar:8.5.1 edb9fc409398f2c3446883f9f80595c884d245d0 -
ivera - 2020-04-08 09:01:41]
at
org.apache.solr.update.processor.ParseDateFieldUpdateProcessorFactory.init(ParseDateFieldUpdateProcessorFactory.java:189)
~[solr-core-8.5.1.jar:8.5.1 edb9fc409398f2c3446883f9f80595c884d245d0 -
ivera - 2020-04-08 09:01:41]
at org.apache.solr.core.PluginBag.initInstance(PluginBag.java:106)
~[solr-core-8.5.1.jar:8.5.1 edb9fc409398f2c3446883f9f80595c884d245d0 -
ivera - 2020-04-08 09:01:41]

After tracking down the RPM package likely to cause this issue it was
determined that updating tzdata-java fro

Simulate "this IndexReader is closed" ?

2020-06-22 Thread Richard Goodman
Hi there,

I've spent time implementing the solr prometheus exporter into our Solr
environment. During this, I did come across an issue where when I was
getting the core level metris, I was getting exceptions.

Digging into this further, I realised it's actually on the Solr side of
this, in particular, the metrics that come from the following group within
each core;

SEARCHER.*


An example of the output I was getting;

{
  "responseHeader":{
"status":500,
"QTime":44},
  "error":{
"msg":"this IndexReader is closed",
"trace":"org.apache.lucene.store.AlreadyClosedException: this
IndexReader is closed\n\tat
org.apache.lucene.index.IndexReader.ensureOpen(IndexReader.java:257)\n\tat
org.apache.lucene.index.StandardDirectoryReader.getVersion(StandardDirectoryReader.java:339)\n\tat
org.apache.lucene.index.FilterDirectoryReader.getVersion(FilterDirectoryReader.java:127)\n\tat
org.apache.lucene.index.FilterDirectoryReader.getVersion(FilterDirectoryReader.java:127)\n\tat
org.apache.solr.search.SolrIndexSearcher.lambda$initializeMetrics$13(SolrIndexSearcher.java:2268)


I changed my metric calls to include "®ex=^(?!SEARCHER).*" and the
results were coming through *(minus the SEARCHER metrics)*.

This was enough to unblock me from getting the rest of the metrics,
however, I want to revisit this and see what I can do, as from my point of
view, this is a bug within Solr, because it breaks the entire metrics API *(for
example, if you just hit /solr/admin/metrics and the IndexReader is Closed,
it'll return this message and 0 metrics will be collected/displayed)*.

My problem is, I'm not entirely sure how to replica this error? and was
hoping I could find some guidance. I saw that the file "
org/apache/solr/search/SolrIndexSearcher.java" has the metrics in them, but
got a bit lost from first glance.

If anyone has any information that could help me;
1. Replicate the issue
2. Explain what exactly does it mean when IndexReader is closed

I would be really grateful,

Kind Regards,
Richard Goodman


Index file on Windows fileshare..

2020-06-22 Thread Fiz N
Hello Solr experts,

I am using standalone version of SOLR 8.5 on Windows machine.

1)  I want to index all types of files under different directory in the
file share.

2) I need to index  absolute path of the files and store it solr field. I
need that info so that end user can click and open the file(Pop-up)

Could you please tell me how to go about this?
This is for POC purpose once we finalize the solution we would be further
going ahead with stable approach.

Thanks
Fiz Nadian.


Re: Index file on Windows fileshare..

2020-06-22 Thread Erick Erickson
Consider running Tika in a client and indexing the docs to Solr. 
At that point, you have total control over what’s indexed.

Here’s a skeletal program to get you started:
https://lucidworks.com/post/indexing-with-solrj/

Best,
Erick

> On Jun 22, 2020, at 1:21 PM, Fiz N  wrote:
> 
> Hello Solr experts,
> 
> I am using standalone version of SOLR 8.5 on Windows machine.
> 
> 1)  I want to index all types of files under different directory in the
> file share.
> 
> 2) I need to index  absolute path of the files and store it solr field. I
> need that info so that end user can click and open the file(Pop-up)
> 
> Could you please tell me how to go about this?
> This is for POC purpose once we finalize the solution we would be further
> going ahead with stable approach.
> 
> Thanks
> Fiz Nadian.



Almost nodes in Solrcloud dead suddently

2020-06-22 Thread Tran Van Hoan

dear all,

 I have a solr cloud 8.2.0 with 6 instance per 6 server (64G RAM), each 
instance has xmx = xms = 30G.

Today almost nodes in the solrcloud were dead 2 times from 8:00AM (5/6 nodes 
were down) and 1:00PM (2/6 nodes  were down). yesterday,  One node were down. 
almost metrics didn't increase too much except threads. 

Performance in one week ago:



 



 



 





 









 

performace 12h ago:



 



 



 



 



 



 





I go to the admin UI, some node dead some node too long to response. When 
checking logfile, they generate too much (log level warning), here are logs 
which appears in the solr cloud:

Log before server 4 and 6 down

- Server 4 before it dead:

   + o.a.s.h.RequestHandlerBase java.io.IOException: 
java.util.concurrent.TimeoutException: Idle timeout expired: 12/12 ms

  +org.apache.solr.client.solrj.SolrServerException: Timeout occured while 
waiting response from server at:  
http://server6:8983/solr/mycollection_shard3_replica_n5/select

  

at 
org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:406)

    at 
org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:746)

    at 
org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1274)

    at 
org.apache.solr.handler.component.HttpShardHandler.request(HttpShardHandler.java:238)

    at 
org.apache.solr.handler.component.HttpShardHandler.lambda$submit$0(HttpShardHandler.java:199)

    at java.util.concurrent.FutureTask.run(FutureTask.java:266)

    at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

    at java.util.concurrent.FutureTask.run(FutureTask.java:266)

    at 
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:181)

    at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)

    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

    ... 1 more

Caused by: java.util.concurrent.TimeoutException

    at 
org.eclipse.jetty.client.util.InputStreamResponseListener.get(InputStreamResponseListener.java:216)

    at 
org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:397)

    ... 12 more

 

+ o.a.s.s.HttpSolrCall invalid return code: -1

+ o.a.s.s.PKIAuthenticationPlugin Invalid key request timestamp: 1592803662746 
, received timestamp: 1592803796152 , TTL: 12  

+ o.a.s.s.PKIAuthenticationPlugin Decryption failed , key must be wrong => 
java.security.InvalidKeyException: No installed provider supports this key: 
(null)

+  o.a.s.u.ErrorReportingConcurrentUpdateSolrClient Error when calling 
SolrCmdDistributor$Req: cmd=delete{,commitWithin=-1}; node=ForwardNode: 
http://server6:8983/solr/mycollection_shard3_replica_n5/ to 
http://server6:8983/solr/mycollection_shard3_replica_n5/ => 
java.util.concurrent.TimeoutException

+ o.a.s.s.HttpSolrCall 
null:org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException:
 Async exception during distributed update: null

 

Server 2: 

 + Max requests queued per destination 3000 exceeded for 
HttpDestination[http://server4:8983]@7d7ec93c,queue=3000,pool=MultiplexConnectionPool@73b938e3[c=4/4,b=4,m=0,i=0]

 +  Max requests queued per destination 3000 exceeded for 
HttpDestination[http://server5:8983]@7d7ec93c,queue=3000,pool=MultiplexConnectionPool@73b938e3[c=4/4,b=4,m=0,i=0]

 

+ Timeout occured while waiting response from server at: 
http://server4:8983/solr/mycollection_shard6_replica_n23/select

+ Timeout occured while waiting response from server at: 
http://server6:8983/solr/mycollection_shard2_replica_n15/select

+   o.a.s.s.HttpSolrCall null:org.apache.solr.common.SolrException: 
org.apache.solr.client.solrj.SolrServerException: IOException occured when 
talking to server at: null

Caused by: org.apache.solr.client.solrj.SolrServerException: IOException 
occured when talking to server at: null

Caused by: java.nio.channels.ClosedChannelException

 

Server 6:

 + org.apache.solr.client.solrj.SolrServerException: Timeout occured while 
waiting response from server at: 
http://server6:8983/solr/mycollection_shard2_replica_n15/select

 + + org.apache.solr.client.solrj.SolrServerException: Timeout occured while 
waiting response from server at: Timeout occured while waiting response from 
server at: http://server4:8983/mycollection_shard6_replica_n23/select

 

I tried search google but didn't find any clue  :(! Do you help me how to find 
the cause. thank you!


 



Sorting in other collection in Solr 8.5.1

2020-06-22 Thread vishal patel
Hi

I am upgrading Solr 8.5.1. I have created 2 shards and each has one replica.
I have created 2 collection one is form and second is actionscomment.forms 
related data are stored in form collection and actions of that forms are stored 
in actionscomment collection.
There are 10 lakh documents in form and 50 lakh documents in actionscomment 
collection.

form schema.xml






actionscomment schema.xml










We are showing form listing using form and actionscomment collection. We are 
showing only 250 records in form listing page. Our form listing columns are 
id,title,form created date and action names. id,title,form created date and 
action names come from form collection and action names come from 
actionscomment collection. We want to give the sorting functionality for all 
columns.It is easy to sort id, title and form created date because it is in 
same collection.

For action name sorting, I execute 2 query. First I execute query in 
actionscomment collection with sort field title and get the form_id list and 
using those form_ids I execute in form collection. But I do not get the proper 
sorting. Sometimes I got so many form ids and my second query length becomes 
larger.
How can I get data from form collection same as order of form id list came from 
actionscomment?

Regards,
Vishal Patel



Re: Query takes more time in Solr 8.5.1 compare to 6.1.0 version

2020-06-22 Thread vishal patel
Is there any other option?

Sent from Outlook

From: Mikhail Khludnev 
Sent: Sunday, May 24, 2020 3:24 AM
To: solr-user 
Subject: Re: Query takes more time in Solr 8.5.1 compare to 6.1.0 version

Unfortunately {!terms} doesn't let one ^boost terms.

On Sat, May 23, 2020 at 10:13 AM vishal patel 
wrote:

> Hi Jason
>
> Thanks for reply.
>
> I have checked jay's query using "terms" query parser and it is really
> helpful to us. After execute using "terms" query parser it will come within
> a 500 milliseconds even though grouping is applied.
> Jay's Query :
> https://drive.google.com/file/d/1bavCqwHfJxoKHFzdOEt-mSG8N0fCHE-w/view
>
> Actually I want to apply same things in my query but my field "msg_id" is
> applied boost.group is also used in my query.
> I am also upgrading Solr 8.5.1.
>
>
> MY query is :
> https://drive.google.com/file/d/1Op_Ja292Bcnv0Ijxw6VdAxvGlfsdczmS/view
>
> I got 30 seconds for above query. How can I use the "terms" query parser
> in my query?
>
> Regards,
> Vishal Patel
> 
> From: Jason Gerlowski 
> Sent: Friday, May 22, 2020 2:59 AM
> To: solr-user@lucene.apache.org 
> Subject: Re: Query takes more time in Solr 8.5.1 compare to 6.1.0 version
>
> Hi Jay,
>
> I can't speak to why you're seeing a performance change between 6.x
> and 8.x.  What I can suggest though is an alternative way of
> formulating the query: you might get different performance if you run
> your query using Solr's "terms" query parser:
>
> https://lucene.apache.org/solr/guide/8_5/other-parsers.html#terms-query-parser
>  It's not guaranteed to help, but there's a chance it'll work for you.
> And knowing whether or not it helps might point others here towards
> the cause of your slowdown.
>
> Even if "terms" performs better for you, it's probably worth
> understanding what's going on here of course.
>
> Are all other queries running comparably?
>
> Jason
>
> On Thu, May 21, 2020 at 10:25 AM jay harkhani 
> wrote:
> >
> > Hello,
> >
> > Please refer below details.
> >
> > >Did you create Solrconfig.xml for the collection from scratch after
> upgrading and reindexing?
> > Yes, We have created collection from scratch and also re-indexing.
> >
> > >Was it based on the latest template?
> > Yes, It was as per latest template.
> >
> > >What happens if you reexecute the query?
> > Not more visible difference. Minor change in milliseconds.
> >
> > >Are there other processes/containers running on the same VM?
> > No
> >
> > >How much heap and how much total memory you have?
> > My heap and total memory are same as Solr 6.1.0. heap memory 5 gb and
> total memory 25gb. As per me there is no issue related to memory.
> >
> > >Maybe also you need to increase the corresponding caches in the config.
> > We are not using cache in both version.
> >
> > Both version have same configuration.
> >
> > Regards,
> > Jay Harkhani.
> >
> > 
> > From: Jörn Franke 
> > Sent: Thursday, May 21, 2020 7:05 PM
> > To: solr-user@lucene.apache.org 
> > Subject: Re: Query takes more time in Solr 8.5.1 compare to 6.1.0 version
> >
> > Did you create Solrconfig.xml for the collection from scratch after
> upgrading and reindexing? Was it based on the latest template?
> > If not then please try this. Maybe also you need to increase the
> corresponding caches in the config.
> >
> > What happens if you reexecute the query?
> >
> > Are there other processes/containers running on the same VM?
> >
> > How much heap and how much total memory you have? You should only have a
> minor fraction of the memory as heap and most of it „free“ (this means it
> is used for file caches).
> >
> >
> >
> > > Am 21.05.2020 um 15:24 schrieb vishal patel <
> vishalpatel200...@outlook.com>:
> > >
> > > Any one is looking this issue?
> > > I got same issue.
> > >
> > > Regards,
> > > Vishal Patel
> > >
> > >
> > >
> > > 
> > > From: jay harkhani 
> > > Sent: Wednesday, May 20, 2020 7:39 PM
> > > To: solr-user@lucene.apache.org 
> > > Subject: Query takes more time in Solr 8.5.1 compare to 6.1.0 version
> > >
> > > Hello,
> > >
> > > Currently I upgrade Solr version from 6.1.0 to 8.5.1 and come across
> one issue. Query which have more ids (around 3000) and grouping is applied
> takes more time to execute. In Solr 6.1.0 it takes 677ms and in Solr 8.5.1
> it takes 26090ms. While take reading we have same solr schema and same no.
> of records in both solr version.
> > >
> > > Please refer below details for query, logs and thead dump (generate
> from Solr Admin while execute query).
> > >
> > > Query :
> https://drive.google.com/file/d/1bavCqwHfJxoKHFzdOEt-mSG8N0fCHE-w/view
> > >
> > > Logs and Thread dump stack trace
> > > Solr 8.5.1 :
> https://drive.google.com/file/d/149IgaMdLomTjkngKHrwd80OSEa1eJbBF/view
> > > Solr 6.1.0 :
> https://drive.google.com/file/d/13v1u__fM8nHfyvA0Mnj30IhdffW6xhwQ/view
> > >
> > > To analyse further more w