Re: Backups with only 1 machine having access to remote storage?

2020-02-21 Thread Koen De Groote
Hello Aroop,

I am doing this via the commands described here:
https://lucene.apache.org/solr/guide/7_6/making-and-restoring-backups.html
The setup is using solr cloud. The backup is written to an NFS mount.

I now see the text: "SolrCloud Backup/Restore requires a shared file system
mounted at the same path on all nodes, or HDFS."

Honestly, I must have missed it. I do not recall it being there before.

I guess that answers my question.

Thank you for contacting me back about this anyway.

Kind regards,
Koen De Groote




On Fri, Feb 21, 2020 at 12:43 AM Aroop Ganguly
 wrote:

> Hi Koen
>
> Which backup mechanism are you using ?
> HDFS backup setup is a lot more sophisticated, and backup repository
> settings made in the solr.xml manage lots of these things.
> The node from where you issue the command would not have any bearing on
> the target collections’s data that you are trying to backup.
> Backup will reach the designated destination, with all the data from your
> collection.
>
> Thats why knowing your setup and settings for backup would help in
> advising you better.
>
> Thanks
> Aroop
>
> > On Feb 20, 2020, at 8:25 AM, Koen De Groote 
> wrote:
> >
> > Hello all,
> >
> > I've recently set up backups, using solr 7.6
> >
> > My setup has 3 replicas per collection and several collections. Not all
> > collections or replicas are present on all hosts.
> >
> > That being said, I run the backup command from 1 particular host and only
> > that host has access to the mount on which the backup data will be
> written.
> >
> > This means that the host writing the backup data doesn't have all the
> data
> > on its local filesystem.
> >
> > Is this a problem?
> >
> > By which I mean: will data not present on that host be retrieved over the
> > network?
> >
> > What happens in this case?
> >
> > Kind regards,
> > Koen De Groote
>
>


Re: Backups with only 1 machine having access to remote storage?

2020-02-21 Thread Koen De Groote
Hello Houston,

Indeed, upon reading the documentation again, I now see this text, which I
must have missed before: SolrCloud Backup/Restore requires a shared file
system mounted at the same path on all nodes, or HDFS.

My bad. Could stand to be even bigger, I think. The text.

Thanks for contacting me about this.

Kind regards,
Koen De Groote



On Thu, Feb 20, 2020 at 7:04 PM Houston Putman 
wrote:

> From my experience, you need all nodes to have access to the shared
> storage.
> Solr will pick which nodes should write each shard's data, and you do not
> have a lot of control over which nodes are selected.
> This is why in the documentation it says that the backup must be written to
> NFS or HDFS.
> Solr won't try to retrieve the other replicas over the network.
>
> I think you will actually get an error back when not every node is able to
> see the path where the backup should be written.
> But even if you don't receive an error, the backup will not work.
>
> - Houston
>
>
>
> On Thu, Feb 20, 2020 at 11:26 AM Koen De Groote <
> koen.degro...@limecraft.com>
> wrote:
>
> > Hello all,
> >
> > I've recently set up backups, using solr 7.6
> >
> > My setup has 3 replicas per collection and several collections. Not all
> > collections or replicas are present on all hosts.
> >
> > That being said, I run the backup command from 1 particular host and only
> > that host has access to the mount on which the backup data will be
> written.
> >
> > This means that the host writing the backup data doesn't have all the
> data
> > on its local filesystem.
> >
> > Is this a problem?
> >
> > By which I mean: will data not present on that host be retrieved over the
> > network?
> >
> > What happens in this case?
> >
> > Kind regards,
> > Koen De Groote
> >
>


graphq query delete: this IndexWriter is closed ??

2020-02-21 Thread Jochen Barth

Dear reader,

still using solr 8.1.1 because of this: 
https://issues.apache.org/jira/browse/SOLR-13738


tried to delete approx. 25ooo solr docs (size of each ca. 1 kB)

using this query: curl http://serv7:8982/solr/Suchindex/update -H 
"Content-type: text/xml" --data-binary '+id:d-nb.info* 
-rnd_d:[0 TO 10] -id:*#* -_query_:"{!graph from=id 
to=parent_ids}class_s:meta"'


now

2020-02-21 13:47:03.523 ERROR (qtp548482954-59) [   x:Suchindex] 
o.a.s.s.HttpSolrCall null:org.apache.solr.common.SolrException: this 
IndexWriter is closed
Caused by: org.apache.lucene.store.AlreadyClosedException: this 
IndexWriter is closed
Caused by: java.lang.ClassCastException: class 
org.apache.lucene.search.IndexSearcher cannot be cast to class 
org.apache.solr.search.SolrIndexSearcher 
(org.apache.lucene.search.IndexSearcher and 
org.apache.solr.search.SolrIndexSearcher are in unnamed module of loader 
org.eclipse.jetty.webapp.WebAppClassLoader @c1fca1e)


Ooops... even commit does not work.

Did rollback. Helps.

Did delete without the -_query_:"..." part, works.

Kind regards.

Jochen

--
Jochen Barth * Universitätsbibliothek Heidelberg, IT * Telefon 06221 54-2580



Re: Solr 7.6 frequent OOM with Java 9, G1 and large heap sizes - any tests with Java 13 and the new ZGC?

2020-02-21 Thread Erick Erickson
People are certainly interested. You’re running on the bleeding edge of
technology, you’re very brave ;).

I’m not quite sure how to interpret “memory utilization stays around 18%”.
18% of total physical RAM or heap? I’m assuming the former..

I’m curious, how did CMS and G1GC fail? It’s perfectly understandable if
the failures were due to stop-the-world GC pauses; they can lead to timeouts
which can cause replicas to be put into recovery, or Zookeeper to think
the node died etc… In extreme cases this means that the entire cluster goes 
down.

Best,
Erick

> On Feb 20, 2020, at 5:31 PM, tbarkley29  wrote:
> 
> We are currently running performance tests with Solr 8.2/OpenJDK11/ZGC. We've
> ran multiple successful 12 hour tests and are currently running 24 hour
> tests. There are three nodes which are 4 cores and 28GB memory, JVM is 16GB.
> We are getting max ~780 Page Per Second with max of ~8,000 users/min. CPU
> utilization stays around 80% and memory utilization stays around 18%. We
> were trying various configurations with G1GC which were unsuccessful after
> about 8 hours. We also tried with CMS which failed within an hour or so.
> Queries used in test were taken from Splunk from production traffic. These
> performance tests are still ongoing and I'd be open to providing JVM metrics
> if interested.
> 
> 
> 
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Is this a bug? Wildcard with PatternReplaceFilterFactory

2020-02-21 Thread Mike Phillips

Is this a bug? Wildcard with PatternReplaceFilterFactory

Attempting to normalize left and right single and double quotes for searches

‘   Left single quotation mark    '    Single quote
’   Right single quotation mark   '    Single quote
“   Left double quotation mark    "    Double quotes
”   Right double quotation mark   "    Double quotes


    positionIncrementGap="100" multiValued="true">

  
    
    words="stopwords.txt" />
        preserveOriginal="1" catenateWords="1"/>
         
        replacement="'"/>
        replacement="'"/>
    replacement="""/>
    replacement="""/>

    
  
  
    
        preserveOriginal="1" catenateWords="1"/>
    words="stopwords.txt" />
    synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        replacement="'"/>
        replacement="'"/>
    replacement="""/>
    replacement="""/>

    
  
    

The wildcard seems to NOT utilize the PatternReplaceFilterFactory

Rod’s  finds fields Rod's and Rod’s that are now in the index as rod's

but *Rod’s* finds nothing because the index now only contains rod's



Sometimes when we restart or following a hard reboot of Solr we get multiple replicas

2020-02-21 Thread jason.randall
Our cluster is set up so that each daily log shard has 3 replicas. It should
create one replica for each of the sites in the cluster and then host one of
the replicas on each site. We have days in which we get only 1 or 2 replicas
instead of 3. We have days in which we get multiple replicas. The replicas
are often randomly dispersed amongst the nodes in the cluster. So we might
have 5 hosted on node 2 and 3 on node 3 but maybe none on node 1 for that
day. Can you provide possible scenarios as to why this may happen? And how
to resolve the issue?



Re: Is this a bug? Wildcard with PatternReplaceFilterFactory

2020-02-21 Thread Erick Erickson
Why do you say “…that are now in the index as rod’s”? You have 
WordDelimiterGraphFilterFactory, which breaks things up. When I put your field 
definition in the schema and use the analysis page, turns “rod’s” into  the 
following 4 tokens:

rod’s
rods
rod
s

And querying on field:”*Rod’s*” works just fine. I’m using 8.x, and when I add 
“&debug=query” to the URL, I see: 
{
"responseHeader": {
"status": 0, "QTime": 10, "params": {
"q": "eoe:\"*Rod's*\"", "debug": "query"
}
}, "response": {
"numFound": 1, "start": 0, "docs": [
{
"id": "1", "eoe": "Rod's", "_version_": 1659176849231577088
}
]
}, "debug": {
"rawquerystring": "eoe:\"*Rod's*\"", "querystring": "eoe:\"*Rod's*\"", 
"parsedquery": "SynonymQuery(Synonym(eoe:*rod's* eoe:rod))", 
"parsedquery_toString": "Synonym(eoe:*rod's* eoe:rod)", "QParser": 
"LuceneQParser"
}
}

What do you see?

Best,
Erick

> On Feb 21, 2020, at 12:57 PM, Mike Phillips  
> wrote:
> 
> Rod’s  finds fields Rod's and Rod’s that are now in the index as rod's
> 
> but *Rod’s* finds nothing because the index now only contains rod's



Hi team,

2020-02-21 Thread Sankar Panda
Can any one tell me how the open nlp model files configure n solrcloud .
Thanks
Sankar Panda


Re: Is this a bug? Wildcard with PatternReplaceFilterFactory

2020-02-21 Thread Mike Phillips
It looks like the debug result you are showing me is the results for 
Rod's not Rod’s, but in answer to your question


This is why I think    "Rod’s  finds fields Rod's and 
Rod’s that are now in the index as rod's"


The analysis page shows Rod’s gets stored in the index as:
rod's rods rod s

Field Value (Index)

Rod’s

Analyse Fieldname / FieldType: _text_ Schema Browser 



 *
   Verbose Output

WT

text
raw_bytes
start
end
positionLength
type
termFrequency
position


Rod’s
[52 6f 64 e2 80 99 73]
0
5
1
word
1
1

SF

text
raw_bytes
start
end
positionLength
type
termFrequency
position


Rod’s
[52 6f 64 e2 80 99 73]
0
5
1
word
1
1

WDGF

text
raw_bytes
start
end
positionLength
type
termFrequency
position
keyword


Rod’s
[52 6f 64 e2 80 99 73]
0
5
2
word
1
1
false


Rods
[52 6f 64 73]
0
5
2
word
1
1
false


Rod
[52 6f 64]
0
3
1
word
1
1
false


s
[73]
4
5
1
word
1
2
false

FGF

text
raw_bytes
start
end
positionLength
type
termFrequency
position
keyword


Rod’s
[52 6f 64 e2 80 99 73]
0
5
2
word
1
1
false


Rods
[52 6f 64 73]
0
5
2
word
1
1
false


Rod
[52 6f 64]
0
3
1
word
1
1
false


s
[73]
4
5
1
word
1
2
false

PRF

text
raw_bytes
start
end
positionLength
type
termFrequency
position
keyword


Rod’s
[52 6f 64 e2 80 99 73]
0
5
2
word
1
1
false


Rods
[52 6f 64 73]
0
5
2
word
1
1
false


Rod
[52 6f 64]
0
3
1
word
1
1
false


s
[73]
4
5
1
word
1
2
false

PRF

text
raw_bytes
start
end
positionLength
type
termFrequency
position
keyword


Rod's
[52 6f 64 27 73]
0
5
2
word
1
1
false


Rods
[52 6f 64 73]
0
5
2
word
1
1
false


Rod
[52 6f 64]
0
3
1
word
1
1
false


s
[73]
4
5
1
word
1
2
false

PRF

text
raw_bytes
start
end
positionLength
type
termFrequency
position
keyword


Rod's
[52 6f 64 27 73]
0
5
2
word
1
1
false


Rods
[52 6f 64 73]
0
5
2
word
1
1
false


Rod
[52 6f 64]
0
3
1
word
1
1
false


s
[73]
4
5
1
word
1
2
false

PRF

text
raw_bytes
start
end
positionLength
type
termFrequency
position
keyword


Rod's
[52 6f 64 27 73]
0
5
2
word
1
1
false


Rods
[52 6f 64 73]
0
5
2
word
1
1
false


Rod
[52 6f 64]
0
3
1
word
1
1
false


s
[73]
4
5
1
word
1
2
false

LCF

tex

t
raw_bytes
start
end
positionLength
type
termFrequency
position
keyword


rod's
[72 6f 64 27 73]
0
5
2
word
1
1
false


rods
[72 6f 64 73]
0
5
2
word
1
1
false


rod
[72 6f 64]
0
3
1
word
1
1
false


s
[73]
4
5
1
word
1
2
false



This is  what we were trying to achieve with the class="solr.PatternReplaceFilterFactory" pattern="’" replacement="'"/>



The problem is when using wildcard *Rod’s* we get no hits
||

|"responseHeader":{ "status":0, "QTime":2, "params":{ "q":"*Rod’s*", 
"debugQuery":"on", "_":"1582315262594"}}, 
"response":{"numFound":0,"start":0,"docs":[] }, "debug":{ 
"rawquerystring":"*Rod’s*", "querystring":"*Rod’s*", 
"parsedquery":"_text_:*rod’s*", "parsedquery_toString":"_text_:*rod’s*", 
"explain":{}, "QParser":"LuceneQParser", ... |







On 2/21/2020 11:52 AM, Erick Erickson wrote:

Why do you say “…that are now in the index as rod’s”? You have 
WordDelimiterGraphFilterFactory, which breaks things up. When I put your field 
definition in the schema and use the analysis page, turns “rod’s” into  the 
following 4 tokens:

rod’s
rods
rod
s

And querying on field:”*Rod’s*” works just fine. I’m using 8.x, and when I add 
“&debug=query” to the URL, I see:
{
"responseHeader": {
"status": 0, "QTime": 10, "params": {
"q": "eoe:\"*Rod's*\"", "debug": "query"
}
}, "response": {
"numFound": 1, "start": 0, "docs": [
{
"id": "1", "eoe": "Rod's", "_version_": 1659176849231577088
}
]
}, "debug": {
"rawquerystring": "eoe:\"*Rod's*\"", "querystring": "eoe:\"*Rod's*\"", "parsedquery": "SynonymQuery(Synonym(eoe:*rod's* 
eoe:rod))", "parsedquery_toString": "Synonym(eoe:*rod's* eoe:rod)", "QParser": "LuceneQParser"
}
}

What do you see?

Best,
Erick


On Feb 21, 2020, at 12:57 PM, Mike Phillips  
wrote:

Rod’s  finds fields Rod's and Rod’s that are now in the index as rod's

but *Rod’s* finds nothing because the index now only contains rod's





NLP with solrcloud

2020-02-21 Thread Sankar Panda
Can any one help me how opennlp pretrained models are configured   with
solrcloud.

Thanks
Sankar Panda


How to monitor the performance of the SolrCloud cluster in real time

2020-02-21 Thread Adonis Ling
Hi team,

Our team is using Solr as a complementary full text search service for our
NoSQL database and I'm building the monitor system for Solr.

After I read the related section (Performance Statistics Reference) in
reference guide, I realized the requestTimes metrics are collected since
the Solr core was first created. Is it possible to monitor the requests
(count or latency) of a collection in real time?

I think it should reset the related metrics periodically. Are there some
configurations to do this?

-- 
Adonis