Re: Limiting random results set with facets.

2020-05-12 Thread ART GALLERY
check out the videos on this website TROO.TUBE don't be such a
sheep/zombie/loser/NPC. Much love!
https://troo.tube/videos/watch/aaa64864-52ee-4201-922f-41300032f219

On Tue, May 12, 2020 at 9:38 PM David Lukowski  wrote:
>
> Thanks Srijan,  2 queries is exactly the route I started going today.
>
> Query 1:
> http://mysolr-node:8080/solr/M2_content/select
> ?q=({!terms f='permissionFilterId'}10,49 AND docBody:(lucky))
> &start=0
> &rows=100
> &fq=channelId:(2 1 3 78 34 35 7 72)
> &fq=date:([* TO 2020-05-12T03:59:59.999Z])
> &hl=false
> &fl=id
> &wt=json
> &sort=random_123456 desc
>
>
> Query 2:
> http://mysolr-node:8080/solr/M2_content/select
> ?q= id:(12345 2345 3456 4567...)
> &start=0
> &rows=30
> &facet=true
> &facet.field=channelId
> &f.channelId.facet.limit=10
> &f.channelId.facet.mincount=1
> &hl=false
> &fl=id, text, users
> &wt=json
> &sort=date desc
>
> Working well so far, but still not ideal.
>
> Thanks for the assist,
>
> David
>
> On Tue, May 12, 2020 at 7:31 PM Srijan  wrote:
>
> > I see what you mean now. You could use two queries - first would return 100
> > randomly sorted docs (no faceting) and the second with fq that includes the
> > ids of the returned 100 docs + faceting.
> >
> > On Tue, May 12, 2020 at 1:29 PM David Lukowski 
> > wrote:
> >
> > > Thanks for the offer of help, this doesn't really seem like what I'm
> > > looking for though, but I could be misunderstanding.  I'll try to state
> > it
> > > more clearly and include the query.
> > >
> > >
> > > -- This will give me back all the documents that have "lucky" in them in
> > > RANDOM sorted order.
> > >
> > > http://mysolr-node:8080/solr/M2_content/select
> > > ?q=({!terms f='permissionFilterId'}10,49 AND docBody:(lucky))
> > > &start=0
> > > &rows=0
> > > &fq=channelId:(2 1 3 78 34 35 7 72)
> > > &fq=date:([* TO 2020-05-12T03:59:59.999Z])
> > > &facet=true
> > > &facet.field=channelId
> > > &f.channelId.facet.limit=10
> > > &f.channelId.facet.mincount=1
> > > &hl=false
> > > &fl=id
> > > &wt=json
> > > &sort=random_123456 desc
> > >
> > >   The issue is that I only want 100 random results.  Sure, I could limit
> > > the results returned to the first 100 by specifying &rows=100, but the
> > > facets would match the query totals and not the rows returned totals.
> > >
> > > RESULTS I HAVE:
> > > "response":{"numFound":377895,"start":0,"docs":[]
> > >   },
> > >   "facet_counts":{
> > > "facet_queries":{},
> > > "facet_fields":{
> > >   "documentType":[
> > > "78",374015,
> > > "3",3021,
> > > "2",736,
> > > "1",41,
> > > "34",41,
> > > "35",32,
> > > "72",8,
> > > "7",1]},
> > >
> > >
> > > RESULTS I WANT:
> > > "response":{"numFound":100,"start":0,"docs":[]
> > >   },
> > >   "facet_counts":{
> > > "facet_queries":{},
> > > "facet_fields":{
> > >   "documentType":[
> > > "78",68,
> > > "3",22,
> > > "2",10]},
> > >
> > > How would I formulate the above query to give me a specific number of
> > > random results with the correct facet counts?
> > >
> > > Thanks for looking,
> > > David
> > >
> > > On Mon, May 11, 2020 at 2:09 PM Srijan  wrote:
> > >
> > > > If you can tag your filter query, you can exclude it when faceting.
> > Your
> > > > results will honor the filter query and you will get the N results
> > back,
> > > > and since faceting will exclude the filter, it will still give you
> > facet
> > > > count for the base query.
> > > >
> > > >
> > > >
> > >
> > https://lucene.apache.org/solr/guide/8_5/faceting.html#tagging-and-excluding-filters
> > > >
> > > >
> > > > On Mon, May 11, 2020 at 3:36 PM David Lukowski <
> > david.lukow...@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > I'm looking for a way if possible to run a query with random results,
> > > > where
> > > > > I limit the number of results I want back, yet still have the facets
> > > > > accurately reflect the results I'm searching.
> > > > >
> > > > > When I run a search I use a filter query to randomize the results
> > based
> > > > on
> > > > > a modulo of a random seed. This returns a results set with the
> > > associated
> > > > > facets for each documentType.
> > > > >
> > > > > "response":{"numFound":377895,"start":0,"docs":[]
> > > > >   },
> > > > >   "facet_counts":{
> > > > > "facet_queries":{},
> > > > > "facet_fields":{
> > > > >   "documentType":[
> > > > > "78",374015,
> > > > > "3",3021,
> > > > > "2",736,
> > > > > "1",41,
> > > > > "34",41,
> > > > > "35",32,
> > > > > "72",8,
> > > > > "7",1]},
> > > > >
> > > > > How do I limit the number of results returned to N and have the
> > facets
> > > > > accurately reflect the number of messages?  I cannot simply say
> > rows=N
> > > > > because the facets will always reflect the total numFound and not the
> > > > > limited results set I'm looking for.
> > > > >
> > > >
> > >
> >


Re: Solr currency function and asymmetric rates

2020-05-12 Thread ART GALLERY
check out the videos on this website TROO.TUBE don't be such a
sheep/zombie/loser/NPC. Much love!
https://troo.tube/videos/watch/aaa64864-52ee-4201-922f-41300032f219

On Tue, May 12, 2020 at 5:39 PM Murray Johnston
 wrote:
>
> I have a question / potential bug.  The currency function created in 
> https://issues.apache.org/jira/browse/SOLR-4138 first converts the field to 
> the default currency before then converting to the currency requested as part 
> of the function.  When dealing with asymmetric rates, that leads to incorrect 
> conversions.  Is this intended?  If not, is it required that 
> CurrencyFieldType.getValueSource convert to default currency?
>
>
> Thanks,
>
>
> -Murray


Re: Solr 8.5.1 query timeAllowed exceeded throws exception

2020-05-12 Thread ART GALLERY
check out the videos on this website TROO.TUBE don't be such a
sheep/zombie/loser/NPC. Much love!
https://troo.tube/videos/watch/aaa64864-52ee-4201-922f-41300032f219

On Tue, May 12, 2020 at 5:37 PM Phill Campbell
 wrote:
>
> Upon examining the Solr source code it appears that it was unable to even 
> make a connection in the time allowed.
> While the error message was a bit confusing, I do understand what it means.
>
>
> > On May 12, 2020, at 2:08 PM, Phill Campbell  
> > wrote:
> >
> >
> >
> > org.apache.solr.client.solrj.SolrServerException: Time allowed to handle 
> > this request exceeded:…
> >   at 
> > org.apache.solr.client.solrj.impl.LBSolrClient.request(LBSolrClient.java:345)
> >   at 
> > org.apache.solr.client.solrj.impl.BaseCloudSolrClient.sendRequest(BaseCloudSolrClient.java:1143)
> >   at 
> > org.apache.solr.client.solrj.impl.BaseCloudSolrClient.requestWithRetryOnStaleState(BaseCloudSolrClient.java:906)
> >   at 
> > org.apache.solr.client.solrj.impl.BaseCloudSolrClient.request(BaseCloudSolrClient.java:838)
> >   at 
> > org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:211)
> >   at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:1035)
> > ...
> >   at javax.swing.SwingWorker$1.call(SwingWorker.java:295)
> >   at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266)
> >   at java.util.concurrent.FutureTask.run(FutureTask.java)
> >   at javax.swing.SwingWorker.run(SwingWorker.java:334)
> >   at 
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> >   at 
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> >   at java.lang.Thread.run(Thread.java:748)
> > Caused by: 
> > org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
> > from server at http://10.156.112.50:10001/solr/BTS: 
> > java.lang.NullPointerException
> >
> >   at 
> > org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:665)
> >   at 
> > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:265)
> >   at 
> > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:248)
> >   at 
> > org.apache.solr.client.solrj.impl.LBSolrClient.doRequest(LBSolrClient.java:368)
> >   at 
> > org.apache.solr.client.solrj.impl.LBSolrClient.request(LBSolrClient.java:296)
> >
> >
> > The timeAllowed is set to 8 seconds. I am using a StopWatch to verify that 
> > the round trip was greater than 8 seconds.
> >
> > Documentation states:
> >
> > timeAllowed Parameter
> > This parameter specifies the amount of time, in milliseconds, allowed for a 
> > search to complete. If this time expires before the search is complete, any 
> > partial results will be returned, but values such as numFound, facet 
> > counts, and result stats may not be accurate for the entire result set. In 
> > case of expiration, if omitHeader isn’t set to true the response header 
> > contains a special flag called partialResults.
> >
> > I do not believe I should be getting an exception.
> >
> > I am load testing so I am intentionally putting pressure on the system.
> >
> > Is this the correct behavior to throw an exception?
> >
> > Regards.
>


Re: How to add MoreLikeThis MLT handler in Solr Cloud

2020-05-12 Thread ART GALLERY
check out the videos on this website TROO.TUBE don't be such a
sheep/zombie/loser/NPC. Much love!
https://troo.tube/videos/watch/aaa64864-52ee-4201-922f-41300032f219

On Tue, May 12, 2020 at 12:59 PM Vignan Malyala  wrote:
>
> Anyone knows how to add mlt handler in solr cloud?
>
> On Tue, May 12, 2020 at 2:21 PM Vignan Malyala  wrote:
>
> > How to add mlt handler in Solr Cloud?
> >
> > There is very limited documentation on this. Using search component with
> > mlt=true doesn't include all configurations like boosting and mlt filters.
> > Also the results with filters don't seem to work.
> > Adding mlt handler seem better, but how to add in solr cloud.
> > In standalone solr its easy to add mlt handler which we did, but what
> > about solr cloud?
> >
> > Thanks in advance!
> > Regards,
> > Sai Vignan M
> >


Re: Integrate highlighting data within main search results

2020-05-12 Thread ART GALLERY
check out the videos on this website TROO.TUBE don't be such a
sheep/zombie/loser/NPC. Much love!
https://troo.tube/videos/watch/aaa64864-52ee-4201-922f-41300032f219

On Tue, May 12, 2020 at 11:35 PM Kamal Kishore Aggarwal
 wrote:
>
> any update on this guys
>
> On Wed, May 6, 2020 at 3:39 PM Kamal Kishore Aggarwal 
> wrote:
>
> > Hi,
> >
> > I am using highlighting feature in solr 8.3 with default method. With
> > current behaviour, main search results and highlighted results are shown in
> > different blocks. Is there a way we can implemented highlighting within the
> > search main results, without having to return extra block for highlighting?
> >
> > I believe that due to performance factor(like default limit values for
> > hl.maxAnalyzedChars, hl.snippets, hl.fragsize) that highlight is returned
> > as separate component. But, if someone has written custom component to
> > integrate both, please share the steps. Also, please share the performance
> > of it.
> >
> > Regards
> >
> > Kamal Kishore
> >


Re:

2020-05-12 Thread ART GALLERY
check out the videos on this website TROO.TUBE don't be such a
sheep/zombie/loser/NPC. Much love!
https://troo.tube/videos/watch/aaa64864-52ee-4201-922f-41300032f219

On Tue, May 12, 2020 at 9:16 AM Nikolai Efseaff  wrote:
>
>
>
>
> Any tax advice in this e-mail should be considered in the context of the tax 
> services we are providing to you. Preliminary tax advice should not be relied 
> upon and may be insufficient for penalty protection.
> 
> The information contained in this message may be privileged and confidential 
> and protected from disclosure. If the reader of this message is not the 
> intended recipient, or an employee or agent responsible for delivering this 
> message to the intended recipient, you are hereby notified that any 
> dissemination, distribution or copying of this communication is strictly 
> prohibited. If you have received this communication in error, please notify 
> us immediately by replying to the message and deleting it from your computer.
>
> Notice required by law: This e-mail may constitute an advertisement or 
> solicitation under U.S. law, if its primary purpose is to advertise or 
> promote a commercial product or service. You may choose not to receive 
> advertising and promotional messages from Ernst & Young LLP (except for EY 
> Client Portal and the ey.com website, which track e-mail preferences through 
> a separate process) at this e-mail address by forwarding this message to 
> no-more-m...@ey.com. If you do so, the sender of this message will be 
> notified promptly. Our principal postal address is 5 Times Square, New York, 
> NY 10036. Thank you. Ernst & Young LLP


Re: velocity reponse writer javascript execution problem

2020-05-12 Thread ART GALLERY
check out the videos on this website TROO.TUBE don't be such a
sheep/zombie/loser/NPC. Much love!
https://troo.tube/videos/watch/aaa64864-52ee-4201-922f-41300032f219

On Tue, May 12, 2020 at 7:32 AM Serkan KAZANCI  wrote:
>
> Hi,
>
>
>
> This is my first mail to the group. Nice to be here.
>
>
>
> 4 years ago, I have set up a solr search interface using velocity response
> writer templates. (Solr version : 5.3.1)
>
>
>
> I want to re-do the interface with new solr version(8.5.1). After some
> tests, I have realized that velocity response writer templates do not run
> JavaScript codes. Even the auto-complete feature at Solr's techproducts demo
> is not working, which also uses velocity response writer templates and
> relies on JavaScript for that function.
>
>
>
> Is it due to security vulnerability I have heard couple of years ago? Is
> there a work around so that I can use velocity templates that executes
> JavaScript? Or is it only me having this problem.
>
>
>
> Thanks for the replies in advance.
>
>
>
> Serkan,
>
>
>


Re: 8.5.1 LogReplayer extremely slow

2020-05-12 Thread ART GALLERY
check out the videos on this website TROO.TUBE don't be such a
sheep/zombie/loser/NPC. Much love!
https://troo.tube/videos/watch/aaa64864-52ee-4201-922f-41300032f219

On Tue, May 12, 2020 at 6:23 AM Markus Jelsma
 wrote:
>
> I found the bastard, it was a freaky document that skrewed Solr over, 
> indexing kept failing, passing documents between replica's times out, 
> documents get reindexed and so the document (and others) end up in the 
> transaction log (many times) and are eligible for reindexing. Reindexing and 
> replaying of the transaction log both fail on that specific document. 
> Recovery was also not possible due to time outs.
>
> Although the original document [1] is a mess, Solr should have no 
> difficulties ingesting it [2]. Any ideas what is going on? Ticket, if so, 
> about what exactly? For the record, this is PreAnalyzed.
>
> Many thanks,
> Markus
>
> [1] https://pastebin.com/1NqBdYCM
> [2] https://www.openindex.io/export/do_not_index.xml
>
> -Original message-
> > From:Markus Jelsma 
> > Sent: Monday 11th May 2020 18:43
> > To: solr-user 
> > Subject: 8.5.1 LogReplayer extremely slow
> >
> > Hello,
> >
> > Our main Solr text search collection broke down last night (search was 
> > still working fine), every indexing action timed out with the Solr master 
> > spending most of its time in Java regex. One shard has only one replica 
> > left for queries and it stays like that. I have copied both shard's leader 
> > to local to see what is going on.
> >
> > One shard is fine but the other has a replica with has about 600MB of data 
> > to replay and it is extremely slow. Using the VisualVM sampler i find that 
> > the replayer is also spending almost all time in dealing with Java regex 
> > (stack trace below). Is this to be expected? And what is it actually doing? 
> > Where do the TokenFilters come from?
> >
> > I had a old but clean collection on the same cluster and started indexing 
> > to it to see what is going on but it too timed out due to Java regex. This 
> > is weird, because locally i have no problem indexing a million records in a 
> > 8.5.1 collection, and the broken down cluster has been running fine for 
> > over a month.
> >
> > A note, this index uses PreAnalyzedField, so i would expect no analysis or 
> > whatsoever, certainly no regex.
> >
> > Thanks,
> > Markus
> >
> > "replayUpdatesExecutor-3-thread-1-processing-n:127.0.1.1:8983_solr 
> > x:sitesearch_shard2_replica_t2 c:sitesearch s:shard2 r:core_node4" #222 
> > prio=5 os_prio=0 cpu=239207,44ms elapsed=239,50s tid=0x7ffde0057000 
> > nid=0x24f5 runnable  [0x7ffeedd0f000]
> >java.lang.Thread.State: RUNNABLE
> > at 
> > java.util.regex.Pattern$GroupTail.match(java.base@11.0.7/Pattern.java:4863)
> > at 
> > java.util.regex.Pattern$CharPropertyGreedy.match(java.base@11.0.7/Pattern.java:4306)
> > at 
> > java.util.regex.Pattern$GroupHead.match(java.base@11.0.7/Pattern.java:4804)
> > at 
> > java.util.regex.Pattern$CharPropertyGreedy.match(java.base@11.0.7/Pattern.java:4306)
> > at 
> > java.util.regex.Pattern$Start.match(java.base@11.0.7/Pattern.java:3619)
> > at 
> > java.util.regex.Matcher.search(java.base@11.0.7/Matcher.java:1729)
> > at java.util.regex.Matcher.find(java.base@11.0.7/Matcher.java:746)
> > at 
> > org.apache.lucene.analysis.pattern.PatternReplaceFilter.incrementToken(PatternReplaceFilter.java:71)
> > at 
> > org.apache.lucene.analysis.miscellaneous.TrimFilter.incrementToken(TrimFilter.java:42)
> > at 
> > org.apache.lucene.analysis.FilteringTokenFilter.incrementToken(FilteringTokenFilter.java:49)
> > at 
> > org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:812)
> > at 
> > org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:442)
> > at 
> > org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:406)
> > at 
> > org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:250)
> > at 
> > org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:495)
> > at 
> > org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1594)
> > at 
> > org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1586)
> > at 
> > org.apache.solr.update.DirectUpdateHandler2.updateDocOrDocValues(DirectUpdateHandler2.java:979)
> > at 
> > org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:345)
> > at 
> > org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:292)
> > at 
> > org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:239)
> > at 
> > org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:76)
> > at 
> > org.apache.solr.update.

Re: using aliases in topic stream

2020-05-13 Thread ART GALLERY
check out the videos on this website TROO.TUBE don't be such a
sheep/zombie/loser/NPC. Much love!
https://troo.tube/videos/watch/aaa64864-52ee-4201-922f-41300032f219

On Wed, May 13, 2020 at 11:32 AM Nightingale, Jonathan A (US)
 wrote:
>
> Hi Everyone,
>
> I'm trying to run this stream and I get the following error
>
> topic(topics,collection1, 
> q="classes:GXP/INDEX",fl="uuid",id="feed-8",initialCheckpoint=0,checkpointEvery=-1)
>
> {
>   "result-set": {
> "docs": [
>   {
> "EXCEPTION": "Slices not found for collection1",
> "EOF": true,
> "RESPONSE_TIME": 6
>   }
> ]
>   }
> }
>
> "collection1" is an alias. I can search using the alias perfectly fine. In 
> fact the search stream operation works fine with the alias. It's just this 
> topic one I've seen so far. Does anyone know why this is?
>
> Thanks!
> Jonathan Nightingale
>


Re:

2020-05-13 Thread ART GALLERY
looks like you like having your rights taken away!!!

On Wed, May 13, 2020 at 1:52 AM Bernd Fehling
 wrote:
>
> Dear list and mailer admins,
>
> it looks like the mailer of this list needs some care .
> Can someone please set this "ART GALLERY" on a black list?
>
> Thank you,
> Bernd
>
>
> Am 13.05.20 um 08:47 schrieb ART GALLERY:
> > check out the videos on this website TROO.TUBE don't be such a
> > sheep/zombie/loser/NPC. Much love!
> > https://troo.tube/videos/watch/aaa64864-52ee-4201-922f-41300032f219
> >
> > On Tue, May 12, 2020 at 9:16 AM Nikolai Efseaff  wrote:
> >>
> >>
> >>
> >>
> >> Any tax advice in this e-mail should be considered in the context of the 
> >> tax services we are providing to you. Preliminary tax advice should not be 
> >> relied upon and may be insufficient for penalty protection.
> >> 
> >> The information contained in this message may be privileged and 
> >> confidential and protected from disclosure. If the reader of this message 
> >> is not the intended recipient, or an employee or agent responsible for 
> >> delivering this message to the intended recipient, you are hereby notified 
> >> that any dissemination, distribution or copying of this communication is 
> >> strictly prohibited. If you have received this communication in error, 
> >> please notify us immediately by replying to the message and deleting it 
> >> from your computer.
> >>
> >> Notice required by law: This e-mail may constitute an advertisement or 
> >> solicitation under U.S. law, if its primary purpose is to advertise or 
> >> promote a commercial product or service. You may choose not to receive 
> >> advertising and promotional messages from Ernst & Young LLP (except for EY 
> >> Client Portal and the ey.com website, which track e-mail preferences 
> >> through a separate process) at this e-mail address by forwarding this 
> >> message to no-more-m...@ey.com. If you do so, the sender of this message 
> >> will be notified promptly. Our principal postal address is 5 Times Square, 
> >> New York, NY 10036. Thank you. Ernst & Young LLP


Re: unique key accross collections within datacenter

2020-05-13 Thread ART GALLERY
check out the videos on this website TROO.TUBE don't be such a
sheep/zombie/loser/NPC. Much love!
https://troo.tube/videos/watch/aaa64864-52ee-4201-922f-41300032f219

On Wed, May 13, 2020 at 7:24 AM Bernd Fehling
 wrote:
>
> Thanks Eric for your answer.
>
> I was thinking to complex and seeing problems which are not there.
>
> I have your second scenario. The first huge collection still remains
> and will grow further while the second will start with same schema but
> content from a new source. Sure I could also load the content
> from the new source into the first huge collection but I want to
> have source, loading, maintenance handling separated.
> May be I also start the new collection with a new instance.
>
> Regards
> Bernd
>
> Am 13.05.20 um 13:40 schrieb Erick Erickson:
> > So a doc in your new collection is expected to supersede a doc
> > with the same ID in the old one, right?
> >
> > What I’d do is delete the IDs from my old collection as they were added to
> > the new one, there’s not much use in keeping both if you always want
> > the new one.
> >
> > Let’s assume you do this, the next issue is making sure all of your docs in
> > the new collection are deleted from the old one, and your process will
> > inevitably have a hiccough or two. You could periodically use streaming to
> > produce a list of IDs common to both collections, and have a cleanup
> > process you occasionally ran to make up for any glitches in the normal
> > delete-from-the-old-collection process, see:
> > https://lucene.apache.org/solr/guide/6_6/stream-decorators.html#stream-decorators
> >
> > If that’s not the case, then having the same id in the different collections
> > doesn’t matter. Solr doesn’t use the ID for combining results, just routing 
> > and
> > then updating.
> >
> > This is illustrated by the fact that, through user error, you can even get 
> > the same
> > document repeated in a result set if it gets indexed to two different 
> > shards.
> >
> > And if neither of those are on target, what about “handling” unique IDs 
> > across
> > two collections do you think might go wrong?
> >
> > Best,
> > Erick
> >
> >> On May 13, 2020, at 4:26 AM, Bernd Fehling 
> >>  wrote:
> >>
> >> Dear list,
> >>
> >> in my SolrCloud 6.6 I have a huge collection and now I will get
> >> much more data from a different source to be indexed.
> >> So I'm thinking about a new collection and combine both, the existing
> >> one and the new one with an alias.
> >>
> >> But how to handle the unique key accross collections within a datacenter?
> >> Is it at all possible?
> >>
> >> I don't see any problems with add, update and delete of documents because
> >> these operations are not using the alias.
> >>
> >> But searching accross collections with alias and then fetching documents
> >> by id from the result may lead to results where the id is in both 
> >> collections?
> >>
> >> I have no idea, but there are SolrClouds with a lot of collections out 
> >> there.
> >> How do they handle uniqueness accross collections within a datacenter?
> >>
> >> Regards
> >> Bernd
> >


Re: multiple sort terms in json facets sorting

2020-05-13 Thread ART GALLERY
check out the videos on this website TROO.TUBE don't be such a
sheep/zombie/loser/NPC. Much love!
https://troo.tube/videos/watch/aaa64864-52ee-4201-922f-41300032f219

On Wed, May 13, 2020 at 7:56 AM Saurabh Sharma
 wrote:
>
> Hi All,
>
> I am trying to use two sorting criteria with json facets but found that
> only one of them is working and other one is not getting honoured.
>
> sort:{ sortingScore:desc , x:desc} , Here only sortingScore is being used
> by solr and parameter x is ignored . Is there any way using which I can use
> multiple sort parameters?
>
>
> json.facet={p:{type : terms,field : P_G,*sort:{ sortingScore:desc ,
> x:desc}*,limit:10,offset:10,facet:{sortingScore
> :
> "avg(product(C,M,M))",x:"max(N)",stats:{type:terms,limit:100,field:CONF,facet:{minPrice:"min(N)",minArea:"min(S)"}
>
> Thanks
> Saurabh Sharma


Re: Slow Query in Solr 8.3.0

2020-05-13 Thread ART GALLERY
check out the videos on this website TROO.TUBE don't be such a
sheep/zombie/loser/NPC. Much love!
https://troo.tube/videos/watch/aaa64864-52ee-4201-922f-41300032f219

On Wed, May 13, 2020 at 10:30 AM Houston Putman  wrote:
>
> Hey Vishal,
>
> That's quite a large query. But I think the problem might be completely
> unrelated. Are any of the return fields multi-valued? There was a major bug
> (SOLR-14013 ) in
> returning multi-valued fields that caused trivial queries to take around 30
> seconds or more. You should be able to fix this by upgrading to 8.4, which
> has the fix included, if you are in fact using multivalued fields.
>
> - Houston Putman
>
> On Wed, May 13, 2020 at 7:02 AM vishal patel 
> wrote:
>
> > I am upgrading Solr 6.1.0 to Solr 8.3.0.
> >
> > I have created 2 shards and one form collection in Solr 8.3.0. My schema
> > file of form collection is same as Solr 6.1.0. Also Solr config file is
> > same.
> >
> > I am executing below URL
> > http://193.268.300.145:8983/solr/forms/select?q=(+(doctype:Apps AND
> > ((allowed_roles:(2229130)) AND ((is_draft:true AND ((distribution_list:24
> > OR draft_form_all_org_allowed_roles:(2229130)) OR
> > (draft_form_own_org_allowed_roles:(2229130) AND
> > msg_distribution_org_list:13))) OR (is_draft:false AND is_public:true AND
> > (is_controller_based:false OR msg_type_id:(1 3))) OR ((allowed_users:24) OR
> > (is_draft:false AND (is_public:false OR is_controller_based:true) AND
> > ((distribution_list:24 OR private_form_all_org_allowed_roles:(2229130)) OR
> > (private_form_own_org_allowed_roles:(2229130) AND
> > msg_distribution_org_list:13)) AND appType:2 AND
> > is_formtype_active:true -status_id:(23) AND (is_draft:false OR
> > msg_type_id:1) AND instance_group_id:(2289710) AND project_id:(2079453) AND
> > locationId:(9696 9694))) AND +msg_id:(10519539^3835 10519540^3834
> > 10523575^3833 10523576^3832 10523578^3831 10525740^3830 10527812^3829
> > 10528779^3828 10528780^3827 10530141^3826 10530142^3825 10530143^3824
> > 10530147^3823 10525725^3822 10525716^3821 10526659^3820 10526661^3819
> > 10529460^3818 10529461^3817 10530338^3816 10531331^3815 10521069^3814
> > 10514233^3813 10514235^3812 10514236^3811 10514818^3810 10518287^3809
> > 10518289^3808 10518292^3807 10518291^3806 10514823^3805 3117146^3804
> > 3120673^3803 10116612^3802 10117480^3801 10117641^3800 10117810^3799
> > 10119703^3798 10128983^3797 10229892^3796 10232225^3795 10233021^3794
> > 10237712^3793 10237744^3792 10239494^3791 10239499^3790 10239500^3789
> > 10243233^3788 10243234^3787 10305946^3786 10305977^3785 10305982^3784
> > 10306994^3783 10306997^3782 10306999^3781 10308101^3780 10308772^3779
> > 10308804^3778 10309685^3777 10309820^3776 10309821^3775 10310633^3774
> > 10310634^3773 10311207^3772 10311210^3771 10352946^3770 10352947^3769
> > 10353164^3768 10353171^3767 10353176^3766 10353956^3765 10354791^3764
> > 10354792^3763 10354794^3762 10354798^3761 10355333^3760 10355353^3759
> > 10355406^3758 10355995^3757 10356008^3756 10358933^3755 10358935^3754
> > 10359420^3753 10359426^3752 10421223^3751 10421224^3750 10421934^3749
> > 10422864^3748 10422865^3747 10426444^3746 10426446^3745 10428470^3744
> > 10430357^3743 10430366^3742 10431990^3741 10490422^3740 10490430^3739
> > 10490742^3738 10490745^3737 10491552^3736 10492344^3735 10492964^3734
> > 10493965^3733 10494657^3732 10494660^3731 3121708^3730 3122606^3729
> > 3124424^3728 3125051^3727 3125782^3726 3125793^3725 3127499^3724
> > 3127600^3723 3127615^3722 3129535^3721 3131364^3720 3131377^3719
> > 3132062^3718 3133668^3717 3134414^3716 10131445^3715 10133209^3714
> > 10135640^3713 10136424^3712 10137129^3711 10137168^3710 10244270^3709
> > 10244324^3708 10244326^3707 10248136^3706 10248137^3705 10248138^3704
> > 10258595^3703 10259267^3702 10259966^3701 10259967^3700 10260700^3699
> > 10260701^3698 10262790^3697 10264386^3696 10264536^3695 10264961^3694
> > 10265098^3693 10265099^3692 10311754^3691 10312638^3690 10312639^3689
> > 10312640^3688 10313909^3687 10313910^3686 10314024^3685 10314659^3684
> > 10314691^3683 10314696^3682 10315395^3681 10315426^3680 10359451^3679
> > 10359835^3678 10361077^3677 10361085^3676 10361277^3675 10361289^3674
> > 10361824^3673 10362431^3672 10362434^3671 10363618^3670 10365316^3669
> > 10365322^3668 10365327^3667 10433969^3666 10435946^3665 10435963^3664
> > 10437695^3663 10437697^3662 10437698^3661 10437703^3660 10438761^3659
> > 10438763^3658 10439721^3657 10439728^3656 10496118^3655 10496281^3654
> > 10496289^3653 10496294^3652 10496296^3651 10496570^3650 10496582^3649
> > 10496626^3648 10497518^3647 10497522^3646 10497530^3645 10498717^3644
> > 10498722^3643 10499254^3642 10499256^3641 10500374^3640 10500382^3639
> > 10507062^3638 10507061^3637 3134424^3636 3135192^3635 3135284^3634
> > 3135293^3633 3139529^3632 3140767^3631 3141525^3630 3141681^3629
> > 3141690^3628 3142537^3627 3143664^3626 3144581^3625 3145417^3624
> > 3145862^3

Re: Creating 100000 dynamic fields in solr

2020-05-13 Thread ART GALLERY
check out the videos on this website TROO.TUBE don't be such a
sheep/zombie/loser/NPC. Much love!
https://troo.tube/videos/watch/aaa64864-52ee-4201-922f-41300032f219

On Tue, May 12, 2020 at 3:35 AM Jan Høydahl  wrote:
>
> Note that my example is simplified. Both the parent and child docs need to 
> have globally unique ‘id’ fields, and any field names used both in parent and 
> child needs to have same fieldType in schema.
> There was some plans to automatically generate IDs for child documents if 
> they do not exist, but I think that is not yet done. Perhaps you can add the 
> UUID processor for this purpose?
>
>  
>id
>  
> Jan
>
> > 12. mai 2020 kl. 07:03 skrev Vignan Malyala :
> >
> > Thanks Jan! This helps a lot!
> >
> > Sai Vignan Malyala
> >
> > On Mon, May 11, 2020 at 5:07 PM Jan Høydahl  wrote:
> >
> >> Sounds like you are looking for parent/child docs here, see
> >> https://lucene.apache.org/solr/guide/8_5/indexing-nested-documents.html
> >>
> >> {
> >>"type": "user",
> >>"name": "user1",
> >>"products": [
> >>{ "id": "prod_A", "cost": 50},
> >>{ "id": "prod_B", "cost": 200},
> >>{ "id": "prod_D", "cost": 25}
> >>]
> >> }
> >>
> >> This will index 4 documents - one user document and three product-cost
> >> child documents.
> >>
> >> You can then search the child docs and return matching parents with e.g.
> >> q=*:*&fq={!parent which="type:user"}((id:prod_A AND cost:[50 TO 100]) OR
> >> (id:prod_D AND cost:[0 TO 40]))&fl=[child]
> >>
> >> Hope this helps.
> >>
> >> Jan
> >>
> >>> 11. mai 2020 kl. 11:35 skrev Vignan Malyala :
> >>>
> >>> I have around 1M products used by my clients.
> >>> Client need a filter of these 1M products by their cost filters.
> >>>
> >>> Just like:
> >>> User1 has 5 products (A,B,C,D,E)
> >>> User2 has 3 products (D,E,F)
> >>> User3 has 10 products (A,B,C,H,I,J,K,L,M,N,O)
> >>>
> >>> ...every customer has different sets.
> >>>
> >>> Now they want to search users by filter of product costs:
> >>> Product_A_cost :  50 TO 100
> >>> Product_D_cost :  0 TO 40
> >>>
> >>> it should return all the users who use products in this filter range.
> >>>
> >>> As I have 1M products, do I need to create dynamic fields for all users
> >>> with filed names as Product_A_cost and product_B_cost. etc to make a
> >>> search by them? If I should, then I haveto create 1M dynamic fields
> >>> Or is there any other way?
> >>>
> >>> Hope I'm clear here!
> >>>
> >>>
> >>> On Mon, May 11, 2020 at 1:47 PM Jan Høydahl 
> >> wrote:
> >>>
>  Sounds like an anti pattern. Can you explain what search problem you are
>  trying to solve with this many unique fields?
> 
>  Jan Høydahl
> 
> > 11. mai 2020 kl. 07:51 skrev Vignan Malyala :
> >
> > Hi
> > Is it good idea to create 10 dynamic fields of time pint in solr?
> > I have that many fields to search on actually which come upon based on
> > users.
> >
> > Thanks in advance!
> > And I'm using Solr Cloud in real-time.
> >
> > Regards,
> > Sai Vignan M
> 
> >>
> >>
>


Re: Unbalanced shard requests

2020-05-13 Thread ART GALLERY
check out the videos on this website TROO.TUBE don't be such a
sheep/zombie/loser/NPC. Much love!
https://troo.tube/videos/watch/aaa64864-52ee-4201-922f-41300032f219

On Mon, May 11, 2020 at 6:50 PM Wei  wrote:
>
> Thanks Michael!  Yes in each shard I have 10 Tlog replicas,  no other type
> of replicas, and each Tlog replica is an individual solr instance on its
> own physical machine.  In the jira you mentioned 'when "last place matches"
> == "first place matches" – e.g. when shards.preference specified matches
> *all* available replicas'.   My setting is
> shards.preference=replica.location:local,replica.type:TLOG,
> I also tried just shards.preference=replica.location:local and it still has
> the issue. Can you explain a bit more?
>
> On Mon, May 11, 2020 at 12:26 PM Michael Gibney 
> wrote:
>
> > FYI: https://issues.apache.org/jira/browse/SOLR-14471
> > Wei, assuming you have only TLOG replicas, your "last place" matches
> > (to which the random fallback ordering would not be applied -- see
> > above issue) would be the same as the "first place" matches selected
> > for executing distributed requests.
> >
> >
> > On Mon, May 11, 2020 at 1:49 PM Michael Gibney
> >  wrote:
> > >
> > > Wei, probably no need to answer my earlier questions; I think I see
> > > the problem here, and believe it is indeed a bug, introduced in 8.3.
> > > Will file an issue and submit a patch shortly.
> > > Michael
> > >
> > > On Mon, May 11, 2020 at 12:49 PM Michael Gibney
> > >  wrote:
> > > >
> > > > Hi Wei,
> > > >
> > > > In considering this problem, I'm stumbling a bit on terminology
> > > > (particularly, where you mention "nodes", I think you're referring to
> > > > "replicas"?). Could you confirm that you have 10 TLOG replicas per
> > > > shard, for each of 6 shards? How many *nodes* (i.e., running solr
> > > > server instances) do you have, and what is the replica placement like
> > > > across those nodes? What, if any, non-TLOG replicas do you have per
> > > > shard (not that it's necessarily relevant, but just to get a complete
> > > > picture of the situation)?
> > > >
> > > > If you're able without too much trouble, can you determine what the
> > > > behavior is like on Solr 8.3? (there were different changes introduced
> > > > to potentially relevant code in 8.3 and 8.4, and knowing whether the
> > > > behavior you're observing manifests on 8.3 would help narrow down
> > > > where to look for an explanation).
> > > >
> > > > Michael
> > > >
> > > > On Fri, May 8, 2020 at 7:34 PM Wei  wrote:
> > > > >
> > > > > Update:  after I remove the shards.preference parameter from
> > > > > solrconfig.xml,  issue is gone and internal shard requests are now
> > > > > balanced. The same parameter works fine with solr 7.6.  Still not
> > sure of
> > > > > the root cause, but I observed a strange coincidence: the nodes that
> > are
> > > > > most frequently picked for shard requests are the first node in each
> > shard
> > > > > returned from the CLUSTERSTATUS api.  Seems something wrong with
> > shuffling
> > > > > equally compared nodes when shards.preference is set.  Will report
> > back if
> > > > > I find more.
> > > > >
> > > > > On Mon, Apr 27, 2020 at 5:59 PM Wei  wrote:
> > > > >
> > > > > > Hi Eric,
> > > > > >
> > > > > > I am measuring the number of shard requests, and it's for query
> > only, no
> > > > > > indexing requests.  I have an external load balancer and see each
> > node
> > > > > > received about the equal number of external queries. However for
> > the
> > > > > > internal shard queries,  the distribution is uneven:6 nodes
> > (one in
> > > > > > each shard,  some of them are leaders and some are non-leaders )
> > gets about
> > > > > > 80% of the shard requests, the other 54 nodes gets about 20% of
> > the shard
> > > > > > requests.   I checked a few other parameters set:
> > > > > >
> > > > > > -Dsolr.disable.shardsWhitelist=true
> > > > > > shards.preference=replica.location:local,replica.type:TLOG
> > > > > >
> > > > > > Nothing seems to cause the strange behavior.  Any suggestions how
> > to
> > > > > > debug this?
> > > > > >
> > > > > > -Wei
> > > > > >
> > > > > >
> > > > > > On Mon, Apr 27, 2020 at 5:42 PM Erick Erickson <
> > erickerick...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > >> Wei:
> > > > > >>
> > > > > >> How are you measuring utilization here? The number of incoming
> > requests
> > > > > >> or CPU?
> > > > > >>
> > > > > >> The leader for each shard are certainly handling all of the
> > indexing
> > > > > >> requests since they’re TLOG replicas, so that’s one thing that
> > might
> > > > > >> skewing your measurements.
> > > > > >>
> > > > > >> Best,
> > > > > >> Erick
> > > > > >>
> > > > > >> > On Apr 27, 2020, at 7:13 PM, Wei  wrote:
> > > > > >> >
> > > > > >> > Hi everyone,
> > > > > >> >
> > > > > >> > I have a strange issue after upgrade from 7.6.0 to 8.4.1. My
> > cloud has 6
> > > > > >> > shards with 10 TLOG replicas each shard.  After upgrade I
> > noticed t

Re: solr.WordDelimiterFilterFactory

2020-05-13 Thread ART GALLERY
check out the videos on this website TROO.TUBE don't be such a
sheep/zombie/loser/NPC. Much love!
https://troo.tube/videos/watch/aaa64864-52ee-4201-922f-41300032f219

On Sat, May 9, 2020 at 4:29 PM Steven White  wrote:
>
> Never mind, I found the answer.   WordDelimiterFilterFactory is
> deprecated and is replaced by WordDelimiterGraphFilterFactory.
>
> Steve
>
> On Sat, May 9, 2020 at 5:22 PM Steven White  wrote:
>
> > Hi everyone,
> >
> > Why I cannot find the filter solr.WordDelimiterFilterFactory at
> > https://lucene.apache.org/solr/guide/8_5/index.html but it is at
> > https://cwiki.apache.org/confluence/display/SOLR/AnalyzersTokenizersTokenFilters
> > ?
> >
> > Thanks
> >
> > Steve
> >


Re: Solr 8.1.5 Postlogs - Basic Authentication Error

2020-05-13 Thread ART GALLERY
check out the videos on this website TROO.TUBE don't be such a
sheep/zombie/loser/NPC. Much love!
https://troo.tube/videos/watch/aaa64864-52ee-4201-922f-41300032f219

On Mon, May 11, 2020 at 4:03 PM Waheed, Imran
 wrote:
>
> Is there a way to use bin/postllogs with basic authentication on? I am 
> getting error if do not give username/password
>
> bin/postlogs http://localhost:8983/solr/logs 
> server/logs/ server/logs
>
> Exception in thread "main" 
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
> from server at http://localhost:8983/solr/logs: Expected mime type 
> application/octet-stream but got text/html. 
> 
> 
> Error 401 require authentication
> 
> HTTP ERROR 401 require authentication
> 
> URI:/solr/logs/update
> STATUS:401
> MESSAGE:require authentication
> SERVLET:default
> 
>
> I get a different error if I try
> bin/postlogs -u user:@password http://localhost:8983/solr/logs server/logs/
>
>
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
> details.
> Exception in thread "main" java.lang.NullPointerException
> at 
> org.apache.solr.util.SolrLogPostTool.gatherFiles(SolrLogPostTool.java:127)
> at 
> org.apache.solr.util.SolrLogPostTool.main(SolrLogPostTool.java:65)
>
> thank you,
> Imran
>
>
> The information in this e-mail is intended only for the person to whom it is
> addressed. If you believe this e-mail was sent to you in error and the e-mail
> contains patient information, please contact the Partners Compliance HelpLine 
> at
> http://www.partners.org/complianceline . If the e-mail was sent to you in 
> error
> but does not contain patient information, please contact the sender and 
> properly
> dispose of the e-mail.


Re: Add replica fails with following exception

2020-05-13 Thread ART GALLERY
check out the videos on this website TROO.TUBE don't be such a
sheep/zombie/loser/NPC. Much love!
https://troo.tube/videos/watch/aaa64864-52ee-4201-922f-41300032f219

On Fri, May 8, 2020 at 7:37 AM Vishal Vaibhav  wrote:
>
> Do addreplicas work only when leader is elected ?
>
> On Fri, 8 May 2020 at 5:43 PM, Erick Erickson 
> wrote:
>
> > My guess is that "$solrHost” is being resolved differently when
> > executed from the shell .vs. your script.
> >
> > Best,
> > Erick
> >
> > > On May 8, 2020, at 4:41 AM, Vishal Vaibhav  wrote:
> > >
> > > I  have a script that creates a collection whenever vm is up and then add
> > > the replicas. However ADDREPLICAS fails everytime  with following
> > > exception. However when i manually hit the following curl it works all
> > > fine. The same line is in the script.
> > > "
> > >
> > http://localhost:8983/solr/admin/collections?action=ADDREPLICA&collection=rules&shard=shard1&node=
> > > $solrHost:8983_solr"
> > >
> > >
> > > 2020-05-08 08:20:55.621 ERROR (qtp2048537720-17) [c:rules   ]
> > > o.a.s.s.HttpSolrCall null:org.apache.solr.common.SolrException:
> > ADDREPLICA
> > > failed to create replica
> > > at
> > >
> > org.apache.solr.client.solrj.SolrResponse.getException(SolrResponse.java:63)
> > > at
> > >
> > org.apache.solr.handler.admin.CollectionsHandler.invokeAction(CollectionsHandler.java:281)
> > > at
> > >
> > org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:253)
> > > at
> > >
> > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)
> > > at
> > org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:839)
> > > at
> > >
> > org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:805)
> > > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:558)
> > > at
> > >
> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419)
> > > at
> > >
> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351)
> > > at
> > >
> > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)
> > > at
> > >
> > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
> > > at
> > >
> > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
> > > at
> > >
> > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
> > > at
> > >
> > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
> > > at
> > >
> > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
> > > at
> > >
> > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1711)
> > > at
> > >
> > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
> > > at
> > >
> > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1347)
> > > at
> > >
> > org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
> > > at
> > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
> > > at
> > >
> > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1678)
> > > at
> > >
> > org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
> > > at
> > >
> > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1249)
> > > at
> > >
> > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
> > > at
> > >
> > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)
> > > at
> > >
> > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:152)
> > > at
> > >
> > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
> > > at
> > >
> > org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
> > > at
> > >
> > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
> > > at org.eclipse.jetty.server.Server.handle(Server.java:505)
> > > at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:370)
> > > at
> > >
> > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:267)
> > > at
> > > org.eclipse.jetty.io
> > .AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
> > > at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
> > > at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
> > > at
> > >
> > org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
> > > at
> > >
> > org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
> > > at
> > >
> > org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
> > > at
> > >
> > org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
> > > at
> > >
> > org.eclipse

Re: Synonym Graph Filter - how to preserve original

2020-05-13 Thread ART GALLERY
check out the videos on this website TROO.TUBE don't be such a
sheep/zombie/loser/NPC. Much love!
https://troo.tube/videos/watch/aaa64864-52ee-4201-922f-41300032f219

On Fri, May 8, 2020 at 12:20 PM Erick Erickson  wrote:
>
> Depends on how you define the synonym file.
>
> See: https://lucene.apache.org/solr/guide/6_6/filter-descriptions.html
>
> In particular:
>
> There are two ways to specify synonym mappings:
>
> • A comma-separated list of words. If the token matches any of the 
> words, then all the words in the list are substituted, which will include the 
> original token.
>
> • Two comma-separated lists of words with the symbol "⇒" between 
> them. If the token matches any word on the left, then the list on the right 
> is substituted. The original token will not be included unless it is also in 
> the list on the right.
>
> > On May 8, 2020, at 1:10 PM, Jae Joo  wrote:
> >
> > putting original term in the synonym list works.
> >
> >
> > On Fri, May 8, 2020 at 1:05 PM atin janki  wrote:
> >
> >> Hi Jae,
> >>
> >> Do try to explain your problem with an example. Also share how you are
> >> writing the synonyms file.
> >> Best Regards,
> >> Atin Janki
> >>
> >>
> >> On Fri, May 8, 2020 at 6:14 PM Jae Joo  wrote:
> >>
> >>> In 8.3, There should be the way to preserve the original terms, but could
> >>> not find it.
> >>>
> >>> Does anyone know?
> >>>
> >>> Thanks,
> >>>
> >>> Jae
> >>>
> >>
>


Re: Solr 8.2.0 Collection API Backup Failing with Azure fileshare

2020-05-13 Thread ART GALLERY
check out the videos on this website TROO.TUBE don't be such a
sheep/zombie/loser/NPC. Much love!
https://troo.tube/videos/watch/aaa64864-52ee-4201-922f-41300032f219

On Thu, May 7, 2020 at 4:27 PM tbarkley29  wrote:
>
> Hello,
>
> We are running Solr 8.2.0 in cloud mode with a 3 node cluster in Azure
> kubernetes. We have an Azure fileshare mounted in order to perform backups
> using the collection API e.g., /mnt/azure
>
> The collection I'm using to test has 1 shard and 3 replicas with about 17GB
> worth of data per replica. While testing backups using the following it
> seems to be working for a bit (there is a snapshot directory for the shard
> created in the fileshare) but then it soon fails (Note before testing each
> time I've ensure the directory, TestLargeCollection, no longer exists);
>
> e.g.,
> /solr/admin/collections?action=BACKUP&name=TestLargeCollection&location=/mnt/azure&collection=TestLargeCollection&async=1000
>
> /solr/admin/collections?action=REQUESTSTATUS&requestid=1000&wt=xml
>
> 
> 
>
> 
>   0
>   14
> 
> org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
> The backup directory already exists:
> file:///mnt/azure/TestLargeCollection/
> 
>   The backup directory already exists:
> file:///mnt/azure/TestLargeCollection/
>   400
> 
> 
>   failed
>   found [1000] in failed tasks
> 
> 
>
> The directory contents looks like this after it fails (notice the missing
> backup.properties file);
>
> solr@solr-0:/mnt/azure/TestLargeCollection$ ls
>snapshot.shard1  zk_backup
>
> FWIW If I do the same with a smaller collection (only about 1GB) it seems to
> work fine;
>
> solr@solr-0:/mnt/azure/TestSmallCollection$ ls
> backup.properties  snapshot.shard1  zk_backup
>
> Any information would be greatly appreciated.
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: facets & docValues

2020-05-13 Thread ART GALLERY
check out the videos on this website TROO.TUBE don't be such a
sheep/zombie/loser/NPC. Much love!
https://troo.tube/videos/watch/aaa64864-52ee-4201-922f-41300032f219

On Thu, May 7, 2020 at 8:49 PM Joel Bernstein  wrote:
>
> You can be pretty sure that adding static warming queries will improve your
> performance following softcommits. But, opening new searchers every 2
> seconds may be too fast to allow for warming so you may need to adjust. As
> a general rule you cannot open searchers faster than you can warm them.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Tue, May 5, 2020 at 5:54 PM Revas  wrote:
>
> > Hi joel, No, we have not, we have softCommit requirement of 2 secs.
> >
> > On Tue, May 5, 2020 at 3:31 PM Joel Bernstein  wrote:
> >
> > > Have you configured static warming queries for the facets? This will warm
> > > the cache structures for the facet fields. You just want to make sure you
> > > commits are spaced far enough apart that the warming completes before a
> > new
> > > searcher starts warming.
> > >
> > >
> > > Joel Bernstein
> > > http://joelsolr.blogspot.com/
> > >
> > >
> > > On Mon, May 4, 2020 at 10:27 AM Revas  wrote:
> > >
> > > > Hi Erick, Thanks for the explanation and advise. With facet queries,
> > does
> > > > doc Values help at all ?
> > > >
> > > > 1) indexed=true, docValues=true =>  all facets
> > > >
> > > > 2)
> > > >
> > > >-  indexed=true , docValues=true => only for subfacets
> > > >- inexed=true, docValues=false=> facet query
> > > >- docValues=true, indexed=false=> term facets
> > > >
> > > >
> > > >
> > > > In case of 1 above, => Indexing slowed considerably. over all facet
> > > > performance improved many fold
> > > > In case of  2=>  over all performance showed only slight
> > > > improvement
> > > >
> > > > Does that mean turning on docValues even for facet query helps improve
> > > the
> > > > performance,  fetching from docValues for facet query is faster than
> > > > fetching from stored fields ?
> > > >
> > > > Thanks
> > > >
> > > >
> > > > On Thu, Apr 16, 2020 at 1:50 PM Erick Erickson <
> > erickerick...@gmail.com>
> > > > wrote:
> > > >
> > > > > DocValues should help when faceting over fields, i.e.
> > facet.field=blah.
> > > > >
> > > > > I would expect docValues to help with sub facets and, but don’t know
> > > > > the code well enough to say definitely one way or the other.
> > > > >
> > > > > The empirical approach would be to set “uninvertible=true” (Solr 7.6)
> > > and
> > > > > turn docValues off. What that means is that if any operation tries to
> > > > > uninvert
> > > > > the index on the Java heap, you’ll get an exception like:
> > > > > "can not sort on a field w/o docValues unless it is indexed=true
> > > > > uninvertible=true and the type supports Uninversion:”
> > > > >
> > > > > See SOLR-12962
> > > > >
> > > > > Speed is only one issue. The entire point of docValues is to not
> > > > “uninvert”
> > > > > the field on the heap. This used to lead to very significant memory
> > > > > pressure. So when turning docValues off, you run the risk of
> > > > > reverting back to the old behavior and having unexpected memory
> > > > > consumption, not to mention slowdowns when the uninversion
> > > > > takes place.
> > > > >
> > > > > Also, unless your documents are very large, this is a tiny corpus. It
> > > can
> > > > > be
> > > > > quite hard to get realistic numbers, the signal gets lost in the
> > noise.
> > > > >
> > > > > You should only shard when your individual query times exceed your
> > > > > requirement. Say you have a 95%tile requirement of 1 second response
> > > > time.
> > > > >
> > > > > Let’s further say that you can meet that requirement with 50
> > > > > queries/second,
> > > > > but when you get to 75 queries/second your response time exceeds your
> > > > > requirements. Do NOT shard at this point. Add another replica
> > instead.
> > > > > Sharding adds inevitable overhead and should only be considered when
> > > > > you can’t get adequate response time even under fairly light query
> > > loads
> > > > > as a general rule.
> > > > >
> > > > > Best,
> > > > > Erick
> > > > >
> > > > > > On Apr 16, 2020, at 12:08 PM, Revas  wrote:
> > > > > >
> > > > > > Hi Erick, You are correct, we have only about 1.8M documents so far
> > > and
> > > > > > turning on the indexing on the facet fields helped improve the
> > > timings
> > > > of
> > > > > > the facet query a lot which has (sub facets and facet queries). So
> > > does
> > > > > > docValues help at all for sub facets and facet query, our tests
> > > > > > revealed further query time improvement when we turned off the
> > > > docValues.
> > > > > > is that the right approach?
> > > > > >
> > > > > > Currently we have only 1 shard and  we are thinking of scaling by
> > > > > > increasing the number of shards when we see a deterioration on
> > query
> > > > > time.
> > > > > > Any suggestions?
> > > > > >
> > > > > > Thanks.
> > > > > >
> > >

Re: Can perform partial searches without fieldname in solr

2020-05-13 Thread ART GALLERY
check out the videos on this website TROO.TUBE don't be such a
sheep/zombie/loser/NPC. Much love!
https://troo.tube/videos/watch/aaa64864-52ee-4201-922f-41300032f219

On Thu, May 7, 2020 at 3:36 PM erars+jonathan.cook
 wrote:
>
> Hello,
>
> I am trying to perform a partial search on a field in solr. my_id:
> ABC_00123
>
> I would like to search for 123 and see this item. I cannot get it to
> work without using the my_id field in the query.
>
> In my schema.xml I have put:
>
> positionIncrementGap="100">
>
>
>
> maxGramSize="50" />
>
>
>
>
>
>
>
> 
> Then (I'm not sure this is necessary):
>
>  stored="false"/>
> I also have:
>
> 
> Finally:
>
> 
> For the query this works: my_id: 223
>
> But 223 on it's own does not. I have the feeling it has to do with this
> copyField definition.
>
> The only way, I could get it to work was to change:
>
>  path="/update/**,/query,/select,/tvrh,/elevate,/spell,/browse">
>  
>_text_ngrm_
>  
> 
> But this breaks all my other default searches. Is there not some way to
> add like:
>
>  path="/update/**,/query,/select,/tvrh,/elevate,/spell,/browse">
>  
>_text_
>_text_ngrm_
>  
> 
>
>
> I understand the lucene query parser (which is the default option) can
> only search one default field, controlled by the df parameter. But the
> dismax and edismax query parsers can search multiple fields.
>
> But how can this be configured and is it likely to change the behaviour
> of everything else? I understand I could configure one of your search
> handlers to use edismax and then tell it to search any combination of
> fields. But now could I do this.
>
> Thanks for any help


Re: Minimum Match Query

2020-05-13 Thread ART GALLERY
check out the videos on this website TROO.TUBE don't be such a
sheep/zombie/loser/NPC. Much love!
https://troo.tube/videos/watch/aaa64864-52ee-4201-922f-41300032f219

On Thu, May 7, 2020 at 2:11 PM Russell Bahr  wrote:
>
> Thank you Emir, we will give this a try.
>
> Russ
>
>
> On Thu, May 7, 2020 at 12:55 AM Emir Arnautović <
> emir.arnauto...@sematext.com> wrote:
>
> > Hi Russel,
> > You are right about mm - it is about min term matches. Frequencies are
> > usually used to determine score. But you can also filter on number of
> > matches using function queries:
> > fq={!frange l=3}sum(termfreq(field, ‘barker’), termfreq(field, ‘jones’),
> > termfreq(field, ‘baker’))
> >
> > It is not perfect and you will need to handle phrases at index time to be
> > able to match phrases. Or you can combine it with some other query to
> > filter out unwanted results and use this approach to make sure frequencies
> > match.
> >
> > HTH,
> > Emir
> > --
> > Monitoring - Log Management - Alerting - Anomaly Detection
> > Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> >
> >
> >
> > > On 7 May 2020, at 03:12, Russell Bahr  wrote:
> > >
> > > Hi Atita,
> > > We actually looked into that and it does not appear to match based on a
> > > single phrase, but says that it must match a certain percentage of the
> > > listed phrases.  What we need is something that would match based on a
> > > single phrase appearing a minimum number of times i.e. "Barker" minimum
> > > number of matches =3 where "Barker" showed up in a document 3 or more
> > times.
> > >
> > > Am I missing something there or am I reading this wrong?
> > > The mm (Minimum Should Match) Parameter When processing queries,
> > > Lucene/Solr recognizes three types of clauses: mandatory, prohibited, and
> > > "optional" (also known as "should" clauses). By default, all words or
> > > phrases specified in the q parameter are treated as "optional" clauses
> > > unless they are preceded by a "+" or a "-". When dealing with these
> > > "optional" clauses, the mm parameter makes it possible to say that a
> > > certain minimum number of those clauses must match. The DisMax query
> > parser
> > > offers great flexibility in how the minimum number can be specified.
> > >
> > > We did try doing a query and the results that came back were reflective
> > > only of minimum number of phrases matching as opposed to a phrase being
> > > mentioned a minimum number of times.
> > >
> > > For example, If I say query for “Google” with mm=100 it doesn’t find
> > > Articles with 100 mentions of Google.  It is used for multiple phrase
> > > queries.  Example against our servers:
> > >
> > > query = "Barker" OR "Jones" OR “Baker” mm=1 103,896 results
> > > query = "Barker" OR "Jones" OR “Baker” mm=2 1200 results
> > > query = "Barker" OR "Jones" OR “Baker” mm=3 16 results
> > >
> > > Please let me know.
> > > Thank you,
> > > Russ
> > >
> > >
> > >
> > > On Wed, May 6, 2020 at 10:13 AM Atita Arora 
> > wrote:
> > >
> > >> Hi,
> > >>
> > >> Did you happen to look into :
> > >>
> > >>
> > >>
> > https://lucene.apache.org/solr/guide/6_6/the-dismax-query-parser.html#TheDisMaxQueryParser-Themm_MinimumShouldMatch_Parameter
> > >>
> > >> I believe 6.5.1 has it too.
> > >>
> > >> I hope it should help.
> > >>
> > >>
> > >> On Wed, May 6, 2020 at 6:46 PM Russell Bahr  wrote:
> > >>
> > >>> Hi SOLR team,
> > >>> I have been asked if there is a way to return results only if those
> > >>> results match a minimum number of times present in the query.
> > >>> ( queries looking for a minimum amount of mentions for a particular
> > >>> term/phrase. Ie must be mentioned 'x' amount of times to return
> > results).
> > >>> Is this something that is possible using SOLR 6.5.1?  Is this something
> > >>> that would require a newer version of SOLR?
> > >>> Any help on this would be appreciated.
> > >>> Thank you,
> > >>> Russ
> > >>>
> > >>
> >
> >


Re: Dead solr nodes in k8

2020-05-13 Thread ART GALLERY
check out the videos on this website TROO.TUBE don't be such a
sheep/zombie/loser/NPC. Much love!
https://troo.tube/videos/watch/aaa64864-52ee-4201-922f-41300032f219

On Thu, May 7, 2020 at 11:58 AM Vishal Vaibhav  wrote:
>
> Thanks for the suggestion. I have made solr host as the pod dns . Solr
> operator is great but needs lots of in depth of k8 expertise. I did a helm
> chart provided by lucidworks. Couple of quick queries what I am observing..
>
> Whenever my pod comes up , I create a collection
> And then I use ADDREPLiCA.
>
> Collection creating takes node set as empty , max shards on a node as 1.
>
> So what I have observed is that ADD replica commands fails most of the time
> with 500 error
>
> On Mon, 4 May 2020 at 2:44 AM, Jan Høydahl  wrote:
>
> > Solr is built to assume nodes/pods with stable host names and indices.
> > Please consider deploying solr with
> > https://github.com/bloomberg/solr-operator which takes care of this for
> > you.
> >
> > Jan Høydahl
> >
> > > 3. mai 2020 kl. 19:25 skrev Vishal Vaibhav :
> > >
> > > 
> > >
> > > Hi I could finally deploy solr cloud 8 using zk ensemble on Kubernetes.
> > > However I am seeing something strange.
> > >
> > > The first thing since k8 keeps rescheduling the pods, the node list gets
> > > filled up like this even the node are dead . Also I am using solr
> > exporter,
> > > so solr exporter keeps trying to hit the dead replicas . Can I get rid of
> > > them somehow .
> > > 
> >


Re: Combined virtual conference announced with content on Solr, search & relevance

2020-05-13 Thread ART GALLERY
check out the videos on this website TROO.TUBE don't be such a
sheep/zombie/loser/NPC. Much love!
https://troo.tube/videos/watch/aaa64864-52ee-4201-922f-41300032f219

On Thu, May 7, 2020 at 10:32 AM Charlie Hull  wrote:
>
> The teams behind Berlin Buzzwords ,
> Haystack  the search relevance conference,
> and MICES  the ecommerce search event are happy to
> announce a week of virtual talks, panel discussions, workshops and
> training sessions covering themes of search, scale, store!
>
> To be held between *7th-12th June 2020* , this collaboration will bring
> together the best of the planned sessions from three annual conferences
> postponed or cancelled due to COVID-19 and make them available across
> the world. We aim to support our three communities and to bring them
> together to share knowledge, expertise and experiences. Read more here.
> 
>
> Tickets are on sale now at https://berlinbuzzwords.de/tickets - see you
> there (virtually) we hope.
>
> Cheers
>
> Charlie
>
> --
>
> Charlie Hull
> OpenSource Connections, previously Flax
>
> tel/fax: +44 (0)8700 118334
> mobile:  +44 (0)7767 825828
> web: www.o19s.com
>


Re: Dynamic schema failure for child docs not using "_childDocuments_" key

2020-05-13 Thread ART GALLERY
check out the videos on this website TROO.TUBE don't be such a
sheep/zombie/loser/NPC. Much love!
https://troo.tube/videos/watch/aaa64864-52ee-4201-922f-41300032f219

On Tue, May 5, 2020 at 8:32 PM mmb1234  wrote:
>
> I am running into a exception where creating child docs fails unless the
> field already exists in the schema (stacktrace is at the bottom of this
> post). My solr is v8.5.1 running in standard/non-cloud mode.
>
> $> curl -X POST -H 'Content-Type: application/json'
> 'http://localhost:8983/solr/mycore/update' --data-binary '[{
>   "id": "3dae27db6ee43e878b9d0e8e",
>   "phone": "+1 (123) 456-7890",
>   "myChildDocuments": [{
> "id": "3baf27db6ee43387849d0e8e",
>  "enabled": false
>}]
> }]'
>
> {
>   "responseHeader":{
> "status":400,
> "QTime":285},
>   "error":{
> "metadata":[
>   "error-class","org.apache.solr.common.SolrException",
>   "root-error-class","org.apache.solr.common.SolrException"],
> "msg":"ERROR: [doc=3baf27db6ee43387849d0e8e] unknown field 'enabled'",
> "code":400}}
>
>
> However using "_childDocuments_" key, it succeeds and child doc fields get
> created in the managed-schema
>
> $> curl -X POST -H 'Content-Type: application/json'
> 'http://localhost:8983/solr/mycore/update' --data-binary '[{
>   "id": "6dae27db6ee43e878b9d0e8e",
>   "phone": "+1 (123) 456-7890",
>   "_childDocuments_": [{
> "id": "6baf27db6ee43387849d0e8e",
>  "enabled": false
>}]
> }]'
>
> {
>   "responseHeader":{
> "status":0,
> "QTime":285}}
>
>
> == stacktrace ==
> 2020-05-06 01:01:26.762 ERROR (qtp1569435561-19) [   x:standalone]
> o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: ERROR:
> [doc=3baf27db6ee43387849d0e8e] unknown field 'enabled'
> at
> org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:226)
> at
> org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:100)
> at
> org.apache.solr.update.AddUpdateCommand.lambda$null$0(AddUpdateCommand.java:224)
> at
> java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
> at
> java.base/java.util.ArrayList$ArrayListSpliterator.tryAdvance(ArrayList.java:1631)
> at
> java.base/java.util.stream.StreamSpliterators$WrappingSpliterator.lambda$initPartialTraversalState$0(StreamSpliterators.java:294)
> at
> java.base/java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.fillBuffer(StreamSpliterators.java:206)
> at
> java.base/java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.doAdvance(StreamSpliterators.java:161)
> at
> java.base/java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(StreamSpliterators.java:300)
> at 
> java.base/java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681)
> at
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:282)
> at
> org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:451)
> at
> org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1284)
> at
> org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1277)
> at
> org.apache.solr.update.DirectUpdateHandler2.updateDocOrDocValues(DirectUpdateHandler2.java:975)
> at
> org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:345)
> at
> org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:292)
> at
> org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:239)
> at
> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:76)
> at
> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
> at
> org.apache.solr.update.processor.NestedUpdateProcessorFactory$NestedUpdateProcessor.processAdd(NestedUpdateProcessorFactory.java:79)
> at
> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
> at
> org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:259)
> at
> org.apache.solr.update.processor.DistributedUpdateProcessor.doVersionAdd(DistributedUpdateProcessor.java:489)
> at
> org.apache.solr.update.processor.DistributedUpdateProcessor.lambda$versionAdd$0(DistributedUpdateProcessor.java:339)
> at 
> org.apache.solr.update.VersionBucket.runWithLock(VersionBucket.java:50)
> at
> org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:339)
> at
> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:225)
>  

Re: Data Import Handler - Concurrent Entity Importing

2020-05-13 Thread ART GALLERY
check out the videos on this website TROO.TUBE don't be such a
sheep/zombie/loser/NPC. Much love!
https://troo.tube/videos/watch/aaa64864-52ee-4201-922f-41300032f219

On Tue, May 5, 2020 at 1:58 PM Mikhail Khludnev  wrote:
>
> Hello, James.
>
> DataImportHandler has a lock preventing concurrent execution. If you need
> to run several imports in parallel at the same core, you need to duplicate
> "/dataimport" handlers definition in solrconfig.xml. Thus, you can run them
> in parallel. Regarding schema, I prefer the latter but mileage may vary.
>
> --
> Mikhail.
>
> On Tue, May 5, 2020 at 6:39 PM James Greene 
> wrote:
>
> > Hello, I'm new to the group here so please excuse me if I do not have the
> > etiquette down yet.
> >
> > Is it possible to have multiple entities (customer configurable, up to 40
> > atm) in a DIH configuration to be imported at once?  Right now I have
> > multiple root entities in my configuration but they get indexes
> > sequentially and this means the entities that are last are always delayed
> > hitting the index.
> >
> > I'm trying to migrate an existing setup (solr 6.6) that utilizes a
> > different collection for each "entity type" into a single collection (solr
> > 8.4) to get around some of the hurdles faced when needing to have searches
> > that require multiple block joins and currently does not work going cross
> > core.
> >
> > I'm also wondering if it is better to fully qualify a field name or use two
> > different fields for performing the "same" search.  i.e:
> >
> >
> > {
> > type_A_status; Active
> > type_A_value: Test
> > }
> > vs
> > {
> > type: A
> > status: Active
> > value: Test
> > }
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev


Re: gzip compression solr 8.4.1

2020-05-13 Thread ART GALLERY
check out the videos on this website TROO.TUBE don't be such a
sheep/zombie/loser/NPC. Much love!
https://troo.tube/videos/watch/aaa64864-52ee-4201-922f-41300032f219

On Tue, May 5, 2020 at 3:33 AM Johannes Siegert
 wrote:
>
> Hi,
>
> We did further tests to see where the problem exactly is. These are our
> outcomes:
>
> The content-length is calculated correctly, a quick test with curl showed
> this.
> The problem is that the stream with the gzip data is not fully consumed and
> afterwards not closed.
>
> Using the debugger with a breakpoint at
> org/apache/solr/common/util/Utils.java:575 shows that it won't enter the
> function readFully((entity.getContent()) most likely due to how the gzip
> stream content is wrapped and extracted beforehand.
>
> On line org/apache/solr/common/util/Utils.java:582 the
> consumeQuietly(entity) should close the stream but does not because of a
> silent exception.
>
> This seems to be the same as it is described in
> https://issues.apache.org/jira/browse/SOLR-14457
>
> We saw that the problem happened also with correct GZIP responses from
> jetty. Not only with non-GZIP as described within the jira issue.
>
> Best,
>
> Johannes
>
> Am Do., 23. Apr. 2020 um 09:55 Uhr schrieb Johannes Siegert <
> johannes.sieg...@offerista.com>:
>
> > Hi,
> >
> > we want to use gzip-compression between our application and the solr
> > server.
> >
> > We use a standalone solr server version 8.4.1 and the prepackaged jetty as
> > application server.
> >
> > We have enabled the jetty gzip module by adding these two files:
> >
> > {path_to_solr}/server/modules/gzip.mod (see below the question)
> > {path_to_solr}/server/etc/jetty-gzip.xml (see below the question)
> >
> > Within the application we use a HttpSolrServer that is configured with
> > allowCompression=true.
> >
> > After we had released our application we saw that the number of
> > connections within the TCP-state CLOSE_WAIT rising up until the application
> > was not able to open new connections.
> >
> >
> > After a long debugging session we think the problem is that the header
> > "Content-Length" that is returned by the jetty is sometimes wrong when
> > gzip-compression is enabled.
> >
> > The solrj client uses a ContentLengthInputStream, that uses the header
> > "Content-Lenght" to detect if all data was received. But the InputStream
> > can not be fully consumed because the value of the header "Content-Lenght"
> > is higher than the actual content-length.
> >
> > Usually the method PoolingHttpClientConnectionManager.releaseConnection is
> > called after the InputStream was fully consumed. This give the connection
> > free to be reused or to be closed by the application.
> >
> > Due to the incorrect header "Content-Length" the
> > PoolingHttpClientConnectionManager.releaseConnection method is never called
> > and the connection stays active. After the connection-timeout of the jetty
> > is reached, it closes the connection from the server-side and the TCP-state
> > switches into CLOSE_WAIT. The client never closes the connection and so the
> > number of connections in use rises up.
> >
> >
> > Currently we try to configure the jetty gzip module to return no
> > "Content-Length" if gzip-compression was used. We hope that in this case
> > another InputStream implementation is used that uses the NULL-terminator to
> > see when the InputStream was fully consumed.
> >
> > Do you have any experiences with this problem or any suggestions for us?
> >
> > Thanks,
> >
> > Johannes
> >
> >
> > gzip.mod
> >
> > -
> >
> > DO NOT EDIT - See:
> > https://www.eclipse.org/jetty/documentation/current/startup-modules.html
> >
> > [description]
> > Enable GzipHandler for dynamic gzip compression
> > for the entire server.
> >
> > [tags]
> > handler
> >
> > [depend]
> > server
> >
> > [xml]
> > etc/jetty-gzip.xml
> >
> > [ini-template]
> > ## Minimum content length after which gzip is enabled
> > jetty.gzip.minGzipSize=2048
> >
> > ## Check whether a file with *.gz extension exists
> > jetty.gzip.checkGzExists=false
> >
> > ## Gzip compression level (-1 for default)
> > jetty.gzip.compressionLevel=-1
> >
> > ## User agents for which gzip is disabled
> > jetty.gzip.excludedUserAgent=.*MSIE.6\.0.*
> >
> > -
> >
> > jetty-gzip.xml
> >
> > -
> >
> > 
> >  > http://www.eclipse.org/jetty/configure_9_3.dtd";>
> >
> > 
> > 
> > 
> > 
> > 
> > 
> >
> > 
> > 
> > 
> >  > class="org.eclipse.jetty.server.handler.gzip.GzipHandler">
> > 
> >  > deprecated="gzip.minGzipSize" default="2048" />
> > 
> > 
> >  > deprecated="gzip.checkGzExists" default="false" />
> > 
> > 
> >  > deprecated="gzip.compressionLevel" default="-1" />
> > 
> > 

Re: Solrcloud Garbage Collection Suspension linked across nodes?

2020-05-13 Thread ART GALLERY
check out the videos on this website TROO.TUBE don't be such a
sheep/zombie/loser/NPC. Much love!
https://troo.tube/videos/watch/aaa64864-52ee-4201-922f-41300032f219

On Mon, May 4, 2020 at 5:43 PM Webster Homer
 wrote:
>
> My company has several Solrcloud environments. In our most active cloud we 
> are seeing outages that are related to GC pauses. We have about 10 
> collections of which 4 get a lot of traffic. The solrcloud consists of 4 
> nodes with 6 processors and 11Gb heap size (25Gb physical memory).
>
> I notice that the 4 nodes seem to do their garbage collection at almost the 
> same time. That seems strange to me. I would expect them to be more staggered.
>
> This morning we had a GC pause that caused problems . During that time our 
> application service was reporting "No live SolrServers available to handle 
> this request"
>
> Between 3:55 and 3:56 AM all 4 nodes were having some amount of garbage 
> collection pauses, for 2 of the nodes it was minor, for one it was 50%. For 3 
> nodes it lasted  until 3>57. However the node with the worst impact didn't 
> recover until 4am.
>
> How is it that all 4 nodes were in lock step doing GC? If they all are doing 
> GC at the same time it defeats the purpose of having redundant cloud servers.
> We just this weekend switched to use G1GC from CMS
>
> At this point in time we also saw that traffic to solr was not well 
> distributed. The application calls solr using CloudSolrClient which I thought 
> did its own load balancing. We saw 10X more traffic going to one solr node 
> that all the others, the we saw it start hitting another node. All solr 
> queries come from our application.
>
> During this period of time I saw only 1 error message in the solr log:
> ERROR (zkConnectionManagerCallback-8-thread-1) [   ] o.a.s.c.ZkController 
> There was a problem finding the leader in 
> zk:org.apache.solr.common.SolrException: Could not get leader props
>
> We are currently using Solr 7.7.2
> GC Tuning
> GC_TUNE="-XX:NewRatio=3 \
> -XX:SurvivorRatio=4 \
> -XX:TargetSurvivorRatio=90 \
> -XX:MaxTenuringThreshold=8 \
> -XX:+UseG1GC \
> -XX:MaxGCPauseMillis=250 \
> -XX:+ParallelRefProcEnabled"
>
>
>
>
> This message and any attachment are confidential and may be privileged or 
> otherwise protected from disclosure. If you are not the intended recipient, 
> you must not copy this message or attachment or disclose the contents to any 
> other person. If you have received this transmission in error, please notify 
> the sender immediately and delete the message and any attachment from your 
> system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not 
> accept liability for any omissions or errors in this message which may arise 
> as a result of E-Mail-transmission or for damages resulting from any 
> unauthorized changes of the content of this message and any attachment 
> thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not 
> guarantee that this message is free of viruses and does not accept liability 
> for any damages caused by any virus transmitted therewith.
>
>
>
> Click http://www.merckgroup.com/disclaimer to access the German, French, 
> Spanish and Portuguese versions of this disclaimer.


Re: Indexing Korean

2020-05-13 Thread ART GALLERY
check out the videos on this website TROO.TUBE don't be such a
sheep/zombie/loser/NPC. Much love!
https://troo.tube/videos/watch/aaa64864-52ee-4201-922f-41300032f219

On Mon, May 4, 2020 at 8:33 AM Audrey Lorberfeld -
audrey.lorberf...@ibm.com  wrote:
>
> Oh wow, I had no idea this existed. Thank you so much!
>
> Best,
> Audrey
>
> On 5/1/20, 12:58 PM, "Markus Jelsma"  wrote:
>
> Hello,
>
> Although it is not mentioned in Solr's language analysis page in the 
> manual, Lucene has had support for Korean for quite a while now.
>
> 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_core_8-5F5-5F0_analyzers-2Dnori_index.html&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=SqDPKA-n_YGjJ4_W3yBTcA-esk2YjXReCnvgtETUuv8&s=GCBa9JGIjJgWrcahymeFn16-B_f9XyuoAA-hQapaIas&e=
>
> Regards,
> Markus
>
>
>
> -Original message-
> > From:Audrey Lorberfeld - audrey.lorberf...@ibm.com 
> 
> > Sent: Friday 1st May 2020 17:34
> > To: solr-user@lucene.apache.org
> > Subject: Indexing Korean
> >
> >  Hi All,
> >
> > My team would like to index Korean, but it looks like Solr OOTB does 
> not have explicit support for Korean. If any of you have schema pipelines you 
> could share for your Korean documents, I would love to see them! I'm assuming 
> I would just use some combination of the OOTB CJK factories
> >
> > Best,
> > Audrey
> >
> >
>


Re: Solr 8.5.1 Using Port 10001 doesn't work in Dashboard

2020-05-13 Thread ART GALLERY
check out the videos on this website TROO.TUBE don't be such a
sheep/zombie/loser/NPC. Much love!
https://troo.tube/videos/watch/aaa64864-52ee-4201-922f-41300032f219

On Mon, May 4, 2020 at 10:18 AM Phill Campbell
 wrote:
>
> I installed PostMan and verified that the response from Solr is correct.
> I cleared cached images and files for Chrome and the problem is solved.
>
> > On May 1, 2020, at 3:42 PM, Sylvain James  wrote:
> >
> > Hi Phil,
> >
> > I encountered something similar recently, and after switched to Firefox,
> > all urls were fine.
> > May be a encoding side effect.
> > It seems to me that a new solr ui is in development. May be this issue will
> > be fixed for the release of this ui.
> >
> > Sylvain
> >
> >
> > Le ven. 1 mai 2020 à 22:52, Phill Campbell  > >
> > a écrit :
> >
> >> The browser is Chrome. I forgot to state that before.
> >> That got me to thinking and so I ran it from Fire Fox.
> >> Everything seems to be fine there!
> >>
> >> Interesting. Since this is my development environment I do not run any
> >> plugins on any of my browsers.
> >>
> >>> On May 1, 2020, at 2:41 PM, Phill Campbell 
> >> wrote:
> >>>
> >>> Today I installed Solr 8.5.1 to replace an 8.2.0 installation.
> >>> It is a clean install, not a migration, there was no data that I needed
> >> to keep.
> >>>
> >>> I run Solr (Solr Cloud Mode) on ports starting with 10001. I have been
> >> doing this since Solr 5x releases.
> >>>
> >>> In my experiment I have 1 shard with replication factor of 2.
> >>>
> >>> http://10.xxx.xxx.xxx:10001/solr/#/  >>>
> >>>
> >>> http://10.xxx.xxx.xxx:10002/solr/#/  >>>
> >>>
> >>> If I go to the “10001” instance the URL changes and is messed up and no
> >> matter which link in the dashboard I click it shows the same information.
> >>> So, use Solr is running, the dashboard comes up.
> >>>
> >>> The URL changes and looks like this:
> >>>
> >>> http://10.xxx.xxx.xxx:10001/solr/#!/#%2F
> >>  >> >
> >>>
> >>> However, on port 10002 it stays like this and show the proper UI in the
> >> dashboard:
> >>>
> >>> http://10.xxx.xxx.xxx:10002/solr/#/  >>>
> >>>
> >>> To make sure something wasn’t interfering with port 10001 I re-installed
> >> my previous Solr installation and it works fine.
> >>>
> >>> What is this “#!” (Hash bang) stuff in the URL?
> >>> How can I run on port 10001?
> >>>
> >>> Probably something obvious, but I just can’t see it.
> >>>
> >>> For every link from the dashboard:
> >>> :10001/solr/#!/#%2F~logging
> >>> :10001/solr/#!/#%2F~cloud
> >>> :10001/solr/#!/#%2F~collections
> >>> :10001/solr/#!/#%2F~java-properties
> >>> :10001/solr/#!/#%2F~threads
> >>> :10001/solr/#!/#%2F~cluster-suggestions
> >>>
> >>>
> >>>
> >>> From “10002” I see everything fine.
> >>> :10002/solr/#/~cloud
> >>>
> >>> Shows the following:
> >>>
> >>> Host
> >>> 10.xxx.xxx.xxx
> >>> Linux 3.10.0-1127.el7.x86_64, 2cpu
> >>> Uptime: unknown
> >>> Memory: 14.8Gb
> >>> File descriptors: 180/100
> >>> Disk: 49.1Gb used: 5%
> >>> Load: 0
> >>>
> >>> Node
> >>> 10001_solr
> >>> Uptime: 2h 10m
> >>> Java 1.8.0_222
> >>> Solr 8.5.1
> >>> ---
> >>> 10002_solr
> >>> Uptime: 2h 9m
> >>> Java 1.8.0_222
> >>> Solr 8.5.1
> >>>
> >>>
> >>> If I switch my starting port from 10001 to 10002 both instances work.
> >> (10002, and 10003)
> >>> If I switch my starting port from 10001 to 10101 both instances work.
> >> (10101, and 10102)
> >>>
> >>> Any help is appreciated.
>