Error when adding user for Solr Basic Authentication
Hi, When I try to add the user for the Solr Basic Authentication using the following method in curl curl --user user:password http://localhost:8983/solr/admin/authentication -H 'Content-type:application/json' -d '{ "set-user": {"tom" : "TomIsCool" , "harry":"HarrysSecret"}}' I get the following error: { "responseHeader":{ "status":400, "QTime":0}, "error":{ "metadata":[ "error-class","org.apache.solr.common.SolrException", "root-error-class","org.apache.solr.common.SolrException"], "msg":"No contentStream", "code":400}} curl: (3) [globbing] unmatched brace in column 1 枩]?V7`-{炘9叡 t肤 ,? E'qyT咐黣]儎;衷 鈛^W褹?curl: (3) [globbing] unmatched cl ose brace/bracket in column 13 What does this error means and how should we resolve it? I'm using SolrCloud on Solr 6.4.2. Regards, Edwin
streaming expressions parallel merge
Hi, With solr streaming expressions is there a way to parallel merge a number of solr streams. Or a way to apply the parallel function to something like this? merge( search(collection1, ...), search(collection2, ...), ... on="id asc") ) Cheers, Damien.
The resolution of the pf parameter with dismax and dismax is not consistent
Hi: I have a question about edismax and dismax. I'm using SOLR 6.3.0. Both types of query statements are the same, but the results is differ. Now there is such a document: pj_title:word1 word2 word3 1. edismax q=word1 word2&qf=pj_title&pf=pj_title&defType=edismax 2. dismax q=word1 word2&qf=pj_title&pf=pj_title&defType=dismax The second case can be completely matched, the first case can not. Why is that ? Maybe you did not understand what I said, but I tried my best. Thank you.
Re: Filtering results by minimum relevancy score
Hi Koji, strictly talking about TF-IDF ( and BM25 which is an evolution of that approach) I would say it is a weighting function/numerical statistic that can be used for ranking functions and is based on probabilistic concepts ( such as IDF) but it is not a probabilistic function[1]. Indeed a BM25 score for a term is not assured to be 0http://math.stackexchange.com/questions/610165/prove-that-the-bm25-scoring-function-is-probabilistic - --- Alessandro Benedetti Search Consultant, R&D Software Engineer, Director Sease Ltd. - www.sease.io -- View this message in context: http://lucene.472066.n3.nabble.com/Filtering-results-by-minimum-relevancy-score-tp4329180p4329715.html Sent from the Solr - User mailing list archive at Nabble.com.
AW: What does the replication factor parameter in collections api do?
Ok. Thank you for your quick reply. Though I still feel a little uneasy. Why is it possible then to alter replicationFactor via MODIFYCOLLECTION in the collections API? What would be the use case for this parameter at all then? -Ursprüngliche Nachricht- Von: Erick Erickson [mailto:erickerick...@gmail.com] Gesendet: Mittwoch, 12. April 2017 19:36 An: solr-user Betreff: Re: What does the replication factor parameter in collections api do? really <3>. replicationFactor is used to set up your collection initially, you have to be able to change your topology afterwards so it's ignored thereafter. Once your replica is added, it's automatically made use of by the collection. On Wed, Apr 12, 2017 at 9:30 AM, Johannes Knaus wrote: > Hi, > > I am still quite new to Solr. I have the following setup: > A SolrCloud setup with > 38 nodes, > maxShardsPerNode=2, > implicit routing with routing field, > and replication factor=2. > > Now, I want to add replica. This works fine by first increasing the > maxShardsPerNode to a higher number and then add replicas. > So far, so good. I can confirm changes of the maxShardsPerNode parameter and > added replicas in the Admin UI. > However, the Solr Admin UI still is showing me a replication factor of 2. > I am a little confused about what the replicationfactor parameter actually > does in my case: > > 1) What does that mean? Does Solr make use of all replicas I have or only of > two? > 2) Do I need to increase the replication factor value as well to really have > more replicas available and usable? If this is true, do I need to > restart/reload the collection newly upload configs to Zookeeper or anything > alike? > 3) Or is replicationfactor just a parameter that is needed for the first > start of SolrCloud and can be ignored afterwards? > > Thank you very much for your help, > All the best, > Johannes >
RE: maxDoc ten times greater than numDoc
I have forced a merge yesterday and went back to one segment. One indexer program reindexes (most or all) every 20 minutes orso. There is nothing custom at that particular point. There is no autoCommit, the indexer program is responsible for a hard commit, it is the single source of reindexed data. After one cycle we had two segments, 50 % deleted, as expected. This was stable for many hours and many cycles. For some reason, i now have 2/3 deletes and three segments, now this situation is stable. So the merges do happen, but sometimes they don't. When they don't, the size increases (now three segments, 55 MB). But it appears that number of segments never decreases, and that is what bothers me. I was about to set segmentsPerTier to two but then i realized i can also delete everything prior to indexing as opposed to deleting only items older than the set i am already about to reindex. This strategy works fine with other reindexing programs, they don't suffer this problem. So, it is not solved, but not a problem anymore. Thanks all anyway :) Markus -Original message- > From:Erick Erickson > Sent: Wednesday 12th April 2017 17:51 > To: solr-user > Subject: Re: maxDoc ten times greater than numDoc > > Yes, this is very strange. My bet: you have something > custom, a setting, indexing code, whatever that > is getting in the way. > > Second possibility (really stretching here): your > merge settings are set to 10 segments having to exist > before merging and somehow not all the docs in the > segments are replaced. So until you get to the 10th > re-index (and assuming a single segment is > produced per re-index) the older segments aren't > merged. If that were the case I'd expect to see the > number of deleted docs drop back periodically > then build up again. A real shot in the dark. One way > to test this would be to specify "segmentsPerTier" of, say, > 2 rather than the default 10, see: > https://cwiki.apache.org/confluence/display/solr/IndexConfig+in+SolrConfig > If this were the case I'd expect with a setting of 2 that > your index might have 50% deleted docs, that would at > least tell us whether we're on the right track. > > Take a look at your index on disk. If you're seeing gaps > in the numbering, you are getting merging, it may be > that they're not happening very often. > > And I take it you have no custom code here and you are > doing commits? (hard commits are all that matters > for merging, it doesn't matter whether openSearcher > is set to true or false). > > I just tried the "techproducts" example as follows: > 1> indexed all the sample files with the bin/solr -e techproducts example > 2> started re-indexing the sample docs one at a time with post.jar > > It took a while, but eventually the original segments got merged away so > I doubt it's any weirdness with a small index. > > Speaking of small index, why are you sharding with only > 8K docs? Sharding will probably slow things down for such > a small index. This isn't germane to your question though. > > Best, > Erick > > > On Wed, Apr 12, 2017 at 5:56 AM, Shawn Heisey wrote: > > On 4/12/2017 5:11 AM, Markus Jelsma wrote: > >> One of our 2 shard collections is rather small and gets all its entries > >> reindexed every 20 minutes orso. Now i just noticed maxDoc is ten times > >> greater than numDoc, the merger is never scheduled but settings are > >> default. We just overwrite the existing entries, all of them. > >> > >> Here are the stats: > >> > >> Last Modified:12 minutes ago > >> Num Docs: 8336 > >> Max Doc:82362 > >> Heap Memory Usage: -1 > >> Deleted Docs: 74026 > >> Version: 3125 > >> Segment Count: 10 > > > > This discrepancy would typically mean that when you reindex, you're > > indexing MOST of the documents, but not ALL of them, so at least one > > document is still not deleted in each older segment. When segments have > > all their documents deleted, they are automatically removed by Lucene, > > but if there's even one document NOT deleted, the segment will remain > > until it is merged. > > > > There's no information here about how large this core is, but unless the > > documents are REALLY enormous, I'm betting that an optimize would happen > > quickly. With a document count this low and an indexing pattern that > > results in such a large maxdoc, this might be a good time to go against > > general advice and perform an optimize at least once a day. > > > > An alternate idea that would not require optimizes: If the intent is to > > completely rebuild the index, you might want to consider issuing a > > "delete all docs by query" before beginning the indexing process. This > > would ensure that none of the previous documents remain. As long as you > > don't do a commit that opens a new searcher before the indexing is > > complete, clients won't ever know that everything was deleted. > > > >> This is the config: > >> > >> 6.5.0 > >> ${solr.data.dir:} > >>>
SOLR - 6.4.0 SolrCore Initialization Failures
Hi, I have recently moved my cores from SOLR 5.1.0 to 6.4.0. I am using windows environment. I have large data in cores. I have total 6 cores with total data 142 GB. All cores are migrated perfectly but one is giving error: SolrCore Initialization Failures - core_name: org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: JVM Error creating core [core_name]: Java heap space I checked SOLR_HEAP="8g" in solr.in.sh Why I have problem in only one core ? Thanks. Regards, Uchit Patel
Re: maxDoc ten times greater than numDoc
Maybe not every entry got deleted and it was holding up the segment. E.g. a child or parent record abandoned. If, for example, the parent record has a date field and the child does not, then deleting with a date-based query may trigger this. I think there was a bug about abandoned child or something. This is pure speculation of course. Regards, Alex. http://www.solr-start.com/ - Resources for Solr users, new and experienced On 13 April 2017 at 12:54, Markus Jelsma wrote: > I have forced a merge yesterday and went back to one segment. > > One indexer program reindexes (most or all) every 20 minutes orso. There is > nothing custom at that particular point. There is no autoCommit, the indexer > program is responsible for a hard commit, it is the single source of > reindexed data. > > After one cycle we had two segments, 50 % deleted, as expected. This was > stable for many hours and many cycles. For some reason, i now have 2/3 > deletes and three segments, now this situation is stable. So the merges do > happen, but sometimes they don't. When they don't, the size increases (now > three segments, 55 MB). But it appears that number of segments never > decreases, and that is what bothers me. > > I was about to set segmentsPerTier to two but then i realized i can also > delete everything prior to indexing as opposed to deleting only items older > than the set i am already about to reindex. This strategy works fine with > other reindexing programs, they don't suffer this problem. > > So, it is not solved, but not a problem anymore. Thanks all anyway :) > Markus > > -Original message- >> From:Erick Erickson >> Sent: Wednesday 12th April 2017 17:51 >> To: solr-user >> Subject: Re: maxDoc ten times greater than numDoc >> >> Yes, this is very strange. My bet: you have something >> custom, a setting, indexing code, whatever that >> is getting in the way. >> >> Second possibility (really stretching here): your >> merge settings are set to 10 segments having to exist >> before merging and somehow not all the docs in the >> segments are replaced. So until you get to the 10th >> re-index (and assuming a single segment is >> produced per re-index) the older segments aren't >> merged. If that were the case I'd expect to see the >> number of deleted docs drop back periodically >> then build up again. A real shot in the dark. One way >> to test this would be to specify "segmentsPerTier" of, say, >> 2 rather than the default 10, see: >> https://cwiki.apache.org/confluence/display/solr/IndexConfig+in+SolrConfig >> If this were the case I'd expect with a setting of 2 that >> your index might have 50% deleted docs, that would at >> least tell us whether we're on the right track. >> >> Take a look at your index on disk. If you're seeing gaps >> in the numbering, you are getting merging, it may be >> that they're not happening very often. >> >> And I take it you have no custom code here and you are >> doing commits? (hard commits are all that matters >> for merging, it doesn't matter whether openSearcher >> is set to true or false). >> >> I just tried the "techproducts" example as follows: >> 1> indexed all the sample files with the bin/solr -e techproducts example >> 2> started re-indexing the sample docs one at a time with post.jar >> >> It took a while, but eventually the original segments got merged away so >> I doubt it's any weirdness with a small index. >> >> Speaking of small index, why are you sharding with only >> 8K docs? Sharding will probably slow things down for such >> a small index. This isn't germane to your question though. >> >> Best, >> Erick >> >> >> On Wed, Apr 12, 2017 at 5:56 AM, Shawn Heisey wrote: >> > On 4/12/2017 5:11 AM, Markus Jelsma wrote: >> >> One of our 2 shard collections is rather small and gets all its entries >> >> reindexed every 20 minutes orso. Now i just noticed maxDoc is ten times >> >> greater than numDoc, the merger is never scheduled but settings are >> >> default. We just overwrite the existing entries, all of them. >> >> >> >> Here are the stats: >> >> >> >> Last Modified:12 minutes ago >> >> Num Docs: 8336 >> >> Max Doc:82362 >> >> Heap Memory Usage: -1 >> >> Deleted Docs: 74026 >> >> Version: 3125 >> >> Segment Count: 10 >> > >> > This discrepancy would typically mean that when you reindex, you're >> > indexing MOST of the documents, but not ALL of them, so at least one >> > document is still not deleted in each older segment. When segments have >> > all their documents deleted, they are automatically removed by Lucene, >> > but if there's even one document NOT deleted, the segment will remain >> > until it is merged. >> > >> > There's no information here about how large this core is, but unless the >> > documents are REALLY enormous, I'm betting that an optimize would happen >> > quickly. With a document count this low and an indexing pattern that >> > results in such a large maxdoc, this might be a good time to go
Re: Using BasicAuth with SolrJ Code
That looks good. can you share the security.json (commenting out anything that's sensitive of course) On Wed, Apr 12, 2017 at 5:10 PM, Zheng Lin Edwin Yeo wrote: > This is what I get when I run the code. > > org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error > from server at http://localhost:8983/solr/testing: Expected mime type > application/octet-stream but got text/html. > > > Error 401 require authentication > > HTTP ERROR 401 > Problem accessing /solr/testing/update. Reason: > require authentication > > > > at > org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:578) > at > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:279) > at > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:268) > at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:149) > at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:106) > at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:71) > at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:85) > at testing.indexing(testing.java:2939) > at testing.main(testing.java:329) > Exception in thread "main" > org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error > from server at http://localhost:8983/solr/testing: Expected mime type > application/octet-stream but got text/html. > > > Error 401 require authentication > > HTTP ERROR 401 > Problem accessing /solr/testing/update. Reason: > require authentication > > > > at > org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:578) > at > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:279) > at > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:268) > at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:149) > at org.apache.solr.client.solrj.SolrClient.commit(SolrClient.java:484) > at org.apache.solr.client.solrj.SolrClient.commit(SolrClient.java:463) > at testing.indexing(testing.java:3063) > at testing.main(testing.java:329) > > Regards, > Edwin > > > On 12 April 2017 at 14:28, Noble Paul wrote: > >> can u paste the stacktrace here >> >> On Tue, Apr 11, 2017 at 1:19 PM, Zheng Lin Edwin Yeo >> wrote: >> > I found from StackOverflow that we should declare it this way: >> > http://stackoverflow.com/questions/43335419/using- >> basicauth-with-solrj-code >> > >> > >> > SolrRequest req = new QueryRequest(new SolrQuery("*:*"));//create a new >> > request object >> > req.setBasicAuthCredentials(userName, password); >> > solrClient.request(req); >> > >> > Is that correct? >> > >> > For this, the NullPointerException is not coming out, but the SolrJ is >> > still not able to get authenticated. I'm still getting Error Code 401 >> even >> > after putting in this code. >> > >> > Any advice on which part of the SolrJ code should we place this code in? >> > >> > Regards, >> > Edwin >> > >> > >> > On 10 April 2017 at 23:50, Zheng Lin Edwin Yeo >> wrote: >> > >> >> Hi, >> >> >> >> I have just set up the Basic Authentication Plugin in Solr 6.4.2 on >> >> SolrCloud, and I am trying to modify my SolrJ code so that the code can >> go >> >> through the authentication and do the indexing. >> >> >> >> I tried using the following code from the Solr Documentation >> >> https://cwiki.apache.org/confluence/display/solr/Basic+Authentication+ >> >> Plugin. >> >> >> >> SolrRequest req ;//create a new request object >> >> req.setBasicAuthCredentials(userName, password); >> >> solrClient.request(req); >> >> >> >> However, the code complains that the req is not initialized. >> >> >> >> If I initialized it, it will be initialize as null. >> >> >> >> SolrRequest req = null;//create a new request object >> >> req.setBasicAuthCredentials(userName, password); >> >> solrClient.request(req); >> >> >> >> This will caused a null pointer exception. >> >> Exception in thread "main" java.lang.NullPointerException >> >> >> >> How should we go about putting these codes, so that the error can be >> >> prevented? >> >> >> >> Regards, >> >> Edwin >> >> >> >> >> >> >> >> -- >> - >> Noble Paul >> -- - Noble Paul
Re: AW: What does the replication factor parameter in collections api do?
On 4/13/2017 3:22 AM, Johannes Knaus wrote: > Ok. Thank you for your quick reply. Though I still feel a little > uneasy. Why is it possible then to alter replicationFactor via > MODIFYCOLLECTION in the collections API? What would be the use case > for this parameter at all then? If you use a very specific storage method for your indexes -- HDFS -- then replicationFactor has meaning beyond initial collection creation, in conjunction with the "autoAddReplicas" feature. https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+HDFS#RunningSolronHDFS-AutomaticallyAddReplicasinSolrCloud If you are NOT utilizing the very specific HDFS storage engine, then everything you were told applies. With standard storage mechanisms, replicationFactor has zero meaning after initial collection creation, and changing the value will have no effect. Thanks, Shawn
Re: SOLR - 6.4.0 SolrCore Initialization Failures
On 4/13/2017 4:37 AM, Uchit Patel wrote: > I have recently moved my cores from SOLR 5.1.0 to 6.4.0. I am using windows > environment. I have large data in cores. I have total 6 cores with total data > 142 GB. All cores are migrated perfectly but one is giving error: > > SolrCore Initialization Failures > >- core_name: > org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: > JVM Error creating core [core_name]: Java heap space > > I checked SOLR_HEAP="8g" in solr.in.sh > Why I have problem in only one core ? Heap space problems affect the entire process, and the reason for needing more heap may not be apparent from logs. An OutOfMemoryError may be thrown from *ANY* part of the program when you run out of heap, even pieces of the program that aren't the actual problem. https://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap Thanks, Shawn
Re: Autosuggestion
Thanks, that's very helpful! The third link especially is quite helpful. Is there any recommendation regarding using FST-based vs AnalyzingInfix suggesters? Thanks On Wed, Apr 12, 2017 at 6:23 PM, Andrea Gazzarini wrote: > Hi, > I think you got an old post. I would have a look at the built-in feature, > first. These posts can help you to get a quick overview: > > https://cwiki.apache.org/confluence/display/solr/Suggester > http://alexbenedetti.blogspot.it/2015/07/solr-you-complete-me.html > https://lucidworks.com/2015/03/04/solr-suggester/ > > HTH, > Andrea > > > On 12/04/17 14:43, OTH wrote: > >> Hello, >> >> Is there any recommended way to achieve auto-suggestion in textboxes using >> Solr? >> >> I'm new to Solr, but right now I have achieved this functionality by using >> an example I found online, doing this: >> >> I added a copy field, which is of the following type: >> >>> positionIncrementGap="100"> >> >>> maxGramSize="10"/> >> >> >> >>> maxGramSize="10"/> >> >> >> >> >> In the search box, after each character is typed, the above field is >> queried, and the results are shown in a drop-down list. >> >> However, this is performing quite slow. I'm not sure if that has to do >> with the front-end code, or because I'm not using the recommended approach >> in terms of how I'm using Solr. Is there any other recommended way to use >> Solr to achieve this functionality? >> >> Thanks >> >> >
Re: Using BasicAuth with SolrJ Code
The security.json which I'm using is the default one that is available from the Solr Documentation https://cwiki.apache.org/confluence/display/ solr/Basic+Authentication+Plugin. { "authentication":{ "blockUnknown": true, "class":"solr.BasicAuthPlugin", "credentials":{"solr":"IV0EHq1OnNrj6gvRCwvFwTrZ1+z1oBbnQdiVC3otuq0= Ndd7LKvVBAaZIF0QAVi1ekCfAJXr1GGfLtRUXhgrF8c="} }, "authorization":{ "class":"solr.RuleBasedAuthorizationPlugin", "user-role":{"solr":"admin"}, "permissions":[{"name":"security-edit", "role":"admin"}] }} Regards, Edwin On 13 April 2017 at 19:53, Noble Paul wrote: > That looks good. can you share the security.json (commenting out > anything that's sensitive of course) > > On Wed, Apr 12, 2017 at 5:10 PM, Zheng Lin Edwin Yeo > wrote: > > This is what I get when I run the code. > > > > org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: > Error > > from server at http://localhost:8983/solr/testing: Expected mime type > > application/octet-stream but got text/html. > > > > > > Error 401 require authentication > > > > HTTP ERROR 401 > > Problem accessing /solr/testing/update. Reason: > > require authentication > > > > > > > > at > > org.apache.solr.client.solrj.impl.HttpSolrClient. > executeMethod(HttpSolrClient.java:578) > > at > > org.apache.solr.client.solrj.impl.HttpSolrClient.request( > HttpSolrClient.java:279) > > at > > org.apache.solr.client.solrj.impl.HttpSolrClient.request( > HttpSolrClient.java:268) > > at org.apache.solr.client.solrj.SolrRequest.process( > SolrRequest.java:149) > > at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:106) > > at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:71) > > at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:85) > > at testing.indexing(testing.java:2939) > > at testing.main(testing.java:329) > > Exception in thread "main" > > org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: > Error > > from server at http://localhost:8983/solr/testing: Expected mime type > > application/octet-stream but got text/html. > > > > > > Error 401 require authentication > > > > HTTP ERROR 401 > > Problem accessing /solr/testing/update. Reason: > > require authentication > > > > > > > > at > > org.apache.solr.client.solrj.impl.HttpSolrClient. > executeMethod(HttpSolrClient.java:578) > > at > > org.apache.solr.client.solrj.impl.HttpSolrClient.request( > HttpSolrClient.java:279) > > at > > org.apache.solr.client.solrj.impl.HttpSolrClient.request( > HttpSolrClient.java:268) > > at org.apache.solr.client.solrj.SolrRequest.process( > SolrRequest.java:149) > > at org.apache.solr.client.solrj.SolrClient.commit(SolrClient.java:484) > > at org.apache.solr.client.solrj.SolrClient.commit(SolrClient.java:463) > > at testing.indexing(testing.java:3063) > > at testing.main(testing.java:329) > > > > Regards, > > Edwin > > > > > > On 12 April 2017 at 14:28, Noble Paul wrote: > > > >> can u paste the stacktrace here > >> > >> On Tue, Apr 11, 2017 at 1:19 PM, Zheng Lin Edwin Yeo > >> wrote: > >> > I found from StackOverflow that we should declare it this way: > >> > http://stackoverflow.com/questions/43335419/using- > >> basicauth-with-solrj-code > >> > > >> > > >> > SolrRequest req = new QueryRequest(new SolrQuery("*:*"));//create a > new > >> > request object > >> > req.setBasicAuthCredentials(userName, password); > >> > solrClient.request(req); > >> > > >> > Is that correct? > >> > > >> > For this, the NullPointerException is not coming out, but the SolrJ is > >> > still not able to get authenticated. I'm still getting Error Code 401 > >> even > >> > after putting in this code. > >> > > >> > Any advice on which part of the SolrJ code should we place this code > in? > >> > > >> > Regards, > >> > Edwin > >> > > >> > > >> > On 10 April 2017 at 23:50, Zheng Lin Edwin Yeo > >> wrote: > >> > > >> >> Hi, > >> >> > >> >> I have just set up the Basic Authentication Plugin in Solr 6.4.2 on > >> >> SolrCloud, and I am trying to modify my SolrJ code so that the code > can > >> go > >> >> through the authentication and do the indexing. > >> >> > >> >> I tried using the following code from the Solr Documentation > >> >> https://cwiki.apache.org/confluence/display/solr/Basic+ > Authentication+ > >> >> Plugin. > >> >> > >> >> SolrRequest req ;//create a new request object > >> >> req.setBasicAuthCredentials(userName, password); > >> >> solrClient.request(req); > >> >> > >> >> However, the code complains that the req is not initialized. > >> >> > >> >> If I initialized it, it will be initialize as null. > >> >> > >> >> SolrRequest req = null;//create a new request object > >> >> req.setBasicAuthCredentials(userName, password); > >> >> solrClient.request(req); > >> >> > >> >> This will caused a null pointer exception. > >> >> Exception in thread "main" java.lang.NullPointerException > >> >> > >> >> How should we go about putting these codes, so that the error can
Re: keyword-in-context for PDF document
Apologies, I meant "keyword-in-context". -- View this message in context: http://lucene.472066.n3.nabble.com/keyword-in-content-for-PDF-document-tp4329754p4329756.html Sent from the Solr - User mailing list archive at Nabble.com.
keyword-in-content for PDF document
If i am search for word "growth" in a PDF, i want to output all the sentences with the word "growth" in it. How can that be done? -- View this message in context: http://lucene.472066.n3.nabble.com/keyword-in-content-for-PDF-document-tp4329754.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Autosuggestion
bq: FST-based vs AnalyzingInfix They are two totally different things. FST-based suggesters are very fast and compact. But they only match from the beginning of the input. AnalyzingInfix creates a "sidecar" index that's searched like a normal index and the _field_ is returned. Thus analyzinginfix can suggest "my dog has fleas" when entering "fleas", but the FST-based suggesters cannot. Best, Erick On Thu, Apr 13, 2017 at 6:24 AM, OTH wrote: > Thanks, that's very helpful! > The third link especially is quite helpful. > Is there any recommendation regarding using FST-based vs AnalyzingInfix > suggesters? > Thanks > > On Wed, Apr 12, 2017 at 6:23 PM, Andrea Gazzarini wrote: > >> Hi, >> I think you got an old post. I would have a look at the built-in feature, >> first. These posts can help you to get a quick overview: >> >> https://cwiki.apache.org/confluence/display/solr/Suggester >> http://alexbenedetti.blogspot.it/2015/07/solr-you-complete-me.html >> https://lucidworks.com/2015/03/04/solr-suggester/ >> >> HTH, >> Andrea >> >> >> On 12/04/17 14:43, OTH wrote: >> >>> Hello, >>> >>> Is there any recommended way to achieve auto-suggestion in textboxes using >>> Solr? >>> >>> I'm new to Solr, but right now I have achieved this functionality by using >>> an example I found online, doing this: >>> >>> I added a copy field, which is of the following type: >>> >>>>> positionIncrementGap="100"> >>> >>>>> maxGramSize="10"/> >>> >>> >>> >>>>> maxGramSize="10"/> >>> >>> >>> >>> >>> In the search box, after each character is typed, the above field is >>> queried, and the results are shown in a drop-down list. >>> >>> However, this is performing quite slow. I'm not sure if that has to do >>> with the front-end code, or because I'm not using the recommended approach >>> in terms of how I'm using Solr. Is there any other recommended way to use >>> Solr to achieve this functionality? >>> >>> Thanks >>> >>> >>
Re: AW: What does the replication factor parameter in collections api do?
bq: Why is it possible then to alter replicationFactor via MODIFYCOLLECTION in the collections API Because MODIFYCOLLECTION just changes properties in the collection definition generically and replicationFactor just happens to be one. IOW there's no overarching reason. It would be extra work to dis-allow that one case and possibly introduce errors without changing any functionality so nobody was willing to put in the effort. Best, Erick On Thu, Apr 13, 2017 at 5:48 AM, Shawn Heisey wrote: > On 4/13/2017 3:22 AM, Johannes Knaus wrote: >> Ok. Thank you for your quick reply. Though I still feel a little >> uneasy. Why is it possible then to alter replicationFactor via >> MODIFYCOLLECTION in the collections API? What would be the use case >> for this parameter at all then? > > If you use a very specific storage method for your indexes -- HDFS -- > then replicationFactor has meaning beyond initial collection creation, > in conjunction with the "autoAddReplicas" feature. > > https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+HDFS#RunningSolronHDFS-AutomaticallyAddReplicasinSolrCloud > > If you are NOT utilizing the very specific HDFS storage engine, then > everything you were told applies. With standard storage mechanisms, > replicationFactor has zero meaning after initial collection creation, > and changing the value will have no effect. > > Thanks, > Shawn >
Re: maxDoc ten times greater than numDoc
If you want to be brave Through a clever bit of reflection, the parameters that TieredMergePolicy uses to decide what segments to reclaim are settable in solrconfig.xml (undocumented, so use at your own risk). You could try bumping reclaimDeletesWeight in your TieredMergePolicy configuration if you wanted to experiment. There's no good reason not to set your segments per tier, it won't hurt. But as you say you have a solution so this is just for curiosity's sake. Best, Erick On Thu, Apr 13, 2017 at 4:42 AM, Alexandre Rafalovitch wrote: > Maybe not every entry got deleted and it was holding up the segment. > E.g. a child or parent record abandoned. If, for example, the parent > record has a date field and the child does not, then deleting with a > date-based query may trigger this. I think there was a bug about > abandoned child or something. > > This is pure speculation of course. > > Regards, >Alex. > > http://www.solr-start.com/ - Resources for Solr users, new and experienced > > > On 13 April 2017 at 12:54, Markus Jelsma wrote: >> I have forced a merge yesterday and went back to one segment. >> >> One indexer program reindexes (most or all) every 20 minutes orso. There is >> nothing custom at that particular point. There is no autoCommit, the indexer >> program is responsible for a hard commit, it is the single source of >> reindexed data. >> >> After one cycle we had two segments, 50 % deleted, as expected. This was >> stable for many hours and many cycles. For some reason, i now have 2/3 >> deletes and three segments, now this situation is stable. So the merges do >> happen, but sometimes they don't. When they don't, the size increases (now >> three segments, 55 MB). But it appears that number of segments never >> decreases, and that is what bothers me. >> >> I was about to set segmentsPerTier to two but then i realized i can also >> delete everything prior to indexing as opposed to deleting only items older >> than the set i am already about to reindex. This strategy works fine with >> other reindexing programs, they don't suffer this problem. >> >> So, it is not solved, but not a problem anymore. Thanks all anyway :) >> Markus >> >> -Original message- >>> From:Erick Erickson >>> Sent: Wednesday 12th April 2017 17:51 >>> To: solr-user >>> Subject: Re: maxDoc ten times greater than numDoc >>> >>> Yes, this is very strange. My bet: you have something >>> custom, a setting, indexing code, whatever that >>> is getting in the way. >>> >>> Second possibility (really stretching here): your >>> merge settings are set to 10 segments having to exist >>> before merging and somehow not all the docs in the >>> segments are replaced. So until you get to the 10th >>> re-index (and assuming a single segment is >>> produced per re-index) the older segments aren't >>> merged. If that were the case I'd expect to see the >>> number of deleted docs drop back periodically >>> then build up again. A real shot in the dark. One way >>> to test this would be to specify "segmentsPerTier" of, say, >>> 2 rather than the default 10, see: >>> https://cwiki.apache.org/confluence/display/solr/IndexConfig+in+SolrConfig >>> If this were the case I'd expect with a setting of 2 that >>> your index might have 50% deleted docs, that would at >>> least tell us whether we're on the right track. >>> >>> Take a look at your index on disk. If you're seeing gaps >>> in the numbering, you are getting merging, it may be >>> that they're not happening very often. >>> >>> And I take it you have no custom code here and you are >>> doing commits? (hard commits are all that matters >>> for merging, it doesn't matter whether openSearcher >>> is set to true or false). >>> >>> I just tried the "techproducts" example as follows: >>> 1> indexed all the sample files with the bin/solr -e techproducts example >>> 2> started re-indexing the sample docs one at a time with post.jar >>> >>> It took a while, but eventually the original segments got merged away so >>> I doubt it's any weirdness with a small index. >>> >>> Speaking of small index, why are you sharding with only >>> 8K docs? Sharding will probably slow things down for such >>> a small index. This isn't germane to your question though. >>> >>> Best, >>> Erick >>> >>> >>> On Wed, Apr 12, 2017 at 5:56 AM, Shawn Heisey wrote: >>> > On 4/12/2017 5:11 AM, Markus Jelsma wrote: >>> >> One of our 2 shard collections is rather small and gets all its entries >>> >> reindexed every 20 minutes orso. Now i just noticed maxDoc is ten times >>> >> greater than numDoc, the merger is never scheduled but settings are >>> >> default. We just overwrite the existing entries, all of them. >>> >> >>> >> Here are the stats: >>> >> >>> >> Last Modified:12 minutes ago >>> >> Num Docs: 8336 >>> >> Max Doc:82362 >>> >> Heap Memory Usage: -1 >>> >> Deleted Docs: 74026 >>> >> Version: 3125 >>> >> Segment Count: 10 >>> > >>> > This discrepancy would typical
Re: Grouped Result sort issue
I had the chance to make some investigation code side, and I basically confirm what Erick hypothesized and what Diego Ceccarelli mentioned in this other thread [1]. Grouping happens with a 2 collector phases strategy : 1) first phase retrieve and sort the groups 2) second phase retrieve the top documents per group and sort them. The phases are indipendent so the documents you retrieve in phase 2 don't affect the order of groups ( that was estabilished in phase1 ). Specifically to the phase 1, we keep for each group the most representative value(s). If the sort is by score asc for each group, the min score is stored in values. If the sort is by score desc, for each group the max score is stored in values. Then when ordering the groups, we just add them to a treeSet using the field comparator on those values. To conclude, when retrieving one doc per group and a flat list, this behaviour may sound counter-intuitive. I guess you should use the field collapsing[2] and you should see a consistent behavior to what you expect. Cheers [1] http://lucene.472066.n3.nabble.com/Question-about-grouping-in-distribute-mode-td4327679.html [2] https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results - --- Alessandro Benedetti Search Consultant, R&D Software Engineer, Director Sease Ltd. - www.sease.io -- View this message in context: http://lucene.472066.n3.nabble.com/Grouped-Result-sort-issue-tp4329255p4329784.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: keyword-in-content for PDF document
With great difficulty. PDF does not usually preserve the text flow, it uses instead absolute positioning for text fragments. Extraction will try to approximate the right thing, but it is an approximation. And if you have two columns, it is harder again. Some documents may have accessibility layer, which would help. I'd start from using Tika (or extract handler with extractOnly=true) on the documents you have and seeing what comes out. See https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika Then you have to figure out whether you are searching just a word or across the sentence boundaries. You could probably (somehow) split on sentence boundary if you want to store each sentence as a value in a multivalued field. Or you could try using highlighter to return only the sentence. Of course, defining the sentence boundary is a lot trickier than it seems at first.. (eg. "He works for B.B.C.") Regards, Alex. http://www.solr-start.com/ - Resources for Solr users, new and experienced On 13 April 2017 at 15:54, ankur wrote: > If i am search for word "growth" in a PDF, i want to output all the sentences > with the word "growth" in it. > > How can that be done? > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/keyword-in-content-for-PDF-document-tp4329754.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: Filtering results by minimum relevancy score
BM25 came out of work on probabilistic engines, but using BM25 in Solr doesn’t automatically make it probabilistic. I read a paper once that showed the two models are not that different, maybe by Karen Sparck-Jones. Still, even with a probabilistic model, relevance cutoffs don’t work. It is still too easy for a good match to have a low score. We’re back to increasing the good hits vs reducing the bad hits. You really only achieve one of those two. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Apr 12, 2017, at 7:41 PM, Koji Sekiguchi > wrote: > > Hi Walter, > > May I ask a tangential question? I'm curious the following line you wrote: > > > Solr is a vector-space engine. Some early engines (Verity VDK) were > > probabilistic engines. Those do give an absolute estimate of the relevance > > of each hit. Unfortunately, the relevance of results is just not as good as > > vector-space engines. So, probabilistic engines are mostly dead. > > Can you elaborate this? > > I thought Okapi BM25, which is the default Similarity on Solr, is based on > the probabilistic > model. Did you mean that Lucene/Solr is still based on vector space model but > they built > BM25Similarity on top of it and therefore, BM25Similarity is not pure > probabilistic scoring > system or Okapi BM25 is not originally probabilistic? > > As for me, I prefer the idea of vector space than probabilistic for the > information retrieval, > and I stick with ClassicSimilarity for my projects. > > Thanks, > > Koji > > > On 2017/04/13 4:08, Walter Underwood wrote: >> Fine. It can’t be done. If it was easy, Solr/Lucene would already have the >> feature, right? >> Solr is a vector-space engine. Some early engines (Verity VDK) were >> probabilistic engines. Those do give an absolute estimate of the relevance >> of each hit. Unfortunately, the relevance of results is just not as good as >> vector-space engines. So, probabilistic engines are mostly dead. >> But, “you don’t want to do it” is very good advice. Instead of trying to >> reduce bad hits, work on increasing good hits. It is really hard, sometimes >> not possible, to optimize both. Increasing the good hits makes your >> customers happy. Reducing the bad hits makes your UX team happy. >> Here is a process. Start collecting the clicks on the search results page >> (SRP) with each query. Look at queries that have below average clickthrough. >> See if those can be combined into categories, then address each category. >> Some categories that I have used: >> * One word or two? “babysitter”, “baby-sitter”, and “baby sitter” are all >> valid. Use synonyms or shingles (and maybe the word delimiter filter) to >> match these. >> * Misspellings. These should be about 10% of queries. Use fuzzy matching. I >> recommend the patch in SOLR-629. >> * Alternate vocabulary. You sell a “laptop”, but people call it a >> “notebook”. People search for “kids movies”, but your movie genre is >> “Children and Family”. Use synonyms. >> * Missing content. People can’t find anything about beach parking because >> there isn’t a page about that. Instead, there are scraps of info about beach >> parking in multiple other pages. Fix the content. >> wunder >> Walter Underwood >> wun...@wunderwood.org >> http://observer.wunderwood.org/ (my blog) >>> On Apr 12, 2017, at 11:44 AM, David Kramer wrote: >>> >>> The idea is to not return poorly matching results, not to limit the number >>> of results returned. One query may have hundreds of excellent matches and >>> another query may have 7. So cutting off by the number of results is >>> trivial but not useful. >>> >>> Again, we are not doing this for performance reasons. We’re doing this >>> because we don’t want to show products that are not very relevant to the >>> search terms specified by the user for UX reasons. >>> >>> I had hoped that the responses would have been more focused on “it’ can’t >>> be done” or “here’s how to do it” than “you don’t want to do it”. I’m >>> still left not knowing if it’s even possible. The one concrete answer of >>> using frange doesn’t help as referencing score in either the q or the fq >>> produces an “undefined field” error. >>> >>> Thanks. >>> >>> On 4/11/17, 8:59 AM, "Dorian Hoxha" wrote: >>> >>>Can't the filter be used in cases when you're paginating in >>>sharded-scenario ? >>>So if you do limit=10, offset=10, each shard will return 20 docs ? >>>While if you do limit=10, _score<=last_page.min_score, then each shard >>> will >>>return 10 docs ? (they will still score all docs, but merging will be >>>faster) >>> >>>Makes sense ? >>> >>>On Tue, Apr 11, 2017 at 12:49 PM, alessandro.benedetti >>> >>> wrote: >>> Can i ask what is the final requirement here ? What are you trying to do ? - just display less results ? you can easily do at search client time, cutting after a certain amount >>>
Re: keyword-in-content for PDF document
Thanks Alex. Yes, I am using TIKA. So, to some extent it preserves the text flow. There is something interesting in your reply, "Or you could try using highlighter to return only the sentence. ". I didnt understand that bit. How do we use Highlighter to return the sentence? To make sure, I want to return all sentences where the word "Growth" appears. -- View this message in context: http://lucene.472066.n3.nabble.com/keyword-in-context-for-PDF-document-tp4329754p4329794.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: keyword-in-content for PDF document
If you don't care about sentence boundaries, but just want a window around target terms and you want concordance functionality (sort before, after, etc), you might check out LUCENE-5317, which is available as a standalone jar on my github site [1] and is available through maven central. Using a highlighter, too, will get you close. See a crummy image of LUCENE-5317 [2] or the full presentation [3] [1] https://github.com/tballison/lucene-addons/tree/6.5-0.1 [2] https://twitter.com/_tallison/status/852492398793981952 [3] https://github.com/tballison/share/blob/master/slides/TextProcessingAndAdvancedSearch_tallison_MITRE_201510_final_abbrev.pdf slide 23ff. -Original Message- From: ankur [mailto:ankur.sancheti.netw...@gmail.com] Sent: Thursday, April 13, 2017 12:08 PM To: solr-user@lucene.apache.org Subject: Re: keyword-in-content for PDF document Thanks Alex. Yes, I am using TIKA. So, to some extent it preserves the text flow. There is something interesting in your reply, "Or you could try using highlighter to return only the sentence. ". I didnt understand that bit. How do we use Highlighter to return the sentence? To make sure, I want to return all sentences where the word "Growth" appears. -- View this message in context: http://lucene.472066.n3.nabble.com/keyword-in-context-for-PDF-document-tp4329754p4329794.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: keyword-in-content for PDF document
The boundary scanner supports sentence as per: https://cwiki.apache.org/confluence/display/solr/Highlighting So, the word in context should - if I remember correctly - give you the sentence that word is in even if the field has longer text. Regards, Alex. http://www.solr-start.com/ - Resources for Solr users, new and experienced On 13 April 2017 at 19:07, ankur wrote: > Thanks Alex. Yes, I am using TIKA. So, to some extent it preserves the text > flow. > > There is something interesting in your reply, "Or you could try using > highlighter to return only > the sentence. ". > > I didnt understand that bit. How do we use Highlighter to return the > sentence? > > To make sure, I want to return all sentences where the word "Growth" > appears. > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/keyword-in-context-for-PDF-document-tp4329754p4329794.html > Sent from the Solr - User mailing list archive at Nabble.com.
keywords not found - google like feature
Hello All, When we search google, sometimes google returns results with mention of keywords not found (mentioned as strike-through) Does Solr provide such feature ? Thanks, Nilesh Kamani
Re: Long GC pauses while reading Solr docs using Cursor approach
Hi Shawn, Thanks for the insights into the memory requirements. Looks like cursor approach is going to require a lot of memory for millions of documents. If I run a query that returns only 500K documents still keeping 100K docs per page, I don't see long GC pauses. So it is not really the number of rows per page but the overall number of docs. May be I can reduce the document cache and the field cache. What do you think? Erick, I was using the streaming approach to get back results from Solr but I was running into some run time exceptions. That bug has been fixed in solr 6.0. But because of some reasons, I won't be able to move to Java 8 and hence I will have to stick to solr 5.5.0. That is the reason I had to switch to the cursor approach. Thanks! On Wed, Apr 12, 2017 at 8:37 PM, Erick Erickson wrote: > You're missing the point of my comment. Since they already are > docValues, you can use the /export functionality to get the results > back as a _stream_ and avoid all of the overhead of the aggregator > node doing a merge sort and all of that. > > You'll have to do this from SolrJ, but see CloudSolrStream. You can > see examples of its usage in StreamingTest.java. > > this should > 1> complete much, much faster. The design goal is 400K rows/second but YMMV > 2> use vastly less memory on your Solr instances. > 3> only require _one_ query > > Best, > Erick > > On Wed, Apr 12, 2017 at 7:36 PM, Shawn Heisey wrote: > > On 4/12/2017 5:19 PM, Chetas Joshi wrote: > >> I am getting back 100K results per page. > >> The fields have docValues enabled and I am getting sorted results based > on "id" and 2 more fields (String: 32 Bytes and Long: 8 Bytes). > >> > >> I have a solr Cloud of 80 nodes. There will be one shard that will get > top 100K docs from each shard and apply merge sort. So, the max memory > usage of any shard could be 40 bytes * 100K * 80 = 320 MB. Why would heap > memory usage shoot up from 8 GB to 17 GB? > > > > From what I understand, Java overhead for a String object is 56 bytes > > above the actual byte size of the string itself. And each character in > > the string will be two bytes -- Java uses UTF-16 for character > > representation internally. If I'm right about these numbers, it means > > that each of those id values will take 120 bytes -- and that doesn't > > include the size the actual response (xml, json, etc). > > > > I don't know what the overhead for a long is, but you can be sure that > > it's going to take more than eight bytes total memory usage for each one. > > > > Then there is overhead for all the Lucene memory structures required to > > execute the query and gather results, plus Solr memory structures to > > keep track of everything. I have absolutely no idea how much memory > > Lucene and Solr use to accomplish a query, but it's not going to be > > small when you have 200 million documents per shard. > > > > Speaking of Solr memory requirements, under normal query circumstances > > the aggregating node is going to receive at least 100K results from > > *every* shard in the collection, which it will condense down to the > > final result with 100K entries. The behavior during a cursor-based > > request may be more memory-efficient than what I have described, but I > > am unsure whether that is the case. > > > > If the cursor behavior is not more efficient, then each entry in those > > results will contain the uniqueKey value and the score. That's going to > > be many megabytes for every shard. If there are 80 shards, it would > > probably be over a gigabyte for one request. > > > > Thanks, > > Shawn > > >
Need help with auto-suggester
Hello, I've followed the steps here to set up auto-suggest: https://lucidworks.com/2015/03/04/solr-suggester/ So basically I configured the auto-suggester in solrconfig.xml, where I told it which field in my index needs to be used for auto-suggestion. The problem is: When the user searches in the text box in the front end, if they are searching for cities, I also need the countries to appear in the drop-down list which the user sees. The field which is being searched is only 'city' here. However, I need to retrieve the corresponding value in the 'country' field as well. How could I do this using the suggester? Thanks
Re: Autosuggestion
Hello So, from what I've picked up so far: FST only matches from the beginning of the input, but can handle spelling errors and do stemming. AnalyzingInfix can't handle spelling errors or stemming but can match from the middle of the string. (Is there anyway to achieve both of the functionalities above, if need be?) Performance-wise, FST's are faster and more compact? Thanks On Thu, Apr 13, 2017 at 7:57 PM, Erick Erickson wrote: > bq: FST-based vs AnalyzingInfix > > They are two totally different things. FST-based suggesters are very > fast and compact. But they only match from the beginning of the input. > > AnalyzingInfix creates a "sidecar" index that's searched like a normal > index and the _field_ is returned. Thus analyzinginfix can suggest > "my dog has fleas" when entering "fleas", but the FST-based suggesters > cannot. > > Best, > Erick > > On Thu, Apr 13, 2017 at 6:24 AM, OTH wrote: > > Thanks, that's very helpful! > > The third link especially is quite helpful. > > Is there any recommendation regarding using FST-based vs AnalyzingInfix > > suggesters? > > Thanks > > > > On Wed, Apr 12, 2017 at 6:23 PM, Andrea Gazzarini > wrote: > > > >> Hi, > >> I think you got an old post. I would have a look at the built-in > feature, > >> first. These posts can help you to get a quick overview: > >> > >> https://cwiki.apache.org/confluence/display/solr/Suggester > >> http://alexbenedetti.blogspot.it/2015/07/solr-you-complete-me.html > >> https://lucidworks.com/2015/03/04/solr-suggester/ > >> > >> HTH, > >> Andrea > >> > >> > >> On 12/04/17 14:43, OTH wrote: > >> > >>> Hello, > >>> > >>> Is there any recommended way to achieve auto-suggestion in textboxes > using > >>> Solr? > >>> > >>> I'm new to Solr, but right now I have achieved this functionality by > using > >>> an example I found online, doing this: > >>> > >>> I added a copy field, which is of the following type: > >>> > >>> >>> positionIncrementGap="100"> > >>> > >>> >>> maxGramSize="10"/> > >>> > >>> > >>> > >>> minGramSize="2" > >>> maxGramSize="10"/> > >>> > >>> > >>> > >>> > >>> In the search box, after each character is typed, the above field is > >>> queried, and the results are shown in a drop-down list. > >>> > >>> However, this is performing quite slow. I'm not sure if that has to do > >>> with the front-end code, or because I'm not using the recommended > approach > >>> in terms of how I'm using Solr. Is there any other recommended way to > use > >>> Solr to achieve this functionality? > >>> > >>> Thanks > >>> > >>> > >> >
Re: keywords not found - google like feature
Something like this. Does SOLR have such feature ? [image: Inline image 1] On Thu, Apr 13, 2017 at 1:49 PM, Nilesh Kamani wrote: > Hello All, > > When we search google, sometimes google returns results with mention of > keywords not found (mentioned as strike-through) > > Does Solr provide such feature ? > > > Thanks, > Nilesh Kamani >
Re: keywords not found - google like feature
Pasted images are generally stripped out, you'll have to provide an external link. On Thu, Apr 13, 2017 at 12:04 PM, Nilesh Kamani wrote: > Something like this. Does SOLR have such feature ? > > [image: Inline image 1] > > On Thu, Apr 13, 2017 at 1:49 PM, Nilesh Kamani > wrote: > >> Hello All, >> >> When we search google, sometimes google returns results with mention of >> keywords not found (mentioned as strike-through) >> >> Does Solr provide such feature ? >> >> >> Thanks, >> Nilesh Kamani >> > >
Re: keywords not found - google like feature
Are you asking visual representation or an actual feature. Because if all your keywords/clauses are optional (default SHOULD) then Solr automatically tries to match maximum number of them and then less and less. So, if all words do not match, it will return results that match less number of words. And words not-matched is effectively your strike-through negative space. You can probably recover that from debug info, though it will be not pretty and perhaps a bit slower. The real issue here is ranking. Does Google do something special with ranking when they do strike through. Do they do some grouping and ranking within groups, not just a global one? The biggest question is - of course - what is your business - as opposed to look-alike - objective. Because explaining your needs through a similarity with other product's secret implementation is a long way to get there. Too much precision loss in each explanation round. Regards, Alex. http://www.solr-start.com/ - Resources for Solr users, new and experienced On 13 April 2017 at 20:49, Nilesh Kamani wrote: > Hello All, > > When we search google, sometimes google returns results with mention of > keywords not found (mentioned as strike-through) > > Does Solr provide such feature ? > > > Thanks, > Nilesh Kamani
Re: keywords not found - google like feature
Here is the example. https://www.google.ca/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#safe=off&q=solr+spring+trump You will see this under search results. Missing: trump I am not asking for visual representation of such feature. Is there anyway solr is returning such info in response ? My client has this specific requirements that when he searches he wants to know what keywords were not found in results. On Thu, Apr 13, 2017 at 3:34 PM, Alexandre Rafalovitch wrote: > Are you asking visual representation or an actual feature. Because if > all your keywords/clauses are optional (default SHOULD) then Solr > automatically tries to match maximum number of them and then less and > less. So, if all words do not match, it will return results that match > less number of words. > > And words not-matched is effectively your strike-through negative > space. You can probably recover that from debug info, though it will > be not pretty and perhaps a bit slower. > > The real issue here is ranking. Does Google do something special with > ranking when they do strike through. Do they do some grouping and > ranking within groups, not just a global one? > > The biggest question is - of course - what is your business - as > opposed to look-alike - objective. Because explaining your needs > through a similarity with other product's secret implementation is a > long way to get there. Too much precision loss in each explanation > round. > > Regards, >Alex. > > http://www.solr-start.com/ - Resources for Solr users, new and experienced > > > On 13 April 2017 at 20:49, Nilesh Kamani wrote: > > Hello All, > > > > When we search google, sometimes google returns results with mention of > > keywords not found (mentioned as strike-through) > > > > Does Solr provide such feature ? > > > > > > Thanks, > > Nilesh Kamani >
RE: keywords not found - google like feature
Hi - There is no such feature out-of-the-box in Solr. But you probably could modify a highlighter implementation to return this information, the highlighter is the component that comes closest to that feature. Regards, Markus -Original message- > From:Nilesh Kamani > Sent: Thursday 13th April 2017 21:52 > To: solr-user@lucene.apache.org > Subject: Re: keywords not found - google like feature > > Here is the example. > https://www.google.ca/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#safe=off&q=solr+spring+trump > > You will see this under search results. Missing: trump > > I am not asking for visual representation of such feature. > Is there anyway solr is returning such info in response ? > My client has this specific requirements that when he searches he wants to > know what keywords were not found in results. > > > > > On Thu, Apr 13, 2017 at 3:34 PM, Alexandre Rafalovitch > wrote: > > > Are you asking visual representation or an actual feature. Because if > > all your keywords/clauses are optional (default SHOULD) then Solr > > automatically tries to match maximum number of them and then less and > > less. So, if all words do not match, it will return results that match > > less number of words. > > > > And words not-matched is effectively your strike-through negative > > space. You can probably recover that from debug info, though it will > > be not pretty and perhaps a bit slower. > > > > The real issue here is ranking. Does Google do something special with > > ranking when they do strike through. Do they do some grouping and > > ranking within groups, not just a global one? > > > > The biggest question is - of course - what is your business - as > > opposed to look-alike - objective. Because explaining your needs > > through a similarity with other product's secret implementation is a > > long way to get there. Too much precision loss in each explanation > > round. > > > > Regards, > >Alex. > > > > http://www.solr-start.com/ - Resources for Solr users, new and experienced > > > > > > On 13 April 2017 at 20:49, Nilesh Kamani wrote: > > > Hello All, > > > > > > When we search google, sometimes google returns results with mention of > > > keywords not found (mentioned as strike-through) > > > > > > Does Solr provide such feature ? > > > > > > > > > Thanks, > > > Nilesh Kamani > > >
RE: maxDoc ten times greater than numDoc
Thanks, but i am not going to be brave this time :) I have tried reclaimDeletesWeight on an ordinary index some time ago and it was very aggresive with slightly higher values than default. I think setting this weight in this situation would be analogous to a forceMerge every time, which makes sense. Thanks, Markus -Original message- > From:Erick Erickson > Sent: Thursday 13th April 2017 17:07 > To: solr-user > Subject: Re: maxDoc ten times greater than numDoc > > If you want to be brave > > Through a clever bit of reflection, the parameters that > TieredMergePolicy uses to decide what segments to reclaim are settable > in solrconfig.xml (undocumented, so use at your own risk). You could > try bumping > > reclaimDeletesWeight > > in your TieredMergePolicy configuration if you wanted to experiment. > > There's no good reason not to set your segments per tier, it won't hurt. > > But as you say you have a solution so this is just for curiosity's sake. > > Best, > Erick > > On Thu, Apr 13, 2017 at 4:42 AM, Alexandre Rafalovitch > wrote: > > Maybe not every entry got deleted and it was holding up the segment. > > E.g. a child or parent record abandoned. If, for example, the parent > > record has a date field and the child does not, then deleting with a > > date-based query may trigger this. I think there was a bug about > > abandoned child or something. > > > > This is pure speculation of course. > > > > Regards, > >Alex. > > > > http://www.solr-start.com/ - Resources for Solr users, new and experienced > > > > > > On 13 April 2017 at 12:54, Markus Jelsma wrote: > >> I have forced a merge yesterday and went back to one segment. > >> > >> One indexer program reindexes (most or all) every 20 minutes orso. There > >> is nothing custom at that particular point. There is no autoCommit, the > >> indexer program is responsible for a hard commit, it is the single source > >> of reindexed data. > >> > >> After one cycle we had two segments, 50 % deleted, as expected. This was > >> stable for many hours and many cycles. For some reason, i now have 2/3 > >> deletes and three segments, now this situation is stable. So the merges do > >> happen, but sometimes they don't. When they don't, the size increases (now > >> three segments, 55 MB). But it appears that number of segments never > >> decreases, and that is what bothers me. > >> > >> I was about to set segmentsPerTier to two but then i realized i can also > >> delete everything prior to indexing as opposed to deleting only items > >> older than the set i am already about to reindex. This strategy works fine > >> with other reindexing programs, they don't suffer this problem. > >> > >> So, it is not solved, but not a problem anymore. Thanks all anyway :) > >> Markus > >> > >> -Original message- > >>> From:Erick Erickson > >>> Sent: Wednesday 12th April 2017 17:51 > >>> To: solr-user > >>> Subject: Re: maxDoc ten times greater than numDoc > >>> > >>> Yes, this is very strange. My bet: you have something > >>> custom, a setting, indexing code, whatever that > >>> is getting in the way. > >>> > >>> Second possibility (really stretching here): your > >>> merge settings are set to 10 segments having to exist > >>> before merging and somehow not all the docs in the > >>> segments are replaced. So until you get to the 10th > >>> re-index (and assuming a single segment is > >>> produced per re-index) the older segments aren't > >>> merged. If that were the case I'd expect to see the > >>> number of deleted docs drop back periodically > >>> then build up again. A real shot in the dark. One way > >>> to test this would be to specify "segmentsPerTier" of, say, > >>> 2 rather than the default 10, see: > >>> https://cwiki.apache.org/confluence/display/solr/IndexConfig+in+SolrConfig > >>> If this were the case I'd expect with a setting of 2 that > >>> your index might have 50% deleted docs, that would at > >>> least tell us whether we're on the right track. > >>> > >>> Take a look at your index on disk. If you're seeing gaps > >>> in the numbering, you are getting merging, it may be > >>> that they're not happening very often. > >>> > >>> And I take it you have no custom code here and you are > >>> doing commits? (hard commits are all that matters > >>> for merging, it doesn't matter whether openSearcher > >>> is set to true or false). > >>> > >>> I just tried the "techproducts" example as follows: > >>> 1> indexed all the sample files with the bin/solr -e techproducts example > >>> 2> started re-indexing the sample docs one at a time with post.jar > >>> > >>> It took a while, but eventually the original segments got merged away so > >>> I doubt it's any weirdness with a small index. > >>> > >>> Speaking of small index, why are you sharding with only > >>> 8K docs? Sharding will probably slow things down for such > >>> a small index. This isn't germane to your question though. > >>> > >>> Best, > >>> Erick > >>> > >>> > >>>
Re: keywords not found - google like feature
Another ugly solution would be to use the debugQuery=true option, then analyze the reults in explain, if the word isnt in the explain, then you strike it out. On Thu, Apr 13, 2017 at 4:01 PM, Markus Jelsma wrote: > Hi - There is no such feature out-of-the-box in Solr. But you probably > could modify a highlighter implementation to return this information, the > highlighter is the component that comes closest to that feature. > > Regards, > Markus > > > > -Original message- > > From:Nilesh Kamani > > Sent: Thursday 13th April 2017 21:52 > > To: solr-user@lucene.apache.org > > Subject: Re: keywords not found - google like feature > > > > Here is the example. > > https://www.google.ca/webhp?sourceid=chrome-instant&ion=1&; > espv=2&ie=UTF-8#safe=off&q=solr+spring+trump > > > > You will see this under search results. Missing: trump > > > > I am not asking for visual representation of such feature. > > Is there anyway solr is returning such info in response ? > > My client has this specific requirements that when he searches he wants > to > > know what keywords were not found in results. > > > > > > > > > > On Thu, Apr 13, 2017 at 3:34 PM, Alexandre Rafalovitch < > arafa...@gmail.com> > > wrote: > > > > > Are you asking visual representation or an actual feature. Because if > > > all your keywords/clauses are optional (default SHOULD) then Solr > > > automatically tries to match maximum number of them and then less and > > > less. So, if all words do not match, it will return results that match > > > less number of words. > > > > > > And words not-matched is effectively your strike-through negative > > > space. You can probably recover that from debug info, though it will > > > be not pretty and perhaps a bit slower. > > > > > > The real issue here is ranking. Does Google do something special with > > > ranking when they do strike through. Do they do some grouping and > > > ranking within groups, not just a global one? > > > > > > The biggest question is - of course - what is your business - as > > > opposed to look-alike - objective. Because explaining your needs > > > through a similarity with other product's secret implementation is a > > > long way to get there. Too much precision loss in each explanation > > > round. > > > > > > Regards, > > >Alex. > > > > > > http://www.solr-start.com/ - Resources for Solr users, new and > experienced > > > > > > > > > On 13 April 2017 at 20:49, Nilesh Kamani > wrote: > > > > Hello All, > > > > > > > > When we search google, sometimes google returns results with mention > of > > > > keywords not found (mentioned as strike-through) > > > > > > > > Does Solr provide such feature ? > > > > > > > > > > > > Thanks, > > > > Nilesh Kamani > > > > > >
Re: keywords not found - google like feature
Regardless of the business case (which would be good to know) you might want to try something along the lines of http://stackoverflow.com/questions/25038080/how-can-i-tell-solr-to-return-the-hit-search-terms-per-document - basically generate pseudo-fields using the exists() function query which will return a boolean if the term is in a specific field. I've used this for simple cases where it worked well, though I wouldn't like to speculate on how well this scales if you have an edismax query where you might need to generate multiple term/field combinations. HTH -Simon On Thu, Apr 13, 2017 at 3:34 PM, Alexandre Rafalovitch wrote: > Are you asking visual representation or an actual feature. Because if > all your keywords/clauses are optional (default SHOULD) then Solr > automatically tries to match maximum number of them and then less and > less. So, if all words do not match, it will return results that match > less number of words. > > And words not-matched is effectively your strike-through negative > space. You can probably recover that from debug info, though it will > be not pretty and perhaps a bit slower. > > The real issue here is ranking. Does Google do something special with > ranking when they do strike through. Do they do some grouping and > ranking within groups, not just a global one? > > The biggest question is - of course - what is your business - as > opposed to look-alike - objective. Because explaining your needs > through a similarity with other product's secret implementation is a > long way to get there. Too much precision loss in each explanation > round. > > Regards, >Alex. > > http://www.solr-start.com/ - Resources for Solr users, new and experienced > > > On 13 April 2017 at 20:49, Nilesh Kamani wrote: > > Hello All, > > > > When we search google, sometimes google returns results with mention of > > keywords not found (mentioned as strike-through) > > > > Does Solr provide such feature ? > > > > > > Thanks, > > Nilesh Kamani >
Re: keywords not found - google like feature
Thanks for your input guys. I will look into it. On Thu, Apr 13, 2017 at 4:07 PM, simon wrote: > Regardless of the business case (which would be good to know) you might > want to try something along the lines of > http://stackoverflow.com/questions/25038080/how-can-i- > tell-solr-to-return-the-hit-search-terms-per-document > - basically generate pseudo-fields using the exists() function query which > will return a boolean if the term is in a specific field. > I've used this for simple cases where it worked well, though I wouldn't > like to speculate on how well this scales if you have an edismax query > where you might need to generate multiple term/field combinations. > > HTH > > -Simon > > On Thu, Apr 13, 2017 at 3:34 PM, Alexandre Rafalovitch > > wrote: > > > Are you asking visual representation or an actual feature. Because if > > all your keywords/clauses are optional (default SHOULD) then Solr > > automatically tries to match maximum number of them and then less and > > less. So, if all words do not match, it will return results that match > > less number of words. > > > > And words not-matched is effectively your strike-through negative > > space. You can probably recover that from debug info, though it will > > be not pretty and perhaps a bit slower. > > > > The real issue here is ranking. Does Google do something special with > > ranking when they do strike through. Do they do some grouping and > > ranking within groups, not just a global one? > > > > The biggest question is - of course - what is your business - as > > opposed to look-alike - objective. Because explaining your needs > > through a similarity with other product's secret implementation is a > > long way to get there. Too much precision loss in each explanation > > round. > > > > Regards, > >Alex. > > > > http://www.solr-start.com/ - Resources for Solr users, new and > experienced > > > > > > On 13 April 2017 at 20:49, Nilesh Kamani > wrote: > > > Hello All, > > > > > > When we search google, sometimes google returns results with mention of > > > keywords not found (mentioned as strike-through) > > > > > > Does Solr provide such feature ? > > > > > > > > > Thanks, > > > Nilesh Kamani > > >
RE: keywords not found - google like feature
Hi - That is not going to be that easy out-of-the-box. In regular setups the output you find in debugging mode contains stemmed versions of the original input text. At best you use KeepWordsFilterFactory to get unstemmed terms, but those tokens would, in usual cases, also have passed through filters such as LowerCase, AsciiFolding or some language specific normalizer. Causing them not to match most original input tokens. Regards, Markus -Original message- > From:David Hastings > Sent: Thursday 13th April 2017 22:05 > To: solr-user@lucene.apache.org > Subject: Re: keywords not found - google like feature > > Another ugly solution would be to use the debugQuery=true option, then > analyze the reults in explain, if the word isnt in the explain, then you > strike it out. > > On Thu, Apr 13, 2017 at 4:01 PM, Markus Jelsma > wrote: > > > Hi - There is no such feature out-of-the-box in Solr. But you probably > > could modify a highlighter implementation to return this information, the > > highlighter is the component that comes closest to that feature. > > > > Regards, > > Markus > > > > > > > > -Original message- > > > From:Nilesh Kamani > > > Sent: Thursday 13th April 2017 21:52 > > > To: solr-user@lucene.apache.org > > > Subject: Re: keywords not found - google like feature > > > > > > Here is the example. > > > https://www.google.ca/webhp?sourceid=chrome-instant&ion=1&; > > espv=2&ie=UTF-8#safe=off&q=solr+spring+trump > > > > > > You will see this under search results. Missing: trump > > > > > > I am not asking for visual representation of such feature. > > > Is there anyway solr is returning such info in response ? > > > My client has this specific requirements that when he searches he wants > > to > > > know what keywords were not found in results. > > > > > > > > > > > > > > > On Thu, Apr 13, 2017 at 3:34 PM, Alexandre Rafalovitch < > > arafa...@gmail.com> > > > wrote: > > > > > > > Are you asking visual representation or an actual feature. Because if > > > > all your keywords/clauses are optional (default SHOULD) then Solr > > > > automatically tries to match maximum number of them and then less and > > > > less. So, if all words do not match, it will return results that match > > > > less number of words. > > > > > > > > And words not-matched is effectively your strike-through negative > > > > space. You can probably recover that from debug info, though it will > > > > be not pretty and perhaps a bit slower. > > > > > > > > The real issue here is ranking. Does Google do something special with > > > > ranking when they do strike through. Do they do some grouping and > > > > ranking within groups, not just a global one? > > > > > > > > The biggest question is - of course - what is your business - as > > > > opposed to look-alike - objective. Because explaining your needs > > > > through a similarity with other product's secret implementation is a > > > > long way to get there. Too much precision loss in each explanation > > > > round. > > > > > > > > Regards, > > > >Alex. > > > > > > > > http://www.solr-start.com/ - Resources for Solr users, new and > > experienced > > > > > > > > > > > > On 13 April 2017 at 20:49, Nilesh Kamani > > wrote: > > > > > Hello All, > > > > > > > > > > When we search google, sometimes google returns results with mention > > of > > > > > keywords not found (mentioned as strike-through) > > > > > > > > > > Does Solr provide such feature ? > > > > > > > > > > > > > > > Thanks, > > > > > Nilesh Kamani > > > > > > > > > >
Re: keywords not found - google like feature
bq: he searches he wants to know what keywords were not found in results. We need to distinguish between words not found in the returned documents and words not found at all. The solutions above tell you about documents returned. If the keyword was found in a document not returned (say the 11th doc and you have rows set to 10) you'd have no way to know that the keyword was actually in _some_ document just not one of the top N returned. So if your question is really "I want to know what terms were not found in any document", they won't help. Another rather ugly solution would be to facet on the keywords. You'd add some facet clauses like: facet.query=keywordfield:keyword1& facet.query=keywordfield:keyword2& facet.query=keywordfield:keyword3& facet.query=keywordfield:keyword4 The word counts in those returned facets would represent the total number of documents having that keyword, regardless of whether they were in the top N returned. For a bazillion docs this is probably unworkable I admit. Do _not_ facet on keywordfield as in &facet.field=keyword unless you are certain it has a pretty low cardinality, as in maybe 100 or so. Beyond that test. Faceting on a field with a million unique values corpus-wide is just asking for trouble. Best, Erick On Thu, Apr 13, 2017 at 1:12 PM, Markus Jelsma wrote: > Hi - That is not going to be that easy out-of-the-box. In regular setups the > output you find in debugging mode contains stemmed versions of the original > input text. > > At best you use KeepWordsFilterFactory to get unstemmed terms, but those > tokens would, in usual cases, also have passed through filters such as > LowerCase, AsciiFolding or some language specific normalizer. Causing them > not to match most original input tokens. > > Regards, > Markus > > > > -Original message- >> From:David Hastings >> Sent: Thursday 13th April 2017 22:05 >> To: solr-user@lucene.apache.org >> Subject: Re: keywords not found - google like feature >> >> Another ugly solution would be to use the debugQuery=true option, then >> analyze the reults in explain, if the word isnt in the explain, then you >> strike it out. >> >> On Thu, Apr 13, 2017 at 4:01 PM, Markus Jelsma >> wrote: >> >> > Hi - There is no such feature out-of-the-box in Solr. But you probably >> > could modify a highlighter implementation to return this information, the >> > highlighter is the component that comes closest to that feature. >> > >> > Regards, >> > Markus >> > >> > >> > >> > -Original message- >> > > From:Nilesh Kamani >> > > Sent: Thursday 13th April 2017 21:52 >> > > To: solr-user@lucene.apache.org >> > > Subject: Re: keywords not found - google like feature >> > > >> > > Here is the example. >> > > https://www.google.ca/webhp?sourceid=chrome-instant&ion=1&; >> > espv=2&ie=UTF-8#safe=off&q=solr+spring+trump >> > > >> > > You will see this under search results. Missing: trump >> > > >> > > I am not asking for visual representation of such feature. >> > > Is there anyway solr is returning such info in response ? >> > > My client has this specific requirements that when he searches he wants >> > to >> > > know what keywords were not found in results. >> > > >> > > >> > > >> > > >> > > On Thu, Apr 13, 2017 at 3:34 PM, Alexandre Rafalovitch < >> > arafa...@gmail.com> >> > > wrote: >> > > >> > > > Are you asking visual representation or an actual feature. Because if >> > > > all your keywords/clauses are optional (default SHOULD) then Solr >> > > > automatically tries to match maximum number of them and then less and >> > > > less. So, if all words do not match, it will return results that match >> > > > less number of words. >> > > > >> > > > And words not-matched is effectively your strike-through negative >> > > > space. You can probably recover that from debug info, though it will >> > > > be not pretty and perhaps a bit slower. >> > > > >> > > > The real issue here is ranking. Does Google do something special with >> > > > ranking when they do strike through. Do they do some grouping and >> > > > ranking within groups, not just a global one? >> > > > >> > > > The biggest question is - of course - what is your business - as >> > > > opposed to look-alike - objective. Because explaining your needs >> > > > through a similarity with other product's secret implementation is a >> > > > long way to get there. Too much precision loss in each explanation >> > > > round. >> > > > >> > > > Regards, >> > > >Alex. >> > > > >> > > > http://www.solr-start.com/ - Resources for Solr users, new and >> > experienced >> > > > >> > > > >> > > > On 13 April 2017 at 20:49, Nilesh Kamani >> > wrote: >> > > > > Hello All, >> > > > > >> > > > > When we search google, sometimes google returns results with mention >> > of >> > > > > keywords not found (mentioned as strike-through) >> > > > > >> > > > > Does Solr provide such feature ? >> > > > > >> > > > > >> > > > > Thanks, >> > > > > Nilesh Kamani >> > > > >> > > >> > >>
Re: DistributedUpdateProcessorFactory was explicitly disabled from this updateRequestProcessorChain
Why are you adding these update processors (esp. the AddSchemaFieldsUpdateProcessor) after DistributedUpdateProcessor? Try adding them before DUP, and it has a better chance to work. On Wed, Apr 12, 2017 at 3:44 PM, Pratik Thaker < pratik.tha...@smartstreamrdu.com> wrote: > Hi All, > > I am facing this issue since very long, can you please provide your > suggestion on it ? > > Regards, > Pratik Thaker > > -Original Message- > From: Pratik Thaker [mailto:pratik.tha...@smartstreamrdu.com] > Sent: 09 February 2017 21:24 > To: 'solr-user@lucene.apache.org' > Subject: RE: DistributedUpdateProcessorFactory was explicitly disabled > from this updateRequestProcessorChain > > Hi Friends, > > Can you please try to give me some details about below issue ? > > Regards, > Pratik Thaker > > From: Pratik Thaker > Sent: 07 February 2017 17:12 > To: 'solr-user@lucene.apache.org' > Subject: DistributedUpdateProcessorFactory was explicitly disabled from > this updateRequestProcessorChain > > Hi All, > > I am using SOLR Cloud 6.0 > > I am receiving below exception very frequently in solr logs, > > o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: > RunUpdateProcessor has received an AddUpdateCommand containing a document > that appears to still contain Atomic document update operations, most > likely because DistributedUpdateProcessorFactory was explicitly disabled > from this updateRequestProcessorChain > at org.apache.solr.update.processor.RunUpdateProcessor.processAdd( > RunUpdateProcessorFactory.java:63) > at org.apache.solr.update.processor.UpdateRequestProcessor. > processAdd(UpdateRequestProcessor.java:48) > at org.apache.solr.update.processor.AddSchemaFieldsUpdateProcessor > Factory$AddSchemaFieldsUpdateProcessor.processAdd( > AddSchemaFieldsUpdateProcessorFactory.java:335) > at org.apache.solr.update.processor.UpdateRequestProcessor. > processAdd(UpdateRequestProcessor.java:48) > at org.apache.solr.update.processor.FieldMutatingUpdateProcessor. > processAdd(FieldMutatingUpdateProcessor.java:117) > at org.apache.solr.update.processor.UpdateRequestProcessor. > processAdd(UpdateRequestProcessor.java:48) > at org.apache.solr.update.processor.FieldMutatingUpdateProcessor. > processAdd(FieldMutatingUpdateProcessor.java:117) > at org.apache.solr.update.processor.UpdateRequestProcessor. > processAdd(UpdateRequestProcessor.java:48) > at org.apache.solr.update.processor.FieldMutatingUpdateProcessor. > processAdd(FieldMutatingUpdateProcessor.java:117) > at org.apache.solr.update.processor.UpdateRequestProcessor. > processAdd(UpdateRequestProcessor.java:48) > at org.apache.solr.update.processor.FieldMutatingUpdateProcessor. > processAdd(FieldMutatingUpdateProcessor.java:117) > at org.apache.solr.update.processor.UpdateRequestProcessor. > processAdd(UpdateRequestProcessor.java:48) > at org.apache.solr.update.processor.FieldNameMutatingUpdateProcess > orFactory$1.processAdd(FieldNameMutatingUpdateProcessorFactory.java:74) > at org.apache.solr.update.processor.UpdateRequestProcessor. > processAdd(UpdateRequestProcessor.java:48) > at org.apache.solr.update.processor.FieldMutatingUpdateProcessor. > processAdd(FieldMutatingUpdateProcessor.java:117) > at org.apache.solr.update.processor.UpdateRequestProcessor. > processAdd(UpdateRequestProcessor.java:48) > at org.apache.solr.update.processor.DistributedUpdateProcessor. > doLocalAdd(DistributedUpdateProcessor.java:936) > at org.apache.solr.update.processor.DistributedUpdateProcessor. > versionAdd(DistributedUpdateProcessor.java:1091) > at org.apache.solr.update.processor.DistributedUpdateProcessor. > processAdd(DistributedUpdateProcessor.java:714) > at org.apache.solr.update.processor.UpdateRequestProcessor. > processAdd(UpdateRequestProcessor.java:48) > at org.apache.solr.update.processor.AbstractDefaultValueUpdateProc > essorFactory$DefaultValueUpdateProcessor.processAdd( > AbstractDefaultValueUpdateProcessorFactory.java:93) > at org.apache.solr.handler.loader.JavabinLoader$1.update( > JavabinLoader.java:97) > > Can you please help me with the root cause ? Below is the snapshot of > solrconfig, > > > > > > > > > > [^\w-\.] > _ > > > > > > > -MM-dd'T'HH:mm:ss.SSSZ > -MM-dd'T'HH:mm:ss,SSSZ > -MM-dd'T'HH:mm:ss.SSS > -MM-dd'T'HH:mm:ss,SSS > -MM-dd'T'HH:mm:ssZ > -MM-dd'T'HH:mm:ss > -MM-dd'T'HH:mmZ > -MM-dd'T'HH:mm > -MM-dd HH:mm:ss.SSSZ > -MM-dd HH:mm:ss,SSSZ > -MM-dd HH:mm:ss.SSS > -MM-dd HH:mm:ss,SSS > -MM-dd HH:mm:ssZ > -MM-dd HH:mm:ss > -MM-dd HH:mmZ > -MM-dd HH:mm > -MM-dd > > > > strings >
Re: keywords not found - google like feature
After reading everyone's post, my thoughts are sometimes things are better achieved with smoke and mirrors. I achieved something similar by measuring my scores with no keyword hits. I wrote simple jquery script to do a CSS strike through on the returned message if the score was poor, + I returned zero results. I run different CSS for different messages all the time. Kind of working from the vantage that if your score is crap so are the results. Generally I can get my searches down to ['response']['numFound']=0 ~ I animate the message sometimes. . On 13 April 2017 at 13:49, Nilesh Kamani wrote: > Hello All, > > When we search google, sometimes google returns results with mention of > keywords not found (mentioned as strike-through) > > Does Solr provide such feature ? > > > Thanks, > Nilesh Kamani >
Re: Enable Gzip compression Solr 6.0
Hi Mahmoud Beware of using a proxy. Your web application will get attacked, and you should only forward the parameters that are needed for your app features. But you thought of that already. Cheers -- Rick On April 12, 2017 11:39:57 PM EDT, Mahmoud Almokadem wrote: >Thanks Rick, > >I already running Solr on my infrastructure and behind a web >application. > >The web application is working as a proxy before Solr, so I think I can >compress the content on Solr end. But I have made it on the proxy now. > >Thanks again, >Mahmoud > > >> On Apr 12, 2017, at 4:31 PM, Rick Leir wrote: >> >> Hi Mahmoud >> I assume you are running Solr 'behind' a web application, so Solr is >not directly on the net. >> >> The gzip compression is an Apache thing, and relates to your web >application. >> >> Connections to Solr are within your infrastructure, so you might not >want to gzip them. But maybe your setup is different? >> >> Older versions of Solr used Tomcat which supported gzip. Newer >versions use Zookeeper and Jetty and you prolly will find a way. >> Cheers -- Rick >> >>> On April 12, 2017 8:48:45 AM EDT, Mahmoud Almokadem > wrote: >>> Hello, >>> >>> How can I enable Gzip compression for Solr 6.0 to save bandwidth >>> between >>> the server and clients? >>> >>> Thanks, >>> Mahmoud >> >> -- >> Sorry for being brief. Alternate email is rickleir at yahoo dot com -- Sorry for being brief. Alternate email is rickleir at yahoo dot com
Solr 4.10 and Distributed pivot faceting in Non-Solr cloud mode
Hi , Would like to know Solr 4.10 supports distributed pivot faceting in non-solr cloud mode. According to the below JIRA , it looks like it is fixed in 4.10. But we use solr in non cloud mode. https://issues.apache.org/jira/browse/SOLR-2894 Thank you, Raji
Re: Error when adding user for Solr Basic Authentication
i found from StackOverflow that we have to use escaping of single quote and to use double quote around the JSON string. http://stackoverflow.com/questions/43387719/error-when-adding-user-for-solr-basic-authentication/43387895#43387895 curl --user user:password http://localhost:8983/solr/admin/authentication -H "Content-type:application/json" -d "{ \"set-user\": {\"tom\" : \"TomIsCool\" , \"harry\":\"HarrysSecret\"}}" Regards, Edwin On 13 April 2017 at 15:03, Zheng Lin Edwin Yeo wrote: > Hi, > > When I try to add the user for the Solr Basic Authentication using the > following method in curl > > curl --user user:password http://localhost:8983/solr/admin/authentication > -H 'Content-type:application/json' -d '{ > "set-user": {"tom" : "TomIsCool" , >"harry":"HarrysSecret"}}' > > I get the following error: > > { > "responseHeader":{ > "status":400, > "QTime":0}, > "error":{ > "metadata":[ > "error-class","org.apache.solr.common.SolrException", > "root-error-class","org.apache.solr.common.SolrException"], > "msg":"No contentStream", > "code":400}} > curl: (3) [globbing] unmatched brace in column 1 > 枩]?V7`-{炘9叡 t肤 ,? E'qyT咐黣]儎;衷 鈛^W褹?curl: (3) [globbing] unmatched cl > ose brace/bracket in column 13 > > > What does this error means and how should we resolve it? > I'm using SolrCloud on Solr 6.4.2. > > > Regards, > Edwin > >
Re: Need help with auto-suggester
You can create a copy field and copy to it from all the fields you want to retrieve the suggestions from and then use that field with the suggester. On Thu 13 Apr, 2017, 23:21 OTH, wrote: > Hello, > > I've followed the steps here to set up auto-suggest: > https://lucidworks.com/2015/03/04/solr-suggester/ > > So basically I configured the auto-suggester in solrconfig.xml, where I > told it which field in my index needs to be used for auto-suggestion. > > The problem is: > When the user searches in the text box in the front end, if they are > searching for cities, I also need the countries to appear in the drop-down > list which the user sees. > The field which is being searched is only 'city' here. However, I need to > retrieve the corresponding value in the 'country' field as well. > > How could I do this using the suggester? > > Thanks > -- Regards, Binoy Dalal