Re: I want "john smi" to find "john smith" in my custom "fullname_s" field
Nick, "string" is a primitive data-type and the entire value of a field is indexed as single token. The regex matching happens against the tokens for text fields and against the full content for string fields. So once a piece of text is tokenized, there is no way to perform a regex query across word boundaries. fullname_s:john smi* is working for me. { "responseHeader":{ "zkConnected":true, "status":0, "QTime":16, "params":{ "q":"fullname_s:john smi*", "indent":"on", "wt":"json"}}, "response":{"numFound":1,"start":0,"maxScore":1.0,"docs":[ { "id":"1", "fullname_s":"john smith", "_version_":1569446064473243648}] }} I am on Solr 6.5.0. What version you are on? Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 On Tue, Jun 6, 2017 at 1:30 PM, Nick Way wrote: > Hi - I have a Solr collection with a custom field "fullname_s" (a string). > > I want "john smi" to find "john smith" (I lower-cased the names upon > indexing them) > > I have tried > > fullname_s:"john smi*" > fullname_s:john smi* > fullname_s:"john smi?" > fullname_s:john smi? > > > but nothing gives the expected result - am I missing something? I spent > hours on this one point yesterday so if anyone can please point me in the > right direction I'd be really grateful. > > I'm using Solr with Adobe Coldfusion by the way but I think the principles > are the same. > > Thank you! > > Nick >
Re: I want "john smi" to find "john smith" in my custom "fullname_s" field
Erik, Thank you for correcting. Things I miss out on daily bases: _text_ :) Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 On Tue, Jun 6, 2017 at 5:12 PM, Nick Way wrote: > Fantastic thank you so much; I now have 'fullname_s:#string. > spacesescaped#* > or email_s:#string.spacesescaped#*' which is working like a dream - thank > you so much - really appreciate your help. > > Thank you also Amrit. > > Nick > > On 6 June 2017 at 10:40, Erik Hatcher wrote: > > > Nick - try escaping the space, so that your query is q=fullname_s:john\ > > smi* > > > > However, whitespace and escaping is problematic. There is a handy prefix > > query parser, so this would work on a string field with spaces: > > > > q={!prefix f=fullname_s}john smi > > > > note no trailing asterisk on that one. Even better, IMO, is to separate > > the query string from the query parser: > > > > q={!prefix f=fullname_s v=$qq}&qq=john smi > > > > Erik > > > > > > > > Amrit - the issue with your example below is that q=fullname_s:john smi* > > parses “john” against fullname_s and “smi” as a prefix query against the > > default field, not likely fullname_s. Check your parsed query to see > > exactly how it parsed.It works for you because… magic! (copyField * > > => _text_) > > > > > > > > > > > On Jun 6, 2017, at 5:14 AM, Amrit Sarkar > wrote: > > > > > > Nick, > > > > > > "string" is a primitive data-type and the entire value of a field is > > > indexed as single token. The regex matching happens against the tokens > > for > > > text fields and against the full content for string fields. So once a > > piece > > > of text is tokenized, there is no way to perform a regex query across > > word > > > boundaries. > > > > > > fullname_s:john smi* is working for me. > > > > > > { > > > "responseHeader":{ > > >"zkConnected":true, > > >"status":0, > > >"QTime":16, > > >"params":{ > > > "q":"fullname_s:john smi*", > > > "indent":"on", > > > "wt":"json"}}, > > > "response":{"numFound":1,"start":0,"maxScore":1.0,"docs":[ > > > { > > >"id":"1", > > >"fullname_s":"john smith", > > >"_version_":1569446064473243648}] > > > }} > > > > > > I am on Solr 6.5.0. What version you are on? > > > > > > > > > Amrit Sarkar > > > Search Engineer > > > Lucidworks, Inc. > > > 415-589-9269 > > > www.lucidworks.com > > > Twitter http://twitter.com/lucidworks > > > LinkedIn: https://www.linkedin.com/in/sarkaramrit2 > > > > > > On Tue, Jun 6, 2017 at 1:30 PM, Nick Way > > > > wrote: > > > > > >> Hi - I have a Solr collection with a custom field "fullname_s" (a > > string). > > >> > > >> I want "john smi" to find "john smith" (I lower-cased the names upon > > >> indexing them) > > >> > > >> I have tried > > >> > > >> fullname_s:"john smi*" > > >> fullname_s:john smi* > > >> fullname_s:"john smi?" > > >> fullname_s:john smi? > > >> > > >> > > >> but nothing gives the expected result - am I missing something? I > spent > > >> hours on this one point yesterday so if anyone can please point me in > > the > > >> right direction I'd be really grateful. > > >> > > >> I'm using Solr with Adobe Coldfusion by the way but I think the > > principles > > >> are the same. > > >> > > >> Thank you! > > >> > > >> Nick > > >> > > > > >
Re: com.ibm.icu dependency errors when building solr source code
Running "ant eclipse" or "ant test" in verbose mode will provide you the exact lib in ivy2 cache which is corrupt. Delete that particular lib and run "ant" again. Also don't try to get out / exit "ant" commands via Ctrl+C or Ctrl+V while it is downloading the libraries to ivy2 folder.
Re: async backup
Damien, then I poll with REQUESTSTATUS REQUESTSTATUS is an API which provided you the status of the any API (including other heavy duty apis like SPLITSHARD or CREATECOLLECTION) associated with async_id at that current timestamp / moment. Does that give you "state"="completed"? Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 On Tue, Jun 27, 2017 at 5:25 AM, Damien Kamerman wrote: > A regular backup creates the files in this order: > drwxr-xr-x 2 root root 63 Jun 27 09:46 snapshot.shard7 > drwxr-xr-x 2 root root 159 Jun 27 09:46 snapshot.shard8 > drwxr-xr-x 2 root root 135 Jun 27 09:46 snapshot.shard1 > drwxr-xr-x 2 root root 178 Jun 27 09:46 snapshot.shard3 > drwxr-xr-x 2 root root 210 Jun 27 09:46 snapshot.shard11 > drwxr-xr-x 2 root root 218 Jun 27 09:46 snapshot.shard9 > drwxr-xr-x 2 root root 180 Jun 27 09:46 snapshot.shard2 > drwxr-xr-x 2 root root 164 Jun 27 09:47 snapshot.shard5 > drwxr-xr-x 2 root root 252 Jun 27 09:47 snapshot.shard6 > drwxr-xr-x 2 root root 103 Jun 27 09:47 snapshot.shard12 > drwxr-xr-x 2 root root 135 Jun 27 09:47 snapshot.shard4 > drwxr-xr-x 2 root root 119 Jun 27 09:47 snapshot.shard10 > drwxr-xr-x 3 root root 4 Jun 27 09:47 zk_backup > -rw-r--r-- 1 root root 185 Jun 27 09:47 backup.properties > > While an async backup creates files in this order: > drwxr-xr-x 2 root root 15 Jun 27 09:49 snapshot.shard3 > drwxr-xr-x 2 root root 15 Jun 27 09:49 snapshot.shard9 > drwxr-xr-x 2 root root 62 Jun 27 09:49 snapshot.shard6 > drwxr-xr-x 2 root root 37 Jun 27 09:49 snapshot.shard2 > drwxr-xr-x 2 root root 67 Jun 27 09:49 snapshot.shard7 > drwxr-xr-x 2 root root 75 Jun 27 09:49 snapshot.shard5 > drwxr-xr-x 2 root root 70 Jun 27 09:49 snapshot.shard8 > drwxr-xr-x 2 root root 15 Jun 27 09:49 snapshot.shard4 > drwxr-xr-x 2 root root 15 Jun 27 09:50 snapshot.shard11 > drwxr-xr-x 2 root root 127 Jun 27 09:50 snapshot.shard1 > drwxr-xr-x 2 root root 116 Jun 27 09:50 snapshot.shard12 > drwxr-xr-x 3 root root 4 Jun 27 09:50 zk_backup > -rw-r--r-- 1 root root 185 Jun 27 09:50 backup.properties > drwxr-xr-x 2 root root 25 Jun 27 09:51 snapshot.shard10 > > > shard10 is much larger than the other shards. > > From the logs: > INFO - 2017-06-27 09:50:33.832; [ ] org.apache.solr.cloud.BackupCmd; > Completed backing up ZK data for backupName=collection1 > INFO - 2017-06-27 09:50:33.800; [ ] > org.apache.solr.handler.admin.CoreAdminOperation; Checking request status > for : backup1103459705035055 > INFO - 2017-06-27 09:50:33.800; [ ] > org.apache.solr.servlet.HttpSolrCall; [admin] webapp=null > path=/admin/cores > params={qt=/admin/cores&requestid=backup1103459705035055&action= > REQUESTSTATUS&wt=javabin&version=2} > status=0 QTime=0 > INFO - 2017-06-27 09:51:33.405; [ ] org.apache.solr.handler. > SnapShooter; > Done creating backup snapshot: shard10 at file:///online/backup/ > collection1 > > Has anyone seen this bug, or knows a workaround? > > > On 27 June 2017 at 09:47, Damien Kamerman wrote: > > > Yes, the async command returns, and then I poll with REQUESTSTATUS. > > > > On 27 June 2017 at 01:24, Varun Thacker wrote: > > > >> Hi Damien, > >> > >> A backup command with async is supposed to return early. It is start the > >> backup process and return. > >> > >> Are you using the REQUESTSTATUS ( > >> http://lucene.apache.org/solr/guide/6_6/collections-api.html > >> #collections-api > >> ) API to validate if the backup is complete? > >> > >> On Sun, Jun 25, 2017 at 10:28 PM, Damien Kamerman > >> wrote: > >> > >> > I've noticed an issue with the Solr 6.5.1 Collections API BACKUP async > >> > command returning early. The state is finished well before one shard > is > >> > finished. > >> > > >> > The collection I'm backing up has 12 shards across 6 nodes and I > suspect > >> > the issue is that it is not waiting for all backups on the node to > >> finish. > >> > > >> > Alternatively, I if I change the request to not be async it works OK > but > >> > sometimes I get the exception "backup the collection time out:180s". > >> > > >> > Has anyone seen this, or knows a workaround? > >> > > >> > Cheers, > >> > Damien. > >> > > >> > > > > >
Re: dynamic datasource password in db_data_config file
Javed, Can you let us know if you are running in standalone or cloud mode? Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 On Mon, Jul 17, 2017 at 11:54 AM, javeed wrote: > HI Team, > Can you please update on this issue. > > Thank you > > > > > -- > View this message in context: http://lucene.472066.n3. > nabble.com/dynamic-datasource-password-in-db-data-config- > file-tp4345804p4346288.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: TransactionLog doesn't know how to serialize class java.util.UUID; try implementing ObjectResolver?
I looked into the code TransactionLog.java (branch_5_5) :: JavaBinCodec.ObjectResolver resolver = new JavaBinCodec.ObjectResolver() { @Override public Object resolve(Object o, JavaBinCodec codec) throws IOException { if (o instanceof BytesRef) { BytesRef br = (BytesRef)o; codec.writeByteArray(br.bytes, br.offset, br.length); return null; } // Fallback: we have no idea how to serialize this. Be noisy to prevent insidious bugs throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, "TransactionLog doesn't know how to serialize " + o.getClass() + "; try implementing ObjectResolver?"); } }; While UUID implements serializable, so should be BytesRef instance to?? :: public final class UUID implements java.io.Serializable, Comparable Can you share the payload with you are trying to update? Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 On Mon, Jul 17, 2017 at 7:03 PM, deviantcode wrote: > Hi Mahmoud, did you ever get to the bottom of this? I'm having the same > issue > on solr 5.5.2 > > > > -- > View this message in context: http://lucene.472066.n3. > nabble.com/TransactionLog-doesn-t-know-how-to-serialize- > class-java-util-UUID-try-implementing-ObjectResolver- > tp4332277p4346335.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Solr Subfaceting
Poornima, Regarding 3; You can do something like: CloudSolrClient client = new CloudSolrClient("localhost:9983"); SolrParams params = new ModifiableSolrParams().add("q","*:*") .add("json.facet","{.}"); QueryResponse response = client.query(params); Setting key and value via SolrParams is available. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 On Mon, Jul 17, 2017 at 8:48 PM, Ponnuswamy, Poornima (GE Healthcare) < poornima.ponnusw...@ge.com> wrote: > Hello, > > We have Solr version 6.4.2 and we have been using Solr Subfaceting – > Terms Facet as per the document https://cwiki.apache.org/ > confluence/display/solr/Faceted+Search in our project. > > In our project which is going to go in production soon, we use it for > getting the facet/subfacet counts, sort etc. We make a direct rest call to > solr and the counts matches perfectly. I have few questions and > clarification on this approach and appreciate your response on this. > > > > 1. In confluence - https://cwiki.apache.org/confluence/display/solr/ > Faceted+Search it page says its experimental and may change > significantly. Is it safe for us to use the Terms faceting or will it > change in future releases?. When will this be official?. > 2. As Term faceting has few advantages over Pivot facet as per > http://yonik.com/solr-subfacets/ we went on with it. Is it safe to use it > or do we use Pivot faceting instead? > 3. Currently we make a rest call to Solr API to get results. Now we are > planning to move to Solr Cloud and use Solrj library to integrate with > Solr. I don’t see any support for Terms faceting (json.facet) in Solrj > library. Am I overlooking it or will it be supported in future releases? > > Appreciate your response. > > Thanks, > Poornima > >
Re: Solr Subfaceting
Poornima, 1. In confluence - https://cwiki.apache.org/confluence/display/solr/ Faceted+Search it page says its experimental and may change significantly. Is it safe for us to use the Terms faceting or will it change in future releases?. When will this be official?. A lot of people / engineers are using json faceting in their production today itself. By "experimental and may change significantly" simple means the end points of request and response may change in in future releases, hence the back-compat will suffer. If you are upgrading to future released solr version, you have to make sure the client code you have wrote at your end (via SolrJ) is upto date with that solr version (you upgrade to). 2. As Term faceting has few advantages over Pivot facet as per http://yonik.com/solr-subfacets/ we went on with it. Is it safe to use it or do we use Pivot faceting instead? In my opinion, you should use the better feature. Though you may hit some limitations of json faceting and their respective would be jiras opened too. Rest Mr. Seeley would be the the best person the 2nd. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 On Mon, Jul 17, 2017 at 10:43 PM, Ponnuswamy, Poornima (GE Healthcare) < poornima.ponnusw...@ge.com> wrote: > Thanks for your response. I have tried with SolrParams and it works for me. > > Any feedback on question 1 & 2. > > Thanks, > Poornima > > On 7/17/17, 12:38 PM, "Amrit Sarkar" wrote: > > Poornima, > > Regarding 3; > You can do something like: > > CloudSolrClient client = new CloudSolrClient("localhost:9983"); > > SolrParams params = new ModifiableSolrParams().add("q","*:*") > .add("json.facet","{.}"); > > QueryResponse response = client.query(params); > > Setting key and value via SolrParams is available. > > > Amrit Sarkar > Search Engineer > Lucidworks, Inc. > 415-589-9269 > www.lucidworks.com > Twitter http://twitter.com/lucidworks > LinkedIn: https://www.linkedin.com/in/sarkaramrit2 > > On Mon, Jul 17, 2017 at 8:48 PM, Ponnuswamy, Poornima (GE Healthcare) < > poornima.ponnusw...@ge.com> wrote: > > > Hello, > > > > We have Solr version 6.4.2 and we have been using Solr Subfaceting – > > Terms Facet as per the document https://cwiki.apache.org/ > > confluence/display/solr/Faceted+Search in our project. > > > > In our project which is going to go in production soon, we use it for > > getting the facet/subfacet counts, sort etc. We make a direct rest > call to > > solr and the counts matches perfectly. I have few questions and > > clarification on this approach and appreciate your response on this. > > > > > > > > 1. In confluence - https://cwiki.apache.org/ > confluence/display/solr/ > > Faceted+Search it page says its experimental and may change > > significantly. Is it safe for us to use the Terms faceting or will it > > change in future releases?. When will this be official?. > > 2. As Term faceting has few advantages over Pivot facet as per > > http://yonik.com/solr-subfacets/ we went on with it. Is it safe to > use it > > or do we use Pivot faceting instead? > > 3. Currently we make a rest call to Solr API to get results. Now > we are > > planning to move to Solr Cloud and use Solrj library to integrate > with > > Solr. I don’t see any support for Terms faceting (json.facet) in > Solrj > > library. Am I overlooking it or will it be supported in future > releases? > > > > Appreciate your response. > > > > Thanks, > > Poornima > > > > > > >
Re: Parent child documents partial update
Sujay, Not really. Parent-child documents are stored in a single block contiguously. Read more about parent-child relationship at: https://medium.com/@sarkaramrit2/multiple-documents-with-same-doc-id-in-index-in-solr-cloud-32c072db2164 While we perform partial / atomic update, say {"id":"X", "fieldA":{"set":"Z"}, that particular doc with X will be fetched (all the "stored" fields), update will be performed and indexed, all happens in *DistributedUpdateProcessor* internally. So there is no way it will fetch the child documents along with it. I am not sure whether this can be done with current code or it will be fixed / improved in the future. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 On Mon, Jul 17, 2017 at 12:44 PM, Sujay Bawaskar wrote: > Hi, > > Need a help to understand solr parent child document partial update > behaviour. Can we perform partial update on parent document without losing > its chiild documents? My observation is that parent child relationship > between documents get lost in case partial update is performed on parent. > Any work around or solution to this issue? > > -- > Thanks, > Sujay P Bawaskar > M:+91-77091 53669 >
Re: Help with updateHandler commit stats
Antonio, I think it is itself suggesting what it is. Meanwhile in official documentation: autocommits Total number of auto-commits executed. so yeah, total number of commits executed in the core's lifetime. Look into: https://cwiki.apache.org/confluence/display/solr/Performance+Statistics+Reference for more details. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 On Fri, Jul 7, 2017 at 4:15 PM, Antonio De Miguel wrote: > Hi, > > I'm taking a look to UpdateHandler stats... and i see when autosoftcommit > occurs (every 10 secs) both metrics, "commits" and "soft autocommits" > increments by one. ¿is this normal? > > My config is: > > autoCommit: 180 secs > autoSoftCommit: 10 secs > > Thanks! >
Re: CloudSolrClient preferred over LBHttpSolrClient
S G, Not sure about the documentation but: The CloudSolrClient uses a connection to zookeeper to extract cluster information like who is a the leader for a shard in a solr collection. To create a CloudSolrClient all you specify is the zookeepers and which collection you want to work with. Behind the scenes solrj will load balance and send the request to the right "shard" in the cluster. The CloudSolrClient is better if you have a cluster of multiple solr nodes across multiple machines. While in LBHttpSolrClient, load balancing is done using a simple round-robin on the list of servers. Hope this helps. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 On Mon, Jul 17, 2017 at 11:38 PM, S G wrote: > Hi, > > Does anyone know if CloudSolrClient is preferred over LBHttpSolrClient ? > If yes, why so and has there been any good performance benefits documented > anywhere? > > Thanks > SG >
Re: Parent child documents partial update
Sujay, Lucene index is in flat-object document style, so I really not think nested documents at index / storage will ever be supported unless someone change the very intricacy of the index. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 On Tue, Jul 18, 2017 at 8:11 AM, Sujay Bawaskar wrote: > Thanks Amrit. So storage mechanism of parent child documents is limiting > the capability of partial update. It would be great to have flawless parent > child index support in solr. > > On 17-Jul-2017 11:14 PM, "Amrit Sarkar" wrote: > > > Sujay, > > > > Not really. Parent-child documents are stored in a single block > > contiguously. Read more about parent-child relationship at: > > https://medium.com/@sarkaramrit2/multiple-documents-with-same-doc-id-in- > > index-in-solr-cloud-32c072db2164 > > > > While we perform partial / atomic update, say {"id":"X", > > "fieldA":{"set":"Z"}, that particular doc with X will be fetched (all the > > "stored" fields), update will be performed and indexed, all happens in > > *DistributedUpdateProcessor* internally. So there is no way it will fetch > > the child documents along with it. > > > > I am not sure whether this can be done with current code or it will be > > fixed / improved in the future. > > > > Amrit Sarkar > > Search Engineer > > Lucidworks, Inc. > > 415-589-9269 > > www.lucidworks.com > > Twitter http://twitter.com/lucidworks > > LinkedIn: https://www.linkedin.com/in/sarkaramrit2 > > > > On Mon, Jul 17, 2017 at 12:44 PM, Sujay Bawaskar < > sujaybawas...@gmail.com> > > wrote: > > > > > Hi, > > > > > > Need a help to understand solr parent child document partial update > > > behaviour. Can we perform partial update on parent document without > > losing > > > its chiild documents? My observation is that parent child relationship > > > between documents get lost in case partial update is performed on > parent. > > > Any work around or solution to this issue? > > > > > > -- > > > Thanks, > > > Sujay P Bawaskar > > > M:+91-77091 53669 > > > > > >
Re: multiValued=false is not working in Solr 6.4 in RHEL/CentOS
By saying: I am just adding multiValued=false in the managed-schema file. Are you modifying in the local filesystem "conf" or going into the core conf directory and changing there? If you are SolrCloud, you should change the same on Zookeeper.
Re: CDCR - how to deal with the transaction log files
Patrick, Yes! You created default UpdateLog which got written to a disk and then you changed it to CdcrUpdateLog in configs. I find no reason it would create a proper COLLECTIONCHECKPOINT on target tlog. One thing you can try before creating / starting from scratch is restarting source cluster nodes, the leaders of shard will try to create the same COLLECTIONCHECKPOINT, which may or may not be successful. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 On Fri, Jul 21, 2017 at 11:09 AM, Patrick Hoeffel < patrick.hoef...@polarisalpha.com> wrote: > I'm working on my first setup of CDCR, and I'm seeing the same "The log > reader for target collection {collection name} is not initialised" as you > saw. > > It looks like you're creating collections on a regular basis, but for me, > I create it one time and never again. I've been creating the collection > first from defaults and then applying the CDCR-aware solrconfig changes > afterward. It sounds like maybe I need to create the configset in ZK first, > then create the collections, first on the Target and then on the Source, > and I should be good? > > Thanks, > > Patrick Hoeffel > Senior Software Engineer > (Direct) 719-452-7371 > (Mobile) 719-210-3706 > patrick.hoef...@polarisalpha.com > PolarisAlpha.com > > > -Original Message- > From: jmyatt [mailto:jmy...@wayfair.com] > Sent: Wednesday, July 12, 2017 4:49 PM > To: solr-user@lucene.apache.org > Subject: Re: CDCR - how to deal with the transaction log files > > glad to hear you found your solution! I have been combing over this post > and others on this discussion board many times and have tried so many > tweaks to configuration, order of steps, etc, all with absolutely no > success in getting the Source cluster tlogs to delete. So incredibly > frustrating. If anyone has other pearls of wisdom I'd love some advice. > Quick hits on what I've tried: > > - solrconfig exactly like Sean's (target and source respectively) expect > no autoSoftCommit > - I am also calling cdcr?action=DISABLEBUFFER (on source as well as on > target) explicitly before starting since the config setting of > defaultState=disabled doesn't seem to work > - when I create the collection on source first, I get the warning "The log > reader for target collection {collection name} is not initialised". When I > reverse the order (create the collection on target first), no such warning > - tlogs replicate as expected, hard commits on both target and source > cause tlogs to rollover, etc - all of that works as expected > - action=QUEUES on source reflects the queueSize accurately. Also > *always* shows updateLogSynchronizer state as "stopped" > - action=LASTPROCESSEDVERSION on both source and target always seems > correct (I don't see the -1 that Sean mentioned). > - I'm creating new collections every time and running full data imports > that take 5-10 minutes. Again, all data replication, log rollover, and > autocommit activity seems to work as expected, and logs on target are > deleted. It's just those pesky source tlogs I can't get to delete. > > > > -- > View this message in context: http://lucene.472066.n3. > nabble.com/CDCR-how-to-deal-with-the-transaction-log- > files-tp4345062p4345715.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: atomic updates in conjunction with optimistic concurrency
Hendrik, Can you list down the error snippet so that we can refer the code where exactly that is happening. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 On Fri, Jul 21, 2017 at 9:50 PM, Hendrik Haddorp wrote: > Hi, > > when I try to use an atomic update in conjunction with optimistic > concurrency Solr sometimes complains that the version I passed in does not > match. The version in my request however match to what is stored and what > the exception states as the actual version does not exist in the collection > at all. Strangely this does only happen sometimes but once it happens for a > collection it seems to stay like that. Any idea why that might happen? > > I'm using Solr 6.3 in Cloud mode with SolrJ. > > regards, > Hendrik >
Re: atomic updates in conjunction with optimistic concurrency
Hendrik, Ran a little test on 6.3, with infinite atomic updates with optimistic concurrency, cannot *reproduce*: List docs = new ArrayList<>(); > SolrInputDocument document = new SolrInputDocument(); > document.addField("id", String.valueOf(1)); > document.addField("external_version_field_s", System.currentTimeMillis()); // > normal update > docs.add(document); > UpdateRequest updateRequest = new UpdateRequest(); > updateRequest.add(docs); > client.request(updateRequest, collection); > updateRequest = new UpdateRequest(); > updateRequest.commit(client, collection); > > while (true) { > QueryResponse response = client.query(new ModifiableSolrParams().add("q", > "id:1")); > System.out.println(response.getResults().get(0).get("_version_")); > docs = new ArrayList<>(); > document = new SolrInputDocument(); > document.addField("id", String.valueOf(1)); > Map map = new HashMap<>(); > map.put("set", createSentance(1)); // atomic map value > document.addField("external_version_field_s", map); > document.addField("_version_", > response.getResults().get(0).get("_version_")); > docs.add(document); > updateRequest = new UpdateRequest(); > updateRequest.add(docs); > client.request(updateRequest, collection); > updateRequest = new UpdateRequest(); > updateRequest.commit(client, collection); > } > > Maybe you can let us know more details how the update been made? Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 On Fri, Jul 21, 2017 at 10:36 PM, Hendrik Haddorp wrote: > Hi, > > I can't find anything about this in the Solr logs. On the caller side I > have this: > Error from server at http://x_shard1_replica2: version conflict for > x expected=1573538179623944192 actual=1573546159565176832 > org.apache.solr.client.solrj.impl.CloudSolrClient$RouteException: Error > from server at http://x_shard1_replica2: version conflict for x > expected=1573538179623944192 actual=1573546159565176832 > at > org.apache.solr.client.solrj.impl.CloudSolrClient.directUpdate(CloudSolrClient.java:765) > ~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 - > shalin - 2016-11-02 19:52:43] > at > org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1173) > ~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 - > shalin - 2016-11-02 19:52:43] > at org.apache.solr.client.solrj.impl.CloudSolrClient.requestWit > hRetryOnStaleState(CloudSolrClient.java:1062) > ~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 - > shalin - 2016-11-02 19:52:43] > at > org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:1004) > ~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 - > shalin - 2016-11-02 19:52:43] > at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:149) > ~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 - > shalin - 2016-11-02 19:52:43] > at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:106) > ~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 - > shalin - 2016-11-02 19:52:43] > at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:71) > ~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 - > shalin - 2016-11-02 19:52:43] > ... > Caused by: > org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: > Error from server at http://x_shard1_replica2: version conflict for > x expected=1573538179623944192 actual=1573546159565176832 > at > org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:593) > ~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 - > shalin - 2016-11-02 19:52:43] > at > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:262) > ~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 - > shalin - 2016-11-02 19:52:43] > at > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:251) > ~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 - > shalin - 2016-11-02 19:52:43] > at > org.apache.solr.client.solrj.impl.LBHttpSolrClient.doRequest(LBHttpSolrClient.java:435) > ~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 - > shalin - 2016-11-02 19:52:43] > at > org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(L
Re: Sum of double fields in JSON Facet
Zheng, You may want to check https://issues.apache.org/jira/browse/SOLR-7452. I don't know whether they are absolutely related but I am sure I have seen complaints and enquiries regarding not precise statistics with JSON Facets. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 On Tue, Jul 25, 2017 at 6:27 PM, Zheng Lin Edwin Yeo wrote: > This is the way which I put my JSON facet. > > totalAmount:"sum(sum(amount1_d,amount2_d))" > > amount1_d: 69446961.2 > amount2_d: 0 > > Result I get: 69446959.27 > > > Regards, > Edwin > > > On 25 July 2017 at 20:44, Zheng Lin Edwin Yeo > wrote: > > > Hi, > > > > I'm trying to do a sum of two double fields in JSON Facet. One of the > > field has a value of 69446961.2, while the other is 0. However, when I > get > > the result, I'm getting a value of 69446959.27. This is 1.93 lesser than > > the original value. > > > > What could be the reason? > > > > I'm using Solr 6.5.1. > > > > Regards, > > Edwin > > >
Re: SOLR Metric Reporting to graphite
Hi, I didn't had a chance to go through the steps you are doing, but I followed the one written by Varun Thacker via influxdb: https://github.com/vthacker/solr-metrics-influxdb, and it works fine. Maybe it can be of some help. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 On Sun, Aug 6, 2017 at 9:47 PM, abhi Abhishek wrote: > Hi All, > I am trying to setup the graphite reporter for SOLR 6.5.0. i've started > a sample docker instance for graphite with statd ( > https://github.com/hopsoft/docker-graphite-statsd). > > also i've added the graphite metrics reporter in the SOLR.xml config of the > collection. however post doing this i dont see any data getting posted to > the graphite ( > https://cwiki.apache.org/confluence/display/solr/Metrics+Reporting). > added XML Config to solr.xml > >class="org.apache.solr.metrics.reporters.SolrGraphiteReporter"> > localhost > 2003 > 1 > > > Graphite Mapped Ports > HostContainerService > 80 80 nginx <https://www.nginx.com/resources/admin-guide/> > 2003 2003 carbon receiver - plaintext > <http://graphite.readthedocs.io/en/latest/feeding-carbon. > html#the-plaintext-protocol> > 2004 2004 carbon receiver - pickle > <http://graphite.readthedocs.io/en/latest/feeding-carbon. > html#the-pickle-protocol> > 2023 2023 carbon aggregator - plaintext > <http://graphite.readthedocs.io/en/latest/carbon-daemons. > html#carbon-aggregator-py> > 2024 2024 carbon aggregator - pickle > <http://graphite.readthedocs.io/en/latest/carbon-daemons. > html#carbon-aggregator-py> > 8125 8125 statsd <https://github.com/etsy/statsd/blob/master/docs/ > server.md> > 8126 8126 statsd admin > <https://github.com/etsy/statsd/blob/v0.7.2/docs/admin_interface.md> > <https://github.com/hopsoft/docker-graphite-statsd#mounted-volumes> > please advice if i am doing something wrong here. > > Thanks, > Abhishek >
Re: Highlighting Performance improvement suggestions required - Solr 6.5.1
Pardon I didn't go through details in configs and I guess you have already went through the recent talks on highlighters, still sharing if not: https://www.slideshare.net/lucidworks/solr-highlighting-at-full-speed-presented-by-timothy-rodriguez-bloomberg-david-smiley-d-w-smiley-llc https://www.youtube.com/watch?v=tv5qKDKW8kk Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 On Wed, Aug 9, 2017 at 7:45 PM, sasarun wrote: > Hi All, > > I found quite a few discussions on the highlighting performance issue. > Though I tried to implement most of them, performance improvement was > negative. > Currently index count is really low with about 922 records . But the field > on which highlighting is done is quite large data. Querying of data with > highlighting is taking lots of time with 85-90% time taken on highlighting. > Configuration of my set schema.xml is as below > > fieldType name="text_general" class="solr.TextField" > positionIncrementGap="100"> > > > > words="stopwords.txt" /> > > > > > > > words="stopwords.txt" /> > ignoreCase="true" expand="true"/> > > > > stored="true" > termVectors="true" termPositions="true" termOffsets="true" > storeOffsetsWithPositions="true"/> > stored="true"/> > > > Query used in solr is > > hl=true&hl.fl=customContent&hl.fragsize=500&hl.simple.pre= > &hl.simple.post=&hl.snippets=1&hl.method=unified& > hl.bs.type=SENTENCE&hl.fragListBuilder=simple&hl. > maxAnalyzedChars=214748364&facet=true&facet.mincount=1& > facet.limit=-1&facet.s > ort=count&debug=timing&facet.field=contentSpecific > > Also note that We had tried fastvectorhighlighter too but the result was > not > positive. Once when we tried to hl.offsetSource="term_vectors" with unified > result came up in half a second but it didnt had any highlight snippets. > > One of the debug returned by solr is shared below for reference > > time=8833.0,prepare={time=0.0,query={time=0.0},facet={time= > 0.0},facet_module={time=0.0},mlt={time=0.0},hig > hlight={time=0.0},stats={time=0.0},expand={time=0.0},terms={ > time=0.0},debug={time=0.0}},process={time=8826.0,query={ > time=867.0},facet={time=2.0},facet_module={time=0.0},mlt={ > time=0.0},highlight={time=7953.0},stats={time=0.0},expand={time=0.0},ter > ms={time=0.0},debug={time=0.0}},loadFieldValues={time=28.0}} > > Any suggestions to improve the performance would be of great help > > Thanks, > Arun > > > > -- > View this message in context: http://lucene.472066.n3. > nabble.com/Highlighting-Performance-improvement- > suggestions-required-Solr-6-5-1-tp4349767.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: CDCR does not work
Pretty much what Webster and Erick mentioned, else please try the pdf I attached. I followed the official documentation doing that. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 On Thu, Sep 28, 2017 at 8:56 PM, Erick Erickson wrote: > If Webster's idea doesn't solve it, the next thing to check is your > tlogs on the source cluster. If you have a successful connection to > the target and it's operative, the tlogs should be regularly pruned. > If not, they'll collect updates forever. > > Also, your Solr logs should show messages as CDCR does its work, to > you see any evidence that it's > 1> running > 2> sending docs? > > Also, your problem description doesn't provide any information other > than "it doesn't work", which makes it very hard to offer anything > except generalities, you might review: > > https://wiki.apache.org/solr/UsingMailingLists > > Best, > Erick > > > On Thu, Sep 28, 2017 at 7:47 AM, Webster Homer > wrote: > > Check that you have autoCommit enabled in the target schema. > > > > Try sending a commit to the target collection. If you don't have > autoCommit > > enabled then the data could be replicating but not committed so not > > searchable > > > > On Thu, Sep 28, 2017 at 1:57 AM, Jiani Yang wrote: > > > >> Hi, > >> > >> Recently I am trying to use CDCR to do the replication of my solr > cluster. > >> I have done exactly as what the tutorial says, the tutorial link is > shown > >> below: > >> https://lucene.apache.org/solr/guide/6_6/cross-data- > >> center-replication-cdcr.html > >> > >> But I cannot see any change on target data center even every status > looks > >> fine. I have been stuck in this situation for a week and could not find > a > >> way to resolve it, could you please help me? > >> > >> Please reply me ASAP! Thank you! > >> > >> Best, > >> Jiani > >> > > > > -- > > > > > > This message and any attachment are confidential and may be privileged or > > otherwise protected from disclosure. If you are not the intended > recipient, > > you must not copy this message or attachment or disclose the contents to > > any other person. If you have received this transmission in error, please > > notify the sender immediately and delete the message and any attachment > > from your system. Merck KGaA, Darmstadt, Germany and any of its > > subsidiaries do not accept liability for any omissions or errors in this > > message which may arise as a result of E-Mail-transmission or for damages > > resulting from any unauthorized changes of the content of this message > and > > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its > > subsidiaries do not guarantee that this message is free of viruses and > does > > not accept liability for any damages caused by any virus transmitted > > therewith. > > > > Click http://www.emdgroup.com/disclaimer to access the German, French, > > Spanish and Portuguese versions of this disclaimer. >
Re: Very high number of deleted docs
Hi Markus, Emir already mentioned tuning *reclaimDeletesWeight which *affects segments about to merge priority. Optimising index time by time, preferably scheduling weekly / fortnight / ..., at low traffic period to never be in such odd position of 80% deleted docs in total index. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 On Wed, Oct 4, 2017 at 6:02 PM, Emir Arnautović < emir.arnauto...@sematext.com> wrote: > Hi Markus, > You can set reclaimDeletesWeight in merge settings to some higher value > than default (I think it is 2) to favor segments with deleted docs when > merging. > > HTH, > Emir > -- > Monitoring - Log Management - Alerting - Anomaly Detection > Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > > > > > On 4 Oct 2017, at 13:31, Markus Jelsma > wrote: > > > > Hello, > > > > Using a 6.6.0, i just spotted one of our collections having a core of > which over 80 % of the total number of documents were deleted documents. > > > > It has > class="org.apache.solr.index.TieredMergePolicyFactory"/> > configured with no non-default settings. > > > > Is this supposed to happen? How can i prevent these kind of numbers? > > > > Thanks, > > Markus > >
Re: Getting user-level KeeperException
Gunalan, Zookeeper throws KeeperException at /overseer for most of the solr issues, namely indexing. Sync the timestamp of zookeeper error with solr log; the problem lies there most probably. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 On Thu, Oct 12, 2017 at 7:52 AM, Gunalan V wrote: > Hello, > > Could someone please let me know what this user-level keeper exception in > zookeeper mean? and How to fix the same. > > > > > > Thanks, > GVK >
Re: Solr related questions
Hi, 1.) I created a core and tried to simplify the managed-schema file. But if > I remove all "unecessary" fields/fieldtypes, I get errors like: field > "_version_" is missing, type "boolean" is missing and so on. Why do I have > to define this types/fields? Which fields/fieldtypes are required? Solr expects the primitive field names and types in the schema. Though a better explanation should be there. "_version_" and a unique id field is mandatory for each document as "_version_" contains the current version of the document utilised in sync across nodes and atomic updation of the documents. 2.) Can I modify the managed-schema remotly/by program e.g. with a post request or only by editing the managed-schema file directly? Sure, Schema API is available to us for a while: https://lucene.apache.org/solr/guide/6_6/schema-api.html 3.) When I have a service(solrnet client) that pushes a file from a > fileserver to solr, will it cause two times traffic? (from the fileserver > to my service and from the service to solr?) Is there a chance to index the > file direct? (I need to add additional attributes to the index document) Two times traffic? where? Solr will receive the docs once so we are good at that part. Please utilize the SolrJ to index documents if possible, as it is most updates one, if you are on solrcloud, use CloudSolrJClient. Regarding index files direct, you can utilize the DIH (DataImportHandler), depends on the file format, its csv, xml, json, but mind it is single threaded. Hope this clarifies some of it. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 On Fri, Oct 13, 2017 at 3:10 PM, startrekfan wrote: > Hello, > > I have some Solr related questions: > > 1.) I created a core and tried to simplify the managed-schema file. But if > I remove all "unecessary" fields/fieldtypes, I get errors like: field > "_version_" is missing, type "boolean" is missing and so on. Why do I have > to define this types/fields? Which fields/fieldtypes are required? > > 2.) Can I modify the managed-schema remotly/by program e.g. with a post > request or only by editing the managed-schema file directly? > > 3.) When I have a service(solrnet client) that pushes a file from a > fileserver to solr, will it cause two times traffic? (from the fileserver > to my service and from the service to solr?) Is there a chance to index the > file direct? (I need to add additional attributes to the index document) > > Thank you >
Re: solr 7.0.1: exception running post to crawl simple website
Kevin, You are getting NPE at: String type = rawContentType.split(";")[0]; //HERE - rawContentType is NULL // related code String rawContentType = conn.getContentType(); public String getContentType() { return getHeaderField("content-type"); } HttpURLConnection conn = (HttpURLConnection) u.openConnection(); Can you check at your webpage level headers are properly set and it has key "content-type". Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 On Wed, Oct 11, 2017 at 9:08 PM, Kevin Layer wrote: > I want to use solr to index a markdown website. The files > are in native markdown, but they are served in HTML (by markserv). > > Here's what I did: > > docker run --name solr -d -p 8983:8983 -t solr > docker exec -it --user=solr solr bin/solr create_core -c handbook > > Then, to crawl the site: > > quadra[git:master]$ docker exec -it --user=solr solr bin/post -c handbook > http://quadra.franz.com:9091/index.md -recursive 10 -delay 0 -filetypes md > /docker-java-home/jre/bin/java -classpath /opt/solr/dist/solr-core-7.0.1.jar > -Dauto=yes -Drecursive=10 -Ddelay=0 -Dfiletypes=md -Dc=handbook -Ddata=web > org.apache.solr.util.SimplePostTool http://quadra.franz.com:9091/index.md > SimplePostTool version 5.0.0 > Posting web pages to Solr url http://localhost:8983/solr/ > handbook/update/extract > Entering auto mode. Indexing pages with content-types corresponding to > file endings md > SimplePostTool: WARNING: Never crawl an external web site faster than > every 10 seconds, your IP will probably be blocked > Entering recursive mode, depth=10, delay=0s > Entering crawl at level 0 (1 links total, 1 new) > Exception in thread "main" java.lang.NullPointerException > at org.apache.solr.util.SimplePostTool$PageFetcher. > readPageFromUrl(SimplePostTool.java:1138) > at org.apache.solr.util.SimplePostTool.webCrawl( > SimplePostTool.java:603) > at org.apache.solr.util.SimplePostTool.postWebPages( > SimplePostTool.java:563) > at org.apache.solr.util.SimplePostTool.doWebMode( > SimplePostTool.java:365) > at org.apache.solr.util.SimplePostTool.execute( > SimplePostTool.java:187) > at org.apache.solr.util.SimplePostTool.main( > SimplePostTool.java:172) > quadra[git:master]$ > > > Any ideas on what I did wrong? > > Thanks. > > Kevin >
Re: solr 7.0.1: exception running post to crawl simple website
Strange, Can you add: "text/html;charset=utf-8". This is wiki.apache.org page's Content-Type. Let's see what it says now. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 On Fri, Oct 13, 2017 at 6:44 PM, Kevin Layer wrote: > OK, so I hacked markserv to add Content-Type text/html, but now I get > > SimplePostTool: WARNING: Skipping URL with unsupported type text/html > > What is it expecting? > > $ docker exec -it --user=solr solr bin/post -c handbook > http://quadra:9091/index.md -recursive 10 -delay 0 -filetypes md > /docker-java-home/jre/bin/java -classpath /opt/solr/dist/solr-core-7.0.1.jar > -Dauto=yes -Drecursive=10 -Ddelay=0 -Dfiletypes=md -Dc=handbook -Ddata=web > org.apache.solr.util.SimplePostTool http://quadra:9091/index.md > SimplePostTool version 5.0.0 > Posting web pages to Solr url http://localhost:8983/solr/ > handbook/update/extract > Entering auto mode. Indexing pages with content-types corresponding to > file endings md > SimplePostTool: WARNING: Never crawl an external web site faster than > every 10 seconds, your IP will probably be blocked > Entering recursive mode, depth=10, delay=0s > Entering crawl at level 0 (1 links total, 1 new) > SimplePostTool: WARNING: Skipping URL with unsupported type text/html > SimplePostTool: WARNING: The URL http://quadra:9091/index.md returned a > HTTP result status of 415 > 0 web pages indexed. > COMMITting Solr index changes to http://localhost:8983/solr/ > handbook/update/extract... > Time spent: 0:00:03.882 > $ > > Thanks. > > Kevin >
Re: solr 7.0.1: exception running post to crawl simple website
Reference to the code: . String rawContentType = conn.getContentType(); String type = rawContentType.split(";")[0]; if(typeSupported(type) || "*".equals(fileTypes)) { String encoding = conn.getContentEncoding(); . protected boolean typeSupported(String type) { for(String key : mimeMap.keySet()) { if(mimeMap.get(key).equals(type)) { if(fileTypes.contains(key)) return true; } } return false; } . It has another check for fileTypes, I can see the page ending with .md (which you are indexing) and not .html. Let's hope now this is not the issue. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 On Fri, Oct 13, 2017 at 7:04 PM, Amrit Sarkar wrote: > Kevin, > > Just put "html" too and give it a shot. These are the types it is > expecting: > > mimeMap = new HashMap<>(); > mimeMap.put("xml", "application/xml"); > mimeMap.put("csv", "text/csv"); > mimeMap.put("json", "application/json"); > mimeMap.put("jsonl", "application/json"); > mimeMap.put("pdf", "application/pdf"); > mimeMap.put("rtf", "text/rtf"); > mimeMap.put("html", "text/html"); > mimeMap.put("htm", "text/html"); > mimeMap.put("doc", "application/msword"); > mimeMap.put("docx", > "application/vnd.openxmlformats-officedocument.wordprocessingml.document"); > mimeMap.put("ppt", "application/vnd.ms-powerpoint"); > mimeMap.put("pptx", > "application/vnd.openxmlformats-officedocument.presentationml.presentation"); > mimeMap.put("xls", "application/vnd.ms-excel"); > mimeMap.put("xlsx", > "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"); > mimeMap.put("odt", "application/vnd.oasis.opendocument.text"); > mimeMap.put("ott", "application/vnd.oasis.opendocument.text"); > mimeMap.put("odp", "application/vnd.oasis.opendocument.presentation"); > mimeMap.put("otp", "application/vnd.oasis.opendocument.presentation"); > mimeMap.put("ods", "application/vnd.oasis.opendocument.spreadsheet"); > mimeMap.put("ots", "application/vnd.oasis.opendocument.spreadsheet"); > mimeMap.put("txt", "text/plain"); > mimeMap.put("log", "text/plain"); > > The keys are the types supported. > > > Amrit Sarkar > Search Engineer > Lucidworks, Inc. > 415-589-9269 > www.lucidworks.com > Twitter http://twitter.com/lucidworks > LinkedIn: https://www.linkedin.com/in/sarkaramrit2 > > On Fri, Oct 13, 2017 at 6:56 PM, Amrit Sarkar > wrote: > >> Ah! >> >> Only supported type is: text/html; encoding=utf-8 >> >> I am not confident of this either :) but this should work. >> >> See the code-snippet below: >> >> .. >> >> if(res.httpStatus == 200) { >> // Raw content type of form "text/html; encoding=utf-8" >> String rawContentType = conn.getContentType(); >> String type = rawContentType.split(";")[0]; >> if(typeSupported(type) || "*".equals(fileTypes)) { >> String encoding = conn.getContentEncoding(); >> >> >> >> >> Amrit Sarkar >> Search Engineer >> Lucidworks, Inc. >> 415-589-9269 >> www.lucidworks.com >> Twitter http://twitter.com/lucidworks >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2 >> >> On Fri, Oct 13, 2017 at 6:51 PM, Kevin Layer wrote: >> >>> Amrit Sarkar wrote: >>> >>> >> Strange, >>> >> >>> >> Can you add: "text/html;charset=utf-8". This is wiki.apache.org >>> page's >>> >> Content-Type. Let's see what it says now. >>> >>> Same thing. Verified Content-Type: >>> >>> quadra[git:master]$ wget -S -O /dev/null http://quadra:9091/index.md |& >>> grep Content-Type >>> Content-Type: text/html;charset=utf-8 >>> quadra[git:master]$ ] >>> >>> quadra[git:master]$ docker exec -it --user=solr solr bin/post -c >>> handbook http://quadra:9091/index.md -recursive 10 -delay 0 -filetypes >>> md >>> /docker-java-home/jre/bin/java -classpath /opt/solr/dist/solr-core-7.0.1.jar >>> -Dauto=yes -Drecursive=10 -Ddelay=0 -Dfiletypes=md -Dc=handbook -Ddat
Re: solr 7.0.1: exception running post to crawl simple website
Kevin, Just put "html" too and give it a shot. These are the types it is expecting: mimeMap = new HashMap<>(); mimeMap.put("xml", "application/xml"); mimeMap.put("csv", "text/csv"); mimeMap.put("json", "application/json"); mimeMap.put("jsonl", "application/json"); mimeMap.put("pdf", "application/pdf"); mimeMap.put("rtf", "text/rtf"); mimeMap.put("html", "text/html"); mimeMap.put("htm", "text/html"); mimeMap.put("doc", "application/msword"); mimeMap.put("docx", "application/vnd.openxmlformats-officedocument.wordprocessingml.document"); mimeMap.put("ppt", "application/vnd.ms-powerpoint"); mimeMap.put("pptx", "application/vnd.openxmlformats-officedocument.presentationml.presentation"); mimeMap.put("xls", "application/vnd.ms-excel"); mimeMap.put("xlsx", "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"); mimeMap.put("odt", "application/vnd.oasis.opendocument.text"); mimeMap.put("ott", "application/vnd.oasis.opendocument.text"); mimeMap.put("odp", "application/vnd.oasis.opendocument.presentation"); mimeMap.put("otp", "application/vnd.oasis.opendocument.presentation"); mimeMap.put("ods", "application/vnd.oasis.opendocument.spreadsheet"); mimeMap.put("ots", "application/vnd.oasis.opendocument.spreadsheet"); mimeMap.put("txt", "text/plain"); mimeMap.put("log", "text/plain"); The keys are the types supported. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 On Fri, Oct 13, 2017 at 6:56 PM, Amrit Sarkar wrote: > Ah! > > Only supported type is: text/html; encoding=utf-8 > > I am not confident of this either :) but this should work. > > See the code-snippet below: > > .. > > if(res.httpStatus == 200) { > // Raw content type of form "text/html; encoding=utf-8" > String rawContentType = conn.getContentType(); > String type = rawContentType.split(";")[0]; > if(typeSupported(type) || "*".equals(fileTypes)) { > String encoding = conn.getContentEncoding(); > > > > > Amrit Sarkar > Search Engineer > Lucidworks, Inc. > 415-589-9269 > www.lucidworks.com > Twitter http://twitter.com/lucidworks > LinkedIn: https://www.linkedin.com/in/sarkaramrit2 > > On Fri, Oct 13, 2017 at 6:51 PM, Kevin Layer wrote: > >> Amrit Sarkar wrote: >> >> >> Strange, >> >> >> >> Can you add: "text/html;charset=utf-8". This is wiki.apache.org page's >> >> Content-Type. Let's see what it says now. >> >> Same thing. Verified Content-Type: >> >> quadra[git:master]$ wget -S -O /dev/null http://quadra:9091/index.md |& >> grep Content-Type >> Content-Type: text/html;charset=utf-8 >> quadra[git:master]$ ] >> >> quadra[git:master]$ docker exec -it --user=solr solr bin/post -c handbook >> http://quadra:9091/index.md -recursive 10 -delay 0 -filetypes md >> /docker-java-home/jre/bin/java -classpath /opt/solr/dist/solr-core-7.0.1.jar >> -Dauto=yes -Drecursive=10 -Ddelay=0 -Dfiletypes=md -Dc=handbook -Ddata=web >> org.apache.solr.util.SimplePostTool http://quadra:9091/index.md >> SimplePostTool version 5.0.0 >> Posting web pages to Solr url http://localhost:8983/solr/han >> dbook/update/extract >> Entering auto mode. Indexing pages with content-types corresponding to >> file endings md >> SimplePostTool: WARNING: Never crawl an external web site faster than >> every 10 seconds, your IP will probably be blocked >> Entering recursive mode, depth=10, delay=0s >> Entering crawl at level 0 (1 links total, 1 new) >> SimplePostTool: WARNING: Skipping URL with unsupported type text/html >> SimplePostTool: WARNING: The URL http://quadra:9091/index.md returned a >> HTTP result status of 415 >> 0 web pages indexed. >> COMMITting Solr index changes to http://localhost:8983/solr/han >> dbook/update/extract... >> Time spent: 0:00:00.531 >> quadra[git:master]$ >> >> Kevin >> >> >> >> >> Amrit Sarkar >> >> Search Engineer >> >> Lucidworks, Inc. >> >> 415-589-9269 >> >> www.lucidworks.com >> >> Twitter http://twitter.com/lucidworks >> >> LinkedIn: https://www.linkedin.com/in/sa
Re: solr 7.0.1: exception running post to crawl simple website
Ah! Only supported type is: text/html; encoding=utf-8 I am not confident of this either :) but this should work. See the code-snippet below: .. if(res.httpStatus == 200) { // Raw content type of form "text/html; encoding=utf-8" String rawContentType = conn.getContentType(); String type = rawContentType.split(";")[0]; if(typeSupported(type) || "*".equals(fileTypes)) { String encoding = conn.getContentEncoding(); Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 On Fri, Oct 13, 2017 at 6:51 PM, Kevin Layer wrote: > Amrit Sarkar wrote: > > >> Strange, > >> > >> Can you add: "text/html;charset=utf-8". This is wiki.apache.org page's > >> Content-Type. Let's see what it says now. > > Same thing. Verified Content-Type: > > quadra[git:master]$ wget -S -O /dev/null http://quadra:9091/index.md |& > grep Content-Type > Content-Type: text/html;charset=utf-8 > quadra[git:master]$ ] > > quadra[git:master]$ docker exec -it --user=solr solr bin/post -c handbook > http://quadra:9091/index.md -recursive 10 -delay 0 -filetypes md > /docker-java-home/jre/bin/java -classpath /opt/solr/dist/solr-core-7.0.1.jar > -Dauto=yes -Drecursive=10 -Ddelay=0 -Dfiletypes=md -Dc=handbook -Ddata=web > org.apache.solr.util.SimplePostTool http://quadra:9091/index.md > SimplePostTool version 5.0.0 > Posting web pages to Solr url http://localhost:8983/solr/ > handbook/update/extract > Entering auto mode. Indexing pages with content-types corresponding to > file endings md > SimplePostTool: WARNING: Never crawl an external web site faster than > every 10 seconds, your IP will probably be blocked > Entering recursive mode, depth=10, delay=0s > Entering crawl at level 0 (1 links total, 1 new) > SimplePostTool: WARNING: Skipping URL with unsupported type text/html > SimplePostTool: WARNING: The URL http://quadra:9091/index.md returned a > HTTP result status of 415 > 0 web pages indexed. > COMMITting Solr index changes to http://localhost:8983/solr/ > handbook/update/extract... > Time spent: 0:00:00.531 > quadra[git:master]$ > > Kevin > > >> > >> Amrit Sarkar > >> Search Engineer > >> Lucidworks, Inc. > >> 415-589-9269 > >> www.lucidworks.com > >> Twitter http://twitter.com/lucidworks > >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2 > >> > >> On Fri, Oct 13, 2017 at 6:44 PM, Kevin Layer wrote: > >> > >> > OK, so I hacked markserv to add Content-Type text/html, but now I get > >> > > >> > SimplePostTool: WARNING: Skipping URL with unsupported type text/html > >> > > >> > What is it expecting? > >> > > >> > $ docker exec -it --user=solr solr bin/post -c handbook > >> > http://quadra:9091/index.md -recursive 10 -delay 0 -filetypes md > >> > /docker-java-home/jre/bin/java -classpath > /opt/solr/dist/solr-core-7.0.1.jar > >> > -Dauto=yes -Drecursive=10 -Ddelay=0 -Dfiletypes=md -Dc=handbook > -Ddata=web > >> > org.apache.solr.util.SimplePostTool http://quadra:9091/index.md > >> > SimplePostTool version 5.0.0 > >> > Posting web pages to Solr url http://localhost:8983/solr/ > >> > handbook/update/extract > >> > Entering auto mode. Indexing pages with content-types corresponding to > >> > file endings md > >> > SimplePostTool: WARNING: Never crawl an external web site faster than > >> > every 10 seconds, your IP will probably be blocked > >> > Entering recursive mode, depth=10, delay=0s > >> > Entering crawl at level 0 (1 links total, 1 new) > >> > SimplePostTool: WARNING: Skipping URL with unsupported type text/html > >> > SimplePostTool: WARNING: The URL http://quadra:9091/index.md > returned a > >> > HTTP result status of 415 > >> > 0 web pages indexed. > >> > COMMITting Solr index changes to http://localhost:8983/solr/ > >> > handbook/update/extract... > >> > Time spent: 0:00:03.882 > >> > $ > >> > > >> > Thanks. > >> > > >> > Kevin > >> > >
Re: solr 7.0.1: exception running post to crawl simple website
Hi Kevin, Can you post the solr log in the mail thread. I don't think it handled the .md by itself by first glance at code. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 On Fri, Oct 13, 2017 at 7:42 PM, Kevin Layer wrote: > Amrit Sarkar wrote: > > >> Kevin, > >> > >> Just put "html" too and give it a shot. These are the types it is > expecting: > > Same thing. > > >> > >> mimeMap = new HashMap<>(); > >> mimeMap.put("xml", "application/xml"); > >> mimeMap.put("csv", "text/csv"); > >> mimeMap.put("json", "application/json"); > >> mimeMap.put("jsonl", "application/json"); > >> mimeMap.put("pdf", "application/pdf"); > >> mimeMap.put("rtf", "text/rtf"); > >> mimeMap.put("html", "text/html"); > >> mimeMap.put("htm", "text/html"); > >> mimeMap.put("doc", "application/msword"); > >> mimeMap.put("docx", > >> "application/vnd.openxmlformats-officedocument. > wordprocessingml.document"); > >> mimeMap.put("ppt", "application/vnd.ms-powerpoint"); > >> mimeMap.put("pptx", > >> "application/vnd.openxmlformats-officedocument. > presentationml.presentation"); > >> mimeMap.put("xls", "application/vnd.ms-excel"); > >> mimeMap.put("xlsx", > >> "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"); > >> mimeMap.put("odt", "application/vnd.oasis.opendocument.text"); > >> mimeMap.put("ott", "application/vnd.oasis.opendocument.text"); > >> mimeMap.put("odp", "application/vnd.oasis.opendocument.presentation"); > >> mimeMap.put("otp", "application/vnd.oasis.opendocument.presentation"); > >> mimeMap.put("ods", "application/vnd.oasis.opendocument.spreadsheet"); > >> mimeMap.put("ots", "application/vnd.oasis.opendocument.spreadsheet"); > >> mimeMap.put("txt", "text/plain"); > >> mimeMap.put("log", "text/plain"); > >> > >> The keys are the types supported. > >> > >> > >> Amrit Sarkar > >> Search Engineer > >> Lucidworks, Inc. > >> 415-589-9269 > >> www.lucidworks.com > >> Twitter http://twitter.com/lucidworks > >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2 > >> > >> On Fri, Oct 13, 2017 at 6:56 PM, Amrit Sarkar > >> wrote: > >> > >> > Ah! > >> > > >> > Only supported type is: text/html; encoding=utf-8 > >> > > >> > I am not confident of this either :) but this should work. > >> > > >> > See the code-snippet below: > >> > > >> > .. > >> > > >> > if(res.httpStatus == 200) { > >> > // Raw content type of form "text/html; encoding=utf-8" > >> > String rawContentType = conn.getContentType(); > >> > String type = rawContentType.split(";")[0]; > >> > if(typeSupported(type) || "*".equals(fileTypes)) { > >> > String encoding = conn.getContentEncoding(); > >> > > >> > > >> > > >> > > >> > Amrit Sarkar > >> > Search Engineer > >> > Lucidworks, Inc. > >> > 415-589-9269 > >> > www.lucidworks.com > >> > Twitter http://twitter.com/lucidworks > >> > LinkedIn: https://www.linkedin.com/in/sarkaramrit2 > >> > > >> > On Fri, Oct 13, 2017 at 6:51 PM, Kevin Layer wrote: > >> > > >> >> Amrit Sarkar wrote: > >> >> > >> >> >> Strange, > >> >> >> > >> >> >> Can you add: "text/html;charset=utf-8". This is wiki.apache.org > page's > >> >> >> Content-Type. Let's see what it says now. > >> >> > >> >> Same thing. Verified Content-Type: > >> >> > >> >> quadra[git:master]$ wget -S -O /dev/null http://quadra:9091/index.md > |& > >> >> grep Content-Type > >> >> Content-Type: text/html;charset=utf-8 > >
Re: solr 7.0.1: exception running post to crawl simple website
ah oh, dockers. They are placed under [solr-home]/server/log/solr/log in the machine. I haven't played much with docker, any way you can get that file from that location. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 On Fri, Oct 13, 2017 at 8:08 PM, Kevin Layer wrote: > Amrit Sarkar wrote: > > >> Hi Kevin, > >> > >> Can you post the solr log in the mail thread. I don't think it handled > the > >> .md by itself by first glance at code. > > How do I extract the log you want? > > > >> > >> Amrit Sarkar > >> Search Engineer > >> Lucidworks, Inc. > >> 415-589-9269 > >> www.lucidworks.com > >> Twitter http://twitter.com/lucidworks > >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2 > >> > >> On Fri, Oct 13, 2017 at 7:42 PM, Kevin Layer wrote: > >> > >> > Amrit Sarkar wrote: > >> > > >> > >> Kevin, > >> > >> > >> > >> Just put "html" too and give it a shot. These are the types it is > >> > expecting: > >> > > >> > Same thing. > >> > > >> > >> > >> > >> mimeMap = new HashMap<>(); > >> > >> mimeMap.put("xml", "application/xml"); > >> > >> mimeMap.put("csv", "text/csv"); > >> > >> mimeMap.put("json", "application/json"); > >> > >> mimeMap.put("jsonl", "application/json"); > >> > >> mimeMap.put("pdf", "application/pdf"); > >> > >> mimeMap.put("rtf", "text/rtf"); > >> > >> mimeMap.put("html", "text/html"); > >> > >> mimeMap.put("htm", "text/html"); > >> > >> mimeMap.put("doc", "application/msword"); > >> > >> mimeMap.put("docx", > >> > >> "application/vnd.openxmlformats-officedocument. > >> > wordprocessingml.document"); > >> > >> mimeMap.put("ppt", "application/vnd.ms-powerpoint"); > >> > >> mimeMap.put("pptx", > >> > >> "application/vnd.openxmlformats-officedocument. > >> > presentationml.presentation"); > >> > >> mimeMap.put("xls", "application/vnd.ms-excel"); > >> > >> mimeMap.put("xlsx", > >> > >> "application/vnd.openxmlformats-officedocument. > spreadsheetml.sheet"); > >> > >> mimeMap.put("odt", "application/vnd.oasis.opendocument.text"); > >> > >> mimeMap.put("ott", "application/vnd.oasis.opendocument.text"); > >> > >> mimeMap.put("odp", "application/vnd.oasis. > opendocument.presentation"); > >> > >> mimeMap.put("otp", "application/vnd.oasis. > opendocument.presentation"); > >> > >> mimeMap.put("ods", "application/vnd.oasis. > opendocument.spreadsheet"); > >> > >> mimeMap.put("ots", "application/vnd.oasis. > opendocument.spreadsheet"); > >> > >> mimeMap.put("txt", "text/plain"); > >> > >> mimeMap.put("log", "text/plain"); > >> > >> > >> > >> The keys are the types supported. > >> > >> > >> > >> > >> > >> Amrit Sarkar > >> > >> Search Engineer > >> > >> Lucidworks, Inc. > >> > >> 415-589-9269 > >> > >> www.lucidworks.com > >> > >> Twitter http://twitter.com/lucidworks > >> > >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2 > >> > >> > >> > >> On Fri, Oct 13, 2017 at 6:56 PM, Amrit Sarkar < > sarkaramr...@gmail.com> > >> > >> wrote: > >> > >> > >> > >> > Ah! > >> > >> > > >> > >> > Only supported type is: text/html; encoding=utf-8 > >> > >> > > >> > >> > I am not confident of this either :) but this should work. > >> > >> > > >> > >> > See the code-snippet below: &
Re: solr 7.0.1: exception running post to crawl simple website
pardon: [solr-home]/server/log/solr.log Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 On Fri, Oct 13, 2017 at 8:10 PM, Amrit Sarkar wrote: > ah oh, dockers. They are placed under [solr-home]/server/log/solr/log in > the machine. I haven't played much with docker, any way you can get that > file from that location. > > Amrit Sarkar > Search Engineer > Lucidworks, Inc. > 415-589-9269 > www.lucidworks.com > Twitter http://twitter.com/lucidworks > LinkedIn: https://www.linkedin.com/in/sarkaramrit2 > > On Fri, Oct 13, 2017 at 8:08 PM, Kevin Layer wrote: > >> Amrit Sarkar wrote: >> >> >> Hi Kevin, >> >> >> >> Can you post the solr log in the mail thread. I don't think it handled >> the >> >> .md by itself by first glance at code. >> >> How do I extract the log you want? >> >> >> >> >> >> Amrit Sarkar >> >> Search Engineer >> >> Lucidworks, Inc. >> >> 415-589-9269 >> >> www.lucidworks.com >> >> Twitter http://twitter.com/lucidworks >> >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2 >> >> >> >> On Fri, Oct 13, 2017 at 7:42 PM, Kevin Layer wrote: >> >> >> >> > Amrit Sarkar wrote: >> >> > >> >> > >> Kevin, >> >> > >> >> >> > >> Just put "html" too and give it a shot. These are the types it is >> >> > expecting: >> >> > >> >> > Same thing. >> >> > >> >> > >> >> >> > >> mimeMap = new HashMap<>(); >> >> > >> mimeMap.put("xml", "application/xml"); >> >> > >> mimeMap.put("csv", "text/csv"); >> >> > >> mimeMap.put("json", "application/json"); >> >> > >> mimeMap.put("jsonl", "application/json"); >> >> > >> mimeMap.put("pdf", "application/pdf"); >> >> > >> mimeMap.put("rtf", "text/rtf"); >> >> > >> mimeMap.put("html", "text/html"); >> >> > >> mimeMap.put("htm", "text/html"); >> >> > >> mimeMap.put("doc", "application/msword"); >> >> > >> mimeMap.put("docx", >> >> > >> "application/vnd.openxmlformats-officedocument. >> >> > wordprocessingml.document"); >> >> > >> mimeMap.put("ppt", "application/vnd.ms-powerpoint"); >> >> > >> mimeMap.put("pptx", >> >> > >> "application/vnd.openxmlformats-officedocument. >> >> > presentationml.presentation"); >> >> > >> mimeMap.put("xls", "application/vnd.ms-excel"); >> >> > >> mimeMap.put("xlsx", >> >> > >> "application/vnd.openxmlformats-officedocument.spreadsheetml >> .sheet"); >> >> > >> mimeMap.put("odt", "application/vnd.oasis.opendocument.text"); >> >> > >> mimeMap.put("ott", "application/vnd.oasis.opendocument.text"); >> >> > >> mimeMap.put("odp", "application/vnd.oasis.opendoc >> ument.presentation"); >> >> > >> mimeMap.put("otp", "application/vnd.oasis.opendoc >> ument.presentation"); >> >> > >> mimeMap.put("ods", "application/vnd.oasis.opendoc >> ument.spreadsheet"); >> >> > >> mimeMap.put("ots", "application/vnd.oasis.opendoc >> ument.spreadsheet"); >> >> > >> mimeMap.put("txt", "text/plain"); >> >> > >> mimeMap.put("log", "text/plain"); >> >> > >> >> >> > >> The keys are the types supported. >> >> > >> >> >> > >> >> >> > >> Amrit Sarkar >> >> > >> Search Engineer >> >> > >> Lucidworks, Inc. >> >> > >> 415-589-9269 >> >> > >> www.lucidworks.com >> >> > >> Twitter http://twitter.com/lucidworks >> >> > >> LinkedIn: https://www.link
Re: solr 7.0.1: exception running post to crawl simple website
Kevin, I am not able to replicate the issue on my system, which is bit annoying for me. Try this out for last time: docker exec -it --user=solr solr bin/post -c handbook http://quadra.franz.com:9091/index.md -recursive 10 -delay 0 -filetypes html and have Content-Type: "html" and "text/html", try with both. If you get past this hurdle this hurdle, let me know. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 On Fri, Oct 13, 2017 at 8:22 PM, Kevin Layer wrote: > Amrit Sarkar wrote: > > >> ah oh, dockers. They are placed under [solr-home]/server/log/solr/log > in > >> the machine. I haven't played much with docker, any way you can get that > >> file from that location. > > I see these files: > > /opt/solr/server/logs/archived > /opt/solr/server/logs/solr_gc.log.0.current > /opt/solr/server/logs/solr.log > /opt/solr/server/solr/handbook/data/tlog > > The 3rd one has very little info. Attached: > > > 2017-10-11 15:28:09.564 INFO (main) [ ] o.e.j.s.Server > jetty-9.3.14.v20161028 > 2017-10-11 15:28:10.668 INFO (main) [ ] o.a.s.s.SolrDispatchFilter > ___ _ Welcome to Apache Solr™ version 7.0.1 > 2017-10-11 15:28:10.669 INFO (main) [ ] o.a.s.s.SolrDispatchFilter / > __| ___| |_ _ Starting in standalone mode on port 8983 > 2017-10-11 15:28:10.670 INFO (main) [ ] o.a.s.s.SolrDispatchFilter \__ > \/ _ \ | '_| Install dir: /opt/solr, Default config dir: > /opt/solr/server/solr/configsets/_default/conf > 2017-10-11 15:28:10.707 INFO (main) [ ] o.a.s.s.SolrDispatchFilter > |___/\___/_|_|Start time: 2017-10-11T15:28:10.674Z > 2017-10-11 15:28:10.747 INFO (main) [ ] o.a.s.c.SolrResourceLoader > Using system property solr.solr.home: /opt/solr/server/solr > 2017-10-11 15:28:10.763 INFO (main) [ ] o.a.s.c.SolrXmlConfig Loading > container configuration from /opt/solr/server/solr/solr.xml > 2017-10-11 15:28:11.062 INFO (main) [ ] o.a.s.c.SolrResourceLoader > [null] Added 0 libs to classloader, from paths: [] > 2017-10-11 15:28:12.514 INFO (main) [ ] o.a.s.c.CorePropertiesLocator > Found 0 core definitions underneath /opt/solr/server/solr > 2017-10-11 15:28:12.635 INFO (main) [ ] o.e.j.s.Server Started @4304ms > 2017-10-11 15:29:00.971 INFO (qtp1911006827-13) [ ] > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system > params={wt=json} status=0 QTime=108 > 2017-10-11 15:29:01.080 INFO (qtp1911006827-18) [ ] > o.a.s.c.TransientSolrCoreCacheDefault > Allocating transient cache for 2147483647 transient cores > 2017-10-11 15:29:01.083 INFO (qtp1911006827-18) [ ] > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores > params={core=handbook&action=STATUS&wt=json} status=0 QTime=5 > 2017-10-11 15:29:01.194 INFO (qtp1911006827-19) [ ] > o.a.s.h.a.CoreAdminOperation core create command > name=handbook&action=CREATE&instanceDir=handbook&wt=json > 2017-10-11 15:29:01.342 INFO (qtp1911006827-19) [ x:handbook] > o.a.s.c.SolrResourceLoader [handbook] Added 51 libs to classloader, from > paths: [/opt/solr/contrib/clustering/lib, /opt/solr/contrib/extraction/lib, > /opt/solr/contrib/langid/lib, /opt/solr/contrib/velocity/lib, > /opt/solr/dist] > 2017-10-11 15:29:01.504 INFO (qtp1911006827-19) [ x:handbook] > o.a.s.c.SolrConfig Using Lucene MatchVersion: 7.0.1 > 2017-10-11 15:29:01.969 INFO (qtp1911006827-19) [ x:handbook] > o.a.s.s.IndexSchema [handbook] Schema name=default-config > 2017-10-11 15:29:03.678 INFO (qtp1911006827-19) [ x:handbook] > o.a.s.s.IndexSchema Loaded schema default-config/1.6 with uniqueid field id > 2017-10-11 15:29:03.806 INFO (qtp1911006827-19) [ x:handbook] > o.a.s.c.CoreContainer Creating SolrCore 'handbook' using configuration from > instancedir /opt/solr/server/solr/handbook, trusted=true > 2017-10-11 15:29:03.853 INFO (qtp1911006827-19) [ x:handbook] > o.a.s.c.SolrCore solr.RecoveryStrategy.Builder > 2017-10-11 15:29:03.866 INFO (qtp1911006827-19) [ x:handbook] > o.a.s.c.SolrCore [[handbook] ] Opening new SolrCore at > [/opt/solr/server/solr/handbook], dataDir=[/opt/solr/server/ > solr/handbook/data/] > 2017-10-11 15:29:04.180 INFO (qtp1911006827-19) [ x:handbook] > o.a.s.r.XSLTResponseWriter xsltCacheLifetimeSeconds=5 > 2017-10-11 15:29:05.100 INFO (qtp1911006827-19) [ x:handbook] > o.a.s.u.UpdateHandler Using UpdateLog implementation: > org.apache.solr.update.UpdateLog > 2017-10-11 15:29:05.101 INFO (qtp1911006827-19) [ x:handbook] > o.a.s.u.UpdateLog Initializing UpdateLog: dataDir= defaultSyncLevel=FLUSH > numRecordsToKeep=100 maxNumLogsToKeep=10 numVersionBucket
Re: solr 7.0.1: exception running post to crawl simple website
Kevin, fileType => md is not recognizable format in SimplePostTool, anyway, moving on. The above is SAXParse, runtime exception. Nothing can be done at Solr end except curating your own data. Some helpful links: https://stackoverflow.com/questions/2599919/java-parsing-xml-document-gives-content-not-allowed-in-prolog-error https://stackoverflow.com/questions/3030903/content-is-not-allowed-in-prolog-when-parsing-perfectly-valid-xml-on-gae Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 On Fri, Oct 13, 2017 at 8:48 PM, Kevin Layer wrote: > Amrit Sarkar wrote: > > >> Kevin, > >> > >> I am not able to replicate the issue on my system, which is bit annoying > >> for me. Try this out for last time: > >> > >> docker exec -it --user=solr solr bin/post -c handbook > >> http://quadra.franz.com:9091/index.md -recursive 10 -delay 0 > -filetypes html > >> > >> and have Content-Type: "html" and "text/html", try with both. > > With text/html I get and your command I get > > quadra[git:master]$ docker exec -it --user=solr solr bin/post -c handbook > http://quadra.franz.com:9091/index.md -recursive 10 -delay 0 -filetypes > html > /docker-java-home/jre/bin/java -classpath /opt/solr/dist/solr-core-7.0.1.jar > -Dauto=yes -Drecursive=10 -Ddelay=0 -Dfiletypes=html -Dc=handbook > -Ddata=web org.apache.solr.util.SimplePostTool > http://quadra.franz.com:9091/index.md > SimplePostTool version 5.0.0 > Posting web pages to Solr url http://localhost:8983/solr/ > handbook/update/extract > Entering auto mode. Indexing pages with content-types corresponding to > file endings html > SimplePostTool: WARNING: Never crawl an external web site faster than > every 10 seconds, your IP will probably be blocked > Entering recursive mode, depth=10, delay=0s > Entering crawl at level 0 (1 links total, 1 new) > POSTed web resource http://quadra.franz.com:9091/index.md (depth: 0) > [Fatal Error] :1:1: Content is not allowed in prolog. > Exception in thread "main" java.lang.RuntimeException: > org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Content is > not allowed in prolog. > at org.apache.solr.util.SimplePostTool$PageFetcher. > getLinksFromWebPage(SimplePostTool.java:1252) > at org.apache.solr.util.SimplePostTool.webCrawl( > SimplePostTool.java:616) > at org.apache.solr.util.SimplePostTool.postWebPages( > SimplePostTool.java:563) > at org.apache.solr.util.SimplePostTool.doWebMode( > SimplePostTool.java:365) > at org.apache.solr.util.SimplePostTool.execute( > SimplePostTool.java:187) > at org.apache.solr.util.SimplePostTool.main( > SimplePostTool.java:172) > Caused by: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; > Content is not allowed in prolog. > at com.sun.org.apache.xerces.internal.parsers.DOMParser. > parse(DOMParser.java:257) > at com.sun.org.apache.xerces.internal.jaxp. > DocumentBuilderImpl.parse(DocumentBuilderImpl.java:339) > at javax.xml.parsers.DocumentBuilder.parse( > DocumentBuilder.java:121) > at org.apache.solr.util.SimplePostTool.makeDom( > SimplePostTool.java:1061) > at org.apache.solr.util.SimplePostTool$PageFetcher. > getLinksFromWebPage(SimplePostTool.java:1232) > ... 5 more > > > When I use "-filetype md" back to the regular output that doesn't scan > anything. > > > >> > >> If you get past this hurdle this hurdle, let me know. > >> > >> Amrit Sarkar > >> Search Engineer > >> Lucidworks, Inc. > >> 415-589-9269 > >> www.lucidworks.com > >> Twitter http://twitter.com/lucidworks > >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2 > >> > >> On Fri, Oct 13, 2017 at 8:22 PM, Kevin Layer wrote: > >> > >> > Amrit Sarkar wrote: > >> > > >> > >> ah oh, dockers. They are placed under [solr-home]/server/log/solr/ > log > >> > in > >> > >> the machine. I haven't played much with docker, any way you can > get that > >> > >> file from that location. > >> > > >> > I see these files: > >> > > >> > /opt/solr/server/logs/archived > >> > /opt/solr/server/logs/solr_gc.log.0.current > >> > /opt/solr/server/logs/solr.log > >> > /opt/solr/server/solr/handbook/data/tlog > >> > > >> > The 3rd one has very little info. Attached: > >> > > >> > >
Re: HOW DO I UNSUBSCRIBE FROM GROUP?
Hi, If you wish the emails to "stop", kindly "UNSUBSCRIBE" by following the instructions on the http://lucene.apache.org/solr/community.html. Hope this helps. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 On Mon, Oct 16, 2017 at 9:56 AM, wrote: > > Hi, > > Just wondering how do I 'unsubscribe' from the emails I'm receiving from > the > group? > > I'm getting way more emails than I need right now and would like them to > 'stop'... But there is NO UNSUBSCRIBE link in any of the emails. > > Thanks, > Rita > > -Original Message- > From: Reth RM [mailto:reth.ik...@gmail.com] > Sent: Sunday, October 15, 2017 10:57 PM > To: solr-user@lucene.apache.org > Subject: Efficient query to obtain DF > > Dear Solr-User Group, > >Can you please suggest efficient query for retrieving term to document > frequency(df) of that term at shard index level? > > I know we can get term to df mapping by applying termVectors component > <https://lucene.apache.org/solr/guide/6_6/the-term- > vector-component.html#The > TermVectorComponent-RequestParameters>, > however, results returned by this component are each doc to term and its > df. I was looking for straight forward flat list of terms-df mapping, > similar to how terms component returns term-tf (term frequency) map list. > > Thank you. > >
Re: Howto verify that update is "in-place"
Hi James, As for each update you are doing via atomic operation contains the "id" / "uniqueKey". Comparing the "_version_" field value for one of them would be fine for a batch. Rest, Emir has list them out. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 On Tue, Oct 17, 2017 at 2:47 PM, Emir Arnautović < emir.arnauto...@sematext.com> wrote: > Hi James, > I did not try, but checking max and num doc might give you info if update > was in-place or atomic - atomic is reindexing of existing doc so the old > doc will be deleted. In-place update should just update doc values of > existing doc so number of deleted docs should not change. > > HTH, > Emir > -- > Monitoring - Log Management - Alerting - Anomaly Detection > Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > > > > > On 17 Oct 2017, at 09:57, James wrote: > > > > I am using Solr 6.6 and carefully read the documentation about atomic and > > in-place updates. I am pretty sure that everything is set up as it > should. > > > > > > > > But how can I make certain that a simple update command actually > performs an > > in-place update without internally re-indexing all other fields? > > > > > > > > I am issuing this command to my server: > > > > (I am using implicit document routing, so I need the "Shard" parameter.) > > > > > > > > { > > > > "ID":1133, > > > > "Property_2":{"set":124}, > > > > "Shard":"FirstShard" > > > > } > > > > > > > > > > > > The log outputs: > > > > > > > > 2017-10-17 07:39:18.701 INFO (qtp1937348256-643) [c:MyCollection > > s:FirstShard r:core_node27 x:MyCollection_FirstShard_replica1] > > o.a.s.u.p.LogUpdateProcessorFactory [MyCollection_FirstShard_replica1] > > webapp=/solr path=/update > > params={commitWithin=1000&boost=1.0&overwrite=true&wt= > json&_=1508221142230}{ > > add=[1133 (1581489542869811200)]} 0 1 > > > > 2017-10-17 07:39:19.703 INFO (commitScheduler-283-thread-1) > [c:MyCollection > > s:FirstShard r:core_node27 x:MyCollection_FirstShard_replica1] > > o.a.s.u.DirectUpdateHandler2 start > > commit{,optimize=false,openSearcher=false,waitSearcher=true, > expungeDeletes=f > > alse,softCommit=true,prepareCommit=false} > > > > 2017-10-17 07:39:19.703 INFO (commitScheduler-283-thread-1) > [c:MyCollection > > s:FirstShard r:core_node27 x:MyCollection_FirstShard_replica1] > > o.a.s.s.SolrIndexSearcher Opening > > [Searcher@32d539b4[MyCollection_FirstShard_replica1] main] > > > > 2017-10-17 07:39:19.703 INFO (commitScheduler-283-thread-1) > [c:MyCollection > > s:FirstShard r:core_node27 x:MyCollection_FirstShard_replica1] > > o.a.s.u.DirectUpdateHandler2 end_commit_flush > > > > 2017-10-17 07:39:19.703 INFO > > (searcherExecutor-268-thread-1-processing-n:192.168.117.142:8983_solr > > x:MyCollection_FirstShard_replica1 s:FirstShard c:MyCollection > > r:core_node27) [c:MyCollection s:FirstShard r:core_node27 > > x:MyCollection_FirstShard_replica1] o.a.s.c.QuerySenderListener > > QuerySenderListener sending requests to > > Searcher@32d539b4[MyCollection_FirstShard_replica1] > > main{ExitableDirectoryReader(UninvertingDirectoryReader( > Uninverting(_i(6.6.0 > > ):C5011/1) Uninverting(_j(6.6.0):C478) Uninverting(_k(6.6.0):C345) > > Uninverting(_l(6.6.0):C4182) Uninverting(_m(6.6.0):C317) > > Uninverting(_n(6.6.0):C399) Uninverting(_q(6.6.0):C1)))} > > > > 2017-10-17 07:39:19.703 INFO > > (searcherExecutor-268-thread-1-processing-n:192.168.117.142:8983_solr > > x:MyCollection_FirstShard_replica1 s:FirstShard c:MyCollection > > r:core_node27) [c:MyCollection s:FirstShard r:core_node27 > > x:MyCollection_FirstShard_replica1] o.a.s.c.QuerySenderListener > > QuerySenderListener done. > > > > 2017-10-17 07:39:19.703 INFO > > (searcherExecutor-268-thread-1-processing-n:192.168.117.142:8983_solr > > x:MyCollection_FirstShard_replica1 s:FirstShard c:MyCollection > > r:core_node27) [c:MyCollection s:FirstShard r:core_node27 > > x:MyCollection_FirstShard_replica1] o.a.s.c.SolrCore > > [MyCollection_FirstShard_replica1] Registered new searcher > > Searcher@32d539b4[MyCollection_FirstShard_replica1] > > main{ExitableDirectoryReader(UninvertingDirectoryReader( > Uninverting(_i(6.6.0 > > ):C5011/1) Uninverting(_j(6.6.0):C478) Uninverting(_k(6.6.0):C345) > > Uninverting(_l(6.6.0):C4182) Uninverting(_m(6.6.0):C317) > > Uninverting(_n(6.6.0):C399) Uninverting(_q(6.6.0):C1)))} > > > > > > > > If I issue another, non-in-place update to another field which is not a > > DocValue, the log output is very similar. Can I increase verbosity? Will > it > > tell me more about the type of update then? > > > > > > > > Thank you! > > > > James > > > > > > > > > > > > > > > >
Re: Using pint field as uniqueKey
By looking into the code, if (uniqueKeyField.getType().isPointField()) { String msg = UNIQUE_KEY + " field ("+uniqueKeyFieldName+ ") can not be configured to use a Points based FieldType: " + uniqueKeyField.getType().getTypeName(); log.error(msg); throw new SolrException(ErrorCode.SERVER_ERROR, msg); } Not sure the reason behind; someone else can weigh in here, but PointFields are not allowed to be unique keys, probably because how they are structures and stored on disk. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 On Tue, Oct 17, 2017 at 1:49 PM, Michael Kondratiev < kondratiev.mich...@gmail.com> wrote: > I'm trying to set up uniqueKey ( what is integer) like that: > > > required="true" multiValued="false"/> > id > > > But when I upload configuration into solr i see following error: > > > > uniqueKey field (id) can not be configured to use a Points based > FieldType: pint > > If i set type=“string” everything seems to be ok.
Re: Howto verify that update is "in-place"
James, @Amrit: Are you saying that the _version_ field should not change when > performing an atomic update operation? It should change. a new version will be allotted to the document. I am not that sure about in-place updates, probably a test run will verify that. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 On Tue, Oct 17, 2017 at 4:06 PM, James wrote: > Hi Emir and Amrit, thanks for your reponses! > > @Emir: Nice idea but after changing any document in any way and after > committing the changes, all Doc counter (Num, Max, Deleted) are still the > same, only thing that changes is the Version (increases by steps of 2) . > > @Amrit: Are you saying that the _version_ field should not change when > performing an atomic update operation? > > Thanks > James > > > -----Ursprüngliche Nachricht- > Von: Amrit Sarkar [mailto:sarkaramr...@gmail.com] > Gesendet: Dienstag, 17. Oktober 2017 11:35 > An: solr-user@lucene.apache.org > Betreff: Re: Howto verify that update is "in-place" > > Hi James, > > As for each update you are doing via atomic operation contains the "id" / > "uniqueKey". Comparing the "_version_" field value for one of them would be > fine for a batch. Rest, Emir has list them out. > > Amrit Sarkar > Search Engineer > Lucidworks, Inc. > 415-589-9269 > www.lucidworks.com > Twitter http://twitter.com/lucidworks > LinkedIn: https://www.linkedin.com/in/sarkaramrit2 > > On Tue, Oct 17, 2017 at 2:47 PM, Emir Arnautović < > emir.arnauto...@sematext.com> wrote: > > > Hi James, > > I did not try, but checking max and num doc might give you info if > > update was in-place or atomic - atomic is reindexing of existing doc > > so the old doc will be deleted. In-place update should just update doc > > values of existing doc so number of deleted docs should not change. > > > > HTH, > > Emir > > -- > > Monitoring - Log Management - Alerting - Anomaly Detection Solr & > > Elasticsearch Consulting Support Training - http://sematext.com/ > > > > > > > > > On 17 Oct 2017, at 09:57, James wrote: > > > > > > I am using Solr 6.6 and carefully read the documentation about > > > atomic and in-place updates. I am pretty sure that everything is set > > > up as it > > should. > > > > > > > > > > > > But how can I make certain that a simple update command actually > > performs an > > > in-place update without internally re-indexing all other fields? > > > > > > > > > > > > I am issuing this command to my server: > > > > > > (I am using implicit document routing, so I need the "Shard" > > > parameter.) > > > > > > > > > > > > { > > > > > > "ID":1133, > > > > > > "Property_2":{"set":124}, > > > > > > "Shard":"FirstShard" > > > > > > } > > > > > > > > > > > > > > > > > > The log outputs: > > > > > > > > > > > > 2017-10-17 07:39:18.701 INFO (qtp1937348256-643) [c:MyCollection > > > s:FirstShard r:core_node27 x:MyCollection_FirstShard_replica1] > > > o.a.s.u.p.LogUpdateProcessorFactory > > > [MyCollection_FirstShard_replica1] > > > webapp=/solr path=/update > > > params={commitWithin=1000&boost=1.0&overwrite=true&wt= > > json&_=1508221142230}{ > > > add=[1133 (1581489542869811200)]} 0 1 > > > > > > 2017-10-17 07:39:19.703 INFO (commitScheduler-283-thread-1) > > [c:MyCollection > > > s:FirstShard r:core_node27 x:MyCollection_FirstShard_replica1] > > > o.a.s.u.DirectUpdateHandler2 start > > > commit{,optimize=false,openSearcher=false,waitSearcher=true, > > expungeDeletes=f > > > alse,softCommit=true,prepareCommit=false} > > > > > > 2017-10-17 07:39:19.703 INFO (commitScheduler-283-thread-1) > > [c:MyCollection > > > s:FirstShard r:core_node27 x:MyCollection_FirstShard_replica1] > > > o.a.s.s.SolrIndexSearcher Opening > > > [Searcher@32d539b4[MyCollection_FirstShard_replica1] main] > > > > > > 2017-10-17 07:39:19.703 INFO (commitScheduler-283-thread-1) > > [c:MyCollection > > > s:FirstShard r:core_node27 x:MyCollection_FirstShard_replica1] > > > o.a.s.u.DirectUpdateHandler2 end_commit_flush > &g
Re: solr 7.0: What causes the segment to flush
> > In 7.0, i am finding that the file is written to disk very early on > and it is being updated every second or so. Had something changed in 7.0 > which is causing it? I tried something similar with solr 6.5 and i was > able to get almost a GB size files on disk. Interesting observation, Nawab, with ramBufferSizeMB=20G, you are getting 20GB segments on 6.5 or less? a GB? Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 On Tue, Oct 17, 2017 at 12:48 PM, Nawab Zada Asad Iqbal wrote: > Hi, > > I have tuned (or tried to tune) my settings to only flush the segment > when it has reached its maximum size. At the moment,I am using my > application with only a couple of threads (i have limited to one thread for > analyzing this scenario) and my ramBufferSizeMB=2 (i.e. ~20GB). With > this, I assumed that my file sizes on the disk will be at in the order of > GB; and no segments will be flushed until the segment's in memory size is > 2GB. In 7.0, i am finding that the file is written to disk very early on > and it is being updated every second or so. Had something changed in 7.0 > which is causing it? I tried something similar with solr 6.5 and i was > able to get almost a GB size files on disk. > > How can I control it to not write to disk until the segment has reached its > maximum permitted size (1945 MB?) ? My write traffic is 'new only' (i.e., > it doesn't delete any document) , however I also found following infostream > logs, which incorrectly say 'delete=true': > > Oct 16, 2017 10:18:29 PM INFO (qtp761960786-887) [ x:filesearch] > o.a.s.c.S.Request [filesearch] webapp=/solr path=/update > params={commit=false} status=0 QTime=21 > Oct 16, 2017 10:18:29 PM INFO (qtp761960786-889) [ x:filesearch] > o.a.s.u.LoggingInfoStream [DW][qtp761960786-889]: anyChanges? > numDocsInRam=4434 deletes=true hasTickets:false pendingChangesInFullFlush: > false > Oct 16, 2017 10:18:29 PM INFO (qtp761960786-889) [ x:filesearch] > o.a.s.u.LoggingInfoStream [IW][qtp761960786-889]: nrtIsCurrent: infoVersion > matches: false; DW changes: true; BD changes: false > Oct 16, 2017 10:18:29 PM INFO (qtp761960786-889) [ x:filesearch] > o.a.s.c.S.Request [filesearch] webapp=/solr path=/admin/luke > params={show=index&numTerms=0&wt=json} status=0 QTime=0 > > > > Thanks > Nawab >
Re: Using pint field as uniqueKey
https://issues.apache.org/jira/browse/SOLR-10829: IndexSchema should enforce that uniqueKey field must not be points based The description tells the real reason. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 On Tue, Oct 17, 2017 at 5:42 PM, alessandro.benedetti wrote: > In addition to what Amrit correctly stated, if you need to search on your > id, > especially range queries, I recommend to use a copy field and leave the id > field, almost as default. > > Cheers > > > > - > --- > Alessandro Benedetti > Search Consultant, R&D Software Engineer, Director > Sease Ltd. - www.sease.io > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html >
Re: Merging is not taking place with tiered merge policy
Chandru, Didn't try the above config bu whyt have you defined both "mergePolicy" and "mergePolicyFactory"? and pass different values for same parameters? > 10 > 1 > > > 10 > 10 > > Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 On Mon, Oct 23, 2017 at 11:00 AM, Chandru Shanmugasundaram < chandru.shanmugasunda...@exterro.com> wrote: > The following is my solrconfig.xml > > > 1000 > 1 > 15 > false > 1024 > > 10 > 1 > > > 10 > 10 > > hdfs > > 1 > 0 > > > > Please let me know if should I tweak something above > > > -- > Thanks, > Chandru.S >
Solr require both hl.fl and df same for correct highlighting.
Solr version: 6.5.x Why do we need to pass hl.fl and df to be same for correct highlighting? Let us suppose I am highlighting on field: fieldA which has stemming filter on its analysis. Sample doc: {"id":"1", "fieldA":"Vacation"} If I then highlighting request: > "params":{ > "q":"Vacation", > "hl":"on", > "indent":"on", > "hl.fl":"fieldA", > "wt":"json"} Highlighting doesn't work as "Vacation" via _text_::text_general as "Vacation" remains "Vacation", while on the index it is stored as "vacat". I debugged through the code and HighlightComponent::169 highlightQuery = rb.getQparser().getHighlightQuery(); highlightQuery is passed which is analysed value of what's being passed, this case: _text_:Vacation. Fast-forwarding to WeightedSpanTermExtractor::extractWeightedTerms::366:: for (final Term queryTerm : nonWeightedTerms) { > if (fieldNameComparator(queryTerm.field())) { > WeightedSpanTerm weightedSpanTerm = new WeightedSpanTerm(boost, > queryTerm.text()); > terms.put(queryTerm.text(), weightedSpanTerm); > } > } extracted term is "Vacation". Jumping to core highlighting code: Highlighter::getBestTextFragements::213 TokenGroup tokenGroup=new TokenGroup(tokenStream); Each tokenStream, has analysed tokens: "vacat" which obviously doesn't match with extracted term. Why the df, qf, values concern with what we pass in "hl.fl"? Isn't the query which is to be highlighted be analysed by field passed in "hl.fl", but then multiple fields can be passed in "hl.fl". Just wondering how it is suppose to be done. Any explanation will be fine. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Re: Streaming Expression - cartesianProduct
Following Pratik's spot-on comment and not really related to your question, Even the "partitionKeys" parameter needs to be specified the "over" field while using "parallel" streaming. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 On Thu, Nov 2, 2017 at 2:38 AM, Pratik Patel wrote: > Roll up needs documents to be sorted by the "over" field. > Check this for more details > http://lucene.472066.n3.nabble.com/Streaming-Expressions-rollup-function- > returning-results-with-duplicate-tuples-td4342398.html > > On Wed, Nov 1, 2017 at 3:41 PM, Kojo wrote: > > > Wrap cartesianProduct function with fetch function works as expected. > > > > But rollup function over cartesianProduct doesn´t aggregate on a returned > > field of the cartesianProduct. > > > > > > The field "id_researcher" bellow is a Multivalued field: > > > > > > > > This one works: > > > > > > fetch(reasercher, > > > > cartesianProduct( > > having( > > cartesianProduct( > > search(schoolarship,zkHost="localhost:9983",qt="/export", > > q="*:*", > > fl="process, area, id_reasercher",sort="process asc"), > > area > > ), > > eq(area, val(Anything))), > > id_reasercher), > > fl="name, django_id", > > on="id_reasercher=django_id" > > ) > > > > > > This one doesn´t works: > > > > rollup( > > > > cartesianProduct( > > having( > > cartesianProduct( > > search(schoolarship,zkHost="localhost:9983",qt="/export", > > q="*:*", > > fl="process, area, id_researcher, status",sort="process asc"), > > area > > ), > > eq(area, val(Anything))), > > id_researcher), > > over=id_researcher,count(*) > > ) > > > > If I aggregate over a non MultiValued field, it works. > > > > > > Is that correct, rollup doesn´t work on a cartesianProduct? > > >
Re: SolrClould 6.6 stability challenges
Pretty much what Emir has stated. I want to know, when you saw; all of this runs perfectly ok when indexing isn't happening. as soon as > we start "nrt" indexing one of the follower nodes goes down within 10 to 20 > minutes. When you say "NRT" indexing, what is the commit strategy in indexing. With auto-commit so highly set, are you committing after batch, if yes, what's the number. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 On Sat, Nov 4, 2017 at 2:47 PM, Emir Arnautović < emir.arnauto...@sematext.com> wrote: > Hi Rick, > Do you see any errors in logs? Do you have any monitoring tool? Maybe you > can check heap and GC metrics around time when incident happened. It is not > large heap but some major GC could cause pause large enough to trigger some > snowball and end up with node in recovery state. > What is indexing rate you observe? Why do you have max warming searchers 5 > (did you mean this with autowarmingsearchers?) when you commit every 5 min? > Why did you increase it - you seen errors with default 2? Maybe you commit > every bulk? > Do you see similar behaviour when you just do indexing without queries? > > Thanks, > Emir > -- > Monitoring - Log Management - Alerting - Anomaly Detection > Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > > > > > On 4 Nov 2017, at 05:15, Rick Dig wrote: > > > > hello all, > > we are trying to run solrcloud 6.6 in a production setting. > > here's our config and issue > > 1) 3 nodes, 1 shard, replication factor 3 > > 2) all nodes are 16GB RAM, 4 core > > 3) Our production load is about 2000 requests per minute > > 4) index is fairly small, index size is around 400 MB with 300k documents > > 5) autocommit is currently set to 5 minutes (even though ideally we would > > like a smaller interval). > > 6) the jvm runs with 8 gb Xms and Xmx with CMS gc. > > 7) all of this runs perfectly ok when indexing isn't happening. as soon > as > > we start "nrt" indexing one of the follower nodes goes down within 10 to > 20 > > minutes. from this point on the nodes never recover unless we stop > > indexing. the master usually is the last one to fall. > > 8) there are maybe 5 to 7 processes indexing at the same time with > document > > batch sizes of 500. > > 9) maxRambuffersizeMB is 100, autowarmingsearchers is 5, > > 10) no cpu and / or oom issues that we can see. > > 11) cpu load does go fairly high 15 to 20 at times. > > any help or pointers appreciated > > > > thanks > > rick > >
Re: Incorrect ngroup count
Zheng, Usually, the number of records returned is more than what is shown in the > ngroup. For example, I may get a ngroup of 22, but there are 25 records > being returned. Does the 25 records being returned have duplicates? Grouping is subjected to co-location of data of same group values in same shard. Can you share what is the architecture of the setup? Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 Medium: https://medium.com/@sarkaramrit2 On Tue, Nov 7, 2017 at 8:36 AM, Zheng Lin Edwin Yeo wrote: > Hi, > > I'm using Solr 6.5.1, and I'm facing the issue of incorrect ngroup count > after I have group it by signature field. > > Usually, the number of records returned is more than what is shown in the > ngroup. For example, I may get a ngroup of 22, but there are 25 records > being returned. > > Below is the part of solrconfig.xml that does the grouping. > > "solr.processor.SignatureUpdateProcessorFactory"> name="enabled">true > signature "overwriteDupes">false content "signatureClass">solr.processor.Lookup3Signature < > processor class="solr.DistributedUpdateProcessorFactory" /> class > ="solr.LogUpdateProcessorFactory" /> "solr.RunUpdateProcessorFactory" /> > > > This is where I set the grouping to true in the requestHandler > > true signature name="group.main">true < > str name="group.cache.percent">100 > > What could be the issue that causes this? > > Regards, > Edwin >
Re: Long blocking during indexing + deleteByQuery
Maybe not a relevant fact on this, but: "addAndDelete" is triggered by "*Reordering of DBQs'; *that means there are non-executed DBQs present in the updateLog and an add operation is also received. Solr makes sure DBQs are executed first and than add operation is executed. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 Medium: https://medium.com/@sarkaramrit2 On Tue, Nov 7, 2017 at 9:19 PM, Erick Erickson wrote: > Well, consider what happens here. > > Solr gets a DBQ that includes document 132 and 10,000,000 other docs > Solr gets an add for document 132 > > The DBQ takes time to execute. If it was processing the requests in > parallel would 132 be in the index after the delete was over? It would > depend on when the DBQ found the doc relative to the add. > With this sequence one would expect 132 to be in the index at the end. > > And it's worse when it comes to distributed indexes. If the updates > were sent out in parallel you could end up in situations where one > replica contained 132 and another didn't depending on the vagaries of > thread execution. > > Now I didn't write the DBQ code, but that's what I think is happening. > > Best, > Erick > > On Tue, Nov 7, 2017 at 7:40 AM, Chris Troullis > wrote: > > As an update, I have confirmed that it doesn't seem to have anything to > do > > with child documents, or standard deletes, just deleteByQuery. If I do a > > deleteByQuery on any collection while also adding/updating in separate > > threads I am experiencing this blocking behavior on the non-leader > replica. > > > > Has anyone else experienced this/have any thoughts on what to try? > > > > On Sun, Nov 5, 2017 at 2:20 PM, Chris Troullis > wrote: > > > >> Hi, > >> > >> I am experiencing an issue where threads are blocking for an extremely > >> long time when I am indexing while deleteByQuery is also running. > >> > >> Setup info: > >> -Solr Cloud 6.6.0 > >> -Simple 2 Node, 1 Shard, 2 replica setup > >> -~12 million docs in the collection in question > >> -Nodes have 64 GB RAM, 8 CPUs, spinning disks > >> -Soft commit interval 10 seconds, Hard commit (open searcher false) 60 > >> seconds > >> -Default merge policy settings (Which I think is 10/10). > >> > >> We have a query heavy index heavyish use case. Indexing is constantly > >> running throughout the day and can be bursty. The indexing process > handles > >> both updates and deletes, can spin up to 15 simultaneous threads, and > sends > >> to solr in batches of 3000 (seems to be the optimal number per trial and > >> error). > >> > >> I can build the entire collection from scratch using this method in < 40 > >> mins and indexing is in general super fast (averages about 3 seconds to > >> send a batch of 3000 docs to solr). The issue I am seeing is when some > >> threads are adding/updating documents while other threads are issuing > >> deletes (using deleteByQuery), solr seems to get into a state of extreme > >> blocking on the replica, which results in some threads taking 30+ > minutes > >> just to send 1 batch of 3000 docs. This collection does use child > documents > >> (hence the delete by query _root_), not sure if that makes a > difference, I > >> am trying to duplicate on a non-child doc collection. CPU/IO wait seems > >> minimal on both nodes, so not sure what is causing the blocking. > >> > >> Here is part of the stack trace on one of the blocked threads on the > >> replica: > >> > >> qtp592179046-576 (576) > >> java.lang.Object@608fe9b5 > >> org.apache.solr.update.DirectUpdateHandler2.addAndDelete( > >> DirectUpdateHandler2.java:354) > >> org.apache.solr.update.DirectUpdateHandler2.addDoc0( > >> DirectUpdateHandler2.java:237) > >> org.apache.solr.update.DirectUpdateHandler2.addDoc( > >> DirectUpdateHandler2.java:194) > >> org.apache.solr.update.processor.RunUpdateProcessor.processAdd( > >> RunUpdateProcessorFactory.java:67) > >> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd( > >> UpdateRequestProcessor.java:55) > >> org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd( > >> DistributedUpdateProcessor.java:979) > >> org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd( > >> DistributedUpdateProcessor
Re: Streaming Expression usage
Kojo, Not sure what do you mean by making two request to get documents. A "search" streaming expression can be passed with "fq" parameter to filter the results and rollup on top of that will fetch you desired results. This maybe not mentioned in official docs: Sample streaming expression: expr=rollup( > > search(collection1, > > zkHost="localhost:9983", > > qt="/export", > > q="*:*", > > fq=a_s:filter_a > > fl="id,a_s,a_i,a_f", > > sort="a_f asc"), > >over=a_f) > > Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 Medium: https://medium.com/@sarkaramrit2 On Wed, Nov 8, 2017 at 7:41 AM, Kojo wrote: > Hi, > I am working on PoC of a front-end web to provide an interface to the end > user search and filter data on Solr indexes. > > I am trying Streaming Expression for about a week and I am fairly keen > about using it to search and filter indexes on Solr side. But I am not sure > whether this is the right approach or not. > > A simple question to illustrate my doubts: If use the search and some > Streaming Expressions more to get and filter the indexes to get documents, > and I want to rollup the result, will I have to make two requests? Is this > a good use for Streaming Expressions? >
Re: How to routing document for send to particular shard range
Ketan, If you know defined indexing architecture; isn't it better to use "implicit" router by writing logic on your own end. If the document is of "Org1", send the document with extra param* "_route_:shard1"* and likewise. Snippet from official doc: https://lucene.apache.org/solr/guide/6_6/shards-and-indexing-data-in-solrcloud.html#ShardsandIndexingDatainSolrCloud-DocumentRouting : If you created the collection and defined the "implicit" router at the time > of creation, you can additionally define a router.field parameter to use a > field from each document to identify a shard where the document belongs. If > the field specified is missing in the document, however, the document will > be rejected. You could also use the _route_ parameter to name a specific > shard. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 Medium: https://medium.com/@sarkaramrit2 On Wed, Nov 8, 2017 at 11:15 AM, Ketan Thanki wrote: > Hi, > > I have requirement now quite different as I need to set routing key hash > for document which confirm it to send to particular shard as its range. > > I have solrcloud configuration with 4 shard & 4 replica with below shard > range. > shard1: 8000-bfff > shard2: c000- > shard3: 0-3fff > shard4: 4000-7fff > > e.g: below show the project works in organization which is my routing key. > Org1= works for project1,project2 > Org2=works for project3 > Org3=works for project4 > Org4=project5 > > So as mentions above I want to index org1 to shard1,org2 to shard2,org3 to > shard3,org4 to shard4 meanwhile send it to particular shard. > How could I manage compositeId routing to do this. > > Regards, > Ketan. > Please cast a vote for Asite in the 2017 Construction Computing Awards: > Click here to Vote<http://caddealer.com/concompawards/index.php?page= > cca2017vote> > > [CC Award Winners!] > >
Re: Atomic Updates with SolrJ
Hi Martin, I tested the same application SolrJ code on my system, it worked just fine on Solr 6.6.x. My Solrclient is "CloudSolrJClient", which I think doesn't make any difference. Can you show the response and field declarations if you are continuously facing the issue. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 Medium: https://medium.com/@sarkaramrit2 On Thu, Nov 9, 2017 at 1:55 PM, Martin Keller < martin.kel...@unitedplanet.com> wrote: > Hello, > > I’m trying to Update a field in a document via SolrJ. Unfortunately, while > the field itself is updated correctly, values of some other fields are > removed. > The code looks like this: > > SolrInputDocument updateDoc = new SolrInputDocument(); > > updateDoc.addField("id", "1234"); > > Map updateValue = new HashMap<>(); > updateValue.put("set", 1); > updateDoc.addField("fieldToUpdate", updateValue); > > final UpdateRequest request; > > request = new UpdateRequest(); > request.add(updateDoc); > > request.process(solrClient, "myCollection"); > solrClient.commit(); > > > If I send a similar request with curl, e.g. > > curl -X POST -H 'Content-Type: application/json' ' > http://localhost:8983/solr/myCollection/update' --data-binary > '[{"id":"1234", "fieldToUpdate":{"set":"1"}}]' > > it works as expected. > I’m using Solr 6.0.1, but the problem also occurs in 6.6.0. > > Any ideas? > > Thanks > Martin > >
Re: solr cloud updatehandler stats mismatch
Wei, Are the requests coming through to collection has multiple shards and replicas. Please mind a update request is received by a node, redirected to particular shard the doc belong, and then distributed to replicas of the collection. On each replica, each core, update request is played. Can be a probable reason b/w mismatch between Mbeans stats and manual counting in logs, as not everything gets logged. Need to check that once. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 Medium: https://medium.com/@sarkaramrit2 On Thu, Nov 9, 2017 at 4:34 PM, Furkan KAMACI wrote: > Hi Wei, > > Do you compare it with files which are under /var/solr/logs by default? > > Kind Regards, > Furkan KAMACI > > On Sun, Nov 5, 2017 at 6:59 PM, Wei wrote: > > > Hi, > > > > I use the following api to track the number of update requests: > > > > /solr/collection1/admin/mbeans?cat=UPDATE&stats=true&wt=json > > > > > > Result: > > > > > >- class: "org.apache.solr.handler.UpdateRequestHandler", > >- version: "6.4.2.1", > >- description: "Add documents using XML (with XSLT), CSV, JSON, or > >javabin", > >- src: null, > >- stats: > >{ > > - handlerStart: 1509824945436, > > - requests: 106062, > > - ... > > > > > > I am quite confused that the number of requests reported above is quite > > different from the count from solr access logs. A few times the handler > > stats is much higher: handler reports ~100k requests but in the access > log > > there are only 5k update requests. What could be the possible cause? > > > > Thanks, > > Wei > > >
Re: Make search on the particular field to be case sensitive
Behavior of the field values is defined by fieldType analyzer declaration. If you look at the managed-schema; You will find fieldType declarations like: > > ignoreCase="true"/> class="solr.EnglishPossessiveFilterFactory"/> "solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> class="solr.PorterStemFilterFactory"/> > "solr.SynonymGraphFilterFactory" expand="true" ignoreCase="true" synonyms= > "synonyms.txt"/> "lang/stopwords_en.txt" ignoreCase="true"/> "solr.LowerCaseFilterFactory"/> "solr.EnglishPossessiveFilterFactory"/> "solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> class="solr.PorterStemFilterFactory"/> In you case fieldType is "string". *You need to write analyzer chain for the same fieldType and don't include:* LowerCaseFilterFactory is responsible lowercase the token coming in query and while indexing. Something like this will work for you: I listed "KeywordTokenizerFactory" considering this is string, not text. More details on: https://lucene.apache.org/solr/guide/6_6/analyzers.html Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 Medium: https://medium.com/@sarkaramrit2 On Thu, Nov 9, 2017 at 4:41 PM, Karan Saini wrote: > Hi guys, > > Solr version :: 6.6.1 > > ** > > I have around 10 fields in my core. I want to make the search on this > specific field to be case sensitive. Please advise, how to introduce case > sensitivity at the field level. What changes do i need to make for this > field ? > > Thanks, > Karan >
Re: Make search on the particular field to be case sensitive
Ah ok. I didn't test and laid it over. Thank you Erick for correcting me out. On 9 Nov 2017 9:06 p.m., "Erick Erickson" wrote: > This won't quite work. "string" types are totally un-analyzed you > cannot add filters to a solr.StrField, you must use solr.TextField > rather than solr.StrField. > > > docValues="true"/> > > > > > > > > start over and re-index from scratch in a new collection of course. > > You also need to make sure you really want to search on the whole > field. The KeywordTokenizerFactory doesn't split the incoming test up > _at all_. So if the input is > "my dog has fleas" you can't search for just "dog" unless you use the > extremely inefficient *dog* form. If you want to search for words, use > an tokenizer that breaks up the input, WhitespaceTokenizer for > instance. > > Best, > Erick > > On Thu, Nov 9, 2017 at 3:24 AM, Amrit Sarkar > wrote: > > Behavior of the field values is defined by fieldType analyzer > declaration. > > > > If you look at the managed-schema; > > > > You will find fieldType declarations like: > > > > positionIncrementGap="100"> > >> > >> >> ignoreCase="true"/> > >> class="solr.EnglishPossessiveFilterFactory"/> >> "solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> >> class="solr.PorterStemFilterFactory"/> type="query"> > >> >> "solr.SynonymGraphFilterFactory" expand="true" ignoreCase="true" > synonyms= > >> "synonyms.txt"/> >> "lang/stopwords_en.txt" ignoreCase="true"/> >> "solr.LowerCaseFilterFactory"/> >> "solr.EnglishPossessiveFilterFactory"/> >> "solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> >> class="solr.PorterStemFilterFactory"/> > > > > > > In you case fieldType is "string". *You need to write analyzer chain for > > the same fieldType and don't include:* > > > > > > LowerCaseFilterFactory is responsible lowercase the token coming in query > > and while indexing. > > > > Something like this will work for you: > > > > > docValues="true"/> > > > > fieldType> > > > > I listed "KeywordTokenizerFactory" considering this is string, not text. > > > > More details on: https://lucene.apache.org/solr/guide/6_6/analyzers.html > > > > Amrit Sarkar > > Search Engineer > > Lucidworks, Inc. > > 415-589-9269 > > www.lucidworks.com > > Twitter http://twitter.com/lucidworks > > LinkedIn: https://www.linkedin.com/in/sarkaramrit2 > > Medium: https://medium.com/@sarkaramrit2 > > > > On Thu, Nov 9, 2017 at 4:41 PM, Karan Saini > wrote: > > > >> Hi guys, > >> > >> Solr version :: 6.6.1 > >> > >> ** > >> > >> I have around 10 fields in my core. I want to make the search on this > >> specific field to be case sensitive. Please advise, how to introduce > case > >> sensitivity at the field level. What changes do i need to make for this > >> field ? > >> > >> Thanks, > >> Karan > >> >
Re: How to routing document for send to particular shard range
Ketan, here I have also created new field 'core' which value is any shard where I > need to send documents and on retrieval use '_route_' parameter with > mentioning the particular shard. But issue facing still my > clusterstate.json showing the "router":{"name":"compositeId"} is it means > my settings not impacted? or its default. Only answering this query, as Erick has already mentioned in the above comment. You need to RECREATE the collection passinfg the "route.field" in the "create collection" api parameters as "route.field" is collection-specific property maintained at zookeeper (state.json / clusterstate.json). https://lucene.apache.org/solr/guide/6_6/collections-api.html#CollectionsAPI-create I highly recommend not to alter core.properties manually when dealing with SolrCloud and instead relying on SolrCloud APIs to make necessary change. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 Medium: https://medium.com/@sarkaramrit2 On Fri, Nov 10, 2017 at 5:23 PM, Ketan Thanki wrote: > Hi Erik, > > My requirement to index the documents of particular organization to > specific shard. Also I have made changes in core.properties as menions > below. > > Model Collection: > name=model > shard=shard1 > collection=model > router.name=implicit > router.field=core > shards=shard1,shard2 > > Workset Collection: > name=workset > shard=shard1 > collection=workset > router.name=implicit > router.field=core > shards=shard1,shard2 > > here I have also created new field 'core' which value is any shard where I > need to send documents and on retrieval use '_route_' parameter with > mentioning the particular shard. But issue facing still my > clusterstate.json showing the "router":{"name":"compositeId"} is it means > my settings not impacted? or its default. > > Please do needful. > > Regards, > > -Original Message- > From: Erick Erickson [mailto:erickerick...@gmail.com] > Sent: Friday, November 10, 2017 12:06 PM > To: solr-user > Subject: Re: How to routing document for send to particular shard range > > You cannot just make configuration changes, whether you use implicit or > compositeId is defined when you _create_ the collection and cannot be > changed later. > > You need to create a new collection and specify router.name=implicit when > you create it. Then you can route documents as you desire. > > I would caution against this though. If you use implicit routing _you_ > have to insure balancing. For instance, you could have 10,000,000 documents > for "Org1" and 15 for "Org2", resulting in hugely unbalanced shards. > > Implicit routing is particularly useful for time-series indexing, where > you, say, index a day's worth of documents to each shard. It may be > appropriate in your case, but so far you haven't told us _why_ you think > routing docs to particular shards is desirable. > > Best, > Erick > > On Thu, Nov 9, 2017 at 10:27 PM, Ketan Thanki wrote: > > Thanks Amrit, > > > > For suggesting me the approach. > > > > I have got some understanding regarding to it and i need to implement > implicit routing for specific shard based. I have try by make changes on > core.properties. but it can't work So can you please let me for the > configuration changes needed. Is it need to create extra field for document > to rout? > > > > I have below configuration Collection created manually: > > 1: Workset with 4 shard and 4 replica > > 2: Model with 4 shard and 4 replica > > > > > > For e.g Core.properties for 1 shard : > > Workset Colection: > > name=workset > > shard=shard1 > > collection=workset > > > > Model Collection: > > name=model > > shard=shard1 > > collection=model > > > > > > So can u please let me the changes needed in configuration for the > implicit routing. > > > > Please do needful. > > > > Regards, > > > > > > -Original Message- > > From: Amrit Sarkar [mailto:sarkaramr...@gmail.com] > > Sent: Wednesday, November 08, 2017 12:36 PM > > To: solr-user@lucene.apache.org > > Subject: Re: How to routing document for send to particular shard > > range > > > > Ketan, > > > > If you know defined indexing architecture; isn't it better to use > "implicit" router by writing logic on your own end. > > > > If the document is of "Org1", send the do
Re: Nested facet complete wrong counts
Kenny, This is a known behavior in multi-sharded collection where the field values belonging to same facet doesn't reside in same shard. Yonik Seeley has improved the Json Facet feature by introducing "overrequest" and "refine" parameters. Kindly checkout Jira: https://issues.apache.org/jira/browse/SOLR-7452 https://issues.apache.org/jira/browse/SOLR-9432 Relevant blog: https://medium.com/@abb67cbb46b/1acfa77cd90c On 10 Nov 2017 10:02 p.m., "kenny" wrote: > Hi all, > > We are doing some tests in solr 6.6 with json facet api and we get > completely wrong counts for some combination of facets > > Setting: We have a set of fields for 376k documents in our query (total > 120M documents). We work with 2 shards. When doing first a faceting over > the first facet and keeping these numbers, we subsequently do a nested > faceting over both facets. > > Then we add the numbers of sub-facet and expect to get the (approximately) > the same numbers back. Sometimes we get rounding errors of about 1% > difference. But on other occasions it seems to way off > > for example > > Gender (3 values) Country (211 values) > 16226 - 18424 = -2198 (-13.5461604832%) > 282854 - 464387 = -181533 (-64.1790464338%) > 40489 - 47902 = -7413 (-18.3086764306%) > 36672 - 49749 = -13077 (-35.6593586387%) > > Gender (3 values) Status (17 Values) > 16226 - 16273 = -47 (-0.289658572661%) > 282854 - 435974 = -153120 (-54.1339348215%) > 40489 - 49925 = -9436 (-23.305095211%) > 36672 - 54019 = -17347 (-47.3031195462%) > > ... > > These are the typical requests we submit. So note that we have refine and > an overrequest, but we in the case of Gender vs Request we should query all > the buckets anyway. > > {"wt":"json","rows":0,"json.facet":"{\"Status_sfhll\":\"hll( > Status_sf)\",\"Status_sf\":{\"type\":\"terms\",\"field\":\"S > tatus_sf\",\"missing\":true,\"refine\":true,\"overrequest\": > 50,\"limit\":50,\"offset\":0}}","q":"*:*","fq":["type:\"something\""]} > > {"wt":"json","rows":0,"json.facet":"{\"Gender_sf\":{\"type\" > :\"terms\",\"field\":\"Gender_sf\",\"missing\":true,\"refine > \":true,\"overrequest\":10,\"limit\":10,\"offset\":0,\" > facet\":{\"Status_sf\":{\"type\":\"terms\",\"field\":\"Statu > s_sf\",\"missing\":true,\"refine\":true,\"overrequest\":50,\ > "limit\":50,\"offset\":0}}},\"Gender_sfhll\":\"hll(Gender_ > sf)\"}","q":"*:*","fq":["type:\"something\""]} > > Is this a known bug? Would switching to old facet api resolve this? Are > there other parameters we miss? > > > Thanks > > > kenny > > >
Re: How to routing document for send to particular shard range
Surely someone else can chim in; but when you say: "so regarding to it we need to index the particular > client data into particular shard so if its manageable than we will > improve the performance as we need" You can / should create different collections for different client data, so that you can for surely improve performance as per need. There are multiple configurations which drives indexing and querying capabilities and incorporating everything in single collection will hinder that flexibility. Also if you need to add new client in future, you don't need to think about sharding again, add new collection and tweak its configuration as per need. Still if you need to use compositeKey to acheive your use-case, I am not sure how to do that honestly. Since shards are predefined when collection will be created. You cannot add more shards and such. You can only split a shard, which will divide the index and hence the hash range. I will strongly recommend you to reconsider your SolrCloud design technique for your use-case. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 Medium: https://medium.com/@sarkaramrit2 On Mon, Nov 13, 2017 at 7:31 PM, Ketan Thanki wrote: > > Thanks Amrit, > > My requirement to achieve best performance while using document routing > facility in solr so regarding to it we need to index the particular client > data into particular shard so if its manageable than we will improve the > performance as we need. > > Please do needful. > > > Regards, > > > -Original Message- > From: Amrit Sarkar [mailto:sarkaramr...@gmail.com] > Sent: Friday, November 10, 2017 5:34 PM > To: solr-user@lucene.apache.org > Subject: Re: How to routing document for send to particular shard range > > Ketan, > > here I have also created new field 'core' which value is any shard where I > > need to send documents and on retrieval use '_route_' parameter with > > mentioning the particular shard. But issue facing still my > > clusterstate.json showing the "router":{"name":"compositeId"} is it > > means my settings not impacted? or its default. > > > Only answering this query, as Erick has already mentioned in the above > comment. You need to RECREATE the collection passinfg the "route.field" in > the "create collection" api parameters as "route.field" is > collection-specific property maintained at zookeeper (state.json / > clusterstate.json). > > https://lucene.apache.org/solr/guide/6_6/collections- > api.html#CollectionsAPI-create > > I highly recommend not to alter core.properties manually when dealing with > SolrCloud and instead relying on SolrCloud APIs to make necessary change. > > Amrit Sarkar > Search Engineer > Lucidworks, Inc. > 415-589-9269 > www.lucidworks.com > Twitter http://twitter.com/lucidworks > LinkedIn: https://www.linkedin.com/in/sarkaramrit2 > Medium: https://medium.com/@sarkaramrit2 > > On Fri, Nov 10, 2017 at 5:23 PM, Ketan Thanki wrote: > > > Hi Erik, > > > > My requirement to index the documents of particular organization to > > specific shard. Also I have made changes in core.properties as menions > > below. > > > > Model Collection: > > name=model > > shard=shard1 > > collection=model > > router.name=implicit > > router.field=core > > shards=shard1,shard2 > > > > Workset Collection: > > name=workset > > shard=shard1 > > collection=workset > > router.name=implicit > > router.field=core > > shards=shard1,shard2 > > > > here I have also created new field 'core' which value is any shard > > where I need to send documents and on retrieval use '_route_' > > parameter with mentioning the particular shard. But issue facing still > > my clusterstate.json showing the "router":{"name":"compositeId"} is it > > means my settings not impacted? or its default. > > > > Please do needful. > > > > Regards, > > > > -Original Message- > > From: Erick Erickson [mailto:erickerick...@gmail.com] > > Sent: Friday, November 10, 2017 12:06 PM > > To: solr-user > > Subject: Re: How to routing document for send to particular shard > > range > > > > You cannot just make configuration changes, whether you use implicit > > or compositeId is defined when you _create_ the collection and cannot > > be changed later. > > > > You need to create a new collection and specify router.name=implicit > > wh
Re: SOLR not deleting records
A little more information would be beneficial; COLO1 and COLO2 are collections? if yes, both have same configurations and you are positively issuing deletes to the IDs already present in index etc. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 Medium: https://medium.com/@sarkaramrit2 On Tue, Nov 14, 2017 at 12:41 PM, vbindal wrote: > We have to SOLR colos. > > We issues a command to delete: IDS DELETED: 1000236662963, > 1000224906023, 1000240171970, 1000241597424, 1000241604072, > 1000241604073, 1000240171754, 1000241604056, 1000241604062, > 1000237569503] > > COLO1 deleted everything but COLO2 skipped some of the records. For ex: > 1000224906023 was not deleted. This happens consistently. > > We are running them in Hard-commit, Soft Commit is off. > > > > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html >
Re: Index time boosting
Hi Venkat, FYI: Index time boosting has been deprecated from latest versions of Solr: https://issues.apache.org/jira/browse/LUCENE-6819. Not sure which version you are on, but best consider the comments on the JIRA before using it. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 Medium: https://medium.com/@sarkaramrit2 On Tue, Nov 14, 2017 at 5:27 PM, Venkateswarlu Bommineni wrote: > Hello Guys, > > I would like to understand how index time boosting works in Solr. and how > it is relates to ommitNorms property in schema.xml. > > and i am trying to understand how it works internally , if you have any > documentation please provide. > > Thanks, > Venkat. >
Re: Leading wildcard searches very slow
Sundeep, You would like to explore http://lucene.apache.org/solr/6_6_1/solr-core/org/apache/solr/analysis/ReversedWildcardFilterFactory.html here probably. Thanks Amrit Sarkar On 18 Nov 2017 6:06 a.m., "Sundeep T" wrote: > Hi, > > We have several indexed string fields which is not tokenized and does not > have docValues enabled. > > When we do leading wildcard searches on these fields they are running very > slow. We were thinking that since this field is indexed, such queries > should be running pretty quickly. We are using Solr 6.6.1. Anyone has ideas > on why these queries are running slow and if there are any ways to speed > them up? > > Thanks > Sundeep >
Re: Issue with CDCR bootstrapping in Solr 7.1
Hi Tom, I see what you are saying and I too think this is a bug, but I will confirm once on the code. Bootstrapping should happen on all the nodes of the target. Meanwhile can you index more than 100 documents in the source and do the exact same experiment again. Followers will not copy the entire index of Leader unless the difference in versions in docs are more than "numRecordsToKeep", which is default 100, unless you have modified in solrconfig.xml. Looking forward to your analysis. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 Medium: https://medium.com/@sarkaramrit2 On Thu, Nov 30, 2017 at 9:03 PM, Tom Peters wrote: > I'm running into an issue with the initial CDCR bootstrapping of an > existing index. In short, after turning on CDCR only the leader replica in > the target data center will have the documents replicated and it will not > exist in any of the follower replicas in the target data center. All > subsequent incremental updates made to the source datacenter will appear in > all replicas in the target data center. > > A little more details: > > I have two clusters setup, a source cluster and a target cluster. Each > cluster has only one shard and three replicas. I used the configuration > detailed in the Source and Target sections of the reference guide as-is > with the exception of updating the zkHost (https://lucene.apache.org/ > solr/guide/7_1/cross-data-center-replication-cdcr.html# > cdcr-configuration-2). > > The source data center has the following nodes: > solr01-a, solr01-b, and solr01-c > > The target data center has the following nodes: > solr02-a, solr02-b, and solr02-c > > Here are the steps that I've done: > > 1. Create collection in source and target data centers > > 2. Add a number of documents to the source data center > > 3. Verify: > > $ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s > $i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound'; done > solr01-a: 81 > solr01-b: 81 > solr01-c: 81 > solr02-a: 0 > solr02-b: 0 > solr02-c: 0 > > 4. Start CDCR: > > $ curl 'solr01-a:8080/solr/mycollection/cdcr?action=START' > > 5. See if target data center has received the initial index > > $ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s > $i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound'; done > solr01-a: 81 > solr01-b: 81 > solr01-c: 81 > solr02-a: 0 > solr02-b: 0 > solr02-c: 81 > > note: only -c has received the index > > 6. Add another document to the source cluster > > 7. See how many documents are in each node: > > $ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s > $i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound'; done > solr01-a: 82 > solr01-b: 82 > solr01-c: 82 > solr02-a: 1 > solr02-b: 1 > solr02-c: 82 > > > As you can see, the initial index only made it to one of the replicas in > the target data center, but subsequent incremental updates have appeared > everywhere I would expect. Any help would be greatly appreciated, thanks. > > > > This message and any attachment may contain information that is > confidential and/or proprietary. Any use, disclosure, copying, storing, or > distribution of this e-mail or any attached file by anyone other than the > intended recipient is strictly prohibited. If you have received this > message in error, please notify the sender by reply email and delete the > message and any attachments. Thank you. >
Re: Issue with CDCR bootstrapping in Solr 7.1
Tom, This is very useful: > I found a way to get the follower replicas to receive the documents from > the leader in the target data center, I have to restart the solr instance > running on that server. Not sure if this information helps at all. You have to issue hardcommit on target after the bootstrapping is done. Reloading makes the core opening a new searcher. While explicit commit is issued at target leader after the BS is done, follower are left unattended though the docs are copied over. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 Medium: https://medium.com/@sarkaramrit2 On Thu, Nov 30, 2017 at 10:06 PM, Tom Peters wrote: > Hi Amrit, > > Starting with more documents doesn't appear to have made a difference. > This time I tried with >1000 docs. Here are the steps I took: > > 1. Deleted the collection on both the source and target DCs. > > 2. Recreated the collections. > > 3. Indexed >1000 documents on source data center, hard commmit > > $ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s > $i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound'; done > solr01-a: 1368 > solr01-b: 1368 > solr01-c: 1368 > solr02-a: 0 > solr02-b: 0 > solr02-c: 0 > > 4. Enabled CDCR and checked docs > > $ curl 'solr01-a:8080/solr/synacor/cdcr?action=START' > > $ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s > $i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound'; done > solr01-a: 1368 > solr01-b: 1368 > solr01-c: 1368 > solr02-a: 0 > solr02-b: 0 > solr02-c: 1368 > > Some additional notes: > > * I do not have numRecordsToKeep defined in my solrconfig.xml, so I assume > it will use the default of 100 > > * I found a way to get the follower replicas to receive the documents from > the leader in the target data center, I have to restart the solr instance > running on that server. Not sure if this information helps at all. > > > On Nov 30, 2017, at 11:22 AM, Amrit Sarkar > wrote: > > > > Hi Tom, > > > > I see what you are saying and I too think this is a bug, but I will > confirm > > once on the code. Bootstrapping should happen on all the nodes of the > > target. > > > > Meanwhile can you index more than 100 documents in the source and do the > > exact same experiment again. Followers will not copy the entire index of > > Leader unless the difference in versions in docs are more than > > "numRecordsToKeep", which is default 100, unless you have modified in > > solrconfig.xml. > > > > Looking forward to your analysis. > > > > Amrit Sarkar > > Search Engineer > > Lucidworks, Inc. > > 415-589-9269 > > www.lucidworks.com > > Twitter http://twitter.com/lucidworks > > LinkedIn: https://www.linkedin.com/in/sarkaramrit2 > > Medium: https://medium.com/@sarkaramrit2 > > > > On Thu, Nov 30, 2017 at 9:03 PM, Tom Peters wrote: > > > >> I'm running into an issue with the initial CDCR bootstrapping of an > >> existing index. In short, after turning on CDCR only the leader replica > in > >> the target data center will have the documents replicated and it will > not > >> exist in any of the follower replicas in the target data center. All > >> subsequent incremental updates made to the source datacenter will > appear in > >> all replicas in the target data center. > >> > >> A little more details: > >> > >> I have two clusters setup, a source cluster and a target cluster. Each > >> cluster has only one shard and three replicas. I used the configuration > >> detailed in the Source and Target sections of the reference guide as-is > >> with the exception of updating the zkHost (https://lucene.apache.org/ > >> solr/guide/7_1/cross-data-center-replication-cdcr.html# > >> cdcr-configuration-2). > >> > >> The source data center has the following nodes: > >>solr01-a, solr01-b, and solr01-c > >> > >> The target data center has the following nodes: > >>solr02-a, solr02-b, and solr02-c > >> > >> Here are the steps that I've done: > >> > >> 1. Create collection in source and target data centers > >> > >> 2. Add a number of documents to the source data center > >> > >> 3. Verify: > >> > >>$ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: ";
Re: Issue with CDCR bootstrapping in Solr 7.1
Tom, (and take care not to restart the leader node otherwise it will replicate > from one of the replicas which is missing the index). How is this possible? Ok I will look more into it. Appreciate if someone else also chimes in if they have similar issue. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 Medium: https://medium.com/@sarkaramrit2 On Fri, Dec 1, 2017 at 4:49 AM, Tom Peters wrote: > Hi Amrit, I tried issuing hard commits to the various nodes in the target > cluster and it does not appear to cause the follower replicas to receive > the initial index. The only way I can get the replicas to see the original > index is by restarting those nodes (and take care not to restart the leader > node otherwise it will replicate from one of the replicas which is missing > the index). > > > > On Nov 30, 2017, at 12:16 PM, Amrit Sarkar > wrote: > > > > Tom, > > > > This is very useful: > > > >> I found a way to get the follower replicas to receive the documents from > >> the leader in the target data center, I have to restart the solr > instance > >> running on that server. Not sure if this information helps at all. > > > > > > You have to issue hardcommit on target after the bootstrapping is done. > > Reloading makes the core opening a new searcher. While explicit commit is > > issued at target leader after the BS is done, follower are left > unattended > > though the docs are copied over. > > > > Amrit Sarkar > > Search Engineer > > Lucidworks, Inc. > > 415-589-9269 > > www.lucidworks.com > > Twitter http://twitter.com/lucidworks > > LinkedIn: https://www.linkedin.com/in/sarkaramrit2 > > Medium: https://medium.com/@sarkaramrit2 > > > > On Thu, Nov 30, 2017 at 10:06 PM, Tom Peters > wrote: > > > >> Hi Amrit, > >> > >> Starting with more documents doesn't appear to have made a difference. > >> This time I tried with >1000 docs. Here are the steps I took: > >> > >> 1. Deleted the collection on both the source and target DCs. > >> > >> 2. Recreated the collections. > >> > >> 3. Indexed >1000 documents on source data center, hard commmit > >> > >> $ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s > >> $i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound'; > done > >> solr01-a: 1368 > >> solr01-b: 1368 > >> solr01-c: 1368 > >> solr02-a: 0 > >> solr02-b: 0 > >> solr02-c: 0 > >> > >> 4. Enabled CDCR and checked docs > >> > >> $ curl 'solr01-a:8080/solr/synacor/cdcr?action=START' > >> > >> $ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s > >> $i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound'; > done > >> solr01-a: 1368 > >> solr01-b: 1368 > >> solr01-c: 1368 > >> solr02-a: 0 > >> solr02-b: 0 > >> solr02-c: 1368 > >> > >> Some additional notes: > >> > >> * I do not have numRecordsToKeep defined in my solrconfig.xml, so I > assume > >> it will use the default of 100 > >> > >> * I found a way to get the follower replicas to receive the documents > from > >> the leader in the target data center, I have to restart the solr > instance > >> running on that server. Not sure if this information helps at all. > >> > >>> On Nov 30, 2017, at 11:22 AM, Amrit Sarkar > >> wrote: > >>> > >>> Hi Tom, > >>> > >>> I see what you are saying and I too think this is a bug, but I will > >> confirm > >>> once on the code. Bootstrapping should happen on all the nodes of the > >>> target. > >>> > >>> Meanwhile can you index more than 100 documents in the source and do > the > >>> exact same experiment again. Followers will not copy the entire index > of > >>> Leader unless the difference in versions in docs are more than > >>> "numRecordsToKeep", which is default 100, unless you have modified in > >>> solrconfig.xml. > >>> > >>> Looking forward to your analysis. > >>> > >>> Amrit Sarkar > >>> Search Engineer > >>> Lucidworks, Inc. > >>> 415-589-9269 > >>> www.lucidworks.com > >>> Twitt
Re: Issue with CDCR bootstrapping in Solr 7.1
Tom, Thank you for trying out bunch of things with CDCR setup. I am successfully able to replicate the exact issue on my setup, this is a problem. I have opened a JIRA for the same: https://issues.apache.org/jira/browse/SOLR-11724. Feel free to add any relevant details as you like. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 Medium: https://medium.com/@sarkaramrit2 On Tue, Dec 5, 2017 at 2:23 AM, Tom Peters wrote: > Not sure how it's possible. But I also tried using the _default config and > just adding in the source and target configuration to make sure I didn't > have something wonky in my custom solrconfig that was causing this issue. I > can confirm that until I restart the follower nodes, they will not receive > the initial index. > > > On Dec 1, 2017, at 12:52 AM, Amrit Sarkar > wrote: > > > > Tom, > > > > (and take care not to restart the leader node otherwise it will replicate > >> from one of the replicas which is missing the index). > > > > How is this possible? Ok I will look more into it. Appreciate if someone > > else also chimes in if they have similar issue. > > > > Amrit Sarkar > > Search Engineer > > Lucidworks, Inc. > > 415-589-9269 > > www.lucidworks.com > > Twitter http://twitter.com/lucidworks > > LinkedIn: https://www.linkedin.com/in/sarkaramrit2 > > Medium: https://medium.com/@sarkaramrit2 > > > > On Fri, Dec 1, 2017 at 4:49 AM, Tom Peters wrote: > > > >> Hi Amrit, I tried issuing hard commits to the various nodes in the > target > >> cluster and it does not appear to cause the follower replicas to receive > >> the initial index. The only way I can get the replicas to see the > original > >> index is by restarting those nodes (and take care not to restart the > leader > >> node otherwise it will replicate from one of the replicas which is > missing > >> the index). > >> > >> > >>> On Nov 30, 2017, at 12:16 PM, Amrit Sarkar > >> wrote: > >>> > >>> Tom, > >>> > >>> This is very useful: > >>> > >>>> I found a way to get the follower replicas to receive the documents > from > >>>> the leader in the target data center, I have to restart the solr > >> instance > >>>> running on that server. Not sure if this information helps at all. > >>> > >>> > >>> You have to issue hardcommit on target after the bootstrapping is done. > >>> Reloading makes the core opening a new searcher. While explicit commit > is > >>> issued at target leader after the BS is done, follower are left > >> unattended > >>> though the docs are copied over. > >>> > >>> Amrit Sarkar > >>> Search Engineer > >>> Lucidworks, Inc. > >>> 415-589-9269 > >>> www.lucidworks.com > >>> Twitter http://twitter.com/lucidworks > >>> LinkedIn: https://www.linkedin.com/in/sarkaramrit2 > >>> Medium: https://medium.com/@sarkaramrit2 > >>> > >>> On Thu, Nov 30, 2017 at 10:06 PM, Tom Peters > >> wrote: > >>> > >>>> Hi Amrit, > >>>> > >>>> Starting with more documents doesn't appear to have made a difference. > >>>> This time I tried with >1000 docs. Here are the steps I took: > >>>> > >>>> 1. Deleted the collection on both the source and target DCs. > >>>> > >>>> 2. Recreated the collections. > >>>> > >>>> 3. Indexed >1000 documents on source data center, hard commmit > >>>> > >>>> $ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s > >>>> $i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound'; > >> done > >>>> solr01-a: 1368 > >>>> solr01-b: 1368 > >>>> solr01-c: 1368 > >>>> solr02-a: 0 > >>>> solr02-b: 0 > >>>> solr02-c: 0 > >>>> > >>>> 4. Enabled CDCR and checked docs > >>>> > >>>> $ curl 'solr01-a:8080/solr/synacor/cdcr?action=START' > >>>> > >>>> $ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s > >>>> $i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound'; > >> done > >>&g
Identify Reference Leak in Custom Code related to Solr
Hi, We incorporated *https://github.com/sematext/solr-researcher <https://github.com/sematext/solr-researcher>* into our project and it is responsible for memory leak / reference leak which is causing multiple *SolrIndexSearcher *objects in the heap dump. 37 instances of *"org.apache.solr.search.SolrIndexSearcher"*, loaded by *"org.eclipse.jetty.webapp.WebAppClassLoader @ 0x5e0020830"*occupy *744,482,384 (48.16%)* bytes. Biggest instances: - org.apache.solr.search.SolrIndexSearcher @ 0x5fcac64c0 - 108,168,104 (7.00%) bytes. - org.apache.solr.search.SolrIndexSearcher @ 0x616b414b0 - 54,982,536 (3.56%) bytes. - org.apache.solr.search.SolrIndexSearcher @ 0x60aaa5820 - 35,614,544 (2.30%) bytes. - org.apache.solr.search.SolrIndexSearcher @ 0x5ed303418 - 26,742,472 (1.73%) bytes. - org.apache.solr.search.SolrIndexSearcher @ 0x6c04d8948 - 26,413,728 (1.71%) bytes. - org.apache.solr.search.SolrIndexSearcher @ 0x66d2f1ca8 - 26,230,600 (1.70%) bytes. - org.apache.solr.search.SolrIndexSearcher @ 0x624904550 - 25,800,200 (1.67%) bytes. - org.apache.solr.search.SolrIndexSearcher @ 0x6baa4c5f8 - 25,094,760 (1.62%) bytes. - org.apache.solr.search.SolrIndexSearcher @ 0x676fefdd0 - 24,720,312 (1.60%) bytes. - org.apache.solr.search.SolrIndexSearcher @ 0x6634d7a08 - 24,315,864 (1.57%) bytes. - org.apache.solr.search.SolrIndexSearcher @ 0x652a82880 - 24,186,328 (1.56%) bytes. - org.apache.solr.search.SolrIndexSearcher @ 0x6ad3ef080 - 24,078,800 (1.56%) bytes. - org.apache.solr.search.SolrIndexSearcher @ 0x64bf747b0 - 24,073,736 (1.56%) bytes. - org.apache.solr.search.SolrIndexSearcher @ 0x6a752cce0 - 23,937,584 (1.55%) bytes. - org.apache.solr.search.SolrIndexSearcher @ 0x698fba4f8 - 23,339,000 (1.51%) bytes. - org.apache.solr.search.SolrIndexSearcher @ 0x6a12724c0 - 23,066,512 (1.49%) bytes. We would really appreciate if some can help us on how to pin-point: 1. *Reference leak* (since it is an independent third-party plugin). This is taking almost 80% of the total heap memory allocated (16GB). Looking forward to positive responses. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 Medium: https://medium.com/@sarkaramrit2
Re: Identify Reference Leak in Custom Code related to Solr
Emir, Solr version: 6.6, SolrCloud We followed the instructions on README.md on the github project. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 Medium: https://medium.com/@sarkaramrit2 On Mon, Dec 18, 2017 at 5:13 PM, Emir Arnautović < emir.arnauto...@sematext.com> wrote: > Hi Amrit, > I’ll check with my colleague that worked on this. In the meantime, can you > provide more info about setup: Solr version, M-S or cloud and steps that we > can do to reproduce it. > > Thanks, > Emir > -- > Monitoring - Log Management - Alerting - Anomaly Detection > Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > > > > > On 18 Dec 2017, at 12:10, Amrit Sarkar wrote: > > > > Hi, > > > > We incorporated *https://github.com/sematext/solr-researcher > > <https://github.com/sematext/solr-researcher>* into our project and it > is > > responsible for memory leak / reference leak which is causing multiple > > *SolrIndexSearcher > > *objects in the heap dump. > > > > 37 instances of *"org.apache.solr.search.SolrIndexSearcher"*, loaded > > by *"org.eclipse.jetty.webapp.WebAppClassLoader > > @ 0x5e0020830"*occupy *744,482,384 (48.16%)* bytes. > > > > Biggest instances: > > > > - org.apache.solr.search.SolrIndexSearcher @ 0x5fcac64c0 - 108,168,104 > > (7.00%) bytes. > > - org.apache.solr.search.SolrIndexSearcher @ 0x616b414b0 - 54,982,536 > > (3.56%) bytes. > > - org.apache.solr.search.SolrIndexSearcher @ 0x60aaa5820 - 35,614,544 > > (2.30%) bytes. > > - org.apache.solr.search.SolrIndexSearcher @ 0x5ed303418 - 26,742,472 > > (1.73%) bytes. > > - org.apache.solr.search.SolrIndexSearcher @ 0x6c04d8948 - 26,413,728 > > (1.71%) bytes. > > - org.apache.solr.search.SolrIndexSearcher @ 0x66d2f1ca8 - 26,230,600 > > (1.70%) bytes. > > - org.apache.solr.search.SolrIndexSearcher @ 0x624904550 - 25,800,200 > > (1.67%) bytes. > > - org.apache.solr.search.SolrIndexSearcher @ 0x6baa4c5f8 - 25,094,760 > > (1.62%) bytes. > > - org.apache.solr.search.SolrIndexSearcher @ 0x676fefdd0 - 24,720,312 > > (1.60%) bytes. > > - org.apache.solr.search.SolrIndexSearcher @ 0x6634d7a08 - 24,315,864 > > (1.57%) bytes. > > - org.apache.solr.search.SolrIndexSearcher @ 0x652a82880 - 24,186,328 > > (1.56%) bytes. > > - org.apache.solr.search.SolrIndexSearcher @ 0x6ad3ef080 - 24,078,800 > > (1.56%) bytes. > > - org.apache.solr.search.SolrIndexSearcher @ 0x64bf747b0 - 24,073,736 > > (1.56%) bytes. > > - org.apache.solr.search.SolrIndexSearcher @ 0x6a752cce0 - 23,937,584 > > (1.55%) bytes. > > - org.apache.solr.search.SolrIndexSearcher @ 0x698fba4f8 - 23,339,000 > > (1.51%) bytes. > > - org.apache.solr.search.SolrIndexSearcher @ 0x6a12724c0 - 23,066,512 > > (1.49%) bytes. > > > > > > We would really appreciate if some can help us on how to pin-point: > > > > 1. *Reference leak* (since it is an independent third-party plugin). > > > > This is taking almost 80% of the total heap memory allocated (16GB). > > Looking forward to positive responses. > > > > Amrit Sarkar > > Search Engineer > > Lucidworks, Inc. > > 415-589-9269 > > www.lucidworks.com > > Twitter http://twitter.com/lucidworks > > LinkedIn: https://www.linkedin.com/in/sarkaramrit2 > > Medium: https://medium.com/@sarkaramrit2 > >
Regarding embedded ZK with Solr
I would like to understand how the embedded ZK works with Solr. If Xg memory is allocated to the Solr installation and we spin up the SolrCloud with embedded ZK; what part/percentage of the X is allocated to the ZK or is it shared? If that is known, how can I change the memory settings for the embedded ZK? Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Re: Step By Step guide to create Solr Cloud in Solr 6.x
Following up Erick's response, This particular article will help setting up Setting up Solr Cloud 6.3.0 with Zookeeper 3.4.6 <https://medium.com/@sarkaramrit2/setting-up-solr-cloud-6-3-0-with-zookeeper-3-4-6-867b96ec4272> Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Re: distribution of leader and replica in SolrCloud
Bernd, When you create a collection via Collections API, the internal logic tries its best to equally distribute the nodes across the shards but sometimes it don't happen. The best thing about SolrCloud is you can manipulate its cloud architecture on the fly using Collections API. You can delete a replica of one particular shard and add a replica (on a specific machine/node) to any of the shards anytime depending to your design. For the above, you can simply: call DELETEREPLICA api on shard1--->server2:7574 (or the other one) https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-DELETEREPLICA:DeleteaReplica boss -- shard1 | |-- server2:8983 (leader) | --- shard2 - server1:8983 | |-- server5:7575 (leader) | --- shard3 - server3:8983 (leader) | |-- server4:8983 | --- shard4 - server1:7574 (leader) | |-- server4:7574 | --- shard5 - server3:7574 (leader) |-- server5:8983 call ADDREPLICA api on shard1>server1:8983 https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-DELETEREPLICA:DeleteaReplica boss -- shard1 - server1:8983 | |-- server2:8983 (leader) | --- shard2 - server1:8983 | |-- server5:7575 (leader) | --- shard3 - server3:8983 (leader) | |-- server4:8983 | --- shard4 - server1:7574 (leader) | |-- server4:7574 | --- shard5 - server3:7574 (leader) |-- server5:8983 Hope this helps. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 On Mon, May 8, 2017 at 5:08 PM, Bernd Fehling < bernd.fehl...@uni-bielefeld.de> wrote: > My assumption was that the strength of SolrCloud is the distribution > of leader and replica within the Cloud and make the Cloud somewhat > failsafe. > But after setting up SolrCloud with a collection I have both, leader and > replica, on the same shard. And this should be failsafe? > > o.a.s.h.a.CollectionsHandler Invoked Collection Action :create with params > replicationFactor=2&routerName=compositeId&collection.configName=boss& > maxShardsPerNode=1&name=boss&router.name=compositeId&action= > CREATE&numShards=5 > > boss -- shard1 - server2:7574 >| |-- server2:8983 (leader) >| > --- shard2 - server1:8983 >| |-- server5:7575 (leader) >| > --- shard3 - server3:8983 (leader) >| |-- server4:8983 >| > --- shard4 - server1:7574 (leader) >| |-- server4:7574 >| > --- shard5 - server3:7574 (leader) > |-- server5:8983 > > From my point of view, if server2 is going to crash then shard1 will > disappear and > 1/5th of the index is missing. > > What is your opinion? > > Regards > Bernd > > > >
Re: SPLITSHARD Working
Vrinda, The expected behavior if parent shard 'shardA' resides on node'1', node'2' ... node'n' and do a SPLITSHARD on it. the child shards, shardA_0 and shardA_1 will reside on node'1', node'2' ... node'n'. shardA --- node'1' (leader) & node'2' (replica) after splitshard; shardA --- node'1' (leader) & node'2' (replica) (INACTIVE) shardA_0 -- node'1' & node'2' (ACTIVE) shardA_1 -- node'1' & node'2' (ACTIVE) Any one of them can be a leader and replica for the children nodes. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 On Mon, May 8, 2017 at 4:32 PM, vrindavda wrote: > Thanks I go it. > > But I see that distribution of shards and replicas is not equal. > > For Example in my case : > I had shard 1 and shard2 on Node 1 and their replica_1 and replica_2 on > Node 2. > I did SHARDSPLIT on shard1 to get shard1_0 and shard1_1 such that > and shard1_0_replica0 are created on Node 1 and shard1_0_replica1, > shard1_1_replica1 and shard1_1_replica0 on Node 2. > > Is this expected behavior ? > > Thank you, > Vrinda Davda > > > > -- > View this message in context: http://lucene.472066.n3. > nabble.com/SPLITSHARD-Working-tp4333876p4333922.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Could not initialize class JdbcSynonymFilterFactory
Just gathering more information on this Solr-JDBC; Is it a open source plugin provided on https://github.com/shopping24/ and not part of actual project *lucene-solr* project? https://github.com/shopping24/solr-jdbc-synonyms Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 On Tue, May 9, 2017 at 4:30 PM, sajjad karimi wrote: > http://stackoverflow.com/questions/43857712/could-not-initialize-class- > jdbcsynonymfilterfactory > : > > > I'm new to solr, I want to add a field type with JdbcSynonymFilter and > JdbcStopFilter to solr schema. I added my data source same as instruction > in this link: [Loading stopwords from Postgresql to Solr6][1] > > then i configured managed-schema with code below: > > > > pattern="[\s]+" > /> > class="com.s24.search.solr.analysis.jdbc.JdbcSynonymFilterFactory" >sql="SELECT concat(term, '=>', use) as line FROM thesaurus;" >dataSource="jdbc/dsTest" ignoreCase="false" expand="true" /> > class="com.s24.search.solr.analysis.jdbc.JdbcStopFilterFactory" > sql="SELECT stopword FROM stopwords" > dataSource="jdbc/dsTest"/> > > > > I added solr-jdbc to dist folder, postgressql driver, beanutils and > dbutils to contrib/jdbc/lib folder. Then, I included libs in solrconfig.xml > of data_driven_schema_configs: > > regex=".*\.jar" /> >regex="solr-jdbc-\d.*\.jar" /> > > I encountered the following error when I was trying to start SolrCloud. > > > "Could not initialize class > com.s24.search.solr.analysis.jdbc.JdbcSynonymFilterFactory, > trace=java.lang.NoClassDefFoundError: > Could not initialize class > com.s24.search.solr.analysis.jdbc.JdbcSynonymFilterFactory" > > > [1]: > http://stackoverflow.com/questions/43724758/loading- > stopwords-from-postgresql-to-solr6?noredirect=1#comment74559858_43724758 >
Re: Number of requests spike up, when i do the delta Import.
I am facing kinda similar issue lately where full-import is taking seconds while delta-import is taking hours. Can you share some more metrics/numbers related to full-import and delta-import requested, rows fetched and time? Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 On Wed, May 31, 2017 at 2:51 PM, vrindavda wrote: > Hello, > Number of requests spike up, whenever I do the delta import in Solr. > Please help me understand this. > > > <http://lucene.472066.n3.nabble.com/file/n4338162/solr.jpg> > > > > -- > View this message in context: http://lucene.472066.n3. > nabble.com/Number-of-requests-spike-up-when-i-do-the-delta- > Import-tp4338162.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Solr Document Routing
Sathyam, It seems your interpretation is wrong as CloudSolrClient calculates (hashes the document id and determine the range it belongs to) which shard the document incoming belongs to. As you have 10 shards, the document will belong to one of them, that is what being calculated and eventually pushed to the leader of that shard. The confluence link provides the insights in much detail: https://lucidworks.com/2013/06/13/solr-cloud-document-routing/ Another useful link: https://lucidworks.com/2013/06/13/solr-cloud-document-routing/ Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 On Thu, Jun 1, 2017 at 11:52 AM, Sathyam wrote: > HI, > > I am indexing documents to a 10 shard collection (testcollection, having no > replicas) in solr6 cluster using CloudSolrClient. I saw that there is a lot > of peer to peer document distribution going on when I looked at the solr > logs. > > An example log statement is as follows: > 2017-06-01 06:07:28.378 INFO (qtp1358444045-3673692) [c:testcollection > s:shard8 r:core_node7 x:testcollection_shard8_replica1] > o.a.s.u.p.LogUpdateProcessorFactory [testcollection_shard8_replica1] > webapp=/solr path=/update params={update.distrib=TOLEADER&distrib.from= > http://10.199.42.29:8983/solr/testcollection_shard7_ > replica1/&wt=javabin&version=2}{add=[BQECDwZGTCEBHZZBBiIP > (1568981383488995328), BQEBBQZB2il3wGT/0/mB (1568981383490043904), > BQEBBQZFnhOJRj+m9RJC (1568981383491092480), BQEGBgZIeBE1klHS4fxk > (1568981383492141056), BQEBBQZFVTmRx2VuCgfV (1568981383493189632)]} 0 25 > > When I went through the code of CloudSolrClient on grepcode I saw that the > client itself finds out which server it needs to hit by using the message > id hash and getting the shard range information from state.json. > Then it is quite confusing to me why there is a distribution of data > between peers as there is no replication and each shard is a leader. > > I would like to know why this is happening and how to avoid it or if the > above log statement means something else and I am misinterpreting > something. > > -- > Sathyam Doraswamy >
Re: Solr Document Routing
Sorry, The confluence link: https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 On Thu, Jun 1, 2017 at 2:11 PM, Amrit Sarkar wrote: > Sathyam, > > It seems your interpretation is wrong as CloudSolrClient calculates > (hashes the document id and determine the range it belongs to) which shard > the document incoming belongs to. As you have 10 shards, the document will > belong to one of them, that is what being calculated and eventually pushed > to the leader of that shard. > > The confluence link provides the insights in much detail: > https://lucidworks.com/2013/06/13/solr-cloud-document-routing/ > Another useful link: https://lucidworks.com/2013/06/13/solr-cloud- > document-routing/ > > Amrit Sarkar > Search Engineer > Lucidworks, Inc. > 415-589-9269 > www.lucidworks.com > Twitter http://twitter.com/lucidworks > LinkedIn: https://www.linkedin.com/in/sarkaramrit2 > > On Thu, Jun 1, 2017 at 11:52 AM, Sathyam > wrote: > >> HI, >> >> I am indexing documents to a 10 shard collection (testcollection, having >> no >> replicas) in solr6 cluster using CloudSolrClient. I saw that there is a >> lot >> of peer to peer document distribution going on when I looked at the solr >> logs. >> >> An example log statement is as follows: >> 2017-06-01 06:07:28.378 INFO (qtp1358444045-3673692) [c:testcollection >> s:shard8 r:core_node7 x:testcollection_shard8_replica1] >> o.a.s.u.p.LogUpdateProcessorFactory [testcollection_shard8_replica1] >> webapp=/solr path=/update params={update.distrib=TOLEADER&distrib.from= >> http://10.199.42.29:8983/solr/testcollection_shard7_replica1 >> /&wt=javabin&version=2}{add=[BQECDwZGTCEBHZZBBiIP >> (1568981383488995328), BQEBBQZB2il3wGT/0/mB (1568981383490043904), >> BQEBBQZFnhOJRj+m9RJC (1568981383491092480), BQEGBgZIeBE1klHS4fxk >> (1568981383492141056), BQEBBQZFVTmRx2VuCgfV (1568981383493189632)]} 0 25 >> >> When I went through the code of CloudSolrClient on grepcode I saw that the >> client itself finds out which server it needs to hit by using the message >> id hash and getting the shard range information from state.json. >> Then it is quite confusing to me why there is a distribution of data >> between peers as there is no replication and each shard is a leader. >> >> I would like to know why this is happening and how to avoid it or if the >> above log statement means something else and I am misinterpreting >> something. >> >> -- >> Sathyam Doraswamy >> > >
Re: Number of requests spike up, when i do the delta Import.
Erick, Thanks for the pointer. Getting astray from what Vrinda is looking for (sorry about that), what if there are no sub-entities? and no deltaImportQuery passed too. I looked into the code and determine it calculates the deltaImportQuery itself, SQLEntityProcessor:getDeltaImportQuery(..)::126. Ideally then, a full-import or the delta-import should take similar time to build the docs (fetch next row). I may very well be going entirely wrong here. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 On Thu, Jun 1, 2017 at 1:50 PM, vrindavda wrote: > Thanks Erick, > > But how do I solve this? I tried creating Stored proc instead of plain > query, but no change in performance. > > For delta import it in processing more documents than the total documents. > In this case delta import is not helping at all, I cannot switch to full > import each time. This was working fine with less data. > > Thank you, > Vrinda Davda > > > > -- > View this message in context: http://lucene.472066.n3. > nabble.com/Number-of-requests-spike-up-when-i-do-the-delta- > Import-tp4338162p4338444.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: SolrCloud CDCR issue
Hi, Yeah if you look above I have stated the same jira. I see your question on 3DCs with Active-Active scenario, will respond there. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 Medium: https://medium.com/@sarkaramrit2 On Mon, Aug 13, 2018 at 9:43 PM cdatta wrote: > And I was thinking about this one: > https://issues.apache.org/jira/browse/SOLR-11959. > > > > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html >
Re: SolrCloud CDCR with 3+ DCs
To the concerned, This is certainly unfortunate if 3-way Active CDCR is not happening successfully. At the time of writing the feature I was able to perform N-way Active CDCR approach. How are the logs looking, are the documents are not getting forwarded in sync? Can you attach the source solr cluster server logs? Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 Medium: https://medium.com/@sarkaramrit2 On Fri, Aug 17, 2018 at 11:49 PM cdatta wrote: > Any pointer would be much appreciated.. > > Thanks.. > > > > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html >
Solr CDCR replication not working
Basic Authentication in clusters is not supported as of today in CDCR. On Fri, 7 Sep 2018, 4:53 pm Mrityunjaya Pathak, wrote: > I have setup two solr cloud instances in two different Datacenters Target > solr cloud machine is copy of source machine with basicAuth enabled on > them. I am unable to see any replication on target. > > Solr Version :6.6.3 > > I have done config changes as suggested on > https://lucene.apache.org/solr/guide/6_6/cross-data-center-replication-cdcr.html > > Source Config Changes > > > > ... > > > serverIP:2181,serverIP:2182,serverIP:2183 > sitecore_master_index > sitecore_master_index > > > > 8 > 1000 > 128 > > > > 1000 > > > > > > ${solr.ulog.dir:} >name="numVersionBuckets">${solr.ulog.numVersionBuckets:65536} > > > > > ${solr.autoCommit.maxTime:15000} > false > > > > ${solr.autoSoftCommit.maxTime:-1} > > > > ... > > > Target Config Changes > > > > ... > > > disabled > > > > > > > > > cdcr-proc-chain > > > > > > ${solr.ulog.dir:} >name="numVersionBuckets">${solr.ulog.numVersionBuckets:65536} > > > > ${solr.autoCommit.maxTime:15000} > false > > > > ${solr.autoSoftCommit.maxTime:-1} > > > > > ... > > > Below are logs from Source target. > > ERROR (zkCallback-4-thread-2-processing-n:sourceIP:8983_solr) [ ] > o.a.s.c.s.i.CloudSolrClient Request to collection collection1 failed due to > (510) org.apache.solr.common.SolrException: Could not find a healthy node > to handle the request., retry? 5 > 2018-09-07 10:36:14.295 WARN > (zkCallback-4-thread-2-processing-n:sourceIP:8983_solr) [ ] > o.a.s.h.CdcrReplicatorManager Unable to instantiate the log reader for > target collection collection1 > org.apache.solr.common.SolrException: Could not find a healthy node to > handle the request. > at > org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1377) > at > org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:1134) > at > org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:1237) > at > org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:1237) > at > org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:1237) > at > org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:1237) > at > org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:1237) > at > org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:1073) > at > org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219) > at > org.apache.solr.handler.CdcrReplicatorManager.getCheckpoint(CdcrReplicatorManager.java:196) > at > org.apache.solr.handler.CdcrReplicatorManager.initLogReaders(CdcrReplicatorManager.java:159) > at > org.apache.solr.handler.CdcrReplicatorManager.stateUpdate(CdcrReplicatorManager.java:134) > at > org.apache.solr.handler.CdcrStateManager.callback(CdcrStateManager.java:36) > at > org.apache.solr.handler.CdcrLeaderStateManager.setAmILeader(CdcrLeaderStateManager.java:108) > at > org.apache.solr.handler.CdcrLeaderStateManager.checkIfIAmLeader(CdcrLeaderStateManager.java:95) > at > org.apache.solr.handler.CdcrLeaderStateManager.access$400(CdcrLeaderStateManager.java:40) > at > org.apache.solr.handler.CdcrLeaderStateManager$LeaderStateWatcher.process(CdcrLeaderStateManager.java:150) > at > org.apache.solr.common.cloud.SolrZkClient$3.lambda$process$0(SolrZkClient.java:269) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2018-09-07 10:36:14.310 INFO > (coreLoadExecutor-8-thread-3-processing-n:sourceIP:8983_solr) [ ] > o.a.s.c.SolrConfig Using Lucene MatchVersion: 6.6.3 > 2018-09-07 10:36:14.315 INFO > (zkCallback-4-thread-1-processing-n:sourceIP:8983_solr) [ ] > o.a.s.c.c.ZkStateReader A cluster state change: [WatchedEvent > state:SyncConnected type:NodeDataChanged > path:/collections/collection1/state.json] for collection [sitecore] has > occurred - updating... (live nodes size: [1]) > 2018-09-07 10:36:14.343 WARN > (cdcr-replicator-211-thread-
Re: SolrCloud CDCR with 3+ DCs
Yeah, I am not sure about how the Authentication band aid feature will work, the mentioned stackoverflow link. It is about time we include basic authentication support in CDCR. On Thu, 6 Sep 2018, 8:41 pm cdatta, wrote: > Hi Amrit, Thanks for your response. > > We wiped out our complete installation and started a fresh one. Now the > multi-direction replication is working but we are seeing errors related to > the authentication sporadically. > > Thanks & Regards, > Chandi Datta > > > > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html >
Non-Solr-related | Reporting abuse | Harshit Arora
Community members help each other out when you behave with decency. This man definitely doesn't know how to. [image: Screen Shot 2018-09-28 at 1.07.11 AM.png] I want to make sure he gets recognized IF he ever reach out to the mailing list: https://lnkd.in/fWkfDCv Malaviya National Institute of Technology Jaipur, India Apologies in advance and kindly ignore if this doesn't concern you. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 Medium: https://medium.com/@sarkaramrit2
Re: partial update in solr
Hi Zahra, To answer your question on seeing "No such processor atomic" with AtomicUpdateProcessorFactory; The feature is introduced in Solr 6.6.1 and 7.0 and is available in the versions later. I am trying the below on v 7.4 and it is working fine, without adding any component on solrconfig.xml: > http://localhost:8983/solr/collectio1/update/json/docs?processor=atomic&atomic.my_newfield=add&atomic.subject=set&atomic.count_i=inc&commit=true > --data-binary {"id": 1,"title": "titleA"} > The Javadocs <https://lucene.apache.org/solr/7_5_0//solr-core/org/apache/solr/update/processor/AtomicUpdateProcessorFactory.html> are broken and I am working on fixing it. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 Medium: https://medium.com/@sarkaramrit2 On Mon, Oct 29, 2018 at 7:26 PM Alexandre Rafalovitch wrote: > I am not sure. I haven't tried this particular path. Your original > question was without using SolrJ. Maybe others have. > > However, I am also not sure how much sense this makes. This Atomic > processor is to make it easier to do the merge when you cannot modify > the source documents. But if you are already doing it from SolrJ, you > could do an update just as easily as trying the atomic approach. > > Regards, >Alex. > On Mon, 29 Oct 2018 at 09:40, Zahra Aminolroaya > wrote: > > > > Thanks Alex. I want to have a query for atomic update with solrj like > below: > > > > > http://localhost:8983/solr/test4/update?preprocessor=atomic&atomic.text2=set&atomic.text=set&atomic.text3=set&commit=true&stream.body=%3Cadd%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3E11%3C/field%3E%3Cfield%20name=%22text3%22%20update=%22set%22%3Ehi%3C/field%3E%3C/doc%3E%3C/add%3E > > > > > > First, in solrj, I used "setfield" instead of "addfield" like > > doc.setField("text3", "hi"); > > > > > > Then, I added ModifiableSolrParams : > > > > > > ModifiableSolrParams add = new ModifiableSolrParams() > > .add("processor", "atomic") > > .add("atomic.text", "set") > > .add("atomic.text2", "set") > > .add("atomic.text3", "set") > > .add(UpdateParams.COMMIT, "true") > > .add("commit","true"); > > > > And then I updated my document: > > > > req.setParams(add); > > req.setAction( UpdateRequest.ACTION.COMMIT,false,false ); > > req.add(docs); > > UpdateResponse rsp = req.process( server ); > > > > > > > > However, I get "No such processor atomic" > > > > > > As you see I set commit to true. What the problem is? > > > > > > > > > > -- > > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html >
Re: Negative CDCR Queue Size?
Hi Webster, The queue size "*-1*" suggests the target is not initialized, and you should see a "WARN" in the logs suggesting something bad happened at the respective target. I am also posting the source code for reference. Any chance you can look for WARN in the logs or probably check at respective source and target the CDCR is configured and was running ok? without any manual intervention? Also, you mentioned there are a number of intermittent issues with CDCR, I see you have reported few Jiras. I will be grateful if you can report the rest? Code: > for (CdcrReplicatorState state : replicatorManager.getReplicatorStates()) { > NamedList queueStats = new NamedList(); > CdcrUpdateLog.CdcrLogReader logReader = state.getLogReader(); > if (logReader == null) { > String collectionName = > req.getCore().getCoreDescriptor().getCloudDescriptor().getCollectionName(); > String shard = > req.getCore().getCoreDescriptor().getCloudDescriptor().getShardId(); > log.warn("The log reader for target collection {} is not initialised @ > {}:{}", > state.getTargetCollection(), collectionName, shard); > queueStats.add(CdcrParams.QUEUE_SIZE, -1l); > } else { > queueStats.add(CdcrParams.QUEUE_SIZE, > logReader.getNumberOfRemainingRecords()); > } > queueStats.add(CdcrParams.LAST_TIMESTAMP, > state.getTimestampOfLastProcessedOperation()); > if (hosts.get(state.getZkHost()) == null) { > hosts.add(state.getZkHost(), new NamedList()); > } > ((NamedList) hosts.get(state.getZkHost())).add(state.getTargetCollection(), > queueStats); > } > rsp.add(CdcrParams.QUEUES, hosts); > > Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 Medium: https://medium.com/@sarkaramrit2 On Wed, Nov 7, 2018 at 12:47 AM Webster Homer < webster.ho...@milliporesigma.com> wrote: > I'm sorry I should have included that. We are running Solr 7.2. We use > CDCR for almost all of our collections. We have experienced several > intermittent problems with CDCR, this one seems to be new, at least I > hadn't seen it before > > -Original Message- > From: Erick Erickson [mailto:erickerick...@gmail.com] > Sent: Tuesday, November 06, 2018 12:36 PM > To: solr-user > Subject: Re: Negative CDCR Queue Size? > > What version of Solr? CDCR has changed quite a bit in the 7x code line so > it's important to know the version. > > On Tue, Nov 6, 2018 at 10:32 AM Webster Homer < > webster.ho...@milliporesigma.com> wrote: > > > > Several times I have noticed that the CDCR action=QUEUES will return a > negative queueSize. When this happens we seem to be missing data in the > target collection. How can this happen? What does a negative Queue size > mean? The timestamp is an empty string. > > > > We have two targets for a source. One looks like this, with a negative > > queue size > > queues": > > ["uc1f-ecom-mzk01.sial.com:2181,uc1f-ecom-mzk02.sial.com:2181,uc1f-eco > > m-mzk03.sial.com:2181/solr",["ucb-catalog-material-180317",["queueSize > > ",-1,"lastTimestamp",""]], > > > > The other is healthy > > "ae1b-ecom-mzk01.sial.com:2181,ae1b-ecom-mzk02.sial.com:2181,ae1b-ecom > > -mzk03.sial.com:2181/solr",["ucb-catalog-material-180317",["queueSize" > > ,246980,"lastTimestamp","2018-11-06T16:21:53.265Z"]] > > > > We are not seeing CDCR errors. > > > > What could cause this behavior? >
Re: Bidirectional CDCR not working
Hi Arnold, You need "cdcr-processor-chain" definitions in solrconfig.xml on both clusters' collections. Both clusters need to act as source and target. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 Medium: https://medium.com/@sarkaramrit2 On Fri, Mar 15, 2019 at 1:03 AM Arnold Bronley wrote: > Hi, > > I used unidirectional CDCR in SolrCloud (7.7.1) without any issues. But > after setting up bidirectional cdcr configuration, I am not able to index a > document. > > Following is the error that I am getting: > > Async exception during distributed update: Error from server at > http://host1:8983/solr/techproducts_shard2_replica_n6: Bad Request > request: > http://host1 > > :8983/solr/techproducts_shard2_replica_n6/update?update.chain=cdcr-processor-chain&update.distrib=TOLEADER&distrib.from= > http://host2:8983/solr/techproducts_shard1_replica_n1&wt=javabin&version=2 > Remote error message: unknown UpdateRequestProcessorChain: > cdcr-processor-chain > > Do you know why I might be getting this error? >
Re: Solr 6.6.0 - Error: can not use FieldCache on multivalued field: categoryLevels
Vincenzo, As I read the source code; SchemaField.java /** * Sanity checks that the properties of this field type are plausible * for a field that may be used to get a FieldCacheSource, throwing * an appropriate exception (including the field name) if it is not. * FieldType subclasses can choose to call this method in their * getValueSource implementation * @see FieldType#getValueSource */ public void checkFieldCacheSource() throws SolrException { if ( multiValued() ) { throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, "can not use FieldCache on multivalued field: " + getName()); } if (! hasDocValues() ) { if ( ! ( indexed() && null != this.type.getUninversionType(this) ) ) { throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, "can not use FieldCache on a field w/o docValues unless it is indexed and supports Uninversion: " + getName()); } } } Seems like FieldCache are not allowed to un-invert values for multi-valued fields. I can suspect the reason, multiple values will eat up more memory? Not sure, someone else can weigh in. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 Medium: https://medium.com/@sarkaramrit2 On Mon, Feb 26, 2018 at 7:37 PM, Vincenzo D'Amore wrote: > Hi, > > while trying to run a group query on a multivalue field I received this > error: > > can not use FieldCache on multivalued field: > > > > > > true > 400 > 4 > > > > org.apache.solr.common.SolrException > org.apache.solr.common. > SolrException > > can not use FieldCache on multivalued field: > categoryLevels > 400 > > > > I don't understand why this is happening. > > Do you know any way to work around this problem? > > Thanks in advance, > Vincenzo > > -- > Vincenzo D'Amore >
Re: Solr CDCR doesn't work if the authentication is enabled
Nice. Can you please post the details on the JIRA too if possible: https://issues.apache.org/jira/browse/SOLR-11959 and we can probably put up a small patch of adding this bit of information in official documentation. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 Medium: https://medium.com/@sarkaramrit2 On Mon, Mar 5, 2018 at 8:11 PM, dimaf wrote: > To resolve the issue, I added names of Source node to /live_nodes of > Target. > https://stackoverflow.com/questions/48790621/solr-cdcr-doesnt-work-if-the- > authentication-is-enabled > <https://stackoverflow.com/questions/48790621/solr-cdcr- > doesnt-work-if-the-authentication-is-enabled> > > > > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html >
Re: CDCR Invalid Number on deletes
Hey Chris, I figured a separate issue while working on CDCR which may relate to your problem. Please see jira: *SOLR-12063* <https://issues.apache.org/jira/projects/SOLR/issues/SOLR-12063>. This is a bug got introduced when we supported the bidirectional approach where an extra flag in tlog entry for cdcr is added. This part of the code is messing up: *UpdateLog.java.RecentUpdates::update()::* switch (oper) { case UpdateLog.ADD: case UpdateLog.UPDATE_INPLACE: case UpdateLog.DELETE: case UpdateLog.DELETE_BY_QUERY: Update update = new Update(); update.log = oldLog; update.pointer = reader.position(); update.version = version; if (oper == UpdateLog.UPDATE_INPLACE && entry.size() == 5) { update.previousVersion = (Long) entry.get(UpdateLog.PREV_VERSION_IDX); } updatesForLog.add(update); updates.put(version, update); if (oper == UpdateLog.DELETE_BY_QUERY) { deleteByQueryList.add(update); } else if (oper == UpdateLog.DELETE) { deleteList.add(new DeleteUpdate(version, (byte[])entry.get(entry.size()-1))); } break; case UpdateLog.COMMIT: break; default: throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, "Unknown Operation! " + oper); } deleteList.add(new DeleteUpdate(version, (byte[])entry.get(entry.size()-1))); is expecting the last entry to be the payload, but everywhere in the project, *pos:[2] *is the index for the payload, while the last entry in source code is *boolean* in / after Solr 7.2, denoting update is cdcr forwarded or typical. UpdateLog.java.RecentUpdates is used to in cdcr sync, checkpoint operations and hence it is a legit bug, slipped the tests I wrote. The immediate fix patch is uploaded and I am awaiting feedback on that. Meanwhile if it is possible for you to apply the patch, build the jar and try it out, please do and let us know. For, *SOLR-9394* <https://issues.apache.org/jira/browse/SOLR-9394>, if you can comment on the JIRA and post the sample docs, solr logs, relevant information, I can give it a thorough look. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 Medium: https://medium.com/@sarkaramrit2 On Wed, Mar 7, 2018 at 1:35 AM, Chris Troullis wrote: > Hi all, > > We recently upgraded to Solr 7.2.0 as we saw that there were some CDCR bug > fixes and features added that would finally let us be able to make use of > it (bi-directional syncing was the big one). The first time we tried to > implement we ran into all kinds of errors, but this time we were able to > get it mostly working. > > The issue we seem to be having now is that any time a document is deleted > via deleteById from a collection on the primary node, we are flooded with > "Invalid Number" errors followed by a random sequence of characters when > CDCR tries to sync the update to the backup site. This happens on all of > our collections where our id fields are defined as longs (some of them the > ids are compound keys and are strings). > > Here's a sample exception: > > org.apache.solr.client.solrj.impl.CloudSolrClient$RouteException: Error > from server at http://ip/solr/collection_shard1_replica_n1: Invalid > Number: ] > -s > at > org.apache.solr.client.solrj.impl.CloudSolrClient. > directUpdate(CloudSolrClient.java:549) > at > org.apache.solr.client.solrj.impl.CloudSolrClient. > sendRequest(CloudSolrClient.java:1012) > at > org.apache.solr.client.solrj.impl.CloudSolrClient. > requestWithRetryOnStaleState(CloudSolrClient.java:883) > at > org.apache.solr.client.solrj.impl.CloudSolrClient. > requestWithRetryOnStaleState(CloudSolrClient.java:945) > at > org.apache.solr.client.solrj.impl.CloudSolrClient. > requestWithRetryOnStaleState(CloudSolrClient.java:945) > at > org.apache.solr.client.solrj.impl.CloudSolrClient. > requestWithRetryOnStaleState(CloudSolrClient.java:945) > at > org.apache.solr.client.solrj.impl.CloudSolrClient. > requestWithRetryOnStaleState(CloudSolrClient.java:945) > at > org.apache.solr.client.solrj.impl.CloudSolrClient. > requestWithRetryOnStaleState(CloudSolrClient.java:945) > at > org.apache.solr.client.solrj.impl.CloudSolrClient.request( > CloudSolrClient.java:816) > at > org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194) > at > org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:211) > at > org.apache.solr.handler.CdcrReplicator.sendRequest( > CdcrReplicator.java:140) > at > org.apache.solr.handler.CdcrReplicator.run(CdcrReplicator.java:104) > at > org.apache.sol
Re: Solr 7.2.0 CDCR Issue with TLOG collections
Webster, I updated the JIRA: *SOLR-12057 <https://issues.apache.org/jira/browse/SOLR-12057>, **CdcrUpdateProcessor* has a hack, it enable *PEER_SYNC* to bypass the leader logic in *DistributedUpdateProcessor.versionAdd,* which eventually ends up in segments not getting created. I wrote a very dirty patch which fixes the problem with basic tests to prove it works. I will try to polish and finish this as soon as possible. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 Medium: https://medium.com/@sarkaramrit2 On Tue, Mar 6, 2018 at 10:07 PM, Webster Homer wrote: > seems that this is a bug in Solr > https://issues.apache.org/jira/browse/SOLR-12057 > > Hopefully it can be addressed soon! > > On Mon, Mar 5, 2018 at 4:14 PM, Webster Homer > wrote: > > > I noticed that the cdcr action=queues returns different results for the > > target clouds. One target says that the updateLogSynchronizer is > > stopped the other says started. Why? What does that mean. We don't > > explicitly set that anywhere > > > > > > {"responseHeader": {"status": 0,"QTime": 0},"queues": [],"tlogTotalSize": > > 0,"tlogTotalCount": 0,"updateLogSynchronizer": "stopped"} > > > > and the other > > > > {"responseHeader": {"status": 0,"QTime": 0},"queues": [],"tlogTotalSize": > > 22254206389,"tlogTotalCount": 2,"updateLogSynchronizer": "started"} > > > > The source is as follows: > > { > > "responseHeader": { > > "status": 0, > > "QTime": 5 > > }, > > "queues": [ > > "xxx-mzk01.sial.com:2181,xxx-mzk02.sial.com:2181,xxx-mzk03. > > sial.com:2181/solr", > > [ > > "b2b-catalog-material-180124T", > > [ > > "queueSize", > > 0, > > "lastTimestamp", > > "2018-02-28T18:34:39.704Z" > > ] > > ], > > "yyy-mzk01.sial.com:2181,yyy-mzk02.sial.com:2181,yyy-mzk03. > > sial.com:2181/solr", > > [ > > "b2b-catalog-material-180124T", > > [ > > "queueSize", > > 0, > > "lastTimestamp", > > "2018-02-28T18:34:39.704Z" > > ] > > ] > > ], > > "tlogTotalSize": 1970848, > > "tlogTotalCount": 1, > > "updateLogSynchronizer": "stopped" > > } > > > > > > On Fri, Mar 2, 2018 at 5:05 PM, Webster Homer > > wrote: > > > >> It looks like the data is getting to the target servers. I see tlog > files > >> with the right timestamps. Looking at the timestamps on the documents in > >> the collection none of the data appears to have been loaded. > >> In the solr.log I see lots of /cdcr messages > action=LASTPROCESSEDVERSION, > >> action=COLLECTIONCHECKPOINT, and action=SHARDCHECKPOINT > >> > >> no errors > >> > >> autoCommit is set to 6 I tried sending a commit explicitly no > >> difference. cdcr is uploading data, but no new data appears in the > >> collection. > >> > >> On Fri, Mar 2, 2018 at 1:39 PM, Webster Homer > >> wrote: > >> > >>> We have been having strange behavior with CDCR on Solr 7.2.0. > >>> > >>> We have a number of replicas which have identical schemas. We found > that > >>> TLOG replicas give much more consistent search results. > >>> > >>> We created a collection using TLOG replicas in our QA clouds. > >>> We have a locally hosted solrcloud with 2 nodes, all our collections > >>> have 2 shards. We use CDCR to replicate the collections from this > >>> environment to 2 data centers hosted in Google cloud. This seems to > work > >>> fairly well for our collections with NRT replicas. However the new TLOG > >>> collection has problems. > >>> > >>> The google cloud solrclusters have 4 nodes each (3 separate > Zookeepers). > >>> 2 shards per collection with 2 replicas per shard. > >>> > >>> We never see data show up in the cloud collections, but we do see tlog > >>> files show up on the cloud servers. I can see that all of the servers > have > >>> cdcr started, buffers are disabled. > >>> The cdcr source configuration is: > >>> > >>> "requestHandler":{"
Re: CDCR Invalid Number on deletes
Hi Chris, Sorry I was off work for few days and didn't follow the conversation. The link is directing me to https://issues.apache.org/jira/projects/SOLR/issues/SOLR-12063. I think we have fixed the issue stated by you in the jira, though the symptoms were different than yours. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 Medium: https://medium.com/@sarkaramrit2 On Wed, Mar 21, 2018 at 1:17 AM, Chris Troullis wrote: > Nevermind I found itthe link you posted links me to SOLR-12036 instead > of SOLR-12063 for some reason. > > On Tue, Mar 20, 2018 at 1:51 PM, Chris Troullis > wrote: > > > Hey Amrit, > > > > Did you happen to see my last reply? Is SOLR-12036 the correct JIRA? > > > > Thanks, > > > > Chris > > > > On Wed, Mar 7, 2018 at 1:52 PM, Chris Troullis > > wrote: > > > >> Hey Amrit, thanks for the reply! > >> > >> I checked out SOLR-12036, but it doesn't look like it has to do with > >> CDCR, and the patch that is attached doesn't look CDCR related. Are you > >> sure that's the correct JIRA number? > >> > >> Thanks, > >> > >> Chris > >> > >> On Wed, Mar 7, 2018 at 11:21 AM, Amrit Sarkar > >> wrote: > >> > >>> Hey Chris, > >>> > >>> I figured a separate issue while working on CDCR which may relate to > your > >>> problem. Please see jira: *SOLR-12063* > >>> <https://issues.apache.org/jira/projects/SOLR/issues/SOLR-12063>. This > >>> is a > >>> bug got introduced when we supported the bidirectional approach where > an > >>> extra flag in tlog entry for cdcr is added. > >>> > >>> This part of the code is messing up: > >>> *UpdateLog.java.RecentUpdates::update()::* > >>> > >>> switch (oper) { > >>> case UpdateLog.ADD: > >>> case UpdateLog.UPDATE_INPLACE: > >>> case UpdateLog.DELETE: > >>> case UpdateLog.DELETE_BY_QUERY: > >>> Update update = new Update(); > >>> update.log = oldLog; > >>> update.pointer = reader.position(); > >>> update.version = version; > >>> > >>> if (oper == UpdateLog.UPDATE_INPLACE && entry.size() == 5) { > >>> update.previousVersion = (Long) entry.get(UpdateLog.PREV_VERSI > >>> ON_IDX); > >>> } > >>> updatesForLog.add(update); > >>> updates.put(version, update); > >>> > >>> if (oper == UpdateLog.DELETE_BY_QUERY) { > >>> deleteByQueryList.add(update); > >>> } else if (oper == UpdateLog.DELETE) { > >>> deleteList.add(new DeleteUpdate(version, > >>> (byte[])entry.get(entry.size()-1))); > >>> } > >>> > >>> break; > >>> > >>> case UpdateLog.COMMIT: > >>> break; > >>> default: > >>> throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, > >>> "Unknown Operation! " + oper); > >>> } > >>> > >>> deleteList.add(new DeleteUpdate(version, (byte[])entry.get(entry.size() > >>> -1))); > >>> > >>> is expecting the last entry to be the payload, but everywhere in the > >>> project, *pos:[2] *is the index for the payload, while the last entry > in > >>> source code is *boolean* in / after Solr 7.2, denoting update is cdcr > >>> forwarded or typical. UpdateLog.java.RecentUpdates is used to in cdcr > >>> sync, > >>> checkpoint operations and hence it is a legit bug, slipped the tests I > >>> wrote. > >>> > >>> The immediate fix patch is uploaded and I am awaiting feedback on that. > >>> Meanwhile if it is possible for you to apply the patch, build the jar > and > >>> try it out, please do and let us know. > >>> > >>> For, *SOLR-9394* <https://issues.apache.org/jira/browse/SOLR-9394>, if > >>> you > >>> can comment on the JIRA and post the sample docs, solr logs, relevant > >>> information, I can give it a thorough look. > >>> > >>> Amrit Sarkar > >>> Search Engineer > >>> Lucidworks, Inc. > >>> 415-589-9269 > >>> www.lucidworks.com > >>> Twitter http://twitter.com/lucidworks > >&g
Re: CDCR performance issues
Hey Tom, I'm also having issue with replicas in the target data center. It will go > from recovering to down. And when one of my replicas go to down in the > target data center, CDCR will no longer send updates from the source to > the target. Are you able to figure out the issue? As long as the leaders of each shard in each collection is up and serving, CDCR shouldn't stop. Sometimes we have to reindex a large chunk of our index (1M+ documents). > What's the best way to handle this if the normal CDCR process won't be > able to keep up? Manually trigger a bootstrap again? Or is there something > else we can do? > That's one of the limitations of CDCR, it cannot handle bulk indexing, preferable way to do is * stop cdcr * bulk index * issue manual BOOTSTRAP (it is independent of stop and start cdcr) * start cdcr 1. Is it accurate that updates are not actually batched in transit from the > source to the target and instead each document is posted separately? The batchsize and schedule regulate how many docs are sent across target. This has more details: https://lucene.apache.org/solr/guide/7_2/cdcr-config.html#the-replicator-element Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 Medium: https://medium.com/@sarkaramrit2 On Tue, Mar 13, 2018 at 12:21 AM, Tom Peters wrote: > I'm also having issue with replicas in the target data center. It will go > from recovering to down. And when one of my replicas go to down in the > target data center, CDCR will no longer send updates from the source to the > target. > > > On Mar 12, 2018, at 9:24 AM, Tom Peters wrote: > > > > Anyone have any thoughts on the questions I raised? > > > > I have another question related to CDCR: > > Sometimes we have to reindex a large chunk of our index (1M+ documents). > What's the best way to handle this if the normal CDCR process won't be able > to keep up? Manually trigger a bootstrap again? Or is there something else > we can do? > > > > Thanks. > > > > > > > >> On Mar 9, 2018, at 3:59 PM, Tom Peters wrote: > >> > >> Thanks. This was helpful. I did some tcpdumps and I'm noticing that the > requests to the target data center are not batched in any way. Each update > comes in as an independent update. Some follow-up questions: > >> > >> 1. Is it accurate that updates are not actually batched in transit from > the source to the target and instead each document is posted separately? > >> > >> 2. Are they done synchronously? I assume yes (since you wouldn't want > operations applied out of order) > >> > >> 3. If they are done synchronously, and are not batched in any way, does > that mean that the best performance I can expect would be roughly how long > it takes to round-trip a single document? ie. If my average ping is 25ms, > then I can expect a peak performance of roughly 40 ops/s. > >> > >> Thanks > >> > >> > >> > >>> On Mar 9, 2018, at 11:21 AM, Davis, Daniel (NIH/NLM) [C] < > daniel.da...@nih.gov> wrote: > >>> > >>> These are general guidelines, I've done loads of networking, but may > be less familiar with SolrCloud and CDCR architecture. However, I know > it's all TCP sockets, so general guidelines do apply. > >>> > >>> Check the round-trip time between the data centers using ping or TCP > ping. Throughput tests may be high, but if Solr has to wait for a > response to a request before sending the next action, then just like any > network protocol that does that, it will get slow. > >>> > >>> I'm pretty sure CDCR uses HTTP/HTTPS rather than just TCP, so also > check whether some proxy/load balancer between data centers is causing it > to be a single connection per operation. That will *kill* performance. > Some proxies default to HTTP/1.0 (open, send request, server send > response, close), and that will hurt. > >>> > >>> Why you should listen to me even without SolrCloud knowledge - > checkout paper "Latency performance of SOAP Implementations". Same > distribution of skills - I knew TCP well, but Apache Axis 1.1 not so well. > I still improved response time of Apache Axis 1.1 by 250ms per call with > 1-line of code. > >>> > >>> -Original Message- > >>> From: Tom Peters [mailto:tpet...@synacor.com] > >>> Sent: Wednesday, March 7, 2018 6:19 PM > >>> To: solr-user@lucene.apache.org > >>> Subject: CDCR performance issues > &
Re: CDCR performance issues
Susheel, That is the correct behavior, "commit" operation is not propagated to target and the documents will be visible in the target as per commit strategy devised there. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 Medium: https://medium.com/@sarkaramrit2 On Fri, Mar 23, 2018 at 6:02 PM, Susheel Kumar wrote: > Just a simple check, if you go to source solr and index single document > from Documents tab, then keep querying target solr for the same document. > How long does it take the document to appear in target data center. In our > case, I can see document show up in target within 30 sec which is our soft > commit time. > > Thanks, > Susheel > > On Fri, Mar 23, 2018 at 8:16 AM, Amrit Sarkar > wrote: > > > Hey Tom, > > > > I'm also having issue with replicas in the target data center. It will go > > > from recovering to down. And when one of my replicas go to down in the > > > target data center, CDCR will no longer send updates from the source to > > > the target. > > > > > > Are you able to figure out the issue? As long as the leaders of each > shard > > in each collection is up and serving, CDCR shouldn't stop. > > > > Sometimes we have to reindex a large chunk of our index (1M+ documents). > > > What's the best way to handle this if the normal CDCR process won't be > > > able to keep up? Manually trigger a bootstrap again? Or is there > > something > > > else we can do? > > > > > > > That's one of the limitations of CDCR, it cannot handle bulk indexing, > > preferable way to do is > > * stop cdcr > > * bulk index > > * issue manual BOOTSTRAP (it is independent of stop and start cdcr) > > * start cdcr > > > > 1. Is it accurate that updates are not actually batched in transit from > the > > > source to the target and instead each document is posted separately? > > > > > > The batchsize and schedule regulate how many docs are sent across target. > > This has more details: > > https://lucene.apache.org/solr/guide/7_2/cdcr-config. > > html#the-replicator-element > > > > > > > > > > Amrit Sarkar > > Search Engineer > > Lucidworks, Inc. > > 415-589-9269 > > www.lucidworks.com > > Twitter http://twitter.com/lucidworks > > LinkedIn: https://www.linkedin.com/in/sarkaramrit2 > > Medium: https://medium.com/@sarkaramrit2 > > > > On Tue, Mar 13, 2018 at 12:21 AM, Tom Peters > wrote: > > > > > I'm also having issue with replicas in the target data center. It will > go > > > from recovering to down. And when one of my replicas go to down in the > > > target data center, CDCR will no longer send updates from the source to > > the > > > target. > > > > > > > On Mar 12, 2018, at 9:24 AM, Tom Peters wrote: > > > > > > > > Anyone have any thoughts on the questions I raised? > > > > > > > > I have another question related to CDCR: > > > > Sometimes we have to reindex a large chunk of our index (1M+ > > documents). > > > What's the best way to handle this if the normal CDCR process won't be > > able > > > to keep up? Manually trigger a bootstrap again? Or is there something > > else > > > we can do? > > > > > > > > Thanks. > > > > > > > > > > > > > > > >> On Mar 9, 2018, at 3:59 PM, Tom Peters wrote: > > > >> > > > >> Thanks. This was helpful. I did some tcpdumps and I'm noticing that > > the > > > requests to the target data center are not batched in any way. Each > > update > > > comes in as an independent update. Some follow-up questions: > > > >> > > > >> 1. Is it accurate that updates are not actually batched in transit > > from > > > the source to the target and instead each document is posted > separately? > > > >> > > > >> 2. Are they done synchronously? I assume yes (since you wouldn't > want > > > operations applied out of order) > > > >> > > > >> 3. If they are done synchronously, and are not batched in any way, > > does > > > that mean that the best performance I can expect would be roughly how > > long > > > it takes to round-trip a single document? ie. If my average ping is > 25ms, > > > then I can expect a
Re: solrcloud Auto-commit doesn't seem reliable
Elaino, When you say commits not working, the solr logs not printing "commit" messages? or documents are not appearing when we search. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 Medium: https://medium.com/@sarkaramrit2 On Thu, Mar 22, 2018 at 4:05 AM, Elaine Cario wrote: > I'm just catching up on reading solr emails, so forgive me for being late > to this dance > > I've just gone through a project to enable CDCR on our Solr, and I also > experienced a small period of time where the commits on the source server > just seemed to stop. This was during a period of intense experimentation > where I was mucking around with configurations, turning CDCR on/off, etc. > At some point the commits stopped occurring, and it drove me nuts for a > couple of days - tried everything - restarting Solr, reloading, turned > buffering on, turned buffering off, etc. I finally threw up my hands and > rebooted the server out of desperation (it was a physical Linux box). > Commits worked fine after that. I don't know what caused the commits to > stop, and why re-booting (and not just restarting Solr) caused them to work > fine. > > Wondering if you ever found a solution to your situation? > > > > On Fri, Feb 16, 2018 at 2:44 PM, Webster Homer > wrote: > > > I meant to get back to this sooner. > > > > When I say I issued a commit I do issue it as > collection/update?commit=true > > > > The soft commit interval is set to 3000, but I don't have a problem with > > soft commits ( I think). I was responding > > > > I am concerned that some hard commits don't seem to happen, but I think > > many commits do occur. I'd like suggestions on how to diagnose this, and > > perhaps an idea of where to look. Typically I believe that issues like > this > > are from our configuration. > > > > Our indexing job is pretty simple, we send blocks of JSON to > > /update/json. We have either re-index the whole collection, > or > > just apply updates. Typically we reindex the data once a week and delete > > any records that are older than the last full index. This does lead to a > > fair number of deleted records in the index especially if commits fail. > > Most of our collections are not large between 2 and 3 million records. > > > > The collections are hosted in google cloud > > > > On Mon, Feb 12, 2018 at 5:00 PM, Erick Erickson > > > wrote: > > > > > bq: But if 3 seconds is aggressive what would be a good value for soft > > > commit? > > > > > > The usual answer is "as long as you can stand". All top-level caches > are > > > invalidated, autowarming is done etc. on each soft commit. That can be > a > > > lot of > > > work and if your users are comfortable with docs not showing up for, > > > say, 10 minutes > > > then use 10 minutes. As always "it depends" here, the point is not to > > > do unnecessary > > > work if possible. > > > > > > bq: If a commit doesn't happen how would there ever be an index merge > > > that would remove the deleted documents. > > > > > > Right, it wouldn't. It's a little more subtle than that though. > > > Segments on various > > > replicas will contain different docs, thus the term/doc statistics can > be > > > a bit > > > different between multiple replicas. None of the stats will change > > > until the commit > > > though. You might try turning no distributed doc/term stats though. > > > > > > Your comments about PULL or TLOG replicas are well taken. However, even > > > those > > > won't be absolutely in sync since they'll replicate from the master at > > > slightly > > > different times and _could_ get slightly different segments _if_ > > > there's indexing > > > going on. But let's say you stop indexing. After the next poll > > > interval all the replicas > > > will have identical characteristics and will score the docs the same. > > > > > > I don't have any signifiant wisdom to offer here, except this is really > > the > > > first time I've heard of this behavior. About all I can imagine is > > > that _somehow_ > > > the soft commit interval is -1. When you say you "issue a commit" I'm > > > assuming > > > it's via collection/update?commit=true or some such which issues a >
Re: Does CDCR Bootstrap sync leaves replica's out of sync
Hi Susheel, Pretty sure you are talking about this: https://issues.apache.org/jira/browse/SOLR-11724 Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 Medium: https://medium.com/@sarkaramrit2 On Mon, Apr 16, 2018 at 11:35 PM, Susheel Kumar wrote: > Does anybody know about known issue where CDCR bootstrap sync leaves the > replica's on target cluster non touched/out of sync. > > After I stopped and restart CDCR, it builds my target leaders index but > replica's on target cluster still showing old index / not modified. > > > Thnx >
Re: Weird transaction log behavior with CDCR
Chris, After disabling the buffer on source, kind shut down all the nodes of source cluster first and then start them again. The tlogs will be removed accordingly. BTW CDCR doesn't abide by 100 numRecordsToKeep or 10 numTlogs. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 Medium: https://medium.com/@sarkaramrit2 On Tue, Apr 17, 2018 at 8:58 PM, Susheel Kumar wrote: > DISABLEBUFFER on source cluster would solve this problem. > > On Tue, Apr 17, 2018 at 9:29 AM, Chris Troullis > wrote: > > > Hi, > > > > We are attempting to use CDCR with solr 7.2.1 and are experiencing odd > > behavior with transaction logs. My understanding is that by default, solr > > will keep a maximum of 10 tlog files or 100 records in the tlogs. I > assume > > that with CDCR, the records will not be removed from the tlogs until it > has > > been confirmed that they have been replicated to the other cluster. > > However, even when replication has finished and the CDCR queue sizes are > 0, > > we are still seeing large numbers (50+) and large sizes (over a GB) of > > tlogs sitting on the nodes. > > > > We are hard committing once per minute. > > > > Doing a lot of reading on the mailing list, I see that a lot of people > were > > pointing to buffering being enabled as the cause for some of these > > transaction log issues. However, we have disabled buffering on both the > > source and target clusters, and are still seeing the issues. > > > > Also, while some of our indexes replicate very rapidly (millions of > > documents in minutes), other smaller indexes are crawling. If we restart > > CDCR on the nodes then it finishes almost instantly. > > > > Any thoughts on these behaviors? > > > > Thanks, > > > > Chris > > >
Re: CdcrReplicator Forwarder not working on some shards
Susheel, At the time of core reload, logs must be complaining or atleast pointing to some direction. Each leader of shard is responsible to spawn a threadpool for cdcr replicator to get the data over. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 Medium: https://medium.com/@sarkaramrit2 On Tue, Apr 17, 2018 at 9:04 PM, Susheel Kumar wrote: > Hi, > > Has anyone gone thru this issue where few shard leaders are forwarding > updates to their counterpart leaders in target cluster while some of the > shards leaders are not forwarding the updates. > > on Solr 6.6, 4 of the shards logs I see below entries and their > counterpart in target are getting updated but for other 4 shards I don't > below entries and neither being replicated to target. > > Any suggestion on how / what can be done to start cdcr-replicator threads > on other shards? > > 2018-04-17 15:26:38.394 INFO > (cdcr-replicator-24-thread-6-processing-n:dc2prsrcvap0049. > whc.dc02.us.adp:8080_solr) > [ ] o.a.s.h.CdcrReplicator Forwarded 0 updates to target COLL > 2018-04-17 15:26:39.394 INFO > (cdcr-replicator-24-thread-7-processing-n:dc2prsrcvap0049. > whc.dc02.us.adp:8080_solr) > [ ] o.a.s.h.CdcrReplicator Forwarded 0 updates to target COLL > > Thanks > Susheel >
Re: Weird transaction log behavior with CDCR
Chris, Try to index few dummy documents and analyse if the tlogs are getting cleared or not. Ideally on the restart, it clears everything and keeps max 2 tlog per data folder. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 Medium: https://medium.com/@sarkaramrit2 On Tue, Apr 17, 2018 at 11:52 PM, Chris Troullis wrote: > Hi Amrit, thanks for the reply. > > I shut down all of the nodes on the source cluster after the buffer was > disabled, and there was no change to the tlogs. > > On Tue, Apr 17, 2018 at 12:20 PM, Amrit Sarkar > wrote: > > > Chris, > > > > After disabling the buffer on source, kind shut down all the nodes of > > source cluster first and then start them again. The tlogs will be removed > > accordingly. BTW CDCR doesn't abide by 100 numRecordsToKeep or 10 > numTlogs. > > > > Amrit Sarkar > > Search Engineer > > Lucidworks, Inc. > > 415-589-9269 > > www.lucidworks.com > > Twitter http://twitter.com/lucidworks > > LinkedIn: https://www.linkedin.com/in/sarkaramrit2 > > Medium: https://medium.com/@sarkaramrit2 > > > > On Tue, Apr 17, 2018 at 8:58 PM, Susheel Kumar > > wrote: > > > > > DISABLEBUFFER on source cluster would solve this problem. > > > > > > On Tue, Apr 17, 2018 at 9:29 AM, Chris Troullis > > > wrote: > > > > > > > Hi, > > > > > > > > We are attempting to use CDCR with solr 7.2.1 and are experiencing > odd > > > > behavior with transaction logs. My understanding is that by default, > > solr > > > > will keep a maximum of 10 tlog files or 100 records in the tlogs. I > > > assume > > > > that with CDCR, the records will not be removed from the tlogs until > it > > > has > > > > been confirmed that they have been replicated to the other cluster. > > > > However, even when replication has finished and the CDCR queue sizes > > are > > > 0, > > > > we are still seeing large numbers (50+) and large sizes (over a GB) > of > > > > tlogs sitting on the nodes. > > > > > > > > We are hard committing once per minute. > > > > > > > > Doing a lot of reading on the mailing list, I see that a lot of > people > > > were > > > > pointing to buffering being enabled as the cause for some of these > > > > transaction log issues. However, we have disabled buffering on both > the > > > > source and target clusters, and are still seeing the issues. > > > > > > > > Also, while some of our indexes replicate very rapidly (millions of > > > > documents in minutes), other smaller indexes are crawling. If we > > restart > > > > CDCR on the nodes then it finishes almost instantly. > > > > > > > > Any thoughts on these behaviors? > > > > > > > > Thanks, > > > > > > > > Chris > > > > > > > > > >
Re: CDCR broken for Mixed Replica Collections
Webster, I have patch uploaded to both Cdcr supporting Tlog: https://issues.apache.org/jira/browse/SOLR-12057 and core not getting failed while initializing for Pull type replicas: https://issues.apache.org/jira/browse/SOLR-12071 and awaiting feedback from open source community. The solution for pull type replicas can be designed better, apart from that, if this is urgent need for you, please apply the patches for your packages and probably give a shot. I will added extensive tests for both the use-cases. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 Medium: https://medium.com/@sarkaramrit2 On Thu, Apr 26, 2018 at 2:46 AM, Erick Erickson wrote: > CDCR won't really ever make sense for PULL replicas since the PULL > replicas have no tlog and don't do any indexing and can't ever become > a leader seamlessly. > > As for plans to address TLOG replicas, patches are welcome if you have > a need. That's really how open source works, people add functionality > as they have use-cases they need to support and contribute them back. > So far this isn't a high-demand topic. > > Best, > Erick > > On Wed, Apr 25, 2018 at 8:03 AM, Webster Homer > wrote: > > I was looking at SOLR-12057 > > > > According to the comment on the ticket, CDCR can not work when a > collection > > has PULL Replicas. That seems like a MAJOR limitation to CDCR and PULL > > Replicas. Is this likely to be addressed in the future? > > CDCR currently is broken for TLOG replicas too. > > > > https://issues.apache.org/jira/browse/SOLR-12057? > focusedCommentId=16391558&page=com.atlassian.jira. > plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16391558 > > > > Thanks > > > > -- > > > > > > This message and any attachment are confidential and may be > > privileged or > > otherwise protected from disclosure. If you are not the intended > > recipient, > > you must not copy this message or attachment or disclose the > > contents to > > any other person. If you have received this transmission in error, > > please > > notify the sender immediately and delete the message and any attachment > > > > from your system. Merck KGaA, Darmstadt, Germany and any of its > > subsidiaries do > > not accept liability for any omissions or errors in this > > message which may > > arise as a result of E-Mail-transmission or for damages > > resulting from any > > unauthorized changes of the content of this message and > > any attachment thereto. > > Merck KGaA, Darmstadt, Germany and any of its > > subsidiaries do not guarantee > > that this message is free of viruses and does > > not accept liability for any > > damages caused by any virus transmitted > > therewith. > > > > > > > > Click http://www.emdgroup.com/disclaimer > > <http://www.emdgroup.com/disclaimer> to access the > > German, French, Spanish > > and Portuguese versions of this disclaimer. >
Re: CDCR broken for Mixed Replica Collections
Pardon, * I have added extensive tests for both the use-cases. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 Medium: https://medium.com/@sarkaramrit2 On Thu, Apr 26, 2018 at 3:50 AM, Amrit Sarkar wrote: > Webster, > > I have patch uploaded to both Cdcr supporting Tlog: https://issues.apache. > org/jira/browse/SOLR-12057 and core not getting failed while initializing > for Pull type replicas: https://issues.apache.org/jira/browse/SOLR-12071 > and awaiting feedback from open source community. The solution for pull > type replicas can be designed better, apart from that, if this is urgent > need for you, please apply the patches for your packages and probably give > a shot. I will added extensive tests for both the use-cases. > > Amrit Sarkar > Search Engineer > Lucidworks, Inc. > 415-589-9269 > www.lucidworks.com > Twitter http://twitter.com/lucidworks > LinkedIn: https://www.linkedin.com/in/sarkaramrit2 > Medium: https://medium.com/@sarkaramrit2 > > On Thu, Apr 26, 2018 at 2:46 AM, Erick Erickson > wrote: > >> CDCR won't really ever make sense for PULL replicas since the PULL >> replicas have no tlog and don't do any indexing and can't ever become >> a leader seamlessly. >> >> As for plans to address TLOG replicas, patches are welcome if you have >> a need. That's really how open source works, people add functionality >> as they have use-cases they need to support and contribute them back. >> So far this isn't a high-demand topic. >> >> Best, >> Erick >> >> On Wed, Apr 25, 2018 at 8:03 AM, Webster Homer >> wrote: >> > I was looking at SOLR-12057 >> > >> > According to the comment on the ticket, CDCR can not work when a >> collection >> > has PULL Replicas. That seems like a MAJOR limitation to CDCR and PULL >> > Replicas. Is this likely to be addressed in the future? >> > CDCR currently is broken for TLOG replicas too. >> > >> > https://issues.apache.org/jira/browse/SOLR-12057?focusedComm >> entId=16391558&page=com.atlassian.jira.plugin.system. >> issuetabpanels%3Acomment-tabpanel#comment-16391558 >> > >> > Thanks >> > >> > -- >> > >> > >> > This message and any attachment are confidential and may be >> > privileged or >> > otherwise protected from disclosure. If you are not the intended >> > recipient, >> > you must not copy this message or attachment or disclose the >> > contents to >> > any other person. If you have received this transmission in error, >> > please >> > notify the sender immediately and delete the message and any attachment >> > >> > from your system. Merck KGaA, Darmstadt, Germany and any of its >> > subsidiaries do >> > not accept liability for any omissions or errors in this >> > message which may >> > arise as a result of E-Mail-transmission or for damages >> > resulting from any >> > unauthorized changes of the content of this message and >> > any attachment thereto. >> > Merck KGaA, Darmstadt, Germany and any of its >> > subsidiaries do not guarantee >> > that this message is free of viruses and does >> > not accept liability for any >> > damages caused by any virus transmitted >> > therewith. >> > >> > >> > >> > Click http://www.emdgroup.com/disclaimer >> > <http://www.emdgroup.com/disclaimer> to access the >> > German, French, Spanish >> > and Portuguese versions of this disclaimer. >> > >
Re: CDCR traffic
Hi Rajeswari, No it is not. Source forwards the update to the Target in classic manner. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 Medium: https://medium.com/@sarkaramrit2 On Fri, Jun 22, 2018 at 11:38 PM, Natarajan, Rajeswari < rajeswari.natara...@sap.com> wrote: > Hi, > > Would like to know , if the CDCR traffic is encrypted. > > Thanks > Ra >
Re: tlogs not deleting
Brian, If you are still facing the issue after disabling buffer, kindly shut down all the nodes at source and then start them again, stale tlogs will start purging themselves. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 Medium: https://medium.com/@sarkaramrit2 On Wed, Jun 20, 2018 at 8:15 PM, Susheel Kumar wrote: > Not in my knowledge. Please double check or wait for some time but after > DISABLEBUFFER on source, your logs should start rolling and its the exact > same issue I have faced with 6.6 which you resolve by DISABLEBUFFER. > > On Tue, Jun 19, 2018 at 1:39 PM, Brian Yee wrote: > > > Does anyone have any additional possible causes for this issue? I checked > > the buffer status using "/cdcr?action=STATUS" and it says buffer disabled > > at both target and source. > > > > -Original Message- > > From: Erick Erickson [mailto:erickerick...@gmail.com] > > Sent: Tuesday, June 19, 2018 11:55 AM > > To: solr-user > > Subject: Re: tlogs not deleting > > > > bq. Do you recommend disabling the buffer on the source SolrCloud as > well? > > > > Disable them all on both source and target IMO. > > > > On Tue, Jun 19, 2018 at 8:50 AM, Brian Yee wrote: > > > Thank you Erick. I am running Solr 6.6. From the documentation: > > > "Replicas do not need to buffer updates, and it is recommended to > > disable buffer on the target SolrCloud." > > > > > > Do you recommend disabling the buffer on the source SolrCloud as well? > > It looks like I already have the buffer disabled at target locations but > > not the source location. Would it even make sense at the source location? > > > > > > This is what I have at the target locations: > > > > > > > > > 100 > > > > > > > > > disabled > > > > > > > > > > > > > > > -Original Message- > > > From: Erick Erickson [mailto:erickerick...@gmail.com] > > > Sent: Tuesday, June 19, 2018 11:00 AM > > > To: solr-user > > > Subject: Re: tlogs not deleting > > > > > > Take a look at the CDCR section of your reference guide, be sure you > get > > the version which you can download from here: > > > https://archive.apache.org/dist/lucene/solr/ref-guide/ > > > > > > There's the CDCR API call you can use for in-flight disabling, and > > depending on the version of Solr you can set it in solrconfig. > > > > > > Basically, buffering was there in the original CDCR to allow a larger > > maintenance window, you could enable buffering and all updates were saved > > until you disabled it, during which period you could do whatever you > needed > > with your target cluster and not lose any updates. > > > > > > Later versions can do the full sync of the index and buffering is being > > removed. > > > > > > Best, > > > Erick > > > > > > On Tue, Jun 19, 2018 at 7:31 AM, Brian Yee wrote: > > >> Thanks for the suggestion. Can you please elaborate a little bit about > > what DISABLEBUFFER does? The documentation is not very detailed. Is this > > something that needs to be done manually whenever this problem happens or > > is it something that we can do to fix it so it won't happen again? > > >> > > >> -Original Message- > > >> From: Susheel Kumar [mailto:susheel2...@gmail.com] > > >> Sent: Monday, June 18, 2018 9:12 PM > > >> To: solr-user@lucene.apache.org > > >> Subject: Re: tlogs not deleting > > >> > > >> You may have to DISABLEBUFFER in source to get rid of tlogs. > > >> > > >> On Mon, Jun 18, 2018 at 6:13 PM, Brian Yee wrote: > > >> > > >>> So I've read a bunch of stuff on hard/soft commits and tlogs. As I > > >>> understand, after a hard commit, solr is supposed to delete old > > >>> tlogs depending on the numRecordsToKeep and maxNumLogsToKeep values > > >>> in the autocommit settings in solrconfig.xml. I am occasionally > > >>> seeing solr fail to do this and the tlogs just build up over time > > >>> and eventually we run out of disk space on the VM and this causes > > problems for us. > > >>> This does not happen all the time, only sometimes. I currently have > > >>> a tlog directory that has 123G worth of tlogs. The last hard commit > > >>> on this node was 10 minutes ago but these tlogs date back to 3 days > > ago. > > >>> > > >>> We have sometimes found that restarting solr on the node will get it > > >>> to clean up the old tlogs, but we really want to find the root cause > > >>> and fix it if possible so we don't keep getting disk space alerts > > >>> and have to adhoc restart nodes. Has anyone seen an issue like this > > before? > > >>> > > >>> My update handler settings look like this: > > >>> > > >>> > > >>> > > >>> > > >>> ${solr.ulog.dir:} > > >>> ${solr.ulog.numVersionBuckets: > > >>> 65536} > > >>> > > >>> > > >>> 60 > > >>> 25 > > >>> false > > >>> > > >>> > > >>> 12 > > >>> > > >>> > > >>> > > >>> 100 > > >>> > > >>> > > >>> > > >>> > > >
Re: CDCR Custom Document Routing
Jay, Can you sample delete command you are firing at the source to understand the issue with Cdcr. On Tue, 3 Jul 2018, 4:22 am Jay Potharaju, wrote: > Hi > The current cdcr setup does not work if my collection uses implicit > routing. > In my testing i found that adding documents works without any problems. It > doesn't seem to work correctly when deleting documents. > Is there an alternative to cdcr that would work in cross data center > scenario. > > Setup: > 8 shards : 2 on each node > Solr:6.6.4 > > Thanks > Jay Potharaju >
Re: CDCR traffic
Hi, In the case of CDCR, assuming both the source and target clusters are SSL > enabled, can we say that the source clusters’ shard leaders act as clients > to the target cluster and hence the data is encrypted while its transmitted > between the clusters? Yes, that is correct. SSL and Kerberized cluster will have the payload/updates encrypted. Thank you for pointing it out. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 Medium: https://medium.com/@sarkaramrit2 On Mon, Jul 9, 2018 at 3:50 PM, Greenhorn Techie wrote: > Amrit, > > Further to the below conversation: > > As I understand, Solr supports SSL encryption between nodes within a Solr > cluster and as well communications to and from clients. In the case of > CDCR, assuming both the source and target clusters are SSL enabled, can we > say that the source clusters’ shard leaders act as clients to the target > cluster and hence the data is encrypted while its transmitted between the > clusters? > > Thanks > > > On 25 June 2018 at 15:56:07, Amrit Sarkar (sarkaramr...@gmail.com) wrote: > > Hi Rajeswari, > > No it is not. Source forwards the update to the Target in classic manner. > > Amrit Sarkar > Search Engineer > Lucidworks, Inc. > 415-589-9269 > www.lucidworks.com > Twitter http://twitter.com/lucidworks > LinkedIn: https://www.linkedin.com/in/sarkaramrit2 > Medium: https://medium.com/@sarkaramrit2 > > On Fri, Jun 22, 2018 at 11:38 PM, Natarajan, Rajeswari < > rajeswari.natara...@sap.com> wrote: > > > Hi, > > > > Would like to know , if the CDCR traffic is encrypted. > > > > Thanks > > Ra > > > >
Anthill Inside and The Fifth Elephant Bengaluru India 2018 Edition
*Anthill Inside and The Fifth Elephant -- HasGeek’s marquee annual conferences -- bring together business decision makers, data engineers, architects, data scientists and product managers – to understand nuances of managing and leveraging data. And what’s more, Solr community members can avail a 10% discount on the conference tickets by visiting these links!Anthill Inside: https://anthillinside.in/2018/?code=SG65IC <https://anthillinside.in/2018/?code=SG65IC> The Fifth Elephant: https://fifthelephant.in/2018/?code=SG65IC <https://fifthelephant.in/2018/?code=SG65IC>Both conferences have been produced by the community, for the community. They cover the theoretical and practical applications of machine learning, deep learning and artificial intelligence, and data collection and other implementation steps towards building these systems.Anthill InsideAnthill Inside stitches the gap between research and industry, bringing in speakers from both the worlds in equal representation. Engage in nuanced, open discussions on topics like privacy and ethics in AI to breaking down components of real world systems into hubs-and-spokes. Hear about organizational issues like what machine learning can and cannot do for your organization to deeper technical issues like how to build classification systems in absence of large datasets.Anthill Inside: 25 July Registration link with 10% discount on conference: https://anthillinside.in/2018/?code=SG65IC <https://anthillinside.in/2018/?code=SG65IC>The Fifth ElephantApplications of techniques and uses of data to build product features is the primary flavour of The Fifth Elephant 2018. The wide variety of topics include:1. Designing systems for data (hint: it’s not only about the algorithms)a. How poor design can lower data quality, which in turn will compromise the entire project: a case study on AADHAAR b. How any data, even as meek as electoral, can be weaponized against the users (voters in this case) 2. Privacy issues with dataa. The right to be forgotten: problems with data systems b. The right to privacy vs the right to information: way forward 3. Handling super large scale data systems 4. Data visualization at scale, like at Uber for self-driving carsAlong with talks at the venue, there are open discussions on privacy and open data, and workshops on Amazon SageMaker (26 July) and recommendations using TensorFlow (27 July).The Fifth Elephant: 26 and 27 July Registration link with 10% discount on conference: https://fifthelephant.in/2018/?code=SG65IC <https://fifthelephant.in/2018/?code=SG65IC>For more details about any of these, write to i...@hasgeek.com or call 7676332020.* Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 Medium: https://medium.com/@sarkaramrit2
Re: SolrCloud CDCR issue
To the concerned, WARN : [c:collection_name s:shard2 r:core_node11 > x:collection_name_shard2_replica_n8] > org.apache.solr.handler.CdcrRequestHandler; The log reader for target > collection collection_name is not initialised @ collection_name:shard2 > This means the source cluster was started first and then target. You need to shut down all the nodes both at source and target. Get the targe nodes up, all of them before starting the source ones. Logs will be initialized positively. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 Medium: https://medium.com/@sarkaramrit2 On Fri, Aug 3, 2018 at 11:33 PM cdatta wrote: > Any pointers? > > > > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html >