RE: Time-out errors while indexing (Solr 7.7.1)
Hi Eric, Toke, Can you please look at the details shared in my trail email & respond with your suggestions/feedback? Thanks & Regards, Vinodh From: Kommu, Vinodh K. Sent: Monday, July 6, 2020 4:58 PM To: solr-user@lucene.apache.org Subject: RE: Time-out errors while indexing (Solr 7.7.1) Thanks Eric & Toke for your response over this. Just wanted to correct few things here about number of docs: Total number of documents exists in the entire cluster (all collections) = 6393876826 (6.3B) Total number of documents exists on 2 bigger collections (3749389864 & 1780147848) = 5529537712 (5.5B) Total number of documents exists on remaining collections = 864339114 (864M) So all collections docs altogether do not have 13B. If you see above numbers, the biggest collection in the cluster holds close to 3.7B docs and second biggest collection holds upto 1.7B docs whereas remaining 20 collections in the cluster holds 864M docs only which gives the total docs in the cluster is 6.3B docs On hardware side, cluster sits on 6 solr VMs, each VMs has 170G total memory (with 2 solr instances running per VM), 16 vCPUs and each solr JVM runs with 31G heap. Remaining memory is allocated to OS disk cache and other OS related operations. Vm.swapiness on each VM is set to 0 so swap memory will be never used. Each collection is created using rule based replica placement API with 6 shards and replicas factor as 3. One other observation with collections cores placement, as mentioned above we create collections using rule based replica placement i.e. rule to ensure no same shard’s replica should sit on same VM using following command. curl -s -k user:password "https://localhost:22010/solr/admin/collections?action=CREATE&name=$SOLR_COLLECTION&numShards=${SHARDS_NO?}&replicationFactor=${REPLICATION_FACTOR?}&maxShardsPerNode=${MAX_SHARDS_PER_NODE?}&collection.configName=$SOLR_COLLECTION&rule=shard:*,replica:<2,host:*" Variable values in above command: SOLR_COLLECTION = collection name SHARDS_NO = 6 REPLICATION_FACTOR = 3 MAX_SHARDS_PER_NODE = (a math logic will work based on number of solr VMs, number of nodes per VM and total number of replicas i.e total number of replicas / number of VMs. Here in this cluster the number would be 18/6 = 3 max shards per machine) Ideally it is supposed to create 3 cores per VM for each collection based on rule based replica placement but from below snippet, there were 2, 3 & 4 cores for each collection are placed differently on each VMs. So apparently VM2 and VM6 have more cores than other VMs so I presume this could be one of the reason to see more IO operations than remaining 4 VMs. That said, I believe solr does this replica placement considering other factors like free disk on each VM etc while creating a new collection correct? If so, is this replica placement across the VMs are fine? If not, what's needed to correct this? Can an additional core with 210G size can create more disk IO operations? If yes, can move the additional core from these VMs to other VM where the cores are less make any difference? (like ensuring each VM has only max of 3 shards) Also we have been noticing significant surge in IO operations at storage level too. Wondering to understand if storage has IOPS limit could make solr crave for IO operations or other way around which is solr make more read write operations leading storage IOPS to reach its higher limit? VM1: 176G node1/solr/Collection2_shard5_replica_n30 176G node2/solr/Collection2_shard2_replica_n24 176G node2/solr/Collection2_shard3_replica_n2 177G node1/solr/Collection2_shard6_replica_n10 208G node1/solr/Collection1_shard5_replica_n18 208G node2/solr/Collection1_shard2_replica_n1 1.1T total VM2: 176G node2/solr/Collection2_shard4_replica_n16 176G node2/solr/Collection2_shard6_replica_n34 177G node1/solr/Collection2_shard5_replica_n6 207G node2/solr/Collection1_shard6_replica_n10 208G node1/solr/Collection1_shard1_replica_n32 208G node2/solr/Collection1_shard5_replica_n30 210G node1/solr/Collection1_shard3_replica_n14 1.4T total VM3: 175G node2/solr/Collection2_shard2_replica_n12 177G node1/solr/Collection2_shard1_replica_n20 208G node1/solr/Collection1_shard1_replica_n8 208G node2/solr/Collection1_shard2_replica_n12 209G node1/solr/Collection1_shard4_replica_n28 976G total VM4: 176G node1/solr/Collection2_shard4_replica_n28 177G node1/solr/Collection2_shard1_replica_n8 207G node2/solr/Collection1_shard6_replica_n22 208G node1/solr/Collection1_shard5_replica_n6 210G node1/solr/Collection1_shard3_replica_n26 975G total VM5: 176G node2/solr/Collection2_shard3_replica_n14 177G node1/solr/Collection2_shard5_replica_n18 177G node2/solr/Collection2_shard1_replica_n32 208G node1/solr/Collection1_shard2_replica_n24 210G node1/solr/Collection1_shard
Re: Replica goes into recovery mode in Solr 6.1.0
Any one is looking my issue? Please guide me. Regards, Vishal Patel From: vishal patel Sent: Monday, July 6, 2020 7:11 PM To: solr-user@lucene.apache.org Subject: Replica goes into recovery mode in Solr 6.1.0 I am using Solr version 6.1.0, Java 8 version and G1GC on production. We have 2 shards and each shard has 1 replica. We have 3 collection. We do not use any cache and also disable in Solr config.xml. Search and Update requests are coming frequently in our live platform. *Our commit configuration in solr.config are below 60 2 false ${solr.autoSoftCommit.maxTime:-1} *We used Near Real Time Searching So we did below configuration in solr.in.cmd set SOLR_OPTS=%SOLR_OPTS% -Dsolr.autoSoftCommit.maxTime=100 *Our collections details are below: Collection Shard1 Shard1 Replica Shard2 Shard2 Replica Number of Documents Size(GB)Number of Documents Size(GB) Number of Documents Size(GB)Number of Documents Size(GB) collection1 26913364201 26913379202 26913380 198 26913379198 collection2 13934360310 13934367310 13934368 219 13934367219 collection3 351539689 73.5351540040 73.5351540136 75.2351539722 75.2 *My server configurations are below: Server1 Server2 CPU Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz, 2301 Mhz, 10 Core(s), 20 Logical Processor(s)Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz, 2301 Mhz, 10 Core(s), 20 Logical Processor(s) HardDisk(GB)3845 ( 3.84 TB) 3485 GB (3.48 TB) Total memory(GB)320 320 Shard1 Allocated memory(GB) 55 Shard2 Replica Allocated memory(GB) 55 Shard2 Allocated memory(GB) 55 Shard1 Replica Allocated memory(GB) 55 Other Applications Allocated Memory(GB) 60 22 Other Number Of Applications11 7 Sometimes, any one replica goes into recovery mode. Why replica goes into recovery? Due to heavy search OR heavy update/insert OR long GC pause time? If any one of them then what should we do in configuration? Should we increase the shard for recovery issue? Regards, Vishal Patel
LTR feature computation caching
Hi, I am adding few features to my LTR model which re-uses the same value for different features. For example, I have features that compare different similarities for each document with the input text: "token1 token2 token3 token4" My features are - No of common terms - No of common terms / Term count in document - Term count in document - No of common terms - 4 - No of common terms - Boolean feature : Is no of common terms == 3 As you can see "No of common terms" is recomputed for each feature. Feature cache caches the values per feature and isn't helpful here. Is there any way where "No of common terms" is computed per document only once and can be shared for all features for that document ?
Max number of documents in update request
Hi, Could someone help me with the best way to go about determining the maximum number of docs I can send in a single update call to Solr in a master / slave architecture. Thanks!
Re: Max number of documents in update request
As many as you can send before blowing up. Really, the question is not answerable. 1K docs? 1G docs? 1 field or 500? And I don’t think it’s a good use of time to pursue much. See: https://lucidworks.com/post/really-batch-updates-solr-2/ If you’re looking at trying to maximize throughput, adding client threads that send Solr documents is a better approach. All that said, I usually just pick 1,000 and don’t worry about it. Best, Erick > On Jul 7, 2020, at 8:59 AM, Sidharth Negi wrote: > > Hi, > > Could someone help me with the best way to go about determining the maximum > number of docs I can send in a single update call to Solr in a master / > slave architecture. > > Thanks!
Re: Null pointer exception in QueryComponent.MergeDds method
8.3.1 the field "id" is for nested document. On Mon, Jul 6, 2020 at 4:17 PM Mikhail Khludnev wrote: > Hi, > What's the version? What's uniqueKey? is it stored? what's fl param? > > On Mon, Jul 6, 2020 at 5:12 PM Jae Joo wrote: > > > I am seeing the nullPointerException in the list below and I am > > looking for how to fix the exception. > > > > Thanks, > > > > > > NamedList sortFieldValues = > > (NamedList)(srsp.getSolrResponse().getResponse().get("sort_values")); > > if (sortFieldValues.size()==0 && // we bypass merging this response > > only if it's partial itself > > thisResponseIsPartial) { // but not the previous > one!! > > continue; //fsv timeout yields empty sort_vlaues > > } > > > > > > > > 2020-07-06 12:45:47.001 ERROR (qtp745962066-636182) [c:]] > > o.a.s.h.RequestHandlerBase java.lang.NullPointerException > > at > > > > > org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:914) > > at > > > > > org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:613) > > at > > > > > org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:592) > > at > > > > > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:431) > > at > > > > > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:198) > > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2576) > > at > > org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:799) > > at > org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:578) > > at > > > > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419) > > at > > > > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351) > > at > > > > > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602) > > at > > > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540) > > at > > > > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146) > > at > > > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) > > at > > > > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) > > at > > > > > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257) > > at > > > > > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1711) > > at > > > > > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255) > > > > > -- > Sincerely yours > Mikhail Khludnev >
Re: Max number of documents in update request
Agreed, I do something between 20 and 1000. If the master node is not handling any search traffic, use twice as many client threads as there are CPUs in the node. That should get you close to 100% CPU utilization. One thread will be waiting while a batch is being processed and another thread will be sending the next batch so there is no pause in processing. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jul 7, 2020, at 6:12 AM, Erick Erickson wrote: > > As many as you can send before blowing up. > > Really, the question is not answerable. 1K docs? 1G docs? 1 field or 500? > > And I don’t think it’s a good use of time to pursue much. See: > > https://lucidworks.com/post/really-batch-updates-solr-2/ > > If you’re looking at trying to maximize throughput, adding > client threads that send Solr documents is a better approach. > > All that said, I usually just pick 1,000 and don’t worry about it. > > Best, > Erick > >> On Jul 7, 2020, at 8:59 AM, Sidharth Negi wrote: >> >> Hi, >> >> Could someone help me with the best way to go about determining the maximum >> number of docs I can send in a single update call to Solr in a master / >> slave architecture. >> >> Thanks! >
Re: Replica goes into recovery mode in Solr 6.1.0
This isn’t a support list, so nobody looks at issues. We do try to help. It looks like you have 1 TB of index on a system with 320 GB of RAM. I don’t know what "Shard1 Allocated memory” is, but maybe half of that RAM is used by JVMs or some other process, I guess. Are you running multiple huge JVMs? The servers will be doing a LOT of disk IO, so look at the read and write iops. I expect that the solr processes are blocked on disk reads almost all the time. "-Dsolr.autoSoftCommit.maxTime=100” is way too short (100 ms). That is probably causing your outages. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jul 7, 2020, at 5:18 AM, vishal patel > wrote: > > Any one is looking my issue? Please guide me. > > Regards, > Vishal Patel > > > > From: vishal patel > Sent: Monday, July 6, 2020 7:11 PM > To: solr-user@lucene.apache.org > Subject: Replica goes into recovery mode in Solr 6.1.0 > > I am using Solr version 6.1.0, Java 8 version and G1GC on production. We have > 2 shards and each shard has 1 replica. We have 3 collection. > We do not use any cache and also disable in Solr config.xml. Search and > Update requests are coming frequently in our live platform. > > *Our commit configuration in solr.config are below > > 60 > 2 > false > > > ${solr.autoSoftCommit.maxTime:-1} > > > *We used Near Real Time Searching So we did below configuration in solr.in.cmd > set SOLR_OPTS=%SOLR_OPTS% -Dsolr.autoSoftCommit.maxTime=100 > > *Our collections details are below: > > Collection Shard1 Shard1 Replica Shard2 Shard2 Replica > Number of Documents Size(GB)Number of Documents Size(GB) > Number of Documents Size(GB)Number of Documents Size(GB) > collection1 26913364201 26913379202 26913380 > 198 26913379198 > collection2 13934360310 13934367310 13934368 > 219 13934367219 > collection3 351539689 73.5351540040 73.5351540136 > 75.2351539722 75.2 > > *My server configurations are below: > >Server1 Server2 > CPU Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz, 2301 Mhz, 10 Core(s), 20 > Logical Processor(s)Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz, 2301 > Mhz, 10 Core(s), 20 Logical Processor(s) > HardDisk(GB)3845 ( 3.84 TB) 3485 GB (3.48 TB) > Total memory(GB)320 320 > Shard1 Allocated memory(GB) 55 > Shard2 Replica Allocated memory(GB) 55 > Shard2 Allocated memory(GB) 55 > Shard1 Replica Allocated memory(GB) 55 > Other Applications Allocated Memory(GB) 60 22 > Other Number Of Applications11 7 > > > Sometimes, any one replica goes into recovery mode. Why replica goes into > recovery? Due to heavy search OR heavy update/insert OR long GC pause time? > If any one of them then what should we do in configuration? > Should we increase the shard for recovery issue? > > Regards, > Vishal Patel >
Solr Query
Hi, I have an URL and I want to break this down and run it in the admin console but I am not what is ++ and - represents in the query. select?q=(StartPublish%3a%5b*+TO+-12-31T23%3a59%3a59.999Z%5d++-Content%3a(Birthdays%5c%2fAnniversaries))++-FriendlyUrl%3a(*%2farchive%2f*))++((Title_NGram%3a(swetha))%5e500+OR+(MetaTitle_NGram%3a(swetha))%5e400+OR+(MetaKeywords_NGram%3a(swetha))%5e300+OR+(MetaDescription_NGram%3a(swetha))%5e200+OR+(Content_NGram%3a(swetha))%5e1))++(ACL%3a((Everyone)+OR+(MIDCO410%5c%5cMidco%5c-AllEmployees)+OR+(MIDCO410%5c%5cMidco%5c-DotNetDevelopers)+OR+(MIDCO410%5c%5cMidco%5c-WebAdmins)+OR+(MIDCO410%5c%5cMidco%5c-Source%5c-Admin)&start=0&rows=1&wt=xml&version=2.2 Thank You, Swetha.
Re: replica deleted but directory remains
Hi Erick, I also have issue about deleting collections or replicas but the data is still in directory. It not show in admin UI, but data is still in folder and the disk is not clean. Not observing specific error message, could you please advise any other possible reason to fix this? Regards, Chien -- Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Null pointer exception in QueryComponent.MergeDds method
Still not clear regarding fl param. Does request enabled timeAllowed param? Anyway debugQuery true should give a clue why "sort_values" are absent in shard response, note they should be supplied at QueryComponent.doFieldSortValues(ResponseBuilder, SolrIndexSearcher). On Tue, Jul 7, 2020 at 4:19 PM Jae Joo wrote: > 8.3.1 > > required="true" multiValued="false" docValues="true"/> > required="true" multiValued="false"/> > > the field "id" is for nested document. > > > > > On Mon, Jul 6, 2020 at 4:17 PM Mikhail Khludnev wrote: > > > Hi, > > What's the version? What's uniqueKey? is it stored? what's fl param? > > > > On Mon, Jul 6, 2020 at 5:12 PM Jae Joo wrote: > > > > > I am seeing the nullPointerException in the list below and I am > > > looking for how to fix the exception. > > > > > > Thanks, > > > > > > > > > NamedList sortFieldValues = > > > (NamedList)(srsp.getSolrResponse().getResponse().get("sort_values")); > > > if (sortFieldValues.size()==0 && // we bypass merging this response > > > only if it's partial itself > > > thisResponseIsPartial) { // but not the previous > > one!! > > > continue; //fsv timeout yields empty sort_vlaues > > > } > > > > > > > > > > > > 2020-07-06 12:45:47.001 ERROR (qtp745962066-636182) [c:]] > > > o.a.s.h.RequestHandlerBase java.lang.NullPointerException > > > at > > > > > > > > > org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:914) > > > at > > > > > > > > > org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:613) > > > at > > > > > > > > > org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:592) > > > at > > > > > > > > > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:431) > > > at > > > > > > > > > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:198) > > > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2576) > > > at > > > org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:799) > > > at > > org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:578) > > > at > > > > > > > > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419) > > > at > > > > > > > > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351) > > > at > > > > > > > > > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602) > > > at > > > > > > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540) > > > at > > > > > > > > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146) > > > at > > > > > > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) > > > at > > > > > > > > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) > > > at > > > > > > > > > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257) > > > at > > > > > > > > > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1711) > > > at > > > > > > > > > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255) > > > > > > > > > -- > > Sincerely yours > > Mikhail Khludnev > > > -- Sincerely yours Mikhail Khludnev
Re: Solr Query
Hi Swetha, Given URL is encoded. So, you can decode it before analyzing. Plus character is used for whitespaces when you encode a URL and minus sign represents a negative query in Solr. Kind Regards, Furkan KAMACI On Tue, Jul 7, 2020 at 9:16 PM swetha vemula wrote: > Hi, > > I have an URL and I want to break this down and run it in the admin console > but I am not what is ++ and - represents in the query. > > select?q=(StartPublish%3a%5b*+TO+-12-31T23%3a59%3a59.999Z%5d++-Content%3a(Birthdays%5c%2fAnniversaries))++-FriendlyUrl%3a(*%2farchive%2f*))++((Title_NGram%3a(swetha))%5e500+OR+(MetaTitle_NGram%3a(swetha))%5e400+OR+(MetaKeywords_NGram%3a(swetha))%5e300+OR+(MetaDescription_NGram%3a(swetha))%5e200+OR+(Content_NGram%3a(swetha))%5e1))++(ACL%3a((Everyone)+OR+(MIDCO410%5c%5cMidco%5c-AllEmployees)+OR+(MIDCO410%5c%5cMidco%5c-DotNetDevelopers)+OR+(MIDCO410%5c%5cMidco%5c-WebAdmins)+OR+(MIDCO410%5c%5cMidco%5c-Source%5c-Admin)&start=0&rows=1&wt=xml&version=2.2 > > Thank You, > Swetha. >
Re: Null pointer exception in QueryComponent.MergeDds method
Yes, we have timeAllowed=2 sec. On Tue, Jul 7, 2020 at 2:20 PM Mikhail Khludnev wrote: > Still not clear regarding fl param. Does request enabled timeAllowed param? > Anyway debugQuery true should give a clue why "sort_values" are absent in > shard response, note they should be supplied at > QueryComponent.doFieldSortValues(ResponseBuilder, SolrIndexSearcher). > > On Tue, Jul 7, 2020 at 4:19 PM Jae Joo wrote: > > > 8.3.1 > > > > > required="true" multiValued="false" docValues="true"/> > > > required="true" multiValued="false"/> > > > > the field "id" is for nested document. > > > > > > > > > > On Mon, Jul 6, 2020 at 4:17 PM Mikhail Khludnev wrote: > > > > > Hi, > > > What's the version? What's uniqueKey? is it stored? what's fl param? > > > > > > On Mon, Jul 6, 2020 at 5:12 PM Jae Joo wrote: > > > > > > > I am seeing the nullPointerException in the list below and I am > > > > looking for how to fix the exception. > > > > > > > > Thanks, > > > > > > > > > > > > NamedList sortFieldValues = > > > > (NamedList)(srsp.getSolrResponse().getResponse().get("sort_values")); > > > > if (sortFieldValues.size()==0 && // we bypass merging this response > > > > only if it's partial itself > > > > thisResponseIsPartial) { // but not the previous > > > one!! > > > > continue; //fsv timeout yields empty sort_vlaues > > > > } > > > > > > > > > > > > > > > > 2020-07-06 12:45:47.001 ERROR (qtp745962066-636182) [c:]] > > > > o.a.s.h.RequestHandlerBase java.lang.NullPointerException > > > > at > > > > > > > > > > > > > > org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:914) > > > > at > > > > > > > > > > > > > > org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:613) > > > > at > > > > > > > > > > > > > > org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:592) > > > > at > > > > > > > > > > > > > > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:431) > > > > at > > > > > > > > > > > > > > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:198) > > > > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2576) > > > > at > > > > org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:799) > > > > at > > > org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:578) > > > > at > > > > > > > > > > > > > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419) > > > > at > > > > > > > > > > > > > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351) > > > > at > > > > > > > > > > > > > > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602) > > > > at > > > > > > > > > > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540) > > > > at > > > > > > > > > > > > > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146) > > > > at > > > > > > > > > > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) > > > > at > > > > > > > > > > > > > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) > > > > at > > > > > > > > > > > > > > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257) > > > > at > > > > > > > > > > > > > > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1711) > > > > at > > > > > > > > > > > > > > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255) > > > > > > > > > > > > > -- > > > Sincerely yours > > > Mikhail Khludnev > > > > > > > > -- > Sincerely yours > Mikhail Khludnev >
Re: Max number of documents in update request
Thanks. This was useful, really appreciate it! :) On Tue, Jul 7, 2020, 8:07 PM Walter Underwood wrote: > Agreed, I do something between 20 and 1000. If the master node is not > handling any search traffic, use twice as many client threads as there are > CPUs in the node. That should get you close to 100% CPU utilization. > One thread will be waiting while a batch is being processed and another > thread will be sending the next batch so there is no pause in processing. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > > On Jul 7, 2020, at 6:12 AM, Erick Erickson > wrote: > > > > As many as you can send before blowing up. > > > > Really, the question is not answerable. 1K docs? 1G docs? 1 field or 500? > > > > And I don’t think it’s a good use of time to pursue much. See: > > > > https://lucidworks.com/post/really-batch-updates-solr-2/ > > > > If you’re looking at trying to maximize throughput, adding > > client threads that send Solr documents is a better approach. > > > > All that said, I usually just pick 1,000 and don’t worry about it. > > > > Best, > > Erick > > > >> On Jul 7, 2020, at 8:59 AM, Sidharth Negi > wrote: > >> > >> Hi, > >> > >> Could someone help me with the best way to go about determining the > maximum > >> number of docs I can send in a single update call to Solr in a master / > >> slave architecture. > >> > >> Thanks! > > > >
Suggester.count parameter for different dictionaries
Hello, we’re using different dictionaries with the suggester component for autocomplete with a setup similar to the following: true 10 titles suggester Is there a way to specify different count options for different dictionaries ? For example I’d like to have suggestions for all authors (say 1000) but 10 for titles and just one for abstracts. The reason to have 1000 authors is to present the number to the user saying ‘your search matches xxx authors, click here to show all’ while at the same time show the most 10 relevant titles and just one abstract. Thanks! — Ing. Andrea Vettori Responsabile Sistemi Informativi
solr query to return matched text to regex with default schema
Hi, I want to search Solr for server names in a set of Microsoft Word documents, PDF, and image files like jpg,gif. Server names are given by the regular expression(regex) INFP[a-zA-z0-9]{3,9} TRKP[a-zA-z0-9]{3,9} PLCP[a-zA-z0-9]{3,9} SQRP[a-zA-z0-9]{3,9} Problem === I want to get the text in the documents matching the regex. eg. INFPWSV01, PLCPLDB01 I've index the files using Solr/Tikka/Tesseract using the default schema. I've used the highlight search tool hl ticked hl.usePhraseHighlighter ticked Solr only returns the metadata (presumably) like filename for the file containing the pattern(s). Questions = 1. Would I have to modify the managed schema? 2. If so would I have to save the file content in the schema 3. If so is this the way to do it: a. solrconfig.xml <- inside my "core" true ignored_ _text_ ... b. Remove line ignored_ as I want meta data c. Change this to the managed schema stored to "true" curl -X POST -H 'Content-type:application/json' --data-binary '{ "replace-field":{ "name":"_text_", "type":"text_general", "multiValued":true, "indexed":true "stored":true } }' http://localhost:8983/api/cores/gettingstarted/schema
Re: Replica goes into recovery mode in Solr 6.1.0
Thanks for your reply. One server has total 320GB ram. In this 2 solr node one is shard1 and second is shard2 replica. Each solr node have 55GB memory allocated. shard1 has 585GB data and shard2 replica has 492GB data. means almost 1TB data in this server. server has also other applications and for that 60GB memory allocated. So total 150GB memory is left. Proper formatting details: https://drive.google.com/file/d/1K9JyvJ50Vele9pPJCiMwm25wV4A6x4eD/view Are you running multiple huge JVMs? >> Not huge but 60GB memory allocated for our 11 application. 150GB memory are >> still free. The servers will be doing a LOT of disk IO, so look at the read and write iops. I expect that the solr processes are blocked on disk reads almost all the time. >> is it chance to go in recovery mode if more IO read and write or blocked? "-Dsolr.autoSoftCommit.maxTime=100” is way too short (100 ms). >> Our requirement is NRT so we keep the less time Regards, Vishal Patel From: Walter Underwood Sent: Tuesday, July 7, 2020 8:15 PM To: solr-user@lucene.apache.org Subject: Re: Replica goes into recovery mode in Solr 6.1.0 This isn’t a support list, so nobody looks at issues. We do try to help. It looks like you have 1 TB of index on a system with 320 GB of RAM. I don’t know what "Shard1 Allocated memory” is, but maybe half of that RAM is used by JVMs or some other process, I guess. Are you running multiple huge JVMs? The servers will be doing a LOT of disk IO, so look at the read and write iops. I expect that the solr processes are blocked on disk reads almost all the time. "-Dsolr.autoSoftCommit.maxTime=100” is way too short (100 ms). That is probably causing your outages. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jul 7, 2020, at 5:18 AM, vishal patel > wrote: > > Any one is looking my issue? Please guide me. > > Regards, > Vishal Patel > > > > From: vishal patel > Sent: Monday, July 6, 2020 7:11 PM > To: solr-user@lucene.apache.org > Subject: Replica goes into recovery mode in Solr 6.1.0 > > I am using Solr version 6.1.0, Java 8 version and G1GC on production. We have > 2 shards and each shard has 1 replica. We have 3 collection. > We do not use any cache and also disable in Solr config.xml. Search and > Update requests are coming frequently in our live platform. > > *Our commit configuration in solr.config are below > > 60 > 2 > false > > > ${solr.autoSoftCommit.maxTime:-1} > > > *We used Near Real Time Searching So we did below configuration in solr.in.cmd > set SOLR_OPTS=%SOLR_OPTS% -Dsolr.autoSoftCommit.maxTime=100 > > *Our collections details are below: > > Collection Shard1 Shard1 Replica Shard2 Shard2 Replica > Number of Documents Size(GB)Number of Documents Size(GB) > Number of Documents Size(GB)Number of Documents Size(GB) > collection1 26913364201 26913379202 26913380 > 198 26913379198 > collection2 13934360310 13934367310 13934368 > 219 13934367219 > collection3 351539689 73.5351540040 73.5351540136 > 75.2351539722 75.2 > > *My server configurations are below: > >Server1 Server2 > CPU Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz, 2301 Mhz, 10 Core(s), 20 > Logical Processor(s)Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz, 2301 > Mhz, 10 Core(s), 20 Logical Processor(s) > HardDisk(GB)3845 ( 3.84 TB) 3485 GB (3.48 TB) > Total memory(GB)320 320 > Shard1 Allocated memory(GB) 55 > Shard2 Replica Allocated memory(GB) 55 > Shard2 Allocated memory(GB) 55 > Shard1 Replica Allocated memory(GB) 55 > Other Applications Allocated Memory(GB) 60 22 > Other Number Of Applications11 7 > > > Sometimes, any one replica goes into recovery mode. Why replica goes into > recovery? Due to heavy search OR heavy update/insert OR long GC pause time? > If any one of them then what should we do in configuration? > Should we increase the shard for recovery issue? > > Regards, > Vishal Patel >
Solr multi word search across multiple fields with mm
Hi We observed that the multi-word queries spanned across multiple fields with mm create a problem. Any help would be appreciated. Current Problem: Search on words spanning across different fields with minimum match(mm) and sow=false generates field centric query with per field mm rather than term centric query with mm across fields when a field undergoes different query-time analyses(like multi-word synonyms, stop words, etc) Below are the sample field and term centric queries: *term centric query with the query string as "amul cheese slice" (none of the terms has synonyms):* "parsedquery_toString": "+description:amul)^6.0 | description_l2:amul | (description_l1:amul)^4.0 | (brand_name_h:amul)^8.0 | (manual_tags:amul)^3.0) ((description:cheese)^6.0 | description_l2:cheese | (description_l1:cheese)^4.0 | (brand_name_h:cheese)^8.0 | (manual_tags:cheese)^3.0) ((description:slice)^6.0 | description_l2:slice | (description_l1:slice)^4.0 | (brand_name_h:slice)^8.0 | (manual_tags:slice)^3.0))~2)", *field centric query with the query string as "amul cheese cake" (cake has a synonym of plum cake):* "parsedquery_toString": "+(((description:amul description:cheese description:cake)~2)^6.0 | ((description_l2:amul description_l2:cheese (description_l2:cupcak description_l2:pastri (+description_l2:plum +description_l2:cake) description_l2:cake))~2) | ((description_l1:amul description_l1:cheese description_l1:cake)~2)^4.0 | ((brand_name_h:amul brand_name_h:cheese brand_name_h:cake)~2)^8.0 | ((manual_tags:amul manual_tags:cheese manual_tags:cake)~2)^3.0)", Referring to multiple blogs below helped us try different things below 1. autogeneratephrase queries 2. per field mm q=({!edismax qf=brand_name description v=$qx mm=2}^10 OR {!edismax qf=description_l1 manual_tags_l1 v=$qx mm=2} OR {!edismax qf=description_l2 v=$qx mm=2} )&qx=amul cheese cake But we observed that the above are still being converted to field centric queries with mm per field resulting in no match if the words span across multiple fields. -- Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html