Re: CMS GC - Old Generation collection never finishes (due to GC Allocation Failure?)
Hi, In addition to what others wrote already, there are a couple of things that might trigger sudden memory allocation surge that you can't really account for: 1. Deep paging, especially in a sharded index. Don't allow it and you'll be much happier. 2. Faceting without docValues especially in a large index. These would be my top two things to check before anything else. I've gone from 48 GB heap and GC having massive trouble keeping up to 8 GB heap and no trouble at all just by getting rid of deep paging and using docValues with all faceted fields. --Ere yasoobhaider kirjoitti 3.10.2018 klo 17.01: Hi I'm working with a Solr cluster with master-slave architecture. Master and slave config: ram: 120GB cores: 16 At any point there are between 10-20 slaves in the cluster, each serving ~2k requests per minute. Each slave houses two collections of approx 10G (~2.5mil docs) and 2G(10mil docs) when optimized. I am working with Solr 6.2.1 Solr configuration: -XX:+CMSParallelRemarkEnabled -XX:+CMSScavengeBeforeRemark -XX:+ParallelRefProcEnabled -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:-OmitStackTraceInFastThrow -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000 -XX:ConcGCThreads=4 -XX:MaxTenuringThreshold=8 -XX:ParallelGCThreads=4 -XX:PretenureSizeThreshold=64m -XX:SurvivorRatio=15 -XX:TargetSurvivorRatio=90 -Xmn10G -Xms80G -Xmx80G Some of these configurations have been reached by multiple trial and errors over time, including the huge heap size. This cluster usually runs without any error. In the usual scenario, old gen gc is triggered according to the configuration at 50% old gen occupancy, and the collector clears out the memory over the next minute or so. This happens every 10-15 minutes. However, I have noticed that sometimes the GC pattern of the slaves completely changes and old gen gc is not able to clear the memory. After observing the gc logs closely for multiple old gen gc collections, I noticed that the old gen gc is triggered at 50% occupancy, but if there is a GC Allocation Failure before the collection completes (after CMS Initial Remark but before CMS reset), the old gen collection is not able to clear much memory. And as soon as this collection completes, another old gen gc is triggered. And in worst case scenarios, this cycle of old gen gc triggering, GC allocation failure keeps happening, and the old gen memory keeps increasing, leading to a single threaded STW GC, which is not able to do much, and I have to restart the solr server. The last time this happened after the following sequence of events: 1. We optimized the bigger collection bringing it to its optimized size of ~10G. 2. For an unrelated reason, we had stopped indexing to the master. We usually index at a low-ish throughput of ~1mil docs/day. This is relevant as when we are indexing, the size of the collection increases, and this effects the heap size used by collection. 3. The slaves started behaving erratically, with old gc collection not being able to free up the required memory and finally being stuck in a STW GC. As unlikely as this sounds, this is the only thing that changed on the cluster. There was no change in query throughput or type of queries. I restarted the slaves multiple times but the gc behaved in the same way for over three days. Then when we fixed the indexing and made it live, the slaves resumed their original gc pattern and are running without any issues for over 24 hours now. I would really be grateful for any advice on the following: 1. What could be the reason behind CMS not being able to free up the memory? What are some experiments I can run to solve this problem? 2. Can stopping/starting indexing be a reason for such drastic changes to GC pattern? 3. I have read at multiple places on this mailing list that the heap size should be much lower (2x-3x the size of collection), but the last time I tried CMS was not able to run smoothly and GC STW would occur which was only solved by a restart. My reasoning for this is that the type of queries and the throughput are also a factor in deciding the heap size, so it may be that our queries are creating too many objects maybe. Is my reasoning correct or should I try with a lower heap size (if it helps achieve a stable gc pattern)? (4. Silly question, but what is the right way to ask question on the mailing list? via mail or via the nabble website? I sent this question earlier as a mail, but it was not showing up on the nabble website so I am posting it from the website now) - - Logs which show this: Desired survivor size 568413384 bytes, new threshold 2 (max 8) - age 1: 437184344 bytes, 4371843
Re: SPLITSHARD throwing OutOfMemory Error
Hi Atita, What is the amount of memory that you have in your system? And what is your index size? Regards, Edwin On Tue, 25 Sep 2018 at 22:39, Atita Arora wrote: > Hi, > > I am working on a test setup with Solr 6.1.0 cloud with 1 collection > sharded across 2 shards with no replication. When triggered a SPLITSHARD > command it throws "java.lang.OutOfMemoryError: Java heap space" everytime. > I tried this with multiple heap settings of 8, 12 & 20G but every time it > does create 2 sub-shards but then fails eventually. > I know the issue => https://jira.apache.org/jira/browse/SOLR-5214 has been > resolved but the trace looked very similar to this one. > Also just to ensure that I do not run into exceptions due to merge as > reported in this ticket, I also tried running optimize before proceeding > with splitting the shard. > I issued the following commands : > > 1. > > http://localhost:8983/solr/admin/collections?collection=testcollection&shard=shard1&action=SPLITSHARD > > This threw java.lang.OutOfMemoryError: Java heap space > > 2. > > http://localhost:8983/solr/admin/collections?collection=testcollection&shard=shard1&action=SPLITSHARD&async=1000 > > Then I ran with async=1000 and checked the status. Every time It's creating > the sub shards, but not splitting the index. > > Is there something that I am not doing correctly? > > Please guide. > > Thanks, > Atita >
Re: SPLITSHARD throwing OutOfMemory Error
Hi Edwin, Thanks for following up on this. So here are the configs : Memory - 30G - 20 G to Solr Disk - 1TB Index = ~ 500G and I think that it possibly is due to the reason why this could be happening is that during split shard, the unsplit index + split index persists on the instance and may be causing this. I actually tried splitshard on another instance with index size 64G and it went through without any issues. I would appreciate if you have additional information to enlighten me on this issue. Thanks again. Regards, Atita On Thu, Oct 4, 2018 at 9:47 AM Zheng Lin Edwin Yeo wrote: > Hi Atita, > > What is the amount of memory that you have in your system? > And what is your index size? > > Regards, > Edwin > > On Tue, 25 Sep 2018 at 22:39, Atita Arora wrote: > > > Hi, > > > > I am working on a test setup with Solr 6.1.0 cloud with 1 collection > > sharded across 2 shards with no replication. When triggered a SPLITSHARD > > command it throws "java.lang.OutOfMemoryError: Java heap space" > everytime. > > I tried this with multiple heap settings of 8, 12 & 20G but every time it > > does create 2 sub-shards but then fails eventually. > > I know the issue => https://jira.apache.org/jira/browse/SOLR-5214 has > been > > resolved but the trace looked very similar to this one. > > Also just to ensure that I do not run into exceptions due to merge as > > reported in this ticket, I also tried running optimize before proceeding > > with splitting the shard. > > I issued the following commands : > > > > 1. > > > > > http://localhost:8983/solr/admin/collections?collection=testcollection&shard=shard1&action=SPLITSHARD > > > > This threw java.lang.OutOfMemoryError: Java heap space > > > > 2. > > > > > http://localhost:8983/solr/admin/collections?collection=testcollection&shard=shard1&action=SPLITSHARD&async=1000 > > > > Then I ran with async=1000 and checked the status. Every time It's > creating > > the sub shards, but not splitting the index. > > > > Is there something that I am not doing correctly? > > > > Please guide. > > > > Thanks, > > Atita > > >
Re: SPLITSHARD throwing OutOfMemory Error
I know it’s not much help if you’re stuck with Solr 6.1 … but Solr 7.5 comes with an alternative strategy for SPLITSHARD that doesn’t consume as much memory and nearly doesn’t consume additional disk space on the leader. This strategy can be turned on by “splitMethod=link” parameter. > On 4 Oct 2018, at 10:23, Atita Arora wrote: > > Hi Edwin, > > Thanks for following up on this. > > So here are the configs : > > Memory - 30G - 20 G to Solr > Disk - 1TB > Index = ~ 500G > > and I think that it possibly is due to the reason why this could be > happening is that during split shard, the unsplit index + split index > persists on the instance and may be causing this. > I actually tried splitshard on another instance with index size 64G and it > went through without any issues. > > I would appreciate if you have additional information to enlighten me on > this issue. > > Thanks again. > > Regards, > > Atita > > On Thu, Oct 4, 2018 at 9:47 AM Zheng Lin Edwin Yeo > wrote: > >> Hi Atita, >> >> What is the amount of memory that you have in your system? >> And what is your index size? >> >> Regards, >> Edwin >> >> On Tue, 25 Sep 2018 at 22:39, Atita Arora wrote: >> >>> Hi, >>> >>> I am working on a test setup with Solr 6.1.0 cloud with 1 collection >>> sharded across 2 shards with no replication. When triggered a SPLITSHARD >>> command it throws "java.lang.OutOfMemoryError: Java heap space" >> everytime. >>> I tried this with multiple heap settings of 8, 12 & 20G but every time it >>> does create 2 sub-shards but then fails eventually. >>> I know the issue => https://jira.apache.org/jira/browse/SOLR-5214 has >> been >>> resolved but the trace looked very similar to this one. >>> Also just to ensure that I do not run into exceptions due to merge as >>> reported in this ticket, I also tried running optimize before proceeding >>> with splitting the shard. >>> I issued the following commands : >>> >>> 1. >>> >>> >> http://localhost:8983/solr/admin/collections?collection=testcollection&shard=shard1&action=SPLITSHARD >>> >>> This threw java.lang.OutOfMemoryError: Java heap space >>> >>> 2. >>> >>> >> http://localhost:8983/solr/admin/collections?collection=testcollection&shard=shard1&action=SPLITSHARD&async=1000 >>> >>> Then I ran with async=1000 and checked the status. Every time It's >> creating >>> the sub shards, but not splitting the index. >>> >>> Is there something that I am not doing correctly? >>> >>> Please guide. >>> >>> Thanks, >>> Atita >>> >> — Andrzej Białecki
Re: SPLITSHARD throwing OutOfMemory Error
Hi Andrzej, We're rather weighing on a lot of other stuff to upgrade our Solr for a very long time like better authentication handling, backups using CDCR, new Replication mode and this probably has just given us another reason to upgrade. Thank you so much for the suggestion, I think its good to know about something like this exists. We'll find out more about this. Great day ahead! Regards, Atita On Thu, Oct 4, 2018 at 11:28 AM Andrzej Białecki wrote: > I know it’s not much help if you’re stuck with Solr 6.1 … but Solr 7.5 > comes with an alternative strategy for SPLITSHARD that doesn’t consume as > much memory and nearly doesn’t consume additional disk space on the leader. > This strategy can be turned on by “splitMethod=link” parameter. > > > On 4 Oct 2018, at 10:23, Atita Arora wrote: > > > > Hi Edwin, > > > > Thanks for following up on this. > > > > So here are the configs : > > > > Memory - 30G - 20 G to Solr > > Disk - 1TB > > Index = ~ 500G > > > > and I think that it possibly is due to the reason why this could be > > happening is that during split shard, the unsplit index + split index > > persists on the instance and may be causing this. > > I actually tried splitshard on another instance with index size 64G and > it > > went through without any issues. > > > > I would appreciate if you have additional information to enlighten me on > > this issue. > > > > Thanks again. > > > > Regards, > > > > Atita > > > > On Thu, Oct 4, 2018 at 9:47 AM Zheng Lin Edwin Yeo > > > wrote: > > > >> Hi Atita, > >> > >> What is the amount of memory that you have in your system? > >> And what is your index size? > >> > >> Regards, > >> Edwin > >> > >> On Tue, 25 Sep 2018 at 22:39, Atita Arora wrote: > >> > >>> Hi, > >>> > >>> I am working on a test setup with Solr 6.1.0 cloud with 1 collection > >>> sharded across 2 shards with no replication. When triggered a > SPLITSHARD > >>> command it throws "java.lang.OutOfMemoryError: Java heap space" > >> everytime. > >>> I tried this with multiple heap settings of 8, 12 & 20G but every time > it > >>> does create 2 sub-shards but then fails eventually. > >>> I know the issue => https://jira.apache.org/jira/browse/SOLR-5214 has > >> been > >>> resolved but the trace looked very similar to this one. > >>> Also just to ensure that I do not run into exceptions due to merge as > >>> reported in this ticket, I also tried running optimize before > proceeding > >>> with splitting the shard. > >>> I issued the following commands : > >>> > >>> 1. > >>> > >>> > >> > http://localhost:8983/solr/admin/collections?collection=testcollection&shard=shard1&action=SPLITSHARD > >>> > >>> This threw java.lang.OutOfMemoryError: Java heap space > >>> > >>> 2. > >>> > >>> > >> > http://localhost:8983/solr/admin/collections?collection=testcollection&shard=shard1&action=SPLITSHARD&async=1000 > >>> > >>> Then I ran with async=1000 and checked the status. Every time It's > >> creating > >>> the sub shards, but not splitting the index. > >>> > >>> Is there something that I am not doing correctly? > >>> > >>> Please guide. > >>> > >>> Thanks, > >>> Atita > >>> > >> > > — > > Andrzej Białecki > >
Update Request Processors are Not Chained
I've defined my update processors as: content en,tr language_code other true true true signature false content 3 org.apache.solr.update.processor.TextProfileSignature 200 My /update/extract request handler is as follows: true true ignored_ content ignored_ ignored_ dedupe langid ignore-commit-from-client dedupe chain works nd signature field is populated but langid processor is not triggered at this combination. When I change their places: true true ignored_ content ignored_ ignored_ langid dedupe ignore-commit-from-client langid works but dedup is not activated (signature field is disappears). I use Solr 6.3. How can I solve this problem? Kind Regards, Furkan KAMACI
Re: Update Request Processors are Not Chained
I found the problem :) Problem is processor are not combined into one chain. On Thu, Oct 4, 2018 at 3:57 PM Furkan KAMACI wrote: > I've defined my update processors as: > > > class="org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory"> > > content > en,tr > language_code > other > true > true > > > > > > > > > true > signature > false > content > 3 > name="signatureClass">org.apache.solr.update.processor.TextProfileSignature > > > > > > default="true"> > > 200 > > > > > > > My /update/extract request handler is as follows: > > startup="lazy" > class="solr.extraction.ExtractingRequestHandler" > > > true > true > ignored_ > content > ignored_ > ignored_ > > > dedupe > langid > ignore-commit-from-client > > > > dedupe chain works nd signature field is populated but langid processor is > not triggered at this combination. When I change their places: > > startup="lazy" > class="solr.extraction.ExtractingRequestHandler" > > > true > true > ignored_ > content > ignored_ > ignored_ > > > langid > dedupe > ignore-commit-from-client > > > > langid works but dedup is not activated (signature field is disappears). > > I use Solr 6.3. How can I solve this problem? > > Kind Regards, > Furkan KAMACI >
Filtering group query results
Hi, We have a requirement where we need to perform a group query in Solr where results are grouped by user-name (which is a field in our indexes) . We then need to filter the results based on numFound response parameter present under each group. In essence, we want to return results only where numFound=1. Looking into the documentation, I couldn’t figure out any mechanism to achieve this. So wondering if there is a possibility to achieve this requirement with the existing building blocks of Solr query mechanism. Thanks
Re: solr and diversification
The use case is on ranking news, Joel. And yes, I have the feeling that it might improve relevance and in 2011/2012 there was a lot of work on this in academia.. Thanks Tim, I'll check out MMR. From: solr-user@lucene.apache.org At: 09/28/18 20:24:44To: solr-user@lucene.apache.org Subject: Re: solr and diversification Interesting, I had not heard of MMR. Joel Bernstein http://joelsolr.blogspot.com/ On Fri, Sep 28, 2018 at 10:43 AM Tim Allison wrote: > If you haven’t already, might want to check out maximal marginal > relevance...original paper: Carbonell and Goldstein. > > On Thu, Sep 27, 2018 at 7:29 PM Joel Bernstein wrote: > > > Yeah, I think your plan sounds fine. > > > > Do you have a specific use case for diversity of results. I've been > > wondering if diversity of results would provide better perceived > relevance. > > > > Joel Bernstein > > http://joelsolr.blogspot.com/ > > > > > > On Thu, Sep 27, 2018 at 1:39 PM Diego Ceccarelli (BLOOMBERG/ LONDON) < > > dceccarel...@bloomberg.net> wrote: > > > > > Yeah, I think Kmeans might be a way to implement the "top 3 stories > that > > > are more distant", but you can also have a more naïve (and faster) > > strategy > > > like > > > - sending a threshold > > > - scan the documents according to the relevance score > > > - select the top documents that have diversity > threshold. > > > > > > I would allow to define the strategy and select it from the request. > > > > > > From: solr-user@lucene.apache.org At: 09/27/18 18:25:43To: Diego > > > Ceccarelli (BLOOMBERG/ LONDON ) , solr-user@lucene.apache.org > > > Subject: Re: solr and diversification > > > > > > I've thought about this problem a little bit. What I was considering > was > > > using Kmeans clustering to cluster the top 50 docs, then pulling the > top > > > scoring doc form each cluster as the top documents. This should be fast > > and > > > effective at getting diversity. > > > > > > > > > Joel Bernstein > > > http://joelsolr.blogspot.com/ > > > > > > > > > On Thu, Sep 27, 2018 at 1:20 PM Diego Ceccarelli (BLOOMBERG/ LONDON) < > > > dceccarel...@bloomberg.net> wrote: > > > > > > > Hi, > > > > > > > > I'm considering to write a component for diversifying the results. I > > know > > > > that diversification can be achieved by using grouping but I'm > thinking > > > > about something different and query biased. > > > > The idea is to have something that gets applied after the normal > > > retrieval > > > > and selects the top k documents more diverse based on some distance > > > metric: > > > > > > > > Example: > > > > imagine that you are asking for 10 rows, and you set diversify.rows=3 > > > > diversity.metric=tfidf diversify.field=body > > > > > > > > Solr might retrieve the the top 10 rows as usual, extract tfidf > vectors > > > > for the bodies and select the top 3 stories that are more distant > > > according > > > > to the cosine similarity. > > > > This would be different from grouping because documents will be > > > > 'collapsed' or not based on the subset of documents retrieved for the > > > > query. > > > > Do you think it would make sense to have it as a component? any > > feedback > > > > / idea? > > > > > > > > > > > > > > > > > > > > > > > >
Re: Modify the log directory for dih
On 10/4/2018 12:30 AM, lala wrote: Hi, I am using: Solr: 7.4 OS: windows7 I start solr using a service on startup. In that case, I really have no idea where anything is on your system. There is no service installation from the Solr project for Windows -- either you obtained that from somewhere else, or it's something written in-house. Either way, you would need to talk to whoever created that service installation for help locating files on your setup. In general, you need to find the log4j2.xml file that is controlling your logging configuration and modify it. It contains a sample of how to log something to a separate file -- the slow query log. That example redirects a specific logger name (which is similar to a full qualified class name and in most cases *is* the class name) to a different logfile. Version 7.4 has a bug when running on Windows that causes a lot of problems specific to logging. https://issues.apache.org/jira/browse/SOLR-12538 That problem has been fixed in the 7.5 release. You can also fix it by editing the solr.cmd script manually. Additional info: I am developing a web application that uses solr as search engine, I use DIH to index folders in solr using the FileListEntityProcessor. What I need is logging each index operation in a file that I can reach & read to be able to detect failed index files in the folder. The FileListEntityProcessor class has absolutely no logging in it. If you require that immediately, you would need to add logging commands to the source code and recompile Solr yourself to produce a package with your change. With an enhancement issue in Jira, we can review what logging is suitable for the class, and probably make it work like SQLEntityProcessor in that regard. If that's done the way I think it should be, then you could add config in log4j2.xml to could enable DEBUG level logging for that class specifically and write its logs to a separate logfile. Thanks, Shawn
Re: Filtering group query results
On 10/4/2018 7:10 AM, Greenhorn Techie wrote: We have a requirement where we need to perform a group query in Solr where results are grouped by user-name (which is a field in our indexes) . We then need to filter the results based on numFound response parameter present under each group. In essence, we want to return results only where numFound=1. I don't think this is possible in Solr. I'm reasonably sure that the document count isn't calculated until after all the querying and filtering is done. It would be easy enough to do on the client side -- just skip over any group where the number of results is not what you're looking for. I've got no idea how difficult it would be to write this kind of capability into the server side. Off hand I would guess that it's probably not super difficult for someone who already knows that part of the code. I don't know that code, so I'd be spending a lot of time learning it before I could make a change. Thanks, Shawn
Boolean clauses in ComplexPhraseQuery
Hi All, Does Solr supports boolean clauses inside ComplexPhraseQuery? For example: {!complexphrase inOrder=true} NOT (field: “value is this” OR field: “value is that”) Thanks, Chuming
Re: checksum failed (hardware problem?)
To be more concrete: Is the definitive test of whether or not a core's index is corrupt to copy it onto a new set of hardware and attempt to write to it? If this is a definitive test, we can run the experiment and update the report so you have a sense of how often this happens. Since this is a SOLR cloud node, which is already removed but whose data dir was preserved, I believe I can just copy the data directory to a fresh machine and start a regular non-cloud solr node hosting this core. Can you please confirm that this will be a definitive test, or whether there is some aspect needed to make it definitive? Thanks! On Wed, Oct 3, 2018 at 2:10 AM Stephen Bianamara wrote: > Hello All -- > > As it would happen, we've seen this error on version 6.6.2 very recently. > This is also on an AWS instance, like Simon's report. The drive doesn't > show any sign of being unhealthy, either from cursory investigation. FWIW, > this occurred during a collection backup. > > Erick, is there some diagnostic data we can find to help pin this down? > > Thanks! > Stephen > > On Sun, Sep 30, 2018 at 12:48 PM Susheel Kumar > wrote: > >> Thank you, Simon. Which basically points that something related to env and >> was causing the checksum failures than any lucene/solr issue. >> >> Eric - I did check with hardware folks and they are aware of some VMware >> issue where the VM hosted in HCI environment is coming into some halt >> state >> for minute or so and may be loosing connections to disk/network. So that >> probably may be the reason of index corruption though they have not been >> able to find anything specific from logs during the time Solr run into >> issue >> >> Also I had again issue where Solr is loosing the connection with zookeeper >> (Client session timed out, have not heard from server in 8367ms for >> sessionid 0x0) Does that points to similar hardware issue, Any >> suggestions? >> >> Thanks, >> Susheel >> >> 2018-09-29 17:30:44.070 INFO >> (searcherExecutor-7-thread-1-processing-n:server54:8080_solr >> x:COLL_shard4_replica2 s:shard4 c:COLL r:core_node8) [c:COLL s:shard4 >> r:core_node8 x:COLL_shard4_replica2] o.a.s.c.SolrCore >> [COLL_shard4_replica2] Registered new searcher >> Searcher@7a4465b1[COLL_shard4_replica2] >> >> main{ExitableDirectoryReader(UninvertingDirectoryReader(Uninverting(_7x3f(6.6.2):C826923/317917:delGen=2523) >> Uninverting(_83pb(6.6.2):C805451/172968:delGen=2957) >> Uninverting(_3ywj(6.6.2):C727978/334529:delGen=2962) >> Uninverting(_7vsw(6.6.2):C872110/385178:delGen=2020) >> Uninverting(_8n89(6.6.2):C741293/109260:delGen=3863) >> Uninverting(_7zkq(6.6.2):C720666/101205:delGen=3151) >> Uninverting(_825d(6.6.2):C707731/112410:delGen=3168) >> Uninverting(_dgwu(6.6.2):C760421/295964:delGen=4624) >> Uninverting(_gs5x(6.6.2):C540942/138952:delGen=1623) >> Uninverting(_gu6a(6.6.2):c75213/35640:delGen=1110) >> Uninverting(_h33i(6.6.2):c131276/40356:delGen=706) >> Uninverting(_h5tc(6.6.2):c44320/11080:delGen=380) >> Uninverting(_h9d9(6.6.2):c35088/3188:delGen=104) >> Uninverting(_h80h(6.6.2):c11927/3412:delGen=153) >> Uninverting(_h7ll(6.6.2):c11284/1368:delGen=205) >> Uninverting(_h8bs(6.6.2):c11518/2103:delGen=149) >> Uninverting(_h9r3(6.6.2):c16439/1018:delGen=52) >> Uninverting(_h9z1(6.6.2):c9428/823:delGen=27) >> Uninverting(_h9v2(6.6.2):c933/33:delGen=12) >> Uninverting(_ha1c(6.6.2):c1056/1:delGen=1) >> Uninverting(_ha6i(6.6.2):c1883/124:delGen=8) >> Uninverting(_ha3x(6.6.2):c807/14:delGen=3) >> Uninverting(_ha47(6.6.2):c1229/133:delGen=6) >> Uninverting(_hapk(6.6.2):c523) Uninverting(_haoq(6.6.2):c279) >> Uninverting(_hamr(6.6.2):c311) Uninverting(_hap0(6.6.2):c338) >> Uninverting(_hapu(6.6.2):c275) Uninverting(_hapv(6.6.2):C4/2:delGen=1) >> Uninverting(_hapw(6.6.2):C5/2:delGen=1) >> Uninverting(_hapx(6.6.2):C2/1:delGen=1) >> Uninverting(_hapy(6.6.2):C2/1:delGen=1) >> Uninverting(_hapz(6.6.2):C3/1:delGen=1) >> Uninverting(_haq0(6.6.2):C6/3:delGen=1) >> Uninverting(_haq1(6.6.2):C1)))} >> 2018-09-29 17:30:52.390 WARN >> >> (zkCallback-5-thread-91-processing-n:server54:8080_solr-SendThread(server117:2182)) >> [ ] o.a.z.ClientCnxn Client session timed out, have not heard from >> server in 8367ms for sessionid 0x0 >> 2018-09-29 17:31:01.302 WARN >> >> (zkCallback-5-thread-91-processing-n:server54:8080_solr-SendThread(server120:2182)) >> [ ] o.a.z.ClientCnxn Client session timed out, have not heard from >> server in 8812ms for sessionid 0x0 >> 2018-09-29 17:31:14.049 INFO >> (zkCallback-5-thread-91-processing-n:server54:8080_solr-EventThread) [ >> ] o.a.s.c.c.ConnectionManager Connection with ZooKeeper >> reestablished. >> 2018-09-29 17:31:14.049 INFO >> (zkCallback-5-thread-91-processing-n:server54:8080_solr-EventThread) [ >> ] o.a.s.c.ZkController ZooKeeper session re-connected ... refreshing >> core states after session expiration. >> 2018-09-29 17:31:14.051 INFO >> (zkCallback-5-thread-91-processing-n:server54:8080_solr-EventThread) [ >> ] o.a.s.c.c.ZkStateReader Updated live no
Connecting Solr to Nutch
Hello out there! I'm trying to create a small search engine and have installed Nutch 1.15 and Solr 7.5.0..issue now is connecting the 2 primarily because the files required to create the Nutch core in Solr doesn't exist i.e. basicconfig. How do I go about connecting the 2 so I can begin crawling websites for the engine? Please help 😊 💗💗, Timeka Cobb
Re: SPLITSHARD throwing OutOfMemory Error
Hi Atita, It would be good to consider upgrading to have the use of the better features like better memory consumption and better authentication. On a side note, it is also good to upgrade now in Solr 7, as Solr Indexes can only be upgraded from the previous major release version (Solr 6) to the current major release version (Solr 7). Since you are using Solr 6.1, so when Solr 8 comes around, it will not be possible to upgrade directly, and the index will have to be upgrade to Solr 7 first before upgrading to Solr 8. http://lucene.apache.org/solr/guide/7_5/indexupgrader-tool.html Regards, Edwin On Thu, 4 Oct 2018 at 17:41, Atita Arora wrote: > Hi Andrzej, > > We're rather weighing on a lot of other stuff to upgrade our Solr for a > very long time like better authentication handling, backups using CDCR, new > Replication mode and this probably has just given us another reason to > upgrade. > Thank you so much for the suggestion, I think its good to know about > something like this exists. We'll find out more about this. > > Great day ahead! > > Regards, > Atita > > > > On Thu, Oct 4, 2018 at 11:28 AM Andrzej Białecki wrote: > > > I know it’s not much help if you’re stuck with Solr 6.1 … but Solr 7.5 > > comes with an alternative strategy for SPLITSHARD that doesn’t consume as > > much memory and nearly doesn’t consume additional disk space on the > leader. > > This strategy can be turned on by “splitMethod=link” parameter. > > > > > On 4 Oct 2018, at 10:23, Atita Arora wrote: > > > > > > Hi Edwin, > > > > > > Thanks for following up on this. > > > > > > So here are the configs : > > > > > > Memory - 30G - 20 G to Solr > > > Disk - 1TB > > > Index = ~ 500G > > > > > > and I think that it possibly is due to the reason why this could be > > > happening is that during split shard, the unsplit index + split index > > > persists on the instance and may be causing this. > > > I actually tried splitshard on another instance with index size 64G and > > it > > > went through without any issues. > > > > > > I would appreciate if you have additional information to enlighten me > on > > > this issue. > > > > > > Thanks again. > > > > > > Regards, > > > > > > Atita > > > > > > On Thu, Oct 4, 2018 at 9:47 AM Zheng Lin Edwin Yeo < > edwinye...@gmail.com > > > > > > wrote: > > > > > >> Hi Atita, > > >> > > >> What is the amount of memory that you have in your system? > > >> And what is your index size? > > >> > > >> Regards, > > >> Edwin > > >> > > >> On Tue, 25 Sep 2018 at 22:39, Atita Arora > wrote: > > >> > > >>> Hi, > > >>> > > >>> I am working on a test setup with Solr 6.1.0 cloud with 1 collection > > >>> sharded across 2 shards with no replication. When triggered a > > SPLITSHARD > > >>> command it throws "java.lang.OutOfMemoryError: Java heap space" > > >> everytime. > > >>> I tried this with multiple heap settings of 8, 12 & 20G but every > time > > it > > >>> does create 2 sub-shards but then fails eventually. > > >>> I know the issue => https://jira.apache.org/jira/browse/SOLR-5214 > has > > >> been > > >>> resolved but the trace looked very similar to this one. > > >>> Also just to ensure that I do not run into exceptions due to merge as > > >>> reported in this ticket, I also tried running optimize before > > proceeding > > >>> with splitting the shard. > > >>> I issued the following commands : > > >>> > > >>> 1. > > >>> > > >>> > > >> > > > http://localhost:8983/solr/admin/collections?collection=testcollection&shard=shard1&action=SPLITSHARD > > >>> > > >>> This threw java.lang.OutOfMemoryError: Java heap space > > >>> > > >>> 2. > > >>> > > >>> > > >> > > > http://localhost:8983/solr/admin/collections?collection=testcollection&shard=shard1&action=SPLITSHARD&async=1000 > > >>> > > >>> Then I ran with async=1000 and checked the status. Every time It's > > >> creating > > >>> the sub shards, but not splitting the index. > > >>> > > >>> Is there something that I am not doing correctly? > > >>> > > >>> Please guide. > > >>> > > >>> Thanks, > > >>> Atita > > >>> > > >> > > > > — > > > > Andrzej Białecki > > > > >