filterCache ramBytesUsed monitoring statistics go negative
Hi: filterCache ramBytesUsed monitoring statistics go negative. Is there a special meaning, or is there a statistical problem When present the list, can sort it by key. Solr7 is like this, easy to view. For example: CACHE.searcher.filterCache.hits: 63265 CACHE.searcher.filterCache.cumulative_evictions: 1981 CACHE.searcher.filterCache.size: 6765 CACHE.searcher.filterCache.maxRamMB: 10240 CACHE.searcher.filterCache.hitratio: 0.8329712577846243 CACHE.searcher.filterCache.warmupTime: 49227 CACHE.searcher.filterCache.evictions: 1981 CACHE.searcher.filterCache.cumulative_hitratio: 0.737519464195261 CACHE.searcher.filterCache.lookups: 75951 CACHE.searcher.filterCache.cumulative_hits: 78624 CACHE.searcher.filterCache.cumulative_inserts: 15927 CACHE.searcher.filterCache.ramBytesUsed: -1418740612 CACHE.searcher.filterCache.inserts: 10510 CACHE.searcher.filterCache.cumulative_lookups: 106606
RE: SOLR uses too much CPU and GC is also weird on Windows server
Thanks for all for helping to think about it, but eventually found out that code was basically single record deleting/adding records. After it was batched up, then everything is back to normal. Funny thing is that 6.0.0. handled these requests somehow, but newer version did not. Anyway, we will observe this and try to improve our code as well. Best regards, Jaan -Original Message- From: Erick Erickson Sent: 28 October 2020 17:18 To: solr-user@lucene.apache.org Subject: Re: SOLR uses too much CPU and GC is also weird on Windows server DocValues=true are usually only used for “primitive” types, string, numerics, booleans and the like, specifically _not_ text-based. I say “usually” because there’s a special “SortableTextField” where it does make some sense to have a text-based field have docValues, but that’s intended for relatively short fields. For example you want to sort on a title field. And probably not something you’re working with. There’s not much we can say from this distance I’m afraid. I think I’d focus on the memory requirements, maybe take a heap dump and see what’s using memory. Did you restart Solr _after_ turning off indexing? I ask because that would help determine which side the problem is on, indexing or querying. It does sound like querying though. As for docValues in general, if you want to be really brave, you can set uninvertible=false for all your fields where docValues=false. When you facet on such a field, you won’t get anything back. If you sort on such a field, you’ll get an error message back. That should test if somehow not having docValues is the root of your problem. Do this on a test system of course ;) I think this is a low-probability issue, but it’s a mystery anyway so... Updating shouldn’t be that much of a problem either, and if you still see high CPU with indexing turned off, that eliminates indexing as a candidate. Is there any chance you changed your schema at all and didn’t delete your entire index and add all your documents back? There are a lot of ways things can go wrong if that’s the case. You had to reindex from scratch when you went to 8x from 6x, I’m wondering if during that process the schema changed without starting over. I’m grasping at straws here… I’d also seriously consider going to 8.6.3. We only make point releases when there’s something serious. Looking through lucene/CHANGES.txt, there is one memory leak fix in 8.6.2. I’d expect a gradual buildup of heap if that were what you’re seeing, but you never know. As for having docValues=false, that would cut down on the size of the index on disk and speed up indexing some, but in terms of memory usage or CPU usage when querying, unless the docValues structures are _needed_, they’re never read into OS RAM by MMapDirectory… The question really is whether you ever, intentionally or not, do “something” that would be more efficient with docValues. That’s where setting uninvertible=false whenever you set docValues=false makes sense, things will show up if your assumption that you don’t need docValues is false. Best, Erick > On Oct 28, 2020, at 9:29 AM, Jaan Arjasepp wrote: > > Hi all, > > Its me again. Anyway, I did a little research and we tried different things > and well, some questions I want to ask and some things that I found. > > Well after monitoring my system with VirtualVM, I found that GC jumping is > from 0.5GB to 2.5GB and it has 4GB of memory for now, so it should not be an > issue anymore or what? But will observe it a bit as it might rise I guess a > bit. > > Next thing we found or are thinking about is that writing on a disk might be > an issue, we turned off the indexing and some other stuff, but I would say, > it did not save much still. > I also did go through all the schema fields, not that much really. They are > all docValues=true. Also I must say they are all automatically generated, so > no manual working there except one field, but this also has docValue=true. > Just curious, if the field is not a string/text, can it be docValue=false or > still better to have true? And as for uninversion, then we are not using much > facets nor other specific things in query, just simple queries. > > Though I must say we are updating documents quite a bunch, but usage of CPU > for being so high, not sure about that. Older version seemed not using CPU so > much... > > I am a bit running out of ideas and hoping that this will continue to work, > but I dont like the CPU usage even over night, when nobody uses it. We will > try to figure out the issue here and I hope I can ask more questions when in > doubt or out of ideas. Also I must admit, solr is really new for me > personally. > > Jaan > > -Original Message- > From: Walter Underwood > Sent: 27 October 2020 18:44 > To: solr-user@lucene.apache.org > Subject: Re: SOLR uses too much CPU and GC is also weird on Windows > server > > That first graph shows a JVM that do
Re: SOLR uses too much CPU and GC is also weird on Windows server
What this sounds like is that somehow you were committing after every update in 8x but not in your 6x code. How that would have been change is anybody’s guess ;). It’s vaguely possible that your client is committing and you had IgnoreCommitOptimizeUpdateProcessorFactory defined in your update chain in 6x but not 8x. The other thing would be if your commit interval was much shorter in 8x than 6x or if your autowarm parameters were significantly different. That said, this is still a mystery, glad you found an answer. Thanks for getting back to us on this, this is useful information to have. Best, Erick > On Nov 2, 2020, at 7:50 AM, Jaan Arjasepp wrote: > > Thanks for all for helping to think about it, but eventually found out that > code was basically single record deleting/adding records. After it was > batched up, then everything is back to normal. Funny thing is that 6.0.0. > handled these requests somehow, but newer version did not. > Anyway, we will observe this and try to improve our code as well. > > Best regards, > Jaan > > -Original Message- > From: Erick Erickson > Sent: 28 October 2020 17:18 > To: solr-user@lucene.apache.org > Subject: Re: SOLR uses too much CPU and GC is also weird on Windows server > > DocValues=true are usually only used for “primitive” types, string, numerics, > booleans and the like, specifically _not_ text-based. > > I say “usually” because there’s a special “SortableTextField” where it does > make some sense to have a text-based field have docValues, but that’s > intended for relatively short fields. For example you want to sort on a title > field. And probably not something you’re working with. > > There’s not much we can say from this distance I’m afraid. I think I’d focus > on the memory requirements, maybe take a heap dump and see what’s using > memory. > > Did you restart Solr _after_ turning off indexing? I ask because that would > help determine which side the problem is on, indexing or querying. It does > sound like querying though. > > As for docValues in general, if you want to be really brave, you can set > uninvertible=false for all your fields where docValues=false. When you facet > on such a field, you won’t get anything back. If you sort on such a field, > you’ll get an error message back. That should test if somehow not having > docValues is the root of your problem. Do this on a test system of course ;) > I think this is a low-probability issue, but it’s a mystery anyway so... > > Updating shouldn’t be that much of a problem either, and if you still see > high CPU with indexing turned off, that eliminates indexing as a candidate. > > Is there any chance you changed your schema at all and didn’t delete your > entire index and add all your documents back? There are a lot of ways things > can go wrong if that’s the case. You had to reindex from scratch when you > went to 8x from 6x, I’m wondering if during that process the schema changed > without starting over. I’m grasping at straws here… > > I’d also seriously consider going to 8.6.3. We only make point releases when > there’s something serious. Looking through lucene/CHANGES.txt, there is one > memory leak fix in 8.6.2. I’d expect a gradual buildup of heap if that were > what you’re seeing, but you never know. > > As for having docValues=false, that would cut down on the size of the index > on disk and speed up indexing some, but in terms of memory usage or CPU usage > when querying, unless the docValues structures are _needed_, they’re never > read into OS RAM by MMapDirectory… The question really is whether you ever, > intentionally or not, do “something” that would be more efficient with > docValues. That’s where setting uninvertible=false whenever you set > docValues=false makes sense, things will show up if your assumption that you > don’t need docValues is false. > > Best, > Erick > > >> On Oct 28, 2020, at 9:29 AM, Jaan Arjasepp wrote: >> >> Hi all, >> >> Its me again. Anyway, I did a little research and we tried different things >> and well, some questions I want to ask and some things that I found. >> >> Well after monitoring my system with VirtualVM, I found that GC jumping is >> from 0.5GB to 2.5GB and it has 4GB of memory for now, so it should not be an >> issue anymore or what? But will observe it a bit as it might rise I guess a >> bit. >> >> Next thing we found or are thinking about is that writing on a disk might be >> an issue, we turned off the indexing and some other stuff, but I would say, >> it did not save much still. >> I also did go through all the schema fields, not that much really. They are >> all docValues=true. Also I must say they are all automatically generated, so >> no manual working there except one field, but this also has docValue=true. >> Just curious, if the field is not a string/text, can it be docValue=false or >> still better to have true? And as for uninversion, then we are not using
[Free Online Meetups] London Information Retrieval Meetup
Hi all, The London Information Retrieval Meetup has moved online: https://www.meetup.com/London-Information-Retrieval-Meetup-Group It is a free evening meetup aimed at Information Retrieval passionates and professionals who are curious to explore and discuss the latest trends in the field. It is technology agnostic, but you'll find many talks on Apache Solr and related technologies. Tomorrow (03.11 at 6:10 pm Uk time) we will host the sixth London Information Retrieval meetup (fully remote). We will have two talks: *Talk 1* "Feature Extraction for Large-Scale Text Collections" from Luke Gallagher, PhD candidate, RMIT University *Talk 2* "A Learning to Rank Project on a Daily Song Ranking Problem" from Ilaria Petreti (IR/ML Engineer, Sease) and Anna Ruggero (R&D Software Engineer, Sease) If you fancy some Search Stories, feel free to register here: https://www.meetup.com/London-Information-Retrieval-Meetup-Group/events/273905485/ Cheers have a nice evening! -- Alessandro Benedetti Search Consultant, R&D Software Engineer, Director www.sease.io
Re: Java Streaming API - nested Hashjoins with zk and accesstoken
Hi All, Any advice on this. Thanks sam On Sun, Nov 1, 2020 at 11:05 PM Anamika Solr wrote: > Hi All, > > I need to combine 3 different documents using hashjoin. I am using below > query(ignore placeholder queries): > > > hashJoin(hashJoin(search(collectionName,q="*:*",fl="id",qt="/export",sort="id > desc"), hashed = > select(search(collectionName,q="*:*",fl="id",qt="/export",sort="id > asc")),on="id"), hashed = > select(search(collectionName,q="*:*",fl="id",qt="/export",sort="id > asc")),on="id") > > This works with simple TupleStream in java. But I also need to pass auth > token on zk. So I have to use below code: > ZkClientClusterStateProvider zkCluster = new > ZkClientClusterStateProvider(zkHosts, null); > SolrZkClient zkServer = zkCluster.getZkStateReader().getZkClient(); > StreamFactory streamFactory = new > StreamFactory().withCollectionZkHost("collectionName"), > zkServer.getZkServerAddress()) > .withFunctionName("search", CloudSolrStream.class) > .withFunctionName("hashJoin", HashJoinStream.class) > .withFunctionName("select", SelectStream.class); > > try (HashJoinStream hashJoinStream = > (HashJoinStream)streamFactory.constructStream(expr);){} > > Issue is one hashjoin with nested select and search works fine with this > api. But the multiple hashjoin is not completing the task. I can see > expression is correctly parsed, but its waiting indefinitely to complete > the thread. > > Any help is appreciated. > > Thanks, > Anamika >
Re: filterCache ramBytesUsed monitoring statistics go negative
On 11/2/2020 4:27 AM, Dawn wrote: filterCache ramBytesUsed monitoring statistics go negative. Is there a special meaning, or is there a statistical problem When present the list, can sort it by key. Solr7 is like this, easy to view. When problems like this surface, it's usually because the code uses an "int" variable somewhere instead of a "long". All numeric variables in Java are signed, and an "int" can only go up to a little over 2 billion before the numbers start going negative. The master code branch looks like it's fine. What is the exact version of Solr you're using? With that information, I can check the relevant code. Maybe simply upgrading to a much newer version would take care of this for you. Thanks, Shawn
Understand on intermittent solr replica going to GONE state
Solr version: 8.2; Zoo - 3.4 I am progressively adding collection by collections with 3 replica's on each, and all of a sudden we got to see the load averages on solr nodes were bumped and also memory usage went to 65% usage on JAVA process , with that some replica's had went to "GONE" state (as per solr cloud) , until I restarted the solr service its been this issue. Need some guidance on where to start with on finding the root cause for this little outage? Data points: At the time we saw this outage, there are 3 instances of the copy tool which is actually pulling the data from old solr (5) and getting it indexed to new solr (8.2), which we stopped as and when we saw this outage as not really sure if that's creating the issue. we have around 12 solr nodes that are mapped to diff collections, each node got 8 Cores cpu with 64GB RAM (40GB was allocated to JVM HEAP) Based on the alerts , what we observed is that the load averages on few solr nodes were very high like 36.5, 26.7, 20 (not really sure if this is what concerning here as I used to see much lesser numbers like 3's and 4's on the averages but now this went to double digits) Also observed the below errors during that time from solr logs o.a.s.s.HttpSolrCall Unable to write response, client closed connection or we are shutting down org.eclipse.jetty.io.EofException: Closed at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:620) at org.apache.commons.io.output.ProxyOutputStream.write(ProxyOutputStream.java:55) at org.apache.solr.response.QueryResponseWriterUtil$1.write(QueryResponseWriterUtil.java:54) at java.io.OutputStream.write(OutputStream.java:116) at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221) at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:282) at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125) at java.io.OutputStreamWriter.write(OutputStreamWriter.java:207) at org.apache.solr.util.FastWriter.flush(FastWriter.java:140) at org.apache.solr.util.FastWriter.flushBuffer(FastWriter.java:154) at org.apache.solr.response.TextResponseWriter.close(TextResponseWriter.java:93) at org.apache.solr.response.JSONResponseWriter.write(JSONResponseWriter.java:73) at org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:65) at org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:809) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:538) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:361) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:305) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) -- Thanks & Regards, Yaswanth Kumar Konathala. yaswanth...@gmail.com
SolrIndexSearcher RankQuery Score calculation
Hi: SolrIndexSearcher.getDocListNC and getDocListAndSetNC code snippet: if (cmd.getSort() != null && query instanceof RankQuery == false && (cmd.getFlags() & GET_SCORES) != 0) { TopFieldCollector.populateScores(topDocs.scoreDocs, this, query); } When this query includes a filterQuery, `QueryUtils.combineQueryAndFilter` will build A new BooleanQuery and copy it to the Query object。 so `query instanceof RankQuery` is false, This causes the score to be lost in the RankQuery phase. Can you change this to determine if the original query is RankQuery: `cmd.getQuery() instanceof RankQuery`. version 8.6.*, 9.*
Search issue in the SOLR for few words
Hi Sir/Madam, Am facing an issue with few keyword searches (like gazing, one) in solr. Can you please help why these words are not listed in solr results? Indexing is done properly. -- Thanks and Regards Veeresh Sasalawad