JMX warning in Solr
I've got following repeating exception in my log: javax.management.InstanceNotFoundException: solr/collection:type=searcher,id=org.apache.solr.search.SolrIndexSearcher com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1095) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.exclusiveUnregisterMBean(DefaultMBeanServerInterceptor.java:427) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.unregisterMBean(DefaultMBeanServerInterceptor.java:415) at com.sun.jmx.mbeanserver.JmxMBeanServer.unregisterMBean(JmxMBeanServer.java:546) at org.jboss.as.jmx.PluggableMBeanServerImpl$TcclMBeanServer.unregisterMBean(PluggableMBeanServerImpl.java:584) at org.jboss.as.jmx.PluggableMBeanServerImpl.unregisterMBean(PluggableMBeanServerImpl.java:331) at org.apache.solr.core.JmxMonitoredMap.put(JmxMonitoredMap.java:138) at org.apache.solr.core.JmxMonitoredMap.put(JmxMonitoredMap.java:51) at org.apache.solr.search.SolrIndexSearcher.register(SolrIndexSearcher.java:297) I know what it means, but I don't have any ideas how it can be fixed and the main root cause of it... My env: JBoss 6.3.x + Solr 4.x I use two Solr instances in one servlet container. Can it be the root cause? Maybe there were similar issue earlier? -- View this message in context: http://lucene.472066.n3.nabble.com/JMX-warning-in-Solr-tp4221975.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: serious data loss bug in correlation with "too much data after closed"
By now I'm pretty much sure that this is either a bug in solr or in http-client. I again reproduced the problem: 1. during massive indexing we see some WARNINGS from HttpParser: "badMessage: java.lang.IllegalStateException: too much data after closed for HttpChannelOverHttp" checking in httpcore code it seems that this happens when the connection closes abruptly. 2. only at some instances of this warning, we see a related NoHttpResponseException from solr node. 3. after indexing we perform a full data validation and see that around 200 docs that our client got http 200 status for them, are not present in solr. 4. checking when these docs were sent to solr we get to the same time that we had log messages from 1 and 2 (HttpChannelOverHttp warning and NoHttpResponseException ) 5. these 200 docs are divided into around 8 bulks that were sent in various times, and all had these warn/error messages around them. Would be glad to have some inputs from the community on this. Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/serious-data-loss-bug-in-correlation-with-too-much-data-after-closed-tp4220723p4221986.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: docValues
I Have tested with docValue and without docValue on the test indexes with a json nested faceting query. Have noticed performance boot with the docValue.The response time with Cached items and without cached items is good. I have noticed that the response time on the cached items of the index without docValue is not always constant (28 Ms, 78 Ms, 94 Ms). Where as with docValue is always constant( always <20 Ms) Decided to go with docValue. > On 08-Aug-2015, at 10:44 pm, Erick Erickson wrote: > > Have you seen: https://cwiki.apache.org/confluence/display/solr/DocValues? > > What kind of speedup? How often are you committing? Is there a speed > difference > after a while or on the first few queries? > > Details matter a lot for questions like this. > > Best, > Erick > >> On Sat, Aug 8, 2015 at 6:22 PM, Nagasharath >> wrote: >> Good >> >> Sent from my iPhone >> >>> On 08-Aug-2015, at 8:12 pm, Aman Tandon wrote: >>> >>> Hi, >>> >>> I am seeing a significant difference in the query time after using docValue >>> >>> what kind of difference, is it good or bad? >>> >>> With Regards >>> Aman Tandon >>> >>> On Sat, Aug 8, 2015 at 11:38 PM, Nagasharath >>> wrote: >>> I am seeing a significant difference in the query time after using docValue. I am curious to know what's happening with 'docValue' included in the schema >> On 07-Aug-2015, at 4:31 pm, Shawn Heisey wrote: >> >> On 8/7/2015 11:47 AM, naga sharathrayapati wrote: >> JVM-Memory has gone up from 3% to 17.1% > > In my experience, a healthy Java application (after the heap size has > stabilized) will have a heap utilization graph where the low points are > between 50 and 75 percent. If the low points in heap utilization are > consistently below 25 percent, you would be better off reducing the heap > size and allowing the OS to use that memory instead. > > If you want to track heap utilization, JVM-Memory in the Solr dashboard > is a very poor tool. Use tools like visualvm or jconsole. > > https://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap > > I need to add what I said about very low heap utilization to that wiki page. > > Thanks, > Shawn
Re: docValues
Interesting... what type of field was this? (string or numeric? single or multi-valued?) Without docValues, the first request would be slow (due to building the in-memory field cache entry), but after that it should be fast. -Yonik On Sun, Aug 9, 2015 at 11:31 AM, Nagasharath wrote: > I Have tested with docValue and without docValue on the test indexes with a > json nested faceting query. > > Have noticed performance boot with the docValue.The response time with Cached > items and without cached items is good. > > I have noticed that the response time on the cached items of the index > without docValue is not always constant (28 Ms, 78 Ms, 94 Ms). Where as with > docValue is always constant( always <20 Ms) > > Decided to go with docValue. > >> On 08-Aug-2015, at 10:44 pm, Erick Erickson wrote: >> >> Have you seen: https://cwiki.apache.org/confluence/display/solr/DocValues? >> >> What kind of speedup? How often are you committing? Is there a speed >> difference >> after a while or on the first few queries? >> >> Details matter a lot for questions like this. >> >> Best, >> Erick >> >>> On Sat, Aug 8, 2015 at 6:22 PM, Nagasharath >>> wrote: >>> Good >>> >>> Sent from my iPhone >>> On 08-Aug-2015, at 8:12 pm, Aman Tandon wrote: Hi, > I am seeing a significant difference in the query time after using > docValue what kind of difference, is it good or bad? With Regards Aman Tandon On Sat, Aug 8, 2015 at 11:38 PM, Nagasharath wrote: > I am seeing a significant difference in the query time after using > docValue. > > I am curious to know what's happening with 'docValue' included in the > schema > >>> On 07-Aug-2015, at 4:31 pm, Shawn Heisey wrote: >>> >>> On 8/7/2015 11:47 AM, naga sharathrayapati wrote: >>> JVM-Memory has gone up from 3% to 17.1% >> >> In my experience, a healthy Java application (after the heap size has >> stabilized) will have a heap utilization graph where the low points are >> between 50 and 75 percent. If the low points in heap utilization are >> consistently below 25 percent, you would be better off reducing the heap >> size and allowing the OS to use that memory instead. >> >> If you want to track heap utilization, JVM-Memory in the Solr dashboard >> is a very poor tool. Use tools like visualvm or jconsole. >> >> https://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap >> >> I need to add what I said about very low heap utilization to that wiki > page. >> >> Thanks, >> Shawn >
Re: docValues
Json Nested faceting on string,string,double fields. Facet function 'sum' is applied on double field Without docValue response for the same query 1) First response without cache 765 Ms 2) second response with cache 28 Ms 3) third response with cache 78 Ms 4) fourth response with cache 94 Ms With docValue response for the same query 1) first response without cache 78 Ms 2) response is always less than 20 Ms with cache Version 5.2.1 > On 09-Aug-2015, at 10:39 am, Yonik Seeley wrote: > > Interesting... what type of field was this? (string or numeric? single > or multi-valued?) > > Without docValues, the first request would be slow (due to building > the in-memory field cache entry), but after that it should be fast. > > -Yonik > > >> On Sun, Aug 9, 2015 at 11:31 AM, Nagasharath >> wrote: >> I Have tested with docValue and without docValue on the test indexes with a >> json nested faceting query. >> >> Have noticed performance boot with the docValue.The response time with >> Cached items and without cached items is good. >> >> I have noticed that the response time on the cached items of the index >> without docValue is not always constant (28 Ms, 78 Ms, 94 Ms). Where as with >> docValue is always constant( always <20 Ms) >> >> Decided to go with docValue. >> >>> On 08-Aug-2015, at 10:44 pm, Erick Erickson wrote: >>> >>> Have you seen: https://cwiki.apache.org/confluence/display/solr/DocValues? >>> >>> What kind of speedup? How often are you committing? Is there a speed >>> difference >>> after a while or on the first few queries? >>> >>> Details matter a lot for questions like this. >>> >>> Best, >>> Erick >>> On Sat, Aug 8, 2015 at 6:22 PM, Nagasharath wrote: Good Sent from my iPhone > On 08-Aug-2015, at 8:12 pm, Aman Tandon wrote: > > Hi, > > >> I am seeing a significant difference in the query time after using >> docValue > > what kind of difference, is it good or bad? > > With Regards > Aman Tandon > > On Sat, Aug 8, 2015 at 11:38 PM, Nagasharath > wrote: > >> I am seeing a significant difference in the query time after using >> docValue. >> >> I am curious to know what's happening with 'docValue' included in the >> schema >> On 07-Aug-2015, at 4:31 pm, Shawn Heisey wrote: On 8/7/2015 11:47 AM, naga sharathrayapati wrote: JVM-Memory has gone up from 3% to 17.1% >>> >>> In my experience, a healthy Java application (after the heap size has >>> stabilized) will have a heap utilization graph where the low points are >>> between 50 and 75 percent. If the low points in heap utilization are >>> consistently below 25 percent, you would be better off reducing the heap >>> size and allowing the OS to use that memory instead. >>> >>> If you want to track heap utilization, JVM-Memory in the Solr dashboard >>> is a very poor tool. Use tools like visualvm or jconsole. >>> >>> https://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap >>> >>> I need to add what I said about very low heap utilization to that wiki >> page. >>> >>> Thanks, >>> Shawn >>
Certification
Is there a industry standard certification on solr?
Re: Concurrent Indexing and Searching in Solr.
On 8/7/2015 1:15 PM, Nitin Solanki wrote: > I wrote a python script for indexing and using > urllib and urllib2 for indexing data via http.. There are a number of Solr python clients. Using a client makes your code much easier to write and understand. https://wiki.apache.org/solr/SolPython I have no experience with any of these clients, but I can say that the one encountered most often when Python developers come into the #solr IRC channel is pysolr. Our wiki page says the last update for pysolr happened in December of 2013, but I can see that the last version on their web page is dated 2015-05-26. Making 100 concurrent indexing requests at the same time as 100 concurrent queries will overwhelm *any* single Solr server. In a previous message you said that you have 4 CPU cores. The load you're trying to put on Solr will require at *LEAST* 200 threads. It may be more than that. Any single system is going to have trouble with that. A system with 4 cores will be *very* overloaded. Thanks, Shawn
Re: SolrJ update
Hi Andrea, Thanks for the explanation on that. I ended up doing that, getting and setting an UUID on the Java code. On 08/08/2015 03:19 AM, Andrea Gazzarini wrote: Hi Henrique, I don't believe there's an easy way to do that. As you noticed, the SolrInputDocument is not an I/O param, that is, it is not sent back once data has been indexed and this is good, because here you're sending just one document, but imagine what could happen if you do a bulk loading...the response would be very very huge! Although I could imagine some workaround (with a custom UpdateRequestProcessor and a custom ResponseWriter), the point is that (see above) I believe it would end in a bad design: - if you send one document at time this is *often* considered a bad practice; - if you send a lot of data the corresponding response would be huge, it will contains a lot of new created identifiersand BTW how do you match them with your input documents? Sequentially? in this way you won't be able to use any *asynch* client Personally, if that is ok for your context, I'd completely avoid the problem moving the logic on the client side. I mean, create a UUID field on Solrj and add that ID to the outcoming document. Best, Andrea 2015-08-06 21:39 GMT+02:00 Henrique O. Santos : Hello all, I am using SolrJ to do a index update on one of my collections. This collection has a uniqueKey id field: id This field is configured to be auto generated on solrconfig.xml like this: id On my Java code, I just add the name field to my document and then proceed with the add: doc.addField("name", this.name); solrClient.add(doc); solrClient.commit(); Everything works, the document gets indexed. What I really need is to know right away in the code the id that was generated for that single document. I have tried looking into the UpdateReponse but no luck. Is there any easy way to do that? Thank you in advance, Henrique.
SolrNet and deep pagination
Hi there, Has anyone worked with deep pagination using SolrNet? The SolrNet version that I am using is v0.4.0.2002. I followed up with this article, https://github.com/mausch/SolrNet/blob/master/Documentation/CursorMark.md , however the version of SolrNet.dll does not expose the a StartOrCursor property in the QueryOptions class. Does anyone have insight into this? Feel free to let me know if there is a later version that we should be using. Additionally, does anyone know how someone will go about using the code to paginate say about 10 records per page on an index page of 2. This means, I will like to page 10 records from page 2 of the entire recordset. Regards, Adrian
plagiarism Checker with solr
Dear All, Can any one let us know how to implement plagiarism Checker with solr, how to index content with shingles and what to send in queries Roshan -- Siddhast Ip innovation (P) ltd 907 chandra vihar colony Jhansi-284002 M:+919871549769 M:+917376314900
Re: Concurrent Indexing and Searching in Solr.
Hi, I used solr 5.2.1 version. It is fast, I think. But again, I am stuck on concurrent searching and threading. I changed *2* to *100*. And apply simultaneous searching using 100 workers. It works fast but not upto the mark. It increases searching from 1.5 to 0.5 seconds. But If I run only single worker then searching time is 0.03 seconds, it is too fast but not possible with 100 workers simultaneously. As Shawn said - "Making 100 concurrent indexing requests at the same time as 100 concurrent queries will overwhelm *any* single Solr server". I got your point. But MongoDB can handle concurrent searching and indexing faster. Then why not solr? Sorry for this.. On Mon, Aug 10, 2015 at 2:39 AM Shawn Heisey wrote: > On 8/7/2015 1:15 PM, Nitin Solanki wrote: > > I wrote a python script for indexing and using > > urllib and urllib2 for indexing data via http.. > > There are a number of Solr python clients. Using a client makes your > code much easier to write and understand. > > https://wiki.apache.org/solr/SolPython > > I have no experience with any of these clients, but I can say that the > one encountered most often when Python developers come into the #solr > IRC channel is pysolr. Our wiki page says the last update for pysolr > happened in December of 2013, but I can see that the last version on > their web page is dated 2015-05-26. > > Making 100 concurrent indexing requests at the same time as 100 > concurrent queries will overwhelm *any* single Solr server. In a > previous message you said that you have 4 CPU cores. The load you're > trying to put on Solr will require at *LEAST* 200 threads. It may be > more than that. Any single system is going to have trouble with that. > A system with 4 cores will be *very* overloaded. > > Thanks, > Shawn > >
Is there a way to see if a JOIN retrieved any results from the secondary index?
Hello everyone, we have two cores in our Solr Index (Solr 5.1). The primary index contains metadata, the secondary fulltexts. We use JOINs to query the primary index and include results from the secondary. Now we are trying to find a way to see in the results whether a result document has hits in the secondary fulltext index (because then we need to do some follow up queries to retrieve snippets). Is this possible? Thanks Andreas