JMX warning in Solr

2015-08-09 Thread drsocmo
I've got following repeating exception in my log:

javax.management.InstanceNotFoundException:
solr/collection:type=searcher,id=org.apache.solr.search.SolrIndexSearcher
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1095)
at
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.exclusiveUnregisterMBean(DefaultMBeanServerInterceptor.java:427)
at
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.unregisterMBean(DefaultMBeanServerInterceptor.java:415)
at
com.sun.jmx.mbeanserver.JmxMBeanServer.unregisterMBean(JmxMBeanServer.java:546)
at
org.jboss.as.jmx.PluggableMBeanServerImpl$TcclMBeanServer.unregisterMBean(PluggableMBeanServerImpl.java:584)
at
org.jboss.as.jmx.PluggableMBeanServerImpl.unregisterMBean(PluggableMBeanServerImpl.java:331)
at org.apache.solr.core.JmxMonitoredMap.put(JmxMonitoredMap.java:138)
at org.apache.solr.core.JmxMonitoredMap.put(JmxMonitoredMap.java:51)
at
org.apache.solr.search.SolrIndexSearcher.register(SolrIndexSearcher.java:297)


I know what it means, but I don't have any ideas how it can be fixed and the
main root cause of it...

My env: JBoss 6.3.x + Solr 4.x

I use two Solr instances in one servlet container. Can it be the root cause?

Maybe there were similar issue earlier?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/JMX-warning-in-Solr-tp4221975.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: serious data loss bug in correlation with "too much data after closed"

2015-08-09 Thread adfel70
By now I'm pretty much sure that this is either a bug in solr or in
http-client.

I again reproduced the problem:
1. during massive indexing we see some WARNINGS from HttpParser:
"badMessage: java.lang.IllegalStateException: too much data after closed for
HttpChannelOverHttp"

checking in httpcore code it seems that this happens when the connection
closes abruptly. 

2. only at some instances of this warning, we see a related
NoHttpResponseException from solr node.

3. after indexing we perform a full data validation and see that around 200
docs that our client got http 200 status for them, are not present in solr.

4. checking when these docs were sent to solr we get to the same time that
we had log messages from 1 and 2 (HttpChannelOverHttp warning and
NoHttpResponseException )

5. these 200 docs are divided into around 8 bulks that were sent in various
times, and all had these warn/error messages around them.


Would be glad to have some inputs from the community on this.

Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/serious-data-loss-bug-in-correlation-with-too-much-data-after-closed-tp4220723p4221986.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: docValues

2015-08-09 Thread Nagasharath
I Have tested with docValue and without docValue on the test indexes with a 
json nested faceting query.

Have noticed performance boot with the docValue.The response time with Cached 
items and without cached items is good.

I have noticed that the response time on the cached items of the index without 
docValue is not always constant (28 Ms, 78 Ms, 94 Ms). Where as with docValue 
is always constant( always <20 Ms)

Decided to go with docValue.

> On 08-Aug-2015, at 10:44 pm, Erick Erickson  wrote:
> 
> Have you seen: https://cwiki.apache.org/confluence/display/solr/DocValues?
> 
> What kind of speedup? How often are you committing? Is there a speed 
> difference
> after a while or on the first few queries?
> 
> Details matter a lot for questions like this.
> 
> Best,
> Erick
> 
>> On Sat, Aug 8, 2015 at 6:22 PM, Nagasharath  
>> wrote:
>> Good
>> 
>> Sent from my iPhone
>> 
>>> On 08-Aug-2015, at 8:12 pm, Aman Tandon  wrote:
>>> 
>>> Hi,
>>> 
>>> 
 I am seeing a significant difference in the query time after using docValue
>>> 
>>> what kind of difference, is it good or bad?
>>> 
>>> With Regards
>>> Aman Tandon
>>> 
>>> On Sat, Aug 8, 2015 at 11:38 PM, Nagasharath 
>>> wrote:
>>> 
 I am seeing a significant difference in the query time after using
 docValue.
 
 I am curious to know what's happening with 'docValue' included in the
 schema
 
>> On 07-Aug-2015, at 4:31 pm, Shawn Heisey  wrote:
>> 
>> On 8/7/2015 11:47 AM, naga sharathrayapati wrote:
>> JVM-Memory has gone up from 3% to 17.1%
> 
> In my experience, a healthy Java application (after the heap size has
> stabilized) will have a heap utilization graph where the low points are
> between 50 and 75 percent.  If the low points in heap utilization are
> consistently below 25 percent, you would be better off reducing the heap
> size and allowing the OS to use that memory instead.
> 
> If you want to track heap utilization, JVM-Memory in the Solr dashboard
> is a very poor tool.  Use tools like visualvm or jconsole.
> 
> https://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap
> 
> I need to add what I said about very low heap utilization to that wiki
 page.
> 
> Thanks,
> Shawn
 


Re: docValues

2015-08-09 Thread Yonik Seeley
Interesting... what type of field was this? (string or numeric? single
or multi-valued?)

Without docValues, the first request would be slow (due to building
the in-memory field cache entry), but after that it should be fast.

-Yonik


On Sun, Aug 9, 2015 at 11:31 AM, Nagasharath  wrote:
> I Have tested with docValue and without docValue on the test indexes with a 
> json nested faceting query.
>
> Have noticed performance boot with the docValue.The response time with Cached 
> items and without cached items is good.
>
> I have noticed that the response time on the cached items of the index 
> without docValue is not always constant (28 Ms, 78 Ms, 94 Ms). Where as with 
> docValue is always constant( always <20 Ms)
>
> Decided to go with docValue.
>
>> On 08-Aug-2015, at 10:44 pm, Erick Erickson  wrote:
>>
>> Have you seen: https://cwiki.apache.org/confluence/display/solr/DocValues?
>>
>> What kind of speedup? How often are you committing? Is there a speed 
>> difference
>> after a while or on the first few queries?
>>
>> Details matter a lot for questions like this.
>>
>> Best,
>> Erick
>>
>>> On Sat, Aug 8, 2015 at 6:22 PM, Nagasharath  
>>> wrote:
>>> Good
>>>
>>> Sent from my iPhone
>>>
 On 08-Aug-2015, at 8:12 pm, Aman Tandon  wrote:

 Hi,


> I am seeing a significant difference in the query time after using 
> docValue

 what kind of difference, is it good or bad?

 With Regards
 Aman Tandon

 On Sat, Aug 8, 2015 at 11:38 PM, Nagasharath 
 wrote:

> I am seeing a significant difference in the query time after using
> docValue.
>
> I am curious to know what's happening with 'docValue' included in the
> schema
>
>>> On 07-Aug-2015, at 4:31 pm, Shawn Heisey  wrote:
>>>
>>> On 8/7/2015 11:47 AM, naga sharathrayapati wrote:
>>> JVM-Memory has gone up from 3% to 17.1%
>>
>> In my experience, a healthy Java application (after the heap size has
>> stabilized) will have a heap utilization graph where the low points are
>> between 50 and 75 percent.  If the low points in heap utilization are
>> consistently below 25 percent, you would be better off reducing the heap
>> size and allowing the OS to use that memory instead.
>>
>> If you want to track heap utilization, JVM-Memory in the Solr dashboard
>> is a very poor tool.  Use tools like visualvm or jconsole.
>>
>> https://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap
>>
>> I need to add what I said about very low heap utilization to that wiki
> page.
>>
>> Thanks,
>> Shawn
>


Re: docValues

2015-08-09 Thread Nagasharath
Json Nested faceting on string,string,double fields. Facet function 'sum' is 
applied on double field

Without docValue response for the same query
1) First response without cache 765 Ms
2) second response with cache 28 Ms
3) third response with cache 78 Ms
4) fourth response with cache 94 Ms

With docValue response for the same query
1) first response without cache 78 Ms
2) response is always less than 20 Ms with cache

Version 5.2.1

> On 09-Aug-2015, at 10:39 am, Yonik Seeley  wrote:
> 
> Interesting... what type of field was this? (string or numeric? single
> or multi-valued?)
> 
> Without docValues, the first request would be slow (due to building
> the in-memory field cache entry), but after that it should be fast.
> 
> -Yonik
> 
> 
>> On Sun, Aug 9, 2015 at 11:31 AM, Nagasharath  
>> wrote:
>> I Have tested with docValue and without docValue on the test indexes with a 
>> json nested faceting query.
>> 
>> Have noticed performance boot with the docValue.The response time with 
>> Cached items and without cached items is good.
>> 
>> I have noticed that the response time on the cached items of the index 
>> without docValue is not always constant (28 Ms, 78 Ms, 94 Ms). Where as with 
>> docValue is always constant( always <20 Ms)
>> 
>> Decided to go with docValue.
>> 
>>> On 08-Aug-2015, at 10:44 pm, Erick Erickson  wrote:
>>> 
>>> Have you seen: https://cwiki.apache.org/confluence/display/solr/DocValues?
>>> 
>>> What kind of speedup? How often are you committing? Is there a speed 
>>> difference
>>> after a while or on the first few queries?
>>> 
>>> Details matter a lot for questions like this.
>>> 
>>> Best,
>>> Erick
>>> 
 On Sat, Aug 8, 2015 at 6:22 PM, Nagasharath  
 wrote:
 Good
 
 Sent from my iPhone
 
> On 08-Aug-2015, at 8:12 pm, Aman Tandon  wrote:
> 
> Hi,
> 
> 
>> I am seeing a significant difference in the query time after using 
>> docValue
> 
> what kind of difference, is it good or bad?
> 
> With Regards
> Aman Tandon
> 
> On Sat, Aug 8, 2015 at 11:38 PM, Nagasharath 
> wrote:
> 
>> I am seeing a significant difference in the query time after using
>> docValue.
>> 
>> I am curious to know what's happening with 'docValue' included in the
>> schema
>> 
 On 07-Aug-2015, at 4:31 pm, Shawn Heisey  wrote:
 
 On 8/7/2015 11:47 AM, naga sharathrayapati wrote:
 JVM-Memory has gone up from 3% to 17.1%
>>> 
>>> In my experience, a healthy Java application (after the heap size has
>>> stabilized) will have a heap utilization graph where the low points are
>>> between 50 and 75 percent.  If the low points in heap utilization are
>>> consistently below 25 percent, you would be better off reducing the heap
>>> size and allowing the OS to use that memory instead.
>>> 
>>> If you want to track heap utilization, JVM-Memory in the Solr dashboard
>>> is a very poor tool.  Use tools like visualvm or jconsole.
>>> 
>>> https://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap
>>> 
>>> I need to add what I said about very low heap utilization to that wiki
>> page.
>>> 
>>> Thanks,
>>> Shawn
>> 


Certification

2015-08-09 Thread Nagasharath
Is there a industry standard certification on solr? 


Re: Concurrent Indexing and Searching in Solr.

2015-08-09 Thread Shawn Heisey
On 8/7/2015 1:15 PM, Nitin Solanki wrote:
> I wrote a python script for indexing and using
> urllib and urllib2 for indexing data via http..

There are a number of Solr python clients.  Using a client makes your
code much easier to write and understand.

https://wiki.apache.org/solr/SolPython

I have no experience with any of these clients, but I can say that the
one encountered most often when Python developers come into the #solr
IRC channel is pysolr.  Our wiki page says the last update for pysolr
happened in December of 2013, but I can see that the last version on
their web page is dated 2015-05-26.

Making 100 concurrent indexing requests at the same time as 100
concurrent queries will overwhelm *any* single Solr server.  In a
previous message you said that you have 4 CPU cores.  The load you're
trying to put on Solr will require at *LEAST* 200 threads.  It may be
more than that.  Any single system is going to have trouble with that. 
A system with 4 cores will be *very* overloaded.

Thanks,
Shawn



Re: SolrJ update

2015-08-09 Thread Henrique O. Santos

Hi Andrea,

Thanks for the explanation on that. I ended up doing that, getting and 
setting an UUID on the Java code.


On 08/08/2015 03:19 AM, Andrea Gazzarini wrote:

Hi Henrique,
I don't believe there's an easy way to do that.

As you noticed, the SolrInputDocument is not an I/O param, that is, it is
not sent back once data has been indexed and this is good, because here
you're sending just one document, but imagine what could happen if you do a
bulk loading...the response would be very very huge!

Although I could imagine some workaround (with a custom
UpdateRequestProcessor and a custom ResponseWriter), the point is that (see
above) I believe it would end in a bad design:

- if you send one document at time this is *often* considered a bad
practice;
- if you send a lot of data the corresponding response would be huge, it
will contains a lot of new created identifiersand BTW how do you match
them with your input documents? Sequentially? in this way you won't
be able to use any *asynch* client

Personally, if that is ok for your context, I'd completely avoid the
problem moving the logic on the client side. I mean, create a UUID field on
Solrj and add that ID to the outcoming document.

Best,
Andrea



2015-08-06 21:39 GMT+02:00 Henrique O. Santos :


Hello all,

I am using SolrJ to do a index update on one of my collections. This
collection has a uniqueKey id field:

   
 
 
 
   
   id

This field is configured to be auto generated on solrconfig.xml like this:


 
   id
 
 
 


On my Java code, I just add the name field to my document and then proceed
with the add:

doc.addField("name", this.name);
solrClient.add(doc);
solrClient.commit();

Everything works, the document gets indexed. What I really need is to know
right away in the code the id that was generated for that single document.
I have tried looking into the UpdateReponse but no luck.

Is there any easy way to do that?

Thank you in advance,
Henrique.





SolrNet and deep pagination

2015-08-09 Thread Adrian Liew
Hi there,

Has anyone worked with deep pagination using SolrNet? The SolrNet version that 
I am using is v0.4.0.2002. I followed up with this article, 
https://github.com/mausch/SolrNet/blob/master/Documentation/CursorMark.md , 
however the version of SolrNet.dll does not expose the a StartOrCursor property 
 in the QueryOptions class.

Does anyone have insight into this? Feel free to let me know if there is a 
later version that we should be using.

Additionally, does anyone know how someone will go about using the code to 
paginate say about 10 records per page on an index page of 2. This means, I 
will like to page 10 records from page 2 of the entire recordset.

Regards,
Adrian


plagiarism Checker with solr

2015-08-09 Thread Roshan Agarwal
Dear All,

Can any one let us know how to implement plagiarism Checker with solr,
how to index content with shingles and what to send in queries

Roshan

-- 

Siddhast Ip innovation (P) ltd
907 chandra vihar colony
Jhansi-284002
M:+919871549769
M:+917376314900


Re: Concurrent Indexing and Searching in Solr.

2015-08-09 Thread Nitin Solanki
Hi,
 I used solr 5.2.1 version. It is fast, I think. But again, I am stuck
on concurrent searching and threading. I changed
*2*
to *100*. And apply simultaneous
searching using 100 workers. It works fast but not upto the mark.

It increases searching from 1.5  to 0.5 seconds. But If I run only single
worker then searching time is 0.03 seconds,  it is too fast but not
possible with 100 workers simultaneously.

As Shawn said - "Making 100 concurrent indexing requests at the same time
as 100
concurrent queries will overwhelm *any* single Solr server". I got your
point.

But MongoDB can handle concurrent searching and indexing faster. Then why
not solr? Sorry for this..



On Mon, Aug 10, 2015 at 2:39 AM Shawn Heisey  wrote:

> On 8/7/2015 1:15 PM, Nitin Solanki wrote:
> > I wrote a python script for indexing and using
> > urllib and urllib2 for indexing data via http..
>
> There are a number of Solr python clients.  Using a client makes your
> code much easier to write and understand.
>
> https://wiki.apache.org/solr/SolPython
>
> I have no experience with any of these clients, but I can say that the
> one encountered most often when Python developers come into the #solr
> IRC channel is pysolr.  Our wiki page says the last update for pysolr
> happened in December of 2013, but I can see that the last version on
> their web page is dated 2015-05-26.
>
> Making 100 concurrent indexing requests at the same time as 100
> concurrent queries will overwhelm *any* single Solr server.  In a
> previous message you said that you have 4 CPU cores.  The load you're
> trying to put on Solr will require at *LEAST* 200 threads.  It may be
> more than that.  Any single system is going to have trouble with that.
> A system with 4 cores will be *very* overloaded.
>
> Thanks,
> Shawn
>
>


Is there a way to see if a JOIN retrieved any results from the secondary index?

2015-08-09 Thread Andreas Kahl
Hello everyone,
 
we have two cores in our Solr Index (Solr 5.1). The primary index contains 
metadata, the secondary fulltexts. We use JOINs to query the primary index and 
include results from the secondary.
Now we are trying to find a way to see in the results whether a result document 
has hits in the secondary fulltext index (because then we need to do some 
follow up queries to retrieve snippets).
Is this possible?
 
Thanks
Andreas