Re: Solr Profiling

2011-11-01 Thread Andre Parodi

I guess it could be many things.

Typically an easy one to spot is if you have insufficient heap (i.e. 
your 16Gb) and the jvm is full gc'ing constantly and not freeing up any 
memory and using lots of cpu. This would make solr slow and "hangs up" 
as well during potentially long gc pauses.


add: -XX:+PrintGCDetails -verbose:gc -Xloggc:/var/log/solr-verbose-gc.log

I configure this on all my java apps just in case. You will easily spot 
any gc problems by looking at the verbose log file.


I you are filling the heap and need to find out what is using up all the 
space the you can take a heap dump with jmap 
-dump:format=b,file=heap.bin . I usually use eclipse memory 
analyser (mat) to inspect the heap. I have found a lucene field cache to 
be a big memory hog.


good luck
andre


On 10/28/2011 02:09 PM, Rohit wrote:

Hi,



My Solr becomes very slow or hangs up at times, we have done almost
everything possible like

. Giving 16GB memory to JVM

. Sharding



But these help only for X time, i want to profile the server and see whats
going wrong? How can I profile solr remotely?



Regards,

Rohit






Re: atypical MLT use-case

2009-12-09 Thread Andre Parodi

the solr 1.4  book says you can do this.

usages of mlt:
"As a request handler with an external input document: What if you want 
similarity results based on something that isn't in the index? A final 
option that Solr supports is returning MLT results based on text data 
sent to the MLT handler (through HTTP POST). For example, if you were to 
send a text file to the handler, then Solr's MLT handler would return 
the documents in the index that are most similar to it. This is atypical 
but an interesting option nonetheless."


not sure about the details of how though as i haven't used mlt myself.


On 09/12/09 17:27, Mike Anderson wrote:

This is somewhat of an odd use-case for MLT. Basically I'm using it for
near-duplicate detection (I'm not using the built in dup detection for a
variety of reasons). While this might sound like an okay idea, the problem
lies in the order of which things happen. Ideally, duplicate detection would
prevent me from adding a document to my index which is already there (or at
least partially there). However, more like this only works on documents
which are *already* in the index. Ideally what I would be able to do is:
post an xml document to solr, and receive a MLT response (the same kind of
MLT response I would recieve had the document been in Solr already, and
queried with id=#{id}&mlt=true).

Is anybody aware of how I could achieve this functionality leveraging
existing handlers? If not I will bump over to solr-dev and see if this is a
tractable problem.

Thanks in advance,
Mike

   




Re: atypical MLT use-case

2009-12-10 Thread Andre Parodi

solr 1.4 enterprise search server.

it's on the left column of the solr homepage.

http://www.packtpub.com/solr-1-4-enterprise-search-server?utm_source=http://lucene.apache.org/solr/&utm_medium=spons&utm_content=pod&utm_campaign=mdb_000275

On 09/12/09 19:14, Mike Anderson wrote:

wow! exactly what i'm looking for. What solr1.4 book is this?

thanks so much. If anybody knows the details of how to use this I'd love to 
hear your tips, experiences, or comments.

-mike


On Dec 9, 2009, at 12:55 PM, Andre Parodi wrote:

   

the solr 1.4  book says you can do this.

usages of mlt:
"As a request handler with an external input document: What if you want similarity 
results based on something that isn't in the index? A final option that Solr supports is 
returning MLT results based on text data sent to the MLT handler (through HTTP POST). For 
example, if you were to send a text file to the handler, then Solr's MLT handler would 
return the documents in the index that are most similar to it. This is atypical but an 
interesting option nonetheless."

not sure about the details of how though as i haven't used mlt myself.


On 09/12/09 17:27, Mike Anderson wrote:
 

This is somewhat of an odd use-case for MLT. Basically I'm using it for
near-duplicate detection (I'm not using the built in dup detection for a
variety of reasons). While this might sound like an okay idea, the problem
lies in the order of which things happen. Ideally, duplicate detection would
prevent me from adding a document to my index which is already there (or at
least partially there). However, more like this only works on documents
which are *already* in the index. Ideally what I would be able to do is:
post an xml document to solr, and receive a MLT response (the same kind of
MLT response I would recieve had the document been in Solr already, and
queried with id=#{id}&mlt=true).

Is anybody aware of how I could achieve this functionality leveraging
existing handlers? If not I will bump over to solr-dev and see if this is a
tractable problem.

Thanks in advance,
Mike


   
 
   




Re: [1.3] help with update timeout issue?

2010-01-15 Thread Andre Parodi

add these to your JAVA_OPTS when you start your jvm.
-verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails 
-Xloggc:/your/path/verbose-gc.log


tail the verbose gc log to see if the timing of your pause corresponds 
with a full gc.


On 15/01/10 03:59, Jerome L Quinn wrote:

Is this related to GC?
   


Re: filter query granularity

2010-01-21 Thread Andre Parodi
according to the wiki you could avoid having 3 filter queries cached by
putting multiple fq parameters:

"Given the following three filtering scenarios of (a) x:bla, (b) y:blub,
and (c) x:bla AND y:blub, will I end up with two or three distinct
filters? In other words, may filters be composites or are they
decomposed as far as their number (relevant for filterCache/@size) is
concerned? In this example, (a), (b) and (c) are three distinct filters.
If, however, (c) was specified using two distinct fq parameters x:bla
and y:blub I'd end up with only two distinct filters for (a), (b) and (c)."
http://wiki.apache.org/solr/FilterQueryGuidance

On 21/01/10 07:57, Wangsheng Mei wrote:
> Thanks for your explanation, it makes a lot sense to me.
>
> 2010/1/21 Lance Norskog 
>
>   
>> The docset for "fq=bla:A OR bla:B" has no relation to the other two.
>> Different 'fq' filters are made and cached separately. The first time
>> you search with a filter query, Solr does that query and saves the
>> list of documents matching the search.
>>
>> 2010/1/20 Wangsheng Mei :
>> 
>>> The following 3 search senarioes:
>>>
>>>   
 bla:A
 bla:B
 bla:A OR bla:B

 
>>> are quite common,  so I use 3 filter queries:
>>> fq=bla:A
>>> fq=bla:B
>>> fq=bla:A OR bla:B
>>>
>>> My question is,
>>> since the last fq documents set will be build from the first two fq doc
>>> sets, will solr still cache the last fq doc set or it just build it at
>>> runtime with the previous two doc sets?
>>> What I'm saying is filter query granuarity,  is my understanding right?
>>>
>>> --
>>> 梅旺生
>>>
>>>   
>>
>>
>> --
>> Lance Norskog
>> goks...@gmail.com
>>
>> 
>
>
>   


To store or not to store serialized objects in solr

2010-01-26 Thread Andre Parodi

Hi,

We currently are storing all of our data in sql database and use solr 
for indexing. We get a list of id's from solr and retrieve the data from 
the db.


We are considering storing all the data in solr to simplify 
administration and remove any synchronisation and are considering the 
following:


1. storing the data in individual fields in solr (indexed=true, store=true)
2. storing the data in a serialized form in a binary field in solr 
(using google proto buffers or similar) and keep the rest of the solr 
fields as indexed=true, stored=*false*.
3. keep as is. data stored in db and just keep solr fields as 
indexed=true, stored=false


Can anyone provide some advice in terms of performance of the different 
approaches. Are there any obvious pitfalls to option 1 and 2 that i need 
to be mindful of?


I am thinking option 2 would be the fastest as it would be reading the 
data in one contiguous block. Will be doing some preformance test to 
verify this soon.


FYI we are looking at 5-10M records, a serialised object is 500 to 1000 
bytes and we index approx 20 fields.


Thanks for any advice.
andre


resetting stats

2010-03-23 Thread Andre Parodi

Hi,

Is there a way to reset the stats counters? For example in the Query 
handler avgTimePerRequest is not much use after a while as it is an avg 
since the server started.


When putting the data into a monitoring system like nagios it would be 
useful to be able to sample the data and reset it at the same time. Thus 
the average and counters provided by the sample would represent the 
averages and counters during the last sampling period.


Thanks
Andre