Re: Solr Profiling
I guess it could be many things. Typically an easy one to spot is if you have insufficient heap (i.e. your 16Gb) and the jvm is full gc'ing constantly and not freeing up any memory and using lots of cpu. This would make solr slow and "hangs up" as well during potentially long gc pauses. add: -XX:+PrintGCDetails -verbose:gc -Xloggc:/var/log/solr-verbose-gc.log I configure this on all my java apps just in case. You will easily spot any gc problems by looking at the verbose log file. I you are filling the heap and need to find out what is using up all the space the you can take a heap dump with jmap -dump:format=b,file=heap.bin . I usually use eclipse memory analyser (mat) to inspect the heap. I have found a lucene field cache to be a big memory hog. good luck andre On 10/28/2011 02:09 PM, Rohit wrote: Hi, My Solr becomes very slow or hangs up at times, we have done almost everything possible like . Giving 16GB memory to JVM . Sharding But these help only for X time, i want to profile the server and see whats going wrong? How can I profile solr remotely? Regards, Rohit
Re: atypical MLT use-case
the solr 1.4 book says you can do this. usages of mlt: "As a request handler with an external input document: What if you want similarity results based on something that isn't in the index? A final option that Solr supports is returning MLT results based on text data sent to the MLT handler (through HTTP POST). For example, if you were to send a text file to the handler, then Solr's MLT handler would return the documents in the index that are most similar to it. This is atypical but an interesting option nonetheless." not sure about the details of how though as i haven't used mlt myself. On 09/12/09 17:27, Mike Anderson wrote: This is somewhat of an odd use-case for MLT. Basically I'm using it for near-duplicate detection (I'm not using the built in dup detection for a variety of reasons). While this might sound like an okay idea, the problem lies in the order of which things happen. Ideally, duplicate detection would prevent me from adding a document to my index which is already there (or at least partially there). However, more like this only works on documents which are *already* in the index. Ideally what I would be able to do is: post an xml document to solr, and receive a MLT response (the same kind of MLT response I would recieve had the document been in Solr already, and queried with id=#{id}&mlt=true). Is anybody aware of how I could achieve this functionality leveraging existing handlers? If not I will bump over to solr-dev and see if this is a tractable problem. Thanks in advance, Mike
Re: atypical MLT use-case
solr 1.4 enterprise search server. it's on the left column of the solr homepage. http://www.packtpub.com/solr-1-4-enterprise-search-server?utm_source=http://lucene.apache.org/solr/&utm_medium=spons&utm_content=pod&utm_campaign=mdb_000275 On 09/12/09 19:14, Mike Anderson wrote: wow! exactly what i'm looking for. What solr1.4 book is this? thanks so much. If anybody knows the details of how to use this I'd love to hear your tips, experiences, or comments. -mike On Dec 9, 2009, at 12:55 PM, Andre Parodi wrote: the solr 1.4 book says you can do this. usages of mlt: "As a request handler with an external input document: What if you want similarity results based on something that isn't in the index? A final option that Solr supports is returning MLT results based on text data sent to the MLT handler (through HTTP POST). For example, if you were to send a text file to the handler, then Solr's MLT handler would return the documents in the index that are most similar to it. This is atypical but an interesting option nonetheless." not sure about the details of how though as i haven't used mlt myself. On 09/12/09 17:27, Mike Anderson wrote: This is somewhat of an odd use-case for MLT. Basically I'm using it for near-duplicate detection (I'm not using the built in dup detection for a variety of reasons). While this might sound like an okay idea, the problem lies in the order of which things happen. Ideally, duplicate detection would prevent me from adding a document to my index which is already there (or at least partially there). However, more like this only works on documents which are *already* in the index. Ideally what I would be able to do is: post an xml document to solr, and receive a MLT response (the same kind of MLT response I would recieve had the document been in Solr already, and queried with id=#{id}&mlt=true). Is anybody aware of how I could achieve this functionality leveraging existing handlers? If not I will bump over to solr-dev and see if this is a tractable problem. Thanks in advance, Mike
Re: [1.3] help with update timeout issue?
add these to your JAVA_OPTS when you start your jvm. -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -Xloggc:/your/path/verbose-gc.log tail the verbose gc log to see if the timing of your pause corresponds with a full gc. On 15/01/10 03:59, Jerome L Quinn wrote: Is this related to GC?
Re: filter query granularity
according to the wiki you could avoid having 3 filter queries cached by putting multiple fq parameters: "Given the following three filtering scenarios of (a) x:bla, (b) y:blub, and (c) x:bla AND y:blub, will I end up with two or three distinct filters? In other words, may filters be composites or are they decomposed as far as their number (relevant for filterCache/@size) is concerned? In this example, (a), (b) and (c) are three distinct filters. If, however, (c) was specified using two distinct fq parameters x:bla and y:blub I'd end up with only two distinct filters for (a), (b) and (c)." http://wiki.apache.org/solr/FilterQueryGuidance On 21/01/10 07:57, Wangsheng Mei wrote: > Thanks for your explanation, it makes a lot sense to me. > > 2010/1/21 Lance Norskog > > >> The docset for "fq=bla:A OR bla:B" has no relation to the other two. >> Different 'fq' filters are made and cached separately. The first time >> you search with a filter query, Solr does that query and saves the >> list of documents matching the search. >> >> 2010/1/20 Wangsheng Mei : >> >>> The following 3 search senarioes: >>> >>> bla:A bla:B bla:A OR bla:B >>> are quite common, so I use 3 filter queries: >>> fq=bla:A >>> fq=bla:B >>> fq=bla:A OR bla:B >>> >>> My question is, >>> since the last fq documents set will be build from the first two fq doc >>> sets, will solr still cache the last fq doc set or it just build it at >>> runtime with the previous two doc sets? >>> What I'm saying is filter query granuarity, is my understanding right? >>> >>> -- >>> 梅旺生 >>> >>> >> >> >> -- >> Lance Norskog >> goks...@gmail.com >> >> > > >
To store or not to store serialized objects in solr
Hi, We currently are storing all of our data in sql database and use solr for indexing. We get a list of id's from solr and retrieve the data from the db. We are considering storing all the data in solr to simplify administration and remove any synchronisation and are considering the following: 1. storing the data in individual fields in solr (indexed=true, store=true) 2. storing the data in a serialized form in a binary field in solr (using google proto buffers or similar) and keep the rest of the solr fields as indexed=true, stored=*false*. 3. keep as is. data stored in db and just keep solr fields as indexed=true, stored=false Can anyone provide some advice in terms of performance of the different approaches. Are there any obvious pitfalls to option 1 and 2 that i need to be mindful of? I am thinking option 2 would be the fastest as it would be reading the data in one contiguous block. Will be doing some preformance test to verify this soon. FYI we are looking at 5-10M records, a serialised object is 500 to 1000 bytes and we index approx 20 fields. Thanks for any advice. andre
resetting stats
Hi, Is there a way to reset the stats counters? For example in the Query handler avgTimePerRequest is not much use after a while as it is an avg since the server started. When putting the data into a monitoring system like nagios it would be useful to be able to sample the data and reset it at the same time. Thus the average and counters provided by the sample would represent the averages and counters during the last sampling period. Thanks Andre