Re: How to improve Solr search performance
2008/4/9, Chris Hostetter <[EMAIL PROTECTED]>: > > > : most of time seems to be used for the writer getting and writing the > docs > : can those docs prefetched? > > as mentiond, the documentCache can help you out in the common case, but > 1-4 seconds for just the XMLWriting seems pretty high ... It really works thanks 1) how are you timing this (ie: what exactly are you measuring) The QTime it the time used for solr to find the docs And I got the time from dispatchfilter received the request to responsewriter write the response It is much larger than QTime. 2) how many stored fields do each of your documents have? (not how many are > in your schema.xml, how many do each of your docs really have in them) 7-9 fields only one of the fields is text, rest of them are short string or int ... > ...having *lots* of stored fields can slow down retrieval of the Document > (and Document retrival is delayed untill response writing) so if you have > thousands thta night account for it. If you're use case is to only ever > return the "ID" field, then not storing anything else will help keep your > total index size smaller and should speed up the response writing. > > > > > -Hoss > >
Can I find the which field matched?
Hi, If i through a query at the solr index , is there a mechanism where i can find out which fields matced the query... (score of that match). Example: for Fields A,B and C, if query q has term1 term2 term3 Field A matches term1 term 2 Field C matches term3 can i get component scores of the whole match as MatchA MatchB (0.0 perhaps) MatchC I will be using these scores from a custom plugin, What classes I need to use for such scores? -umar
Re: Snipets Solr/nutch
thank you for your response. I have another problem with snippets.here is the problem: I transform the HTML code into text then I index all this text generated into one field called myText , many pages has common header with common information (example : web site about the president bush) and the word bush appear in this header, if I want to highlighting the the field myText and I am searching the word bush, I will have the same sentence containing bush highlighted ( which is the sentence of the comment header containing bush word )because I have put fargsize to 150and Solr return through the whole text the first word encountered (bush) highlighted. How can I deal with that. I was told that nutchwax handle this problem is it true?if true how can I integarte nutch classes into solr. thank you in advance. -- View this message in context: http://www.nabble.com/Snipets-Solr-nutch-tp16537216p16585594.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Return the result only field A or field B is non-zero?
This would be trivial if you also stored boolean fields for aiszero and biszero. That would also be fast, I expect. wunder On 4/8/08 11:53 PM, "Vinci" <[EMAIL PROTECTED]> wrote: > > Hi all, > > I want to limit the search result by 2 numerical field, A and B, where Solr > return the result only value in field A or B is non-zero. Does it possible > or I need to change the document and schema? or I need to change the schema > as well as the query? > > Thank you, > Vinci
Re: Distributed Search
On Wed, Apr 9, 2008 at 2:00 AM, oleg_gnatovskiy <[EMAIL PROTECTED]> wrote: > We are using the Chain Collapse patch as well. Will that not work over a > distributed index? Since there is no explicit distributed support for it, it would only collapse per-shard. -Yonik
Re: How to improve Solr search performance
: 1) how are you timing this (ie: what exactly are you measuring) : And I got the time from dispatchfilter received the request to : responsewriter write the response : It is much larger than QTime. can you be more specific about what you mean when you say "And I got the time from dispatchfilter..." What *exactly* are you looking at (ie: is this a time you are seeing in a log file? ifso which log file? ... is this timing code you added to the dispatch filter yourself? what *exactly* are you looking at?) I ask because it's possible there is included network IO overhead in communicating with your client (i would be suprised if it was significant if you are only returning a single field for the first 50 results, but i know nothing about your network setup, or what your client code look ike -- so anythign is possible. : 7-9 fields : only one of the fields is text, rest of them are short string or int ... How big is the text field? Are you talking about a few hundred chars or several KB of text per doc? Is set to true in your solrconfig.xml? (I forgot we had that option when i sent my last email, as long as you are using the "fl" param with just your uniqueKey field document retrieval should be "fast enough" and fairly consistent. -Hoss
Re: Nightly build compile error?
: : Hello everyone. I downloaded the latest nightly build from : http://people.apache.org/builds/lucene/solr/nightly/. When I tried to : compile it, I got the following errors: : : [javac] Compiling 189 source files to : /home/csweb/apache-solr-nightly/build/core : [javac] : /home/csweb/apache-solr-nightly/src/java/org/apache/solr/handler/admin/MultiCoreHandler.java:93: : cannot find symbol : [javac] symbol : variable CREATE I'm not sure how you managed to get that far ... because of some refactoring that was done a little while back, the nightly builds don't currently include all of the source, see SOLR-510. The nightly builds do however already contain all the pre-built jars (and war) that you need to run Solr ... if you want to compile from source, I would just check out from subversion. -Hoss
Re: Nightly build compile error?
hossman wrote: > > : > : Hello everyone. I downloaded the latest nightly build from > : http://people.apache.org/builds/lucene/solr/nightly/. When I tried to > : compile it, I got the following errors: > : > : [javac] Compiling 189 source files to > : /home/csweb/apache-solr-nightly/build/core > : [javac] > : > /home/csweb/apache-solr-nightly/src/java/org/apache/solr/handler/admin/MultiCoreHandler.java:93: > : cannot find symbol > : [javac] symbol : variable CREATE > > I'm not sure how you managed to get that far ... because of some > refactoring that was done a little while back, the nightly builds don't > currently include all of the source, see SOLR-510. > > The nightly builds do however already contain all the pre-built jars (and > war) that you need to run Solr ... if you want to compile from source, I > would just check out from subversion. > > > > -Hoss > > > Yup, that works. -- View this message in context: http://www.nabble.com/Nightly-build-compile-error--tp16577739p16592725.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Distributed Search
Do you have any suggestions as to how we would be able to implement chain collapse over the entire distributed index? Our collection is 27 GB, 15 million documents. Do you think there is a way to optimize Solr performance enough to not have to segment such a large collection? Yonik Seeley wrote: > > On Wed, Apr 9, 2008 at 2:00 AM, oleg_gnatovskiy > <[EMAIL PROTECTED]> wrote: >> We are using the Chain Collapse patch as well. Will that not work over a >> distributed index? > > Since there is no explicit distributed support for it, it would only > collapse per-shard. > > -Yonik > > -- View this message in context: http://www.nabble.com/Distributed-Search-tp16577204p16592826.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Distributed Search
On Wed, Apr 9, 2008 at 1:57 PM, oleg_gnatovskiy <[EMAIL PROTECTED]> wrote: > Do you have any suggestions as to how we would be able to implement chain > collapse over the entire distributed index? Our collection is 27 GB, 15 > million documents. Do you think there is a way to optimize Solr performance > enough to not have to segment such a large collection? What is the current performance bottleneck that is causing you to have to segment in the first place? 15M docs is often doable on a single box I think, but it depends heavily on what the queries are, what faceting is done, etc. -onik
Re: Distributed Search
Yonik Seeley wrote: > > On Wed, Apr 9, 2008 at 1:57 PM, oleg_gnatovskiy > <[EMAIL PROTECTED]> wrote: >> Do you have any suggestions as to how we would be able to implement >> chain >> collapse over the entire distributed index? Our collection is 27 GB, 15 >> million documents. Do you think there is a way to optimize Solr >> performance >> enough to not have to segment such a large collection? > > What is the current performance bottleneck that is causing you to have > to segment in the first place? > 15M docs is often doable on a single box I think, but it depends > heavily on what the queries are, what faceting is done, etc. > > -onik > > Well we are running some really heavy faceting, and searching up to 15 fields at a time for each query. The bottleneck was that a single query either took 15 minutes, or died with a heap space error... -- View this message in context: http://www.nabble.com/Distributed-Search-tp16577204p16595616.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Payloads in Solr
I started this thread back in November. Recall that I'm indexing xml and storing the xpath as a payload in each token. I am not encoding or mapping the xpath but storing the text directly as String.getBytes(). We're not using this to query in any way, just to add context to our search results. Presently, I'm ready to bounce around some more ideas about encoding xpath or strings in general. Back in the day Grant said: > From what I understand from Michael Busch, you can store the path at > each token, but this doesn't seem efficient to me. I would think you > may want to come up with some more efficient encoding. I am cc'ing > Michael on this thread to see if he is able to add any light to the > subject (he may not be able to b/c of employer reasons). If he > can't, then we can brainstorm a bit more on how to do it most > efficiently. > The word "encoding" in Grant's response brings to mind Huffman coding (http://en.wikipedia.org/wiki/Huffman_coding). This would not solve the query on payload problem that Yonik pointed out because the encoding would be document centric, but could reduce the amount of total bytes that I need to store. Any ideas? Tricia -- View this message in context: http://www.nabble.com/Payloads-in-Solr-tp13812560p16599300.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: indexing slow, IO-bound?
On Mon, 7 Apr 2008 16:37:48 -0400 "Yonik Seeley" <[EMAIL PROTECTED]> wrote: > On Mon, Apr 7, 2008 at 4:30 PM, Mike Klaas <[EMAIL PROTECTED]> wrote: > > 'top', 'vmstat' tell exactly what's going on in terms of io and cpu on > > unix. Perhaps someone has gotten these to work under windows with cygwin. > > The windows task manager is a pretty good replacement of top... do > "select columns" and you can get all sorts of stuff like number of > threads, file handles, page faults, etc. You can also simply see if > things are CPU bound or not (sort by the CPU column, or go to the > "Performance" tab. I suggest you use the Performance monitor tool - in server versions of Win32, it should be under Administration tools. You can also generate logs for later reviewing (otherwise it only shows u the last x minutes of activity). You can mix and match different performance providers ...not sure if Java itself providers counters - you *may* be able to trace CPU / memory by application once the app is running, but I doubt you can do that for IO. if only u had dtrace in windows ;) B _ {Beto|Norberto|Numard} Meijome "Web2.0 is what you were doing while the rest of us were building businesses." The Reverend I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Return the result only field A or field B is non-zero?
Hi, Thank you Underwood. That still not come up with the solution...doing the boolean operation for every query (query AND (isAZero OR isBZero) ) if I have the boolean field? ***Adding boolean need largely update the document structure that may not be preferred...can Solr generate this field for me? Thank you, Vinci Walter Underwood wrote: > > This would be trivial if you also stored boolean fields for > aiszero and biszero. That would also be fast, I expect. > > wunder > > On 4/8/08 11:53 PM, "Vinci" <[EMAIL PROTECTED]> wrote: > >> >> Hi all, >> >> I want to limit the search result by 2 numerical field, A and B, where >> Solr >> return the result only value in field A or B is non-zero. Does it >> possible >> or I need to change the document and schema? or I need to change the >> schema >> as well as the query? >> >> Thank you, >> Vinci > > > -- View this message in context: http://www.nabble.com/Return-the-result-only-field-A-or-field-B-is-non-zero--tp16580681p16601353.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to improve Solr search performance
> > > can you be more specific about what you mean when you say "And I got the > time from dispatchfilter..." What *exactly* are you looking at (ie: is > this a time you are seeing in a log file? ifso which log file? ... is this > timing code you added to the dispatch filter yourself? what *exactly* are > you looking at?) the time code is added by myself I was just testing the solr performance And I found that avg request time is much longer than QTime(T1) So I added some code timing the whole SoleDispatchFilter.doFilter() method(T2) The time(T2-T1) is used for the the responseWriter writing the document. Deep into it, most of time cost here: public void writeDocs(boolean includeScore, Set fields) throws IOException { SolrIndexSearcher searcher = request.getSearcher(); DocIterator iterator = ids.iterator(); int sz = ids.size(); includeScore = includeScore && ids.hasScores(); for (int i=0; i > How big is the text field? Are you talking about a few hundred chars or > several KB of text per doc? several KB Is set to true in your solrconfig.xml? It has set to true
Human Powered Search Module
Hello Everybody, I am a newbie in Lucene and I am from India, currently working for a search module for our classifed website search module in clickindia.com. I have implemented the basic functionality of solr lucen and am pretty happy with the results. Search in India has its own share of nuances. 'Maruti' is spelt as 'Maruthi' in most of South India. People spell most of the times 'Naukri' as 'Naukari'; a loan request would be simply followed in the query as 'need money'. These and many more such intricacies are typical of Indians and require a special kind of module for the same. Is there any ready-made solution for the same? Can I get the access of words as mentioned above and is used in India, so that I could implement it? regards, Sushan Rungta Mob: +91-9312098968 www.clickindia.com