Re: listing/enumerating field information
interesting! Code-searching for relevant lucene classes led me to try adding to my solrconfig.xml This allowed me to try this request... http://localhost:8983/solr/select?rows=0&qt=test&q=fields which I think gets me (2) below. --tracey Tracey Jaquith wrote: The Internet Archive is getting close to going live with Solr. I have two remaining classes of problems. 1) across the entire index, enumerate all the unique values for a given field. 2) we use unrestricted dynamicField additions from documents. (that is our users are free to add any named field they like to their document's data (which is metadata for their item)). we want to list all the unique field names in the index. Eg: ... audio ... movies prelinger 1) would yield a list of audio and movies if the field passed in was mediatype 2) would yield a list of mediatype and collection >From our prior implementation of a java + lucene search engine, we already ran in to queries that our SE could not handle. So we nightly build a cache structure to handle those other queries. We *could* solve 1) and 2) in this nightly cache, but ideally we'd like to use Solr if possible. thanks! --tracey -- --Tracey Jaquith - http://www.archive.org/~tracey -- -- --Tracey Jaquith - http://www.archive.org/~tracey --
Re: Performance tuning
On Thu, 2007-01-11 at 14:57 +, Stephanie Belton wrote: > Hello, > > > > Solr is now up and running on our production environment and working great. > However it is taking up a lot of extra CPU and memory (CPU usage has doubled > and memory is swapping). Is there any documentation on performance tuning? > There seems to be a lot of useful info in the server output but I don’t > understand it. > > > > E.g. > filterCache{lookups=0,hits=0,hitratio=0.00,inserts=537,evictions=0,size=337,cumulative_lookups=4723,cumulative_hits=3708,cumulative_hitratio=0.78,cumulative_inserts=4647,cumulative_evictions=72} > > > queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=256,evictions=0,size=256,cumulative_lookups=3779,cumulative_hits=552,cumulative_hitratio=0.14,cumulative_inserts=3632,cumulative_evictions=0} > > > documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=66005,cumulative_hits=2460,cumulative_hitratio=0.03,cumulative_inserts=63545,cumulative_evictions=4195} > > > > etc. what should I be watching out for? > Hi Stephanie, did you see http://wiki.apache.org/solr/SolrPerformanceFactors? Further you may consider to balance the load via http://wiki.apache.org/solr/CollectionDistribution HTH salu2 > > > Thanks > > Stephanie >
Re: Performance tuning
On 1/11/07, Stephanie Belton <[EMAIL PROTECTED]> wrote: Solr is now up and running on our production environment and working great. However it is taking up a lot of extra CPU and memory (CPU usage has doubled and memory is swapping). Is there any documentation on performance tuning? There seems to be a lot of useful info in the server output but I don't understand it. Swapping if it's constant isn't good... How much memory does this box have, and what is the heap size of the JVM? Are there other things running on this box? Solr does warming of caches by default to make complex queries that hit a new snapshot of the index fast. This takes up CPU in bursts, but is normally nothing to worry about unless you have other apps running on the same box that need CPU. Because of this warming, CPU usage of a Solr collection isn't directly related to query traffic at all times. -Yonik
How can I update a specific field of an existing document?
Hello everybody, I want update a specific field in a document, but i don't find how do it in the documentation of Solr. Is that posible?, I need to index only a field for a document, Do i have to index all the document for this? The problem is that i have to transform a bizdata object to a file content xml in java, i should to build all the document xml step by step, field by field, retrieving all the bizdata of database to be passed to Solr. Thanks in advance. -- Iris Soto
Re: How can I update a specific field of an existing document?
On Thu, 2007-01-11 at 10:19 -0600, Iris Soto wrote: > Hello everybody, > I want update a specific field in a document, but i don't find how do it > in the documentation of Solr. > Is that posible?, I need to index only a field for a document, Do i have > to index all the document for this? > The problem is that i have to transform a bizdata object to a file > content xml in java, i should to build all the document xml step by > step, field by field, retrieving all the bizdata of database to be > passed to Solr. > On Thu, 2007-01-11 at 06:43 -0500, Erik Hatcher wrote: > In Lucene to update a document the operation is really a delete > followed by an add. You will need to add the complete document as > there is no such "update only a field" semantics in Lucene. This is from a thread in the dev list. So no it is not possible to just update one field. HTH salu2 > Thanks in advance. >
Re: listing/enumerating field information
On 1/11/07, Tracey Jaquith <[EMAIL PROTECTED]> wrote: The Internet Archive is getting close to going live with Solr. I have two remaining classes of problems. 1) across the entire index, enumerate all the unique values for a given field. 2) we use unrestricted dynamicField additions from documents. (that is our users are free to add any named field they like to their document's data (which is metadata for their item)). we want to list all the unique field names in the index. Reasonable requests, they both seem like they would be useful additions to Solr. I've considered doing (1) in the past, adding the doc frequency of each term. Relying on the schema for (2) is slightly ambiguous. Do you want a) all the fields defined by the schema, or b) all the fields actually in the index (which may exclude some fields in the schema if not used, but also include any dynamic fields in use). For 2.b, we could use IndexReader.getFieldNames() -Yonik
Re: Does Solr support integration with the Compass framework?
doesn't compass use multiple indexes? have a read of the "direct lucene" box on http://www.opensymphony.com/compass/versions/1.1M3/html/introduction.html#i-use-lucene would that prevent the two being used together? i'd be interested in getting the two working together as well, it'd be great to have the compass api to create the indexes and use solr to expose them over http. Yonik Seeley wrote: One could do a very loose coupling by just pointing Solr at the index created by Compass, and send a commit command to solr whenever you want a new view of the index. -Yonik On 1/10/07, Jochen Franke <[EMAIL PROTECTED]> wrote: Currently I'm investigating different Lucene based search technologies. For the indexing of our object model my favorite is Compass because of the Object/Search Engine Mapping capabilities. At the same time Solr offers serveral nice features like faceted search and caching. Has anybody integrated or tried to integrate Solr with Compass already and can share experiences. Thanks, Jochen
Re: listing/enumerating field information
: Code-searching for relevant lucene classes led me to try adding : : to my solrconfig.xml holy cow, i forgot that thing even existed! ... as you can see by skimmingthe code it's a hodge podge of misc crap that was used early on as a simple way to test that things were working. Writing a more generic "Stats" request handler that does what you're describing certianly seems like a good idea. Attempting to enumerating all of the values for a field could be dangerous but an API where the clienc specifies a starting term and a number of terms and we use the TermEnum.seek() would be fairly straight forward. -Hoss
Re: listing/enumerating field information
On 1/11/07, Chris Hostetter <[EMAIL PROTECTED]> wrote: Writing a more generic "Stats" request handler that does what you're describing certianly seems like a good idea. Hmmm, I hadn't thought of it as a separate handler, but as long as these types of requests aren't related to a base query, and not needed along with every query, I guess that could make sense. Attempting to enumerating all of the values for a field could be dangerous We do it for faceting :-) But we don't drag it all into memory at once... but an API where the clienc specifies a starting term and a number of terms and we use the TermEnum.seek() would be fairly straight forward. Adding a start and end (like a range query) is a great idea! Additionally, I think adding support to incrementally write all the terms to the response might be important... loading them all into memory doesn't seem like a great idea. Perhaps adding Iterator or Iterable to the list of supported types in TextWriter would be a nice general way to go. -Yonik
RE: Performance tuning
This is the output of the free command: [EMAIL PROTECTED] root2]# free -m total used free sharedbuffers cached Mem: 2007 1888119 0 86814 -/+ buffers/cache:986 1020 Swap: 1992207 1784 We normally have no swapping at all on this server and since last night (when Solr was deployed on the site) it's been going up. Here is an extract of the top command output sorted by memory usage, does each of the processes really take up 566M??? CU usage is low because we are outside of peak time but during the day it's at 40% when it used to be just 20%: 20:14:16 up 45 days, 21:47, 1 user, load average: 1.06, 1.14, 1.11 167 processes: 166 sleeping, 1 running, 0 zombie, 0 stopped CPU states: cpuusernice systemirq softirq iowaitidle total8.8%0.0%0.3% 0.1% 0.2%6.9% 83.2% cpu007.9%0.0%0.3% 0.7% 0.9%6.9% 82.8% cpu018.5%0.0%0.3% 0.0% 0.0%6.9% 84.0% cpu029.9%0.0%0.1% 0.0% 0.0%6.9% 82.8% cpu039.0%0.0%0.6% 0.0% 0.2%7.0% 83.2% Mem: 2055300k av, 1914588k used, 140712k free, 0k shrd, 89032k buff 1326540k actv, 301236k in_d, 30788k in_c Swap: 2040244k av, 212948k used, 1827296k free 843380k cached PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND 12201 root 15 0 566M 561M 13276 S 0.0 27.9 0:02 0 java 12203 root 15 0 566M 561M 13276 S 0.0 27.9 4:48 2 java 12204 root 16 0 566M 561M 13276 S 0.0 27.9 4:45 1 java 12205 root 15 0 566M 561M 13276 S 0.0 27.9 4:45 0 java 12206 root 15 0 566M 561M 13276 S 0.0 27.9 4:46 2 java 12207 root 15 0 566M 561M 13276 S 0.0 27.9 8:35 2 java 12208 root 16 0 566M 561M 13276 S 0.0 27.9 15:53 1 java 12209 root 16 0 566M 561M 13276 S 0.0 27.9 27:30 1 java 12210 root 21 0 566M 561M 13276 S 0.0 27.9 0:00 1 java 12211 root 15 0 566M 561M 13276 S 0.0 27.9 0:00 0 java 12212 root 15 0 566M 561M 13276 S 0.0 27.9 0:17 1 java 12213 root 15 0 566M 561M 13276 S 0.0 27.9 0:15 2 java 12214 root 21 0 566M 561M 13276 S 0.0 27.9 0:00 3 java 12215 root 15 0 566M 561M 13276 S 0.0 27.9 0:33 2 java 12217 root 21 0 566M 561M 13276 S 0.0 27.9 0:00 3 java 12218 root 15 0 566M 561M 13276 S 0.0 27.9 0:00 2 java 12219 root 15 0 566M 561M 13276 S 0.0 27.9 0:00 1 java 12220 root 15 0 566M 561M 13276 S 0.0 27.9 0:00 2 java 12221 root 15 0 566M 561M 13276 S 0.0 27.9 0:00 0 java 1 root 25 0 566M 561M 13276 S 0.0 27.9 297:21 2 java 12223 root 15 0 566M 561M 13276 S 0.0 27.9 0:13 3 java 12224 root 15 0 566M 561M 13276 S 0.0 27.9 0:00 0 java 12225 root 15 0 566M 561M 13276 S 0.0 27.9 0:00 3 java 12226 root 15 0 566M 561M 13276 S 0.0 27.9 0:00 2 java 12227 root 15 0 566M 561M 13276 S 0.0 27.9 0:00 1 java 12228 root 15 0 566M 561M 13276 S 0.0 27.9 0:00 0 java 12229 root 15 0 566M 561M 13276 S 0.0 27.9 0:00 1 java 12230 root 15 0 566M 561M 13276 S 0.0 27.9 0:00 1 java Etc... On the server we also have a website running using mod_perl, it's been running for 1 year and up until now the CPU usage was peaking at 20% and memory around 28% no swapping. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: 11 January 2007 15:12 To: solr-user@lucene.apache.org Subject: Re: Performance tuning On 1/11/07, Stephanie Belton <[EMAIL PROTECTED]> wrote: > Solr is now up and running on our production environment and working great. However it is taking up a lot of extra CPU and memory (CPU usage has doubled and memory is swapping). Is there any documentation on performance tuning? There seems to be a lot of useful info in the server output but I don't understand it. Swapping if it's constant isn't good... How much memory does this box have, and what is the heap size of the JVM? Are there other things running on this box? Solr does warming of caches by default to make complex queries that hit a new snapshot of the index fast. This takes up CPU in bursts, but is normally nothing to worry about unless you have other apps running on the same box that need CPU. Because of this warming, CPU usage of a Solr collection isn't directly related to query traffic at all times. -Yonik
Re: listing/enumerating field information
: > Attempting to enumerating : > all of the values for a field could be dangerous : : We do it for faceting :-) But we don't drag it all into memory at once... i ment trying to return them all to the user at one time ... even if we decreased the server side memory usage risk my supporting Iterators in the OUtputWriters, we could still wind up slammingthe client with a large reply (theoretically: an infinite list) basicly i'm just arguing that we design the API to have a build in "limit" concept, and default it to something managable 9the same way we do for term based facet counts) : Adding a start and end (like a range query) is a great idea! oh yeah ... i hadn't considered an "end" ... just a limit, but it would be trivial to support both. : Perhaps adding Iterator or Iterable to the list of supported types in : TextWriter would be a nice general way to go. yeah ... Iterable would probably make more sense since it's the more generic API and would allow people to pass truely "lazy" objects to the SolrQueryResponse (where the iterator() method does the initialization work) ...that seems like a seperate (but related) issue to having an easy way to acces Term/Field stats. -Hoss
RE: Performance tuning
Thanks for sending this link, I seem to have missed that on the wiki! -Original Message- From: Thorsten Scherler [mailto:[EMAIL PROTECTED] Sent: 11 January 2007 15:06 To: solr-user@lucene.apache.org Subject: Re: Performance tuning On Thu, 2007-01-11 at 14:57 +, Stephanie Belton wrote: > Hello, > > > > Solr is now up and running on our production environment and working great. > However it is taking up a lot of extra CPU and memory (CPU usage has doubled > and memory is swapping). Is there any documentation on performance tuning? > There seems to be a lot of useful info in the server output but I don’t > understand it. > > > > E.g. > filterCache{lookups=0,hits=0,hitratio=0.00,inserts=537,evictions=0,size=337,cumulative_lookups=4723,cumulative_hits=3708,cumulative_hitratio=0.78,cumulative_inserts=4647,cumulative_evictions=72} > > > queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=256,evictions=0,size=256,cumulative_lookups=3779,cumulative_hits=552,cumulative_hitratio=0.14,cumulative_inserts=3632,cumulative_evictions=0} > > > documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=66005,cumulative_hits=2460,cumulative_hitratio=0.03,cumulative_inserts=63545,cumulative_evictions=4195} > > > > etc. what should I be watching out for? > Hi Stephanie, did you see http://wiki.apache.org/solr/SolrPerformanceFactors? Further you may consider to balance the load via http://wiki.apache.org/solr/CollectionDistribution HTH salu2 > > > Thanks > > Stephanie >
Re: Performance tuning
On 1/11/07, Stephanie Belton <[EMAIL PROTECTED]> wrote: This is the output of the free command: [EMAIL PROTECTED] root2]# free -m total used free sharedbuffers cached Mem: 2007 1888119 0 86814 -/+ buffers/cache:986 1020 Swap: 1992207 1784 We normally have no swapping at all on this server and since last night (when Solr was deployed on the site) it's been going up. That may be fine... swap in use != swapping. The OS may be swapping out some processes that haven't been used in a long time to free up more memory for disk cache (notice 814M cached). This is a good thing. Here is an extract of the top command output sorted by memory usage, does each of the processes really take up 566M??? No, older versions of linux show each thread as a separate process. CU usage is low because we are outside of peak time but during the day it's at 40% when it used to be just 20%: Full-text search is CPU intensive. An average peak of 40% seems acceptable. If the load gets too high, you can scale out by adding multiple servers behind a load balancer. -Yonik 20:14:16 up 45 days, 21:47, 1 user, load average: 1.06, 1.14, 1.11 167 processes: 166 sleeping, 1 running, 0 zombie, 0 stopped CPU states: cpuusernice systemirq softirq iowaitidle total8.8%0.0%0.3% 0.1% 0.2%6.9% 83.2% cpu007.9%0.0%0.3% 0.7% 0.9%6.9% 82.8% cpu018.5%0.0%0.3% 0.0% 0.0%6.9% 84.0% cpu029.9%0.0%0.1% 0.0% 0.0%6.9% 82.8% cpu039.0%0.0%0.6% 0.0% 0.2%7.0% 83.2% Mem: 2055300k av, 1914588k used, 140712k free, 0k shrd, 89032k buff 1326540k actv, 301236k in_d, 30788k in_c Swap: 2040244k av, 212948k used, 1827296k free 843380k cached PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND 12201 root 15 0 566M 561M 13276 S 0.0 27.9 0:02 0 java 12203 root 15 0 566M 561M 13276 S 0.0 27.9 4:48 2 java 12204 root 16 0 566M 561M 13276 S 0.0 27.9 4:45 1 java 12205 root 15 0 566M 561M 13276 S 0.0 27.9 4:45 0 java 12206 root 15 0 566M 561M 13276 S 0.0 27.9 4:46 2 java 12207 root 15 0 566M 561M 13276 S 0.0 27.9 8:35 2 java 12208 root 16 0 566M 561M 13276 S 0.0 27.9 15:53 1 java 12209 root 16 0 566M 561M 13276 S 0.0 27.9 27:30 1 java 12210 root 21 0 566M 561M 13276 S 0.0 27.9 0:00 1 java 12211 root 15 0 566M 561M 13276 S 0.0 27.9 0:00 0 java 12212 root 15 0 566M 561M 13276 S 0.0 27.9 0:17 1 java 12213 root 15 0 566M 561M 13276 S 0.0 27.9 0:15 2 java 12214 root 21 0 566M 561M 13276 S 0.0 27.9 0:00 3 java 12215 root 15 0 566M 561M 13276 S 0.0 27.9 0:33 2 java 12217 root 21 0 566M 561M 13276 S 0.0 27.9 0:00 3 java 12218 root 15 0 566M 561M 13276 S 0.0 27.9 0:00 2 java 12219 root 15 0 566M 561M 13276 S 0.0 27.9 0:00 1 java 12220 root 15 0 566M 561M 13276 S 0.0 27.9 0:00 2 java 12221 root 15 0 566M 561M 13276 S 0.0 27.9 0:00 0 java 1 root 25 0 566M 561M 13276 S 0.0 27.9 297:21 2 java 12223 root 15 0 566M 561M 13276 S 0.0 27.9 0:13 3 java 12224 root 15 0 566M 561M 13276 S 0.0 27.9 0:00 0 java 12225 root 15 0 566M 561M 13276 S 0.0 27.9 0:00 3 java 12226 root 15 0 566M 561M 13276 S 0.0 27.9 0:00 2 java 12227 root 15 0 566M 561M 13276 S 0.0 27.9 0:00 1 java 12228 root 15 0 566M 561M 13276 S 0.0 27.9 0:00 0 java 12229 root 15 0 566M 561M 13276 S 0.0 27.9 0:00 1 java 12230 root 15 0 566M 561M 13276 S 0.0 27.9 0:00 1 java Etc... On the server we also have a website running using mod_perl, it's been running for 1 year and up until now the CPU usage was peaking at 20% and memory around 28% no swapping. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: 11 January 2007 15:12 To: solr-user@lucene.apache.org Subject: Re: Performance tuning On 1/11/07, Stephanie Belton <[EMAIL PROTECTED]> wrote: > Solr is now up and running on our production environment and working great. However it is taking up a lot of extra CPU and memory (CPU usage has doubled and memory is swapping). Is there any documentation on performance tuning? There seems to be a lot of useful info in the server output but I don't understand it. Swapping if it's constant isn't good... How much memory does this box have, and what is the heap size of the JVM? Are there other things running on this box? Solr does warming of caches by default
RE: Performance tuning
Thanks for that. I am sorry this isn't really Solr-related but how can I monitor the swapping if I can't rely on the output of the free command? Do you think I could still achieve any significant improvements by going through the performance tuning advice on the wiki? -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: 11 January 2007 20:32 To: solr-user@lucene.apache.org Subject: Re: Performance tuning On 1/11/07, Stephanie Belton <[EMAIL PROTECTED]> wrote: > This is the output of the free command: > > [EMAIL PROTECTED] root2]# free -m > total used free sharedbuffers cached > Mem: 2007 1888119 0 86814 > -/+ buffers/cache:986 1020 > Swap: 1992207 1784 > > We normally have no swapping at all on this server and since last night > (when Solr was deployed on the site) it's been going up. That may be fine... swap in use != swapping. The OS may be swapping out some processes that haven't been used in a long time to free up more memory for disk cache (notice 814M cached). This is a good thing. > Here is an extract of the top command output sorted by memory usage, does > each of the processes really take up 566M??? No, older versions of linux show each thread as a separate process. CU usage is low because we are > outside of peak time but during the day it's at 40% when it used to be just > 20%: Full-text search is CPU intensive. An average peak of 40% seems acceptable. If the load gets too high, you can scale out by adding multiple servers behind a load balancer. -Yonik > 20:14:16 up 45 days, 21:47, 1 user, load average: 1.06, 1.14, 1.11 > 167 processes: 166 sleeping, 1 running, 0 zombie, 0 stopped > CPU states: cpuusernice systemirq softirq iowaitidle >total8.8%0.0%0.3% 0.1% 0.2%6.9% 83.2% >cpu007.9%0.0%0.3% 0.7% 0.9%6.9% 82.8% >cpu018.5%0.0%0.3% 0.0% 0.0%6.9% 84.0% >cpu029.9%0.0%0.1% 0.0% 0.0%6.9% 82.8% >cpu039.0%0.0%0.6% 0.0% 0.2%7.0% 83.2% > Mem: 2055300k av, 1914588k used, 140712k free, 0k shrd, 89032k > buff >1326540k actv, 301236k in_d, 30788k in_c > Swap: 2040244k av, 212948k used, 1827296k free 843380k > cached > > PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND > 12201 root 15 0 566M 561M 13276 S 0.0 27.9 0:02 0 java > 12203 root 15 0 566M 561M 13276 S 0.0 27.9 4:48 2 java > 12204 root 16 0 566M 561M 13276 S 0.0 27.9 4:45 1 java > 12205 root 15 0 566M 561M 13276 S 0.0 27.9 4:45 0 java > 12206 root 15 0 566M 561M 13276 S 0.0 27.9 4:46 2 java > 12207 root 15 0 566M 561M 13276 S 0.0 27.9 8:35 2 java > 12208 root 16 0 566M 561M 13276 S 0.0 27.9 15:53 1 java > 12209 root 16 0 566M 561M 13276 S 0.0 27.9 27:30 1 java > 12210 root 21 0 566M 561M 13276 S 0.0 27.9 0:00 1 java > 12211 root 15 0 566M 561M 13276 S 0.0 27.9 0:00 0 java > 12212 root 15 0 566M 561M 13276 S 0.0 27.9 0:17 1 java > 12213 root 15 0 566M 561M 13276 S 0.0 27.9 0:15 2 java > 12214 root 21 0 566M 561M 13276 S 0.0 27.9 0:00 3 java > 12215 root 15 0 566M 561M 13276 S 0.0 27.9 0:33 2 java > 12217 root 21 0 566M 561M 13276 S 0.0 27.9 0:00 3 java > 12218 root 15 0 566M 561M 13276 S 0.0 27.9 0:00 2 java > 12219 root 15 0 566M 561M 13276 S 0.0 27.9 0:00 1 java > 12220 root 15 0 566M 561M 13276 S 0.0 27.9 0:00 2 java > 12221 root 15 0 566M 561M 13276 S 0.0 27.9 0:00 0 java > 1 root 25 0 566M 561M 13276 S 0.0 27.9 297:21 2 java > 12223 root 15 0 566M 561M 13276 S 0.0 27.9 0:13 3 java > 12224 root 15 0 566M 561M 13276 S 0.0 27.9 0:00 0 java > 12225 root 15 0 566M 561M 13276 S 0.0 27.9 0:00 3 java > 12226 root 15 0 566M 561M 13276 S 0.0 27.9 0:00 2 java > 12227 root 15 0 566M 561M 13276 S 0.0 27.9 0:00 1 java > 12228 root 15 0 566M 561M 13276 S 0.0 27.9 0:00 0 java > 12229 root 15 0 566M 561M 13276 S 0.0 27.9 0:00 1 java > 12230 root 15 0 566M 561M 13276 S 0.0 27.9 0:00 1 java > Etc... > > On the server we also have a website running using mod_perl, it's been > running for 1 year and up until now the CPU usage was peaking at 20% and > memory around 28% no swapping. > > -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley > Sent: 11 January 2007 15:12 > To: solr-user@lucene.apache.org > Subject: Re: Pe
WordDelimiterFilter usage
I'm trying to determine how to index/query for a certain use case, and the WordDelimiterFilterFactory appears to be what I need to use. Here's the scenario: - Text field being indexed - Field exists as a full name - Data might be "cold play" - This should match against searches for "cold play" and "coldplay" (just "cold" and just "play" are OK as well) I'm not able to match "cold play" against searches for "coldplay" at present. I'm certain this is a common scenario and I'm missing something obvious. Any suggestions of how/where to look/fix this issue? thanks, j
Re: WordDelimiterFilter usage
WordDelimiterFilter wo't really help you in this situations ... but it would help if you find a lot of users are searching for ColdPlay or cold-play. if you have a finite list of popular terms like this that you need to deal with, the SynonymFilter can help you out. : Date: Thu, 11 Jan 2007 13:30:39 -0800 : From: Jeff Rodenburg <[EMAIL PROTECTED]> : Reply-To: solr-user@lucene.apache.org : To: solr-user@lucene.apache.org : Subject: WordDelimiterFilter usage : : I'm trying to determine how to index/query for a certain use case, and the : WordDelimiterFilterFactory appears to be what I need to use. Here's the : scenario: : : - Text field being indexed : - Field exists as a full name : - Data might be "cold play" : - This should match against searches for "cold play" and "coldplay" (just : "cold" and just "play" are OK as well) : : I'm not able to match "cold play" against searches for "coldplay" at : present. I'm certain this is a common scenario and I'm missing something : obvious. Any suggestions of how/where to look/fix this issue? : : thanks, : j : -Hoss
Re: Performance tuning
On 1/11/07, Stephanie Belton <[EMAIL PROTECTED]> wrote: Thanks for that. I am sorry this isn't really Solr-related but how can I monitor the swapping if I can't rely on the output of the free command? Do you think I could still achieve any significant improvements by going through the performance tuning advice on the wiki? Unfortunately, I think that's pretty old stuff. People are normally concerned with: - the number of requests per second they can handle with their server - the average latency of requests (or median, 99 percentile, etc) A goal of reducing CPU usage w/o looking at the other factors is unusual, but if your query rate is very low, or your cache hit rate is low, you could reduce or eliminate caching or autowarming. -Yonik
RE: Performance tuning
The reason I am keeping a close eye on resource usage is that our traffic is increasing by around 20% every month (currently over 400,000 page impressions/day although not all of them are search queries!) and I want to make sure we tackle any performance issues before it gets too late. I would rather keep load balancing as a last resort due to cost implications. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: 11 January 2007 22:02 To: solr-user@lucene.apache.org Subject: Re: Performance tuning On 1/11/07, Stephanie Belton <[EMAIL PROTECTED]> wrote: > Thanks for that. I am sorry this isn't really Solr-related but how can I > monitor the swapping if I can't rely on the output of the free command? > > Do you think I could still achieve any significant improvements by going > through the performance tuning advice on the wiki? Unfortunately, I think that's pretty old stuff. People are normally concerned with: - the number of requests per second they can handle with their server - the average latency of requests (or median, 99 percentile, etc) A goal of reducing CPU usage w/o looking at the other factors is unusual, but if your query rate is very low, or your cache hit rate is low, you could reduce or eliminate caching or autowarming. -Yonik
Re: Performance tuning
On 1/11/07, Stephanie Belton <[EMAIL PROTECTED]> wrote: The reason I am keeping a close eye on resource usage is that our traffic is increasing by around 20% every month (currently over 400,000 page impressions/day although not all of them are search queries!) and I want to make sure we tackle any performance issues before it gets too late. I would rather keep load balancing as a last resort due to cost implications. Going slightly OT, but if this is business critical, load-balancing also provides high availability, which can pay for itself in the event that a server crashes. -Yonik
Re: Performance tuning
On 1/11/07 2:33 PM, "Yonik Seeley" <[EMAIL PROTECTED]> wrote: > On 1/11/07, Stephanie Belton <[EMAIL PROTECTED]> wrote: >> The reason I am keeping a close eye on resource usage is that our traffic is >> increasing by around 20% every month (currently over 400,000 page >> impressions/day although not all of them are search queries!) and I want to >> make sure we tackle any performance issues before it gets too late. I would >> rather keep load balancing as a last resort due to cost implications. > > Going slightly OT, but if this is business critical, load-balancing > also provides high availability, which can pay for itself in the event > that a server crashes. Right. For us, load balancing is not a last resort but a fact of life. The smallest number of parallel servers is three, so that we have two running when one is down for scheduled maintenance or software update. Back to performance, check your cache hit ratios in the admin UI, then adjust the cache sizes. When your caches are the right size, Solr will be mostly CPU-bound, but quite fast. If Solr is not CPU-bound under a maximum load (in testing) it means it is using the disk too much. wunder -- Walter Underwood Search Guru, Netflix
Re: How can I update a specific field of an existing document?
On Thu, 2007-01-11 at 17:48 +0100, Thorsten Scherler wrote: > On Thu, 2007-01-11 at 10:19 -0600, Iris Soto wrote: > > Hello everybody, > > I want update a specific field in a document, but i don't find how do it > > in the documentation of Solr. > > Is that posible?, I need to index only a field for a document, Do i have > > to index all the document for this? No, just the one document. Let's say you have a CMS and you edit one document. You will need to re-index this document only by using the the add solr statement for the whole document (not one field only). > > The problem is that i have to transform a bizdata object to a file > > content xml in java, i should to build all the document xml step by > > step, field by field, retrieving all the bizdata of database to be > > passed to Solr. see above only for the document where the field are changed. I wrote a small cocoon based plugin in forrest doing the cms related example. It adds an document related solr gui for a cms like system. Maybe that gives you some ideas for your own app. > > > > On Thu, 2007-01-11 at 06:43 -0500, Erik Hatcher wrote: > > In Lucene to update a document the operation is really a delete > > followed by an add. You will need to add the complete document as > > there is no such "update only a field" semantics in Lucene. > > This is from a thread in the dev list. could not access the archive the first time: http://www.nabble.com/forum/ViewPost.jtp?post=8275908&framed=y HTH salu2 > > So no it is not possible to just update one field. > > HTH > > salu2 > > > Thanks in advance. > > > -- thorsten "Together we stand, divided we fall!" Hey you (Pink Floyd)
Re: Performance tuning
On 1/11/07, Stephanie Belton <[EMAIL PROTECTED]> wrote: Thanks for that. I am sorry this isn't really Solr-related but how can I monitor the swapping if I can't rely on the output of the free command? $ vmstat -S M 3 procs ---memory-- ---swap-- -io --system-- cpu r b swpd free buff cache si sobibo incs us sy id wa 1 2 0 2236 34763003723 77 155 1 0 98 1 0 1 0 2235 34763007113 551 2607 16 4 71 9 1 0 0 2235 347630072 892 742 2194 13 3 67 17 The si/so columns display the real-time swap in/out rates. vmstat is also rather useful for all its other columns too. -Mike
Re: How can I update a specific field of an existing document?
Thorsten Scherler escribió: On Thu, 2007-01-11 at 17:48 +0100, Thorsten Scherler wrote: On Thu, 2007-01-11 at 10:19 -0600, Iris Soto wrote: Hello everybody, I want update a specific field in a document, but i don't find how do it in the documentation of Solr. Is that posible?, I need to index only a field for a document, Do i have to index all the document for this? No, just the one document. Let's say you have a CMS and you edit one document. You will need to re-index this document only by using the the add solr statement for the whole document (not one field only). The problem is that i have to transform a bizdata object to a file content xml in java, i should to build all the document xml step by step, field by field, retrieving all the bizdata of database to be passed to Solr. see above only for the document where the field are changed. I wrote a small cocoon based plugin in forrest doing the cms related example. It adds an document related solr gui for a cms like system. Maybe that gives you some ideas for your own app. On Thu, 2007-01-11 at 06:43 -0500, Erik Hatcher wrote: In Lucene to update a document the operation is really a delete followed by an add. You will need to add the complete document as there is no such "update only a field" semantics in Lucene. This is from a thread in the dev list. could not access the archive the first time: http://www.nabble.com/forum/ViewPost.jtp?post=8275908&framed=y HTH salu2 So no it is not possible to just update one field. HTH salu2 Thanks in advance. I'm obtaining all the document to be passed and indexed by Solr. Thank you for to clarify my doubt. ¡Saludos!. -- Iris Soto
Re: WordDelimiterFilter usage
Thanks Hoss - it is a finite list, but in the tens of thousands. I'm going to easy route -- adding another field that indexes the terms with no included whitespace. This is used in an ajax-style lookup, so it works for this scenario. Not something I'd normally do in a typical index, for sure. thanks, jeff On 1/11/07, Chris Hostetter <[EMAIL PROTECTED]> wrote: WordDelimiterFilter wo't really help you in this situations ... but it would help if you find a lot of users are searching for ColdPlay or cold-play. if you have a finite list of popular terms like this that you need to deal with, the SynonymFilter can help you out. : Date: Thu, 11 Jan 2007 13:30:39 -0800 : From: Jeff Rodenburg <[EMAIL PROTECTED]> : Reply-To: solr-user@lucene.apache.org : To: solr-user@lucene.apache.org : Subject: WordDelimiterFilter usage : : I'm trying to determine how to index/query for a certain use case, and the : WordDelimiterFilterFactory appears to be what I need to use. Here's the : scenario: : : - Text field being indexed : - Field exists as a full name : - Data might be "cold play" : - This should match against searches for "cold play" and "coldplay" (just : "cold" and just "play" are OK as well) : : I'm not able to match "cold play" against searches for "coldplay" at : present. I'm certain this is a common scenario and I'm missing something : obvious. Any suggestions of how/where to look/fix this issue? : : thanks, : j : -Hoss