Re: listing/enumerating field information

2007-01-11 Thread Tracey Jaquith




interesting!  

Code-searching for relevant lucene classes led me to try adding
   
to my solrconfig.xml

This allowed me to try this request...
   http://localhost:8983/solr/select?rows=0&qt=test&q=fields
which I think gets me (2) below.

--tracey


Tracey Jaquith wrote:

  
  
  
The Internet Archive is getting close to going live with Solr.
I have two remaining classes of problems.
  
1) across the entire index, enumerate all the unique values for a given
field.
2) we use unrestricted dynamicField additions from documents.  (that is
our users are free to add any named field they like to their document's
data (which is metadata for their item)).  we want to list all the
unique field names in the index.
  
Eg:

  ... 
 audio


  ... 
  movies
  prelinger

  
1) would yield a list of audio and movies if the field passed in was
mediatype
2) would yield a list of  mediatype and collection
  
  
>From our prior implementation of a java + lucene search engine, we
already
ran in to queries that our SE could not handle.  So we nightly build a
cache
structure to handle those other queries.  We *could* solve 1) and 2) in
this nightly cache, but ideally we'd like to use Solr if possible.
  
thanks!
--tracey
  
  
  -- 
    
  --Tracey Jaquith -
  http://www.archive.org/~tracey
--
  


-- 
  
--Tracey Jaquith - http://www.archive.org/~tracey
--





Re: Performance tuning

2007-01-11 Thread Thorsten Scherler
On Thu, 2007-01-11 at 14:57 +, Stephanie Belton wrote:
> Hello,
> 
>  
> 
> Solr is now up and running on our production environment and working great. 
> However it is taking up a lot of extra CPU and memory (CPU usage has doubled 
> and memory is swapping). Is there any documentation on performance tuning? 
> There seems to be a lot of useful info in the server output but I don’t 
> understand it.
> 
>  
> 
> E.g.
> filterCache{lookups=0,hits=0,hitratio=0.00,inserts=537,evictions=0,size=337,cumulative_lookups=4723,cumulative_hits=3708,cumulative_hitratio=0.78,cumulative_inserts=4647,cumulative_evictions=72}
> 
> 
> queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=256,evictions=0,size=256,cumulative_lookups=3779,cumulative_hits=552,cumulative_hitratio=0.14,cumulative_inserts=3632,cumulative_evictions=0}
> 
> 
> documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=66005,cumulative_hits=2460,cumulative_hitratio=0.03,cumulative_inserts=63545,cumulative_evictions=4195}
> 
>  
> 
> etc. what should I be watching out for?
> 

Hi Stephanie,

did you see http://wiki.apache.org/solr/SolrPerformanceFactors?

Further you may consider to balance the load via
http://wiki.apache.org/solr/CollectionDistribution

HTH

salu2

>  
> 
> Thanks
> 
> Stephanie
> 



Re: Performance tuning

2007-01-11 Thread Yonik Seeley

On 1/11/07, Stephanie Belton <[EMAIL PROTECTED]> wrote:


Solr is now up and running on our production environment and working great. 
However it is taking up a lot of extra CPU and memory (CPU usage has doubled 
and memory is swapping). Is there any documentation on performance tuning? 
There seems to be a lot of useful info in the server output but I don't 
understand it.


Swapping if it's constant isn't good...  How much memory does this box
have, and what is the heap size of the JVM?  Are there other things
running on this box?

Solr does warming of caches by default to make complex queries that
hit a new snapshot of the index fast.  This takes up CPU in bursts,
but is normally nothing to worry about unless you have other apps
running on the same box that need CPU.  Because of this warming, CPU
usage of a Solr collection isn't directly related to query traffic at
all times.


-Yonik


How can I update a specific field of an existing document?

2007-01-11 Thread Iris Soto

Hello everybody,
I want update a specific field in a document, but i don't find how do it 
in the documentation of Solr.
Is that posible?, I need to index only a field for a document, Do i have 
to index all the document for this?
The problem is that i have to transform a bizdata object to a file 
content xml in java,  i should to build all the document xml step by 
step, field by field, retrieving all the bizdata of database to be 
passed to Solr.


Thanks in advance.

--
Iris Soto 



Re: How can I update a specific field of an existing document?

2007-01-11 Thread Thorsten Scherler
On Thu, 2007-01-11 at 10:19 -0600, Iris Soto wrote:
> Hello everybody,
> I want update a specific field in a document, but i don't find how do it 
> in the documentation of Solr.
> Is that posible?, I need to index only a field for a document, Do i have 
> to index all the document for this?
> The problem is that i have to transform a bizdata object to a file 
> content xml in java,  i should to build all the document xml step by 
> step, field by field, retrieving all the bizdata of database to be 
> passed to Solr.
> 

On Thu, 2007-01-11 at 06:43 -0500, Erik Hatcher wrote:
> In Lucene to update a document the operation is really a delete  
> followed by an add.  You will need to add the complete document as  
> there is no such "update only a field" semantics in Lucene. 

This is from a thread in the dev list.

So no it is not possible to just update one field.

HTH

salu2

> Thanks in advance.
> 



Re: listing/enumerating field information

2007-01-11 Thread Yonik Seeley

On 1/11/07, Tracey Jaquith <[EMAIL PROTECTED]> wrote:

 The Internet Archive is getting close to going live with Solr.
 I have two remaining classes of problems.

 1) across the entire index, enumerate all the unique values for a given field.
 2) we use unrestricted dynamicField additions from documents.  (that is our 
users are free to add any named field they like to their document's data (which 
is metadata for their item)).  we want to list all the unique field names in 
the index.


Reasonable requests, they both seem like they would be useful additions to Solr.
I've considered doing (1) in the past, adding the doc frequency of each term.

Relying on the schema for (2) is slightly ambiguous.
Do you want a) all the fields defined by the schema, or b) all the
fields actually in the index (which may exclude some fields in the
schema if not used, but also include any dynamic fields in use).

For 2.b, we could use IndexReader.getFieldNames()

-Yonik


Re: Does Solr support integration with the Compass framework?

2007-01-11 Thread Graham O'Regan

doesn't compass use multiple indexes?

have a read of the "direct lucene" box on

http://www.opensymphony.com/compass/versions/1.1M3/html/introduction.html#i-use-lucene

would that prevent the two being used together? i'd be interested in 
getting the two working together as well, it'd be great to have the 
compass api to create the indexes and use solr to expose them over http.


Yonik Seeley wrote:

One could do a very loose coupling by just pointing Solr at the index
created by Compass, and send a commit command to solr whenever you
want a new view of the index.

-Yonik

On 1/10/07, Jochen Franke <[EMAIL PROTECTED]> wrote:

Currently I'm investigating different Lucene based
search technologies.

For the indexing of our object model my favorite
is Compass because of the Object/Search Engine Mapping
capabilities.
At the same time Solr offers serveral nice features
like faceted search and caching.

Has anybody integrated or tried to integrate
Solr with Compass already and can share experiences.

Thanks,
   Jochen




Re: listing/enumerating field information

2007-01-11 Thread Chris Hostetter

: Code-searching for relevant lucene classes led me to try adding
:
: to my solrconfig.xml

holy cow, i forgot that thing even existed! ... as you can see by
skimmingthe code it's a hodge podge of misc crap that was used early on as
a simple way to test that things were working.

Writing a more generic "Stats" request handler that does what you're
describing certianly seems like a good idea.  Attempting to enumerating
all of the values for a field could be dangerous but an API where the
clienc specifies a starting term and a number of terms and we use the
TermEnum.seek() would be fairly straight forward.



-Hoss



Re: listing/enumerating field information

2007-01-11 Thread Yonik Seeley

On 1/11/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:

Writing a more generic "Stats" request handler that does what you're
describing certianly seems like a good idea.


Hmmm, I hadn't thought of it as a separate handler, but as long as
these types of requests aren't related to a base query, and not needed
along with every query, I guess that could make sense.


 Attempting to enumerating
all of the values for a field could be dangerous


We do it for faceting :-)  But we don't drag it all into memory at once...


but an API where the
clienc specifies a starting term and a number of terms and we use the
TermEnum.seek() would be fairly straight forward.


Adding a start and end (like a range query) is a great idea!
Additionally, I think adding support to incrementally write all the
terms to the response might be important... loading them all into
memory doesn't seem like a great idea.

Perhaps adding Iterator or Iterable to the list of supported types in
TextWriter would be a nice general way to go.

-Yonik


RE: Performance tuning

2007-01-11 Thread Stephanie Belton
This is the output of the free command:

[EMAIL PROTECTED] root2]# free -m
 total   used   free sharedbuffers cached
Mem:  2007   1888119  0 86814
-/+ buffers/cache:986   1020
Swap: 1992207   1784

We normally have no swapping at all on this server and since last night
(when Solr was deployed on the site) it's been going up.

Here is an extract of the top command output sorted by memory usage, does
each of the processes really take up 566M??? CU usage is low because we are
outside of peak time but during the day it's at 40% when it used to be just
20%:

20:14:16  up 45 days, 21:47,  1 user,  load average: 1.06, 1.14, 1.11
167 processes: 166 sleeping, 1 running, 0 zombie, 0 stopped
CPU states:  cpuusernice  systemirq  softirq  iowaitidle
   total8.8%0.0%0.3%   0.1% 0.2%6.9%   83.2%
   cpu007.9%0.0%0.3%   0.7% 0.9%6.9%   82.8%
   cpu018.5%0.0%0.3%   0.0% 0.0%6.9%   84.0%
   cpu029.9%0.0%0.1%   0.0% 0.0%6.9%   82.8%
   cpu039.0%0.0%0.6%   0.0% 0.2%7.0%   83.2%
Mem:  2055300k av, 1914588k used,  140712k free,   0k shrd,   89032k
buff
   1326540k actv,  301236k in_d,   30788k in_c
Swap: 2040244k av,  212948k used, 1827296k free  843380k
cached

  PID USER PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
12201 root  15   0  566M 561M 13276 S 0.0 27.9   0:02   0 java
12203 root  15   0  566M 561M 13276 S 0.0 27.9   4:48   2 java
12204 root  16   0  566M 561M 13276 S 0.0 27.9   4:45   1 java
12205 root  15   0  566M 561M 13276 S 0.0 27.9   4:45   0 java
12206 root  15   0  566M 561M 13276 S 0.0 27.9   4:46   2 java
12207 root  15   0  566M 561M 13276 S 0.0 27.9   8:35   2 java
12208 root  16   0  566M 561M 13276 S 0.0 27.9  15:53   1 java
12209 root  16   0  566M 561M 13276 S 0.0 27.9  27:30   1 java
12210 root  21   0  566M 561M 13276 S 0.0 27.9   0:00   1 java
12211 root  15   0  566M 561M 13276 S 0.0 27.9   0:00   0 java
12212 root  15   0  566M 561M 13276 S 0.0 27.9   0:17   1 java
12213 root  15   0  566M 561M 13276 S 0.0 27.9   0:15   2 java
12214 root  21   0  566M 561M 13276 S 0.0 27.9   0:00   3 java
12215 root  15   0  566M 561M 13276 S 0.0 27.9   0:33   2 java
12217 root  21   0  566M 561M 13276 S 0.0 27.9   0:00   3 java
12218 root  15   0  566M 561M 13276 S 0.0 27.9   0:00   2 java
12219 root  15   0  566M 561M 13276 S 0.0 27.9   0:00   1 java
12220 root  15   0  566M 561M 13276 S 0.0 27.9   0:00   2 java
12221 root  15   0  566M 561M 13276 S 0.0 27.9   0:00   0 java
1 root  25   0  566M 561M 13276 S 0.0 27.9 297:21   2 java
12223 root  15   0  566M 561M 13276 S 0.0 27.9   0:13   3 java
12224 root  15   0  566M 561M 13276 S 0.0 27.9   0:00   0 java
12225 root  15   0  566M 561M 13276 S 0.0 27.9   0:00   3 java
12226 root  15   0  566M 561M 13276 S 0.0 27.9   0:00   2 java
12227 root  15   0  566M 561M 13276 S 0.0 27.9   0:00   1 java
12228 root  15   0  566M 561M 13276 S 0.0 27.9   0:00   0 java
12229 root  15   0  566M 561M 13276 S 0.0 27.9   0:00   1 java
12230 root  15   0  566M 561M 13276 S 0.0 27.9   0:00   1 java
Etc...

On the server we also have a website running using mod_perl, it's been
running for 1 year and up until now the CPU usage was peaking at 20% and
memory around 28% no swapping.

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley
Sent: 11 January 2007 15:12
To: solr-user@lucene.apache.org
Subject: Re: Performance tuning

On 1/11/07, Stephanie Belton <[EMAIL PROTECTED]> wrote:

> Solr is now up and running on our production environment and working
great. However it is taking up a lot of extra CPU and memory (CPU usage has
doubled and memory is swapping). Is there any documentation on performance
tuning? There seems to be a lot of useful info in the server output but I
don't understand it.

Swapping if it's constant isn't good...  How much memory does this box
have, and what is the heap size of the JVM?  Are there other things
running on this box?

Solr does warming of caches by default to make complex queries that
hit a new snapshot of the index fast.  This takes up CPU in bursts,
but is normally nothing to worry about unless you have other apps
running on the same box that need CPU.  Because of this warming, CPU
usage of a Solr collection isn't directly related to query traffic at
all times.


-Yonik




Re: listing/enumerating field information

2007-01-11 Thread Chris Hostetter

: >  Attempting to enumerating
: > all of the values for a field could be dangerous
:
: We do it for faceting :-)  But we don't drag it all into memory at once...

i ment trying to return them all to the user at one time ... even if we
decreased the server side memory usage risk my supporting Iterators in the
OUtputWriters, we could still wind up slammingthe client with a large
reply (theoretically: an infinite list)

basicly i'm just arguing that we design the API to have a build in "limit"
concept, and default it to something managable 9the same way we do for
term based facet counts)

: Adding a start and end (like a range query) is a great idea!

oh yeah ... i hadn't considered an "end" ... just a limit, but it would be
trivial to support both.

: Perhaps adding Iterator or Iterable to the list of supported types in
: TextWriter would be a nice general way to go.

yeah ... Iterable would probably make more sense since it's the more
generic API and would allow people to pass truely "lazy" objects to the
SolrQueryResponse (where the iterator() method does the initialization
work)

...that seems like a seperate (but related) issue to having an easy way to
acces Term/Field stats.


-Hoss



RE: Performance tuning

2007-01-11 Thread Stephanie Belton
Thanks for sending this link, I seem to have missed that on the wiki!

-Original Message-
From: Thorsten Scherler [mailto:[EMAIL PROTECTED] 
Sent: 11 January 2007 15:06
To: solr-user@lucene.apache.org
Subject: Re: Performance tuning

On Thu, 2007-01-11 at 14:57 +, Stephanie Belton wrote:
> Hello,
> 
>  
> 
> Solr is now up and running on our production environment and working great. 
> However it is taking up a lot of extra CPU and memory (CPU usage has doubled 
> and memory is swapping). Is there any documentation on performance tuning? 
> There seems to be a lot of useful info in the server output but I don’t 
> understand it.
> 
>  
> 
> E.g.
> filterCache{lookups=0,hits=0,hitratio=0.00,inserts=537,evictions=0,size=337,cumulative_lookups=4723,cumulative_hits=3708,cumulative_hitratio=0.78,cumulative_inserts=4647,cumulative_evictions=72}
> 
> 
> queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=256,evictions=0,size=256,cumulative_lookups=3779,cumulative_hits=552,cumulative_hitratio=0.14,cumulative_inserts=3632,cumulative_evictions=0}
> 
> 
> documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=66005,cumulative_hits=2460,cumulative_hitratio=0.03,cumulative_inserts=63545,cumulative_evictions=4195}
> 
>  
> 
> etc. what should I be watching out for?
> 

Hi Stephanie,

did you see http://wiki.apache.org/solr/SolrPerformanceFactors?

Further you may consider to balance the load via
http://wiki.apache.org/solr/CollectionDistribution

HTH

salu2

>  
> 
> Thanks
> 
> Stephanie
> 





Re: Performance tuning

2007-01-11 Thread Yonik Seeley

On 1/11/07, Stephanie Belton <[EMAIL PROTECTED]> wrote:

This is the output of the free command:

[EMAIL PROTECTED] root2]# free -m
 total   used   free sharedbuffers cached
Mem:  2007   1888119  0 86814
-/+ buffers/cache:986   1020
Swap: 1992207   1784

We normally have no swapping at all on this server and since last night
(when Solr was deployed on the site) it's been going up.


That may be fine... swap in use != swapping.
The OS may be swapping out some processes that haven't been used in a
long time to free up more memory for disk cache (notice 814M cached).
This is a good thing.


Here is an extract of the top command output sorted by memory usage, does
each of the processes really take up 566M???


No, older versions of linux show each thread as a separate process.

CU usage is low because we are

outside of peak time but during the day it's at 40% when it used to be just
20%:


Full-text search is CPU intensive.  An average peak of 40% seems
acceptable.  If the load gets too high, you can scale out by adding
multiple servers behind a load balancer.

-Yonik


20:14:16  up 45 days, 21:47,  1 user,  load average: 1.06, 1.14, 1.11
167 processes: 166 sleeping, 1 running, 0 zombie, 0 stopped
CPU states:  cpuusernice  systemirq  softirq  iowaitidle
   total8.8%0.0%0.3%   0.1% 0.2%6.9%   83.2%
   cpu007.9%0.0%0.3%   0.7% 0.9%6.9%   82.8%
   cpu018.5%0.0%0.3%   0.0% 0.0%6.9%   84.0%
   cpu029.9%0.0%0.1%   0.0% 0.0%6.9%   82.8%
   cpu039.0%0.0%0.6%   0.0% 0.2%7.0%   83.2%
Mem:  2055300k av, 1914588k used,  140712k free,   0k shrd,   89032k
buff
   1326540k actv,  301236k in_d,   30788k in_c
Swap: 2040244k av,  212948k used, 1827296k free  843380k
cached

  PID USER PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
12201 root  15   0  566M 561M 13276 S 0.0 27.9   0:02   0 java
12203 root  15   0  566M 561M 13276 S 0.0 27.9   4:48   2 java
12204 root  16   0  566M 561M 13276 S 0.0 27.9   4:45   1 java
12205 root  15   0  566M 561M 13276 S 0.0 27.9   4:45   0 java
12206 root  15   0  566M 561M 13276 S 0.0 27.9   4:46   2 java
12207 root  15   0  566M 561M 13276 S 0.0 27.9   8:35   2 java
12208 root  16   0  566M 561M 13276 S 0.0 27.9  15:53   1 java
12209 root  16   0  566M 561M 13276 S 0.0 27.9  27:30   1 java
12210 root  21   0  566M 561M 13276 S 0.0 27.9   0:00   1 java
12211 root  15   0  566M 561M 13276 S 0.0 27.9   0:00   0 java
12212 root  15   0  566M 561M 13276 S 0.0 27.9   0:17   1 java
12213 root  15   0  566M 561M 13276 S 0.0 27.9   0:15   2 java
12214 root  21   0  566M 561M 13276 S 0.0 27.9   0:00   3 java
12215 root  15   0  566M 561M 13276 S 0.0 27.9   0:33   2 java
12217 root  21   0  566M 561M 13276 S 0.0 27.9   0:00   3 java
12218 root  15   0  566M 561M 13276 S 0.0 27.9   0:00   2 java
12219 root  15   0  566M 561M 13276 S 0.0 27.9   0:00   1 java
12220 root  15   0  566M 561M 13276 S 0.0 27.9   0:00   2 java
12221 root  15   0  566M 561M 13276 S 0.0 27.9   0:00   0 java
1 root  25   0  566M 561M 13276 S 0.0 27.9 297:21   2 java
12223 root  15   0  566M 561M 13276 S 0.0 27.9   0:13   3 java
12224 root  15   0  566M 561M 13276 S 0.0 27.9   0:00   0 java
12225 root  15   0  566M 561M 13276 S 0.0 27.9   0:00   3 java
12226 root  15   0  566M 561M 13276 S 0.0 27.9   0:00   2 java
12227 root  15   0  566M 561M 13276 S 0.0 27.9   0:00   1 java
12228 root  15   0  566M 561M 13276 S 0.0 27.9   0:00   0 java
12229 root  15   0  566M 561M 13276 S 0.0 27.9   0:00   1 java
12230 root  15   0  566M 561M 13276 S 0.0 27.9   0:00   1 java
Etc...

On the server we also have a website running using mod_perl, it's been
running for 1 year and up until now the CPU usage was peaking at 20% and
memory around 28% no swapping.

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley
Sent: 11 January 2007 15:12
To: solr-user@lucene.apache.org
Subject: Re: Performance tuning

On 1/11/07, Stephanie Belton <[EMAIL PROTECTED]> wrote:

> Solr is now up and running on our production environment and working
great. However it is taking up a lot of extra CPU and memory (CPU usage has
doubled and memory is swapping). Is there any documentation on performance
tuning? There seems to be a lot of useful info in the server output but I
don't understand it.

Swapping if it's constant isn't good...  How much memory does this box
have, and what is the heap size of the JVM?  Are there other things
running on this box?

Solr does warming of caches by default

RE: Performance tuning

2007-01-11 Thread Stephanie Belton
Thanks for that. I am sorry this isn't really Solr-related but how can I
monitor the swapping if I can't rely on the output of the free command?

Do you think I could still achieve any significant improvements by going
through the performance tuning advice on the wiki? 

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley
Sent: 11 January 2007 20:32
To: solr-user@lucene.apache.org
Subject: Re: Performance tuning

On 1/11/07, Stephanie Belton <[EMAIL PROTECTED]> wrote:
> This is the output of the free command:
>
> [EMAIL PROTECTED] root2]# free -m
>  total   used   free sharedbuffers cached
> Mem:  2007   1888119  0 86814
> -/+ buffers/cache:986   1020
> Swap: 1992207   1784
>
> We normally have no swapping at all on this server and since last night
> (when Solr was deployed on the site) it's been going up.

That may be fine... swap in use != swapping.
The OS may be swapping out some processes that haven't been used in a
long time to free up more memory for disk cache (notice 814M cached).
This is a good thing.

> Here is an extract of the top command output sorted by memory usage, does
> each of the processes really take up 566M???

No, older versions of linux show each thread as a separate process.

 CU usage is low because we are
> outside of peak time but during the day it's at 40% when it used to be
just
> 20%:

Full-text search is CPU intensive.  An average peak of 40% seems
acceptable.  If the load gets too high, you can scale out by adding
multiple servers behind a load balancer.

-Yonik

> 20:14:16  up 45 days, 21:47,  1 user,  load average: 1.06, 1.14, 1.11
> 167 processes: 166 sleeping, 1 running, 0 zombie, 0 stopped
> CPU states:  cpuusernice  systemirq  softirq  iowaitidle
>total8.8%0.0%0.3%   0.1% 0.2%6.9%   83.2%
>cpu007.9%0.0%0.3%   0.7% 0.9%6.9%   82.8%
>cpu018.5%0.0%0.3%   0.0% 0.0%6.9%   84.0%
>cpu029.9%0.0%0.1%   0.0% 0.0%6.9%   82.8%
>cpu039.0%0.0%0.6%   0.0% 0.2%7.0%   83.2%
> Mem:  2055300k av, 1914588k used,  140712k free,   0k shrd,   89032k
> buff
>1326540k actv,  301236k in_d,   30788k in_c
> Swap: 2040244k av,  212948k used, 1827296k free  843380k
> cached
>
>   PID USER PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
> 12201 root  15   0  566M 561M 13276 S 0.0 27.9   0:02   0 java
> 12203 root  15   0  566M 561M 13276 S 0.0 27.9   4:48   2 java
> 12204 root  16   0  566M 561M 13276 S 0.0 27.9   4:45   1 java
> 12205 root  15   0  566M 561M 13276 S 0.0 27.9   4:45   0 java
> 12206 root  15   0  566M 561M 13276 S 0.0 27.9   4:46   2 java
> 12207 root  15   0  566M 561M 13276 S 0.0 27.9   8:35   2 java
> 12208 root  16   0  566M 561M 13276 S 0.0 27.9  15:53   1 java
> 12209 root  16   0  566M 561M 13276 S 0.0 27.9  27:30   1 java
> 12210 root  21   0  566M 561M 13276 S 0.0 27.9   0:00   1 java
> 12211 root  15   0  566M 561M 13276 S 0.0 27.9   0:00   0 java
> 12212 root  15   0  566M 561M 13276 S 0.0 27.9   0:17   1 java
> 12213 root  15   0  566M 561M 13276 S 0.0 27.9   0:15   2 java
> 12214 root  21   0  566M 561M 13276 S 0.0 27.9   0:00   3 java
> 12215 root  15   0  566M 561M 13276 S 0.0 27.9   0:33   2 java
> 12217 root  21   0  566M 561M 13276 S 0.0 27.9   0:00   3 java
> 12218 root  15   0  566M 561M 13276 S 0.0 27.9   0:00   2 java
> 12219 root  15   0  566M 561M 13276 S 0.0 27.9   0:00   1 java
> 12220 root  15   0  566M 561M 13276 S 0.0 27.9   0:00   2 java
> 12221 root  15   0  566M 561M 13276 S 0.0 27.9   0:00   0 java
> 1 root  25   0  566M 561M 13276 S 0.0 27.9 297:21   2 java
> 12223 root  15   0  566M 561M 13276 S 0.0 27.9   0:13   3 java
> 12224 root  15   0  566M 561M 13276 S 0.0 27.9   0:00   0 java
> 12225 root  15   0  566M 561M 13276 S 0.0 27.9   0:00   3 java
> 12226 root  15   0  566M 561M 13276 S 0.0 27.9   0:00   2 java
> 12227 root  15   0  566M 561M 13276 S 0.0 27.9   0:00   1 java
> 12228 root  15   0  566M 561M 13276 S 0.0 27.9   0:00   0 java
> 12229 root  15   0  566M 561M 13276 S 0.0 27.9   0:00   1 java
> 12230 root  15   0  566M 561M 13276 S 0.0 27.9   0:00   1 java
> Etc...
>
> On the server we also have a website running using mod_perl, it's been
> running for 1 year and up until now the CPU usage was peaking at 20% and
> memory around 28% no swapping.
>
> -Original Message-
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik
Seeley
> Sent: 11 January 2007 15:12
> To: solr-user@lucene.apache.org
> Subject: Re: Pe

WordDelimiterFilter usage

2007-01-11 Thread Jeff Rodenburg

I'm trying to determine how to index/query for a certain use case, and the
WordDelimiterFilterFactory appears to be what I need to use.  Here's the
scenario:

- Text field being indexed
- Field exists as a full name
- Data might be "cold play"
- This should match against searches for "cold play" and "coldplay" (just
"cold" and just "play" are OK as well)

I'm not able to match "cold play" against searches for "coldplay" at
present.  I'm certain this is a common scenario and I'm missing something
obvious.  Any suggestions of how/where to look/fix this issue?

thanks,
j


Re: WordDelimiterFilter usage

2007-01-11 Thread Chris Hostetter

WordDelimiterFilter wo't really help you in this situations ... but it
would help if you find a lot of users are searching for ColdPlay or
cold-play.

if you have a finite list of popular terms like this that you need to deal
with, the SynonymFilter can help you out.


: Date: Thu, 11 Jan 2007 13:30:39 -0800
: From: Jeff Rodenburg <[EMAIL PROTECTED]>
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: WordDelimiterFilter usage
:
: I'm trying to determine how to index/query for a certain use case, and the
: WordDelimiterFilterFactory appears to be what I need to use.  Here's the
: scenario:
:
: - Text field being indexed
: - Field exists as a full name
: - Data might be "cold play"
: - This should match against searches for "cold play" and "coldplay" (just
: "cold" and just "play" are OK as well)
:
: I'm not able to match "cold play" against searches for "coldplay" at
: present.  I'm certain this is a common scenario and I'm missing something
: obvious.  Any suggestions of how/where to look/fix this issue?
:
: thanks,
: j
:



-Hoss



Re: Performance tuning

2007-01-11 Thread Yonik Seeley

On 1/11/07, Stephanie Belton <[EMAIL PROTECTED]> wrote:

Thanks for that. I am sorry this isn't really Solr-related but how can I
monitor the swapping if I can't rely on the output of the free command?

Do you think I could still achieve any significant improvements by going
through the performance tuning advice on the wiki?


Unfortunately, I think that's pretty old stuff.

People are normally concerned with:
 - the number of requests per second they can handle with their server
 - the average latency of requests (or median, 99 percentile, etc)

A goal of reducing CPU usage w/o looking at the other factors is
unusual, but if your query rate is very low, or your cache hit rate is
low, you could reduce or eliminate caching or autowarming.

-Yonik


RE: Performance tuning

2007-01-11 Thread Stephanie Belton
The reason I am keeping a close eye on resource usage is that our traffic is
increasing by around 20% every month (currently over 400,000 page
impressions/day although not all of them are search queries!) and I want to
make sure we tackle any performance issues before it gets too late. I would
rather keep load balancing as a last resort due to cost implications.
 

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley
Sent: 11 January 2007 22:02
To: solr-user@lucene.apache.org
Subject: Re: Performance tuning

On 1/11/07, Stephanie Belton <[EMAIL PROTECTED]> wrote:
> Thanks for that. I am sorry this isn't really Solr-related but how can I
> monitor the swapping if I can't rely on the output of the free command?
>
> Do you think I could still achieve any significant improvements by going
> through the performance tuning advice on the wiki?

Unfortunately, I think that's pretty old stuff.

People are normally concerned with:
  - the number of requests per second they can handle with their server
  - the average latency of requests (or median, 99 percentile, etc)

A goal of reducing CPU usage w/o looking at the other factors is
unusual, but if your query rate is very low, or your cache hit rate is
low, you could reduce or eliminate caching or autowarming.

-Yonik




Re: Performance tuning

2007-01-11 Thread Yonik Seeley

On 1/11/07, Stephanie Belton <[EMAIL PROTECTED]> wrote:

The reason I am keeping a close eye on resource usage is that our traffic is
increasing by around 20% every month (currently over 400,000 page
impressions/day although not all of them are search queries!) and I want to
make sure we tackle any performance issues before it gets too late. I would
rather keep load balancing as a last resort due to cost implications.


Going slightly OT, but if this is business critical, load-balancing
also provides high availability, which can pay for itself in the event
that a server crashes.

-Yonik


Re: Performance tuning

2007-01-11 Thread Walter Underwood
On 1/11/07 2:33 PM, "Yonik Seeley" <[EMAIL PROTECTED]> wrote:

> On 1/11/07, Stephanie Belton <[EMAIL PROTECTED]> wrote:
>> The reason I am keeping a close eye on resource usage is that our traffic is
>> increasing by around 20% every month (currently over 400,000 page
>> impressions/day although not all of them are search queries!) and I want to
>> make sure we tackle any performance issues before it gets too late. I would
>> rather keep load balancing as a last resort due to cost implications.
> 
> Going slightly OT, but if this is business critical, load-balancing
> also provides high availability, which can pay for itself in the event
> that a server crashes.

Right. For us, load balancing is not a last resort but a fact of life.
The smallest number of parallel servers is three, so that we have
two running when one is down for scheduled maintenance or software
update.

Back to performance, check your cache hit ratios in the admin UI, then
adjust the cache sizes. When your caches are the right size, Solr will
be mostly CPU-bound, but quite fast. If Solr is not CPU-bound under a
maximum load (in testing) it means it is using the disk too much.

wunder
-- 
Walter Underwood
Search Guru, Netflix




Re: How can I update a specific field of an existing document?

2007-01-11 Thread Thorsten Scherler
On Thu, 2007-01-11 at 17:48 +0100, Thorsten Scherler wrote:
> On Thu, 2007-01-11 at 10:19 -0600, Iris Soto wrote:
> > Hello everybody,
> > I want update a specific field in a document, but i don't find how do it 
> > in the documentation of Solr.
> > Is that posible?, I need to index only a field for a document, Do i have 
> > to index all the document for this?

No, just the one document. Let's say you have a CMS and you edit one
document. You will need to re-index this document only by using the the
add solr statement for the whole document (not one field only).

> > The problem is that i have to transform a bizdata object to a file 
> > content xml in java,  i should to build all the document xml step by 
> > step, field by field, retrieving all the bizdata of database to be 
> > passed to Solr.

see above only for the document where the field are changed. I wrote a
small cocoon based plugin in forrest doing the cms related example.

It adds an document related solr gui for a cms like system. Maybe that
gives you some ideas for your own app.


> > 
> 
> On Thu, 2007-01-11 at 06:43 -0500, Erik Hatcher wrote:
> > In Lucene to update a document the operation is really a delete  
> > followed by an add.  You will need to add the complete document as  
> > there is no such "update only a field" semantics in Lucene. 
> 
> This is from a thread in the dev list.

could not access the archive the first time:
http://www.nabble.com/forum/ViewPost.jtp?post=8275908&framed=y

HTH

salu2

> 
> So no it is not possible to just update one field.
> 
> HTH
> 
> salu2
> 
> > Thanks in advance.
> > 
> 
-- 
thorsten

"Together we stand, divided we fall!" 
Hey you (Pink Floyd)




Re: Performance tuning

2007-01-11 Thread Mike Klaas

On 1/11/07, Stephanie Belton <[EMAIL PROTECTED]> wrote:

Thanks for that. I am sorry this isn't really Solr-related but how can I
monitor the swapping if I can't rely on the output of the free command?


$ vmstat -S M 3
procs ---memory-- ---swap-- -io --system-- cpu
r  b   swpd   free   buff  cache   si   sobibo   incs us sy id wa
1  2  0   2236 34763003723   77   155  1  0 98  1
0  1  0   2235 34763007113  551  2607 16  4 71  9
1  0  0   2235 347630072   892  742  2194 13  3 67 17

The si/so columns display the real-time swap in/out rates.  vmstat is
also rather useful for all its other columns too.

-Mike


Re: How can I update a specific field of an existing document?

2007-01-11 Thread Iris Soto

Thorsten Scherler escribió:

On Thu, 2007-01-11 at 17:48 +0100, Thorsten Scherler wrote:
  

On Thu, 2007-01-11 at 10:19 -0600, Iris Soto wrote:


Hello everybody,
I want update a specific field in a document, but i don't find how do it 
in the documentation of Solr.
Is that posible?, I need to index only a field for a document, Do i have 
to index all the document for this?
  


No, just the one document. Let's say you have a CMS and you edit one
document. You will need to re-index this document only by using the the
add solr statement for the whole document (not one field only).

  
The problem is that i have to transform a bizdata object to a file 
content xml in java,  i should to build all the document xml step by 
step, field by field, retrieving all the bizdata of database to be 
passed to Solr.
  


see above only for the document where the field are changed. I wrote a
small cocoon based plugin in forrest doing the cms related example.

It adds an document related solr gui for a cms like system. Maybe that
gives you some ideas for your own app.


  

On Thu, 2007-01-11 at 06:43 -0500, Erik Hatcher wrote:

In Lucene to update a document the operation is really a delete  
followed by an add.  You will need to add the complete document as  
there is no such "update only a field" semantics in Lucene. 
  

This is from a thread in the dev list.



could not access the archive the first time:
http://www.nabble.com/forum/ViewPost.jtp?post=8275908&framed=y

HTH

salu2

  

So no it is not possible to just update one field.

HTH

salu2



Thanks in advance.

  

I'm obtaining all the document to be passed and indexed by Solr.
Thank you  for  to clarify my doubt.

¡Saludos!.

--
Iris Soto 



Re: WordDelimiterFilter usage

2007-01-11 Thread Jeff Rodenburg

Thanks Hoss - it is a finite list, but in the tens of thousands.  I'm going
to easy route -- adding another field that indexes the terms with no
included whitespace.  This is used in an ajax-style lookup, so it works for
this scenario.  Not something I'd normally do in a typical index, for sure.

thanks,
jeff


On 1/11/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:



WordDelimiterFilter wo't really help you in this situations ... but it
would help if you find a lot of users are searching for ColdPlay or
cold-play.

if you have a finite list of popular terms like this that you need to deal
with, the SynonymFilter can help you out.


: Date: Thu, 11 Jan 2007 13:30:39 -0800
: From: Jeff Rodenburg <[EMAIL PROTECTED]>
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: WordDelimiterFilter usage
:
: I'm trying to determine how to index/query for a certain use case, and
the
: WordDelimiterFilterFactory appears to be what I need to use.  Here's the
: scenario:
:
: - Text field being indexed
: - Field exists as a full name
: - Data might be "cold play"
: - This should match against searches for "cold play" and "coldplay"
(just
: "cold" and just "play" are OK as well)
:
: I'm not able to match "cold play" against searches for "coldplay" at
: present.  I'm certain this is a common scenario and I'm missing
something
: obvious.  Any suggestions of how/where to look/fix this issue?
:
: thanks,
: j
:



-Hoss