Solr replication

2007-10-01 Thread ycrux
Hi !

I'm really new to Solr !

Could anybody please explain me with a short example how I can setup a simple 
Solr replication with 3 machines (a master node and 2 slaves) ?

This is my conf:

* master (linux 2.6.20) :
- Hostname "solr.master" with IP "192.168.1.1"
* 2 slaves (linux 2.6.20) :
- Hostname "solr.slave1" with IP "192.168.1.2"
- Hostname "solr.slave2" with IP "192.168.1.3"

N.B: sorry if the question was already asked before, but I could't find 
anything better than the "CollectionDistribution" on the Wiki.

Regards
Y.



I18N with SOLR

2007-10-01 Thread Dilip.TS
Hello,

  Is there anyone who has worked on internationalization  with SOLR?
  Apart from using the dynamicField name="*_eng" say for english, is there
any other configurations to be made?

Regards
 Dilip





SOLR server

2007-10-01 Thread Mark Jarecki

Hi there

I am setting up a dedicated SOLR server on Debian Etch, I was  
wondering whether the server should be configured 32 bit or 64 bit?  
What issues are there either way?


Cheers

Mark


Re: Solr replication

2007-10-01 Thread climbingrose
1)On solr.master:
+Edit scripts.conf:
solr_hostname=localhost
solr_port=8983
rsyncd_port=18983
+Enable and start rsync:
rsyncd-enable; rsyncd-start
+Run snapshooter:
snapshooter
After running this, you should be able to see a new folder named snapshot.*
in data/index folder.
You can can solrconfig.xml to trigger snapshooter after a commit or
optimise.

2) On slave:
+Edit scripts.conf:
solr_hostname=solr.master
solr_port=8986
rsyncd_port=18986
data_dir=
webapp_name=solr
master_host=localhost
master_data_dir=$MASTER_SOLR_HOME/data/
master_status_dir=$MASTER_SOLR_HOME/logs/clients/
+Run snappuller:
snappuller -P 18983
+Run snapinstaller:
snapinstaller

You should setup crontab to run snappuller and snapinstaller periodically.



On 10/1/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
>
> Hi !
>
> I'm really new to Solr !
>
> Could anybody please explain me with a short example how I can setup a
> simple Solr replication with 3 machines (a master node and 2 slaves) ?
>
> This is my conf:
>
> * master (linux 2.6.20) :
> - Hostname "solr.master" with IP "192.168.1.1"
> * 2 slaves (linux 2.6.20) :
> - Hostname "solr.slave1" with IP "192.168.1.2"
> - Hostname "solr.slave2" with IP "192.168.1.3"
>
> N.B: sorry if the question was already asked before, but I could't find
> anything better than the "CollectionDistribution" on the Wiki.
>
> Regards
> Y.
>
>


-- 
Regards,

Cuong Hoang


Re: Re: Solr replication

2007-10-01 Thread ycrux
Works like a charm. Thanks very much.

cheers
Y.

Message d'origine
>Date: Mon, 1 Oct 2007 21:55:30 +1000
>De: climbingrose 
>A: solr-user@lucene.apache.org
>Sujet: Re: Solr replication
>   boundary="=_Part_10345_13696775.1191239730731"
>
>1)On solr.master:
>+Edit scripts.conf:
>solr_hostname=localhost
>solr_port=8983
>rsyncd_port=18983
>+Enable and start rsync:
>rsyncd-enable; rsyncd-start
>+Run snapshooter:
>snapshooter
>After running this, you should be able to see a new folder named snapshot.*
>in data/index folder.
>You can can solrconfig.xml to trigger snapshooter after a commit or
>optimise.
>
>2) On slave:
>+Edit scripts.conf:
>solr_hostname=solr.master
>solr_port=8986
>rsyncd_port=18986
>data_dir=
>webapp_name=solr
>master_host=localhost
>master_data_dir=$MASTER_SOLR_HOME/data/
>master_status_dir=$MASTER_SOLR_HOME/logs/clients/
>+Run snappuller:
>snappuller -P 18983
>+Run snapinstaller:
>snapinstaller
>
>You should setup crontab to run snappuller and snapinstaller periodically.
>
>
>
>On 10/1/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
>>
>> Hi !
>>
>> I'm really new to Solr !
>>
>> Could anybody please explain me with a short example how I can setup a
>> simple Solr replication with 3 machines (a master node and 2 slaves) ?
>>
>> This is my conf:
>>
>> * master (linux 2.6.20) :
>> - Hostname "solr.master" with IP "192.168.1.1"
>> * 2 slaves (linux 2.6.20) :
>> - Hostname "solr.slave1" with IP "192.168.1.2"
>> - Hostname "solr.slave2" with IP "192.168.1.3"
>>
>> N.B: sorry if the question was already asked before, but I could't find
>> anything better than the "CollectionDistribution" on the Wiki.
>>
>> Regards
>> Y.
>>
>>
>
>
>-- 
>Regards,
>
>Cuong Hoang
>
>



Searching combined English-Japanese index

2007-10-01 Thread Maximilian Hütter
Hi,

I know there has been quite some discussion about Multilanguage
searching already, but I am not quite sure this applies to my case.

I have an index with field which contain Japanese and English at the
same time. Is this possible? Tokenizing is not the big problem here, the
 StandardTokenizerFactory is good enough, judging by the Solr-Admin
Field Analysis.

My problem is, that searches for Japanese Text don't give any results. I
get results for the English parts, but not for the Japanese.

Using Limo I can see that it is correctly indexed as UTF-8. But using
the Solr Admin Query, I don't get any results. As I understood it, Solr
should just match the characters and return something.

When I search using an English term, I get results but the Japanese is
not encoded correctly in the response. (although it is UTF-8 encoded)

I am using Solr 1.2.

Any ideas, what I might be doing wrong?

Best regards,

Max

-- 
Maximilian Hütter
blue elephant systems GmbH
Wollgrasweg 49
D-70599 Stuttgart

Tel:  (+49) 0711 - 45 10 17 578
Fax:  (+49) 0711 - 45 10 17 573
e-mail :  [EMAIL PROTECTED]
Sitz   :  Stuttgart, Amtsgericht Stuttgart, HRB 24106
Geschäftsführer:  Joachim Hörnle, Thomas Gentsch, Holger Dietrich


Re: Re: Re: Solr replication

2007-10-01 Thread ycrux
One more question about replication. 

Now that the replication is working, how can I see the changes on slave nodes ? 

The page statistics :

"http://solr.slave1:8983/solr/admin/stats.jsp"; 

doesn't reflect the correct number of indexed documents and still shows 
numDocs=0.

Is there any command to tell Solr (on slave node) to sync itself with
disk ?

cheers
Y.

Message d'origine
>De: [EMAIL PROTECTED]
>A: solr-user@lucene.apache.org
>Sujet: Re: Re: Solr replication
>Date: Mon,  1 Oct 2007 15:00:46 +0200
>
>Works like a charm. Thanks very much.
>
>cheers
>Y.
>
>Message d'origine
>>Date: Mon, 1 Oct 2007 21:55:30 +1000
>>De: climbingrose 
>>A: solr-user@lucene.apache.org
>>Sujet: Re: Solr replication
>>  boundary="=_Part_10345_13696775.1191239730731"
>>
>>1)On solr.master:
>>+Edit scripts.conf:
>>solr_hostname=localhost
>>solr_port=8983
>>rsyncd_port=18983
>>+Enable and start rsync:
>>rsyncd-enable; rsyncd-start
>>+Run snapshooter:
>>snapshooter
>>After running this, you should be able to see a new folder named snapshot.*
>>in data/index folder.
>>You can can solrconfig.xml to trigger snapshooter after a commit or
>>optimise.
>>
>>2) On slave:
>>+Edit scripts.conf:
>>solr_hostname=solr.master
>>solr_port=8986
>>rsyncd_port=18986
>>data_dir=
>>webapp_name=solr
>>master_host=localhost
>>master_data_dir=$MASTER_SOLR_HOME/data/
>>master_status_dir=$MASTER_SOLR_HOME/logs/clients/
>>+Run snappuller:
>>snappuller -P 18983
>>+Run snapinstaller:
>>snapinstaller
>>
>>You should setup crontab to run snappuller and snapinstaller periodically.
>>
>>
>>
>>On 10/1/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
>>>
>>> Hi !
>>>
>>> I'm really new to Solr !
>>>
>>> Could anybody please explain me with a short example how I can setup a
>>> simple Solr replication with 3 machines (a master node and 2 slaves) ?
>>>
>>> This is my conf:
>>>
>>> * master (linux 2.6.20) :
>>> - Hostname "solr.master" with IP "192.168.1.1"
>>> * 2 slaves (linux 2.6.20) :
>>> - Hostname "solr.slave1" with IP "192.168.1.2"
>>> - Hostname "solr.slave2" with IP "192.168.1.3"
>>>
>>> N.B: sorry if the question was already asked before, but I could't find
>>> anything better than the "CollectionDistribution" on the Wiki.
>>>
>>> Regards
>>> Y.
>>>
>>>
>>
>>
>>-- 
>>Regards,
>>
>>Cuong Hoang
>>
>>
>



Re: Searching combined English-Japanese index

2007-10-01 Thread Yonik Seeley
On 10/1/07, Maximilian Hütter <[EMAIL PROTECTED]> wrote:
> When I search using an English term, I get results but the Japanese is
> not encoded correctly in the response. (although it is UTF-8 encoded)

One quick thing to try is the python writer (wt=python) to see the
actual unicode values of what you are getting back (since the python
writer automatically escapes non-ascii).  That can help rule out
incorrect charset handling by clients.

-Yonik


correlation between score and term frequency

2007-10-01 Thread kubias
Hi!

I have a question about the correlation between the score value and the
term frequency. Let's assume that we have one index about one set of
documents. In addition to that, let's assume that there is only one term
in a query.

If we now search for the term "car" and get a certain score value X, and
if we then search for the term "football" and get the same score value X.
Is it now sure that both values X are the same?

Could you explain, what correlation between the score value and the term
frequency exists in my scenario?

Thanks for your help!

Best regards,
alex




Re: correlation between score and term frequency

2007-10-01 Thread Grant Ingersoll
Not sure I follow, you get back the same score for two different  
queries and you wonder why?


The best way to see how a score is calculated is to use the explain  
(debug) functionality in Solr.


-Grant

On Oct 1, 2007, at 10:06 AM, [EMAIL PROTECTED] wrote:


Hi!

I have a question about the correlation between the score value and  
the

term frequency. Let's assume that we have one index about one set of
documents. In addition to that, let's assume that there is only one  
term

in a query.

If we now search for the term "car" and get a certain score value  
X, and
if we then search for the term "football" and get the same score  
value X.

Is it now sure that both values X are the same?

Could you explain, what correlation between the score value and the  
term

frequency exists in my scenario?

Thanks for your help!

Best regards,
alex




--
Grant Ingersoll
http://lucene.grantingersoll.com





Re: Re: Re: Solr replication

2007-10-01 Thread climbingrose
sh /bin/commit should trigger a refresh. However, this command should be
executed as part of snapinstaller so you should have to run it manually.

On 10/1/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
>
> One more question about replication.
>
> Now that the replication is working, how can I see the changes on slave
> nodes ?
>
> The page statistics :
>
> "http://solr.slave1:8983/solr/admin/stats.jsp";
>
> doesn't reflect the correct number of indexed documents and still shows
> numDocs=0.
>
> Is there any command to tell Solr (on slave node) to sync itself with
> disk ?
>
> cheers
> Y.
>
> Message d'origine
> >De: [EMAIL PROTECTED]
> >A: solr-user@lucene.apache.org
> >Sujet: Re: Re: Solr replication
> >Date: Mon,  1 Oct 2007 15:00:46 +0200
> >
> >Works like a charm. Thanks very much.
> >
> >cheers
> >Y.
> >
> >Message d'origine
> >>Date: Mon, 1 Oct 2007 21:55:30 +1000
> >>De: climbingrose
> >>A: solr-user@lucene.apache.org
> >>Sujet: Re: Solr replication
> >>  boundary="=_Part_10345_13696775.1191239730731"
> >>
> >>1)On solr.master:
> >>+Edit scripts.conf:
> >>solr_hostname=localhost
> >>solr_port=8983
> >>rsyncd_port=18983
> >>+Enable and start rsync:
> >>rsyncd-enable; rsyncd-start
> >>+Run snapshooter:
> >>snapshooter
> >>After running this, you should be able to see a new folder named
> snapshot.*
> >>in data/index folder.
> >>You can can solrconfig.xml to trigger snapshooter after a commit or
> >>optimise.
> >>
> >>2) On slave:
> >>+Edit scripts.conf:
> >>solr_hostname=solr.master
> >>solr_port=8986
> >>rsyncd_port=18986
> >>data_dir=
> >>webapp_name=solr
> >>master_host=localhost
> >>master_data_dir=$MASTER_SOLR_HOME/data/
> >>master_status_dir=$MASTER_SOLR_HOME/logs/clients/
> >>+Run snappuller:
> >>snappuller -P 18983
> >>+Run snapinstaller:
> >>snapinstaller
> >>
> >>You should setup crontab to run snappuller and snapinstaller
> periodically.
> >>
> >>
> >>
> >>On 10/1/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> >>>
> >>> Hi !
> >>>
> >>> I'm really new to Solr !
> >>>
> >>> Could anybody please explain me with a short example how I can setup a
> >>> simple Solr replication with 3 machines (a master node and 2 slaves) ?
> >>>
> >>> This is my conf:
> >>>
> >>> * master (linux 2.6.20) :
> >>> - Hostname "solr.master" with IP "192.168.1.1"
> >>> * 2 slaves (linux 2.6.20) :
> >>> - Hostname "solr.slave1" with IP "192.168.1.2"
> >>> - Hostname "solr.slave2" with IP "192.168.1.3"
> >>>
> >>> N.B: sorry if the question was already asked before, but I could't
> find
> >>> anything better than the "CollectionDistribution" on the Wiki.
> >>>
> >>> Regards
> >>> Y.
> >>>
> >>>
> >>
> >>
> >>--
> >>Regards,
> >>
> >>Cuong Hoang
> >>
> >>
> >
>
>


-- 
Regards,

Cuong Hoang


Re: Re: Re: Re: Solr replication

2007-10-01 Thread ycrux
Perfect. Thanks for all guys.

cheers
Y.

Message d'origine
>Date: Tue, 2 Oct 2007 01:01:37 +1000
>De: climbingrose 
>A: solr-user@lucene.apache.org
>Sujet: Re: Re: Re: Solr replication
>   boundary="=_Part_11644_22377225.1191250897674"
>
>sh /bin/commit should trigger a refresh. However, this command should be
>executed as part of snapinstaller so you should have to run it manually.
>
>On 10/1/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
>>
>> One more question about replication.
>>
>> Now that the replication is working, how can I see the changes on slave
>> nodes ?
>>
>> The page statistics :
>>
>> "http://solr.slave1:8983/solr/admin/stats.jsp";
>>
>> doesn't reflect the correct number of indexed documents and still shows
>> numDocs=0.
>>
>> Is there any command to tell Solr (on slave node) to sync itself with
>> disk ?
>>
>> cheers
>> Y.
>>
>> Message d'origine
>> >De: [EMAIL PROTECTED]
>> >A: solr-user@lucene.apache.org
>> >Sujet: Re: Re: Solr replication
>> >Date: Mon,  1 Oct 2007 15:00:46 +0200
>> >
>> >Works like a charm. Thanks very much.
>> >
>> >cheers
>> >Y.
>> >
>> >Message d'origine
>> >>Date: Mon, 1 Oct 2007 21:55:30 +1000
>> >>De: climbingrose
>> >>A: solr-user@lucene.apache.org
>> >>Sujet: Re: Solr replication
>> >>  boundary="=_Part_10345_13696775.1191239730731"
>> >>
>> >>1)On solr.master:
>> >>+Edit scripts.conf:
>> >>solr_hostname=localhost
>> >>solr_port=8983
>> >>rsyncd_port=18983
>> >>+Enable and start rsync:
>> >>rsyncd-enable; rsyncd-start
>> >>+Run snapshooter:
>> >>snapshooter
>> >>After running this, you should be able to see a new folder named
>> snapshot.*
>> >>in data/index folder.
>> >>You can can solrconfig.xml to trigger snapshooter after a commit or
>> >>optimise.
>> >>
>> >>2) On slave:
>> >>+Edit scripts.conf:
>> >>solr_hostname=solr.master
>> >>solr_port=8986
>> >>rsyncd_port=18986
>> >>data_dir=
>> >>webapp_name=solr
>> >>master_host=localhost
>> >>master_data_dir=$MASTER_SOLR_HOME/data/
>> >>master_status_dir=$MASTER_SOLR_HOME/logs/clients/
>> >>+Run snappuller:
>> >>snappuller -P 18983
>> >>+Run snapinstaller:
>> >>snapinstaller
>> >>
>> >>You should setup crontab to run snappuller and snapinstaller
>> periodically.
>> >>
>> >>
>> >>
>> >>On 10/1/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
>> >>>
>> >>> Hi !
>> >>>
>> >>> I'm really new to Solr !
>> >>>
>> >>> Could anybody please explain me with a short example how I can setup a
>> >>> simple Solr replication with 3 machines (a master node and 2 slaves) ?
>> >>>
>> >>> This is my conf:
>> >>>
>> >>> * master (linux 2.6.20) :
>> >>> - Hostname "solr.master" with IP "192.168.1.1"
>> >>> * 2 slaves (linux 2.6.20) :
>> >>> - Hostname "solr.slave1" with IP "192.168.1.2"
>> >>> - Hostname "solr.slave2" with IP "192.168.1.3"
>> >>>
>> >>> N.B: sorry if the question was already asked before, but I could't
>> find
>> >>> anything better than the "CollectionDistribution" on the Wiki.
>> >>>
>> >>> Regards
>> >>> Y.
>> >>>
>> >>>
>> >>
>> >>
>> >>--
>> >>Regards,
>> >>
>> >>Cuong Hoang
>> >>
>> >>
>> >
>>
>>
>
>
>-- 
>Regards,
>
>Cuong Hoang
>
>



Re: Searching combined English-Japanese index

2007-10-01 Thread Maximilian Hütter
Yonik Seeley schrieb:
> On 10/1/07, Maximilian Hütter <[EMAIL PROTECTED]> wrote:
>> When I search using an English term, I get results but the Japanese is
>> not encoded correctly in the response. (although it is UTF-8 encoded)
> 
> One quick thing to try is the python writer (wt=python) to see the
> actual unicode values of what you are getting back (since the python
> writer automatically escapes non-ascii).  That can help rule out
> incorrect charset handling by clients.
> 
> -Yonik
> 
Thanks for the tip, it turns out that the unicode values are wrong... I
mean the browser displays correctly what is send. But I don't know how
solr gets these values.

For example python output is:

'key':'honshu_server_ovo:application_List VPO NT Templates_integrated',
 'backend':'honshu_server',
 'service':'ovoconfig',
 'objectclass':'ovo:application',
 'objecttype':'integrated',
 'name':'List VPO NT Templates',
 'label':u'VPO
\u00e3\u0083\u0086\u00e3\u0083\u00b3\u00e3\u0083\u0097\u00e3\u0083\u00ac\u00e3\u0083\u00bc\u00e3\u0083\u0088',
 'path':'',
 'context':'',
 'revision':'',
 'description':'',
 'ovo:application_name':'List VPO NT Templates'},

But in Limo the doc looks like this:

key honshu_server_ovo:application_List VPO NT Templates_integrated
backend honshu_server
service ovoconfig
objectclass ovo:application
objecttype  integrated
nameList VPO NT Templates
label   VPO テンプレート
path
context 
revision
description 
ovo:application_nameList VPO NT Templates

I hope you can view the japanese katakana in the label field.

But somehow this is changed to completely different unicode characters
in the search result.

Max

-- 
Maximilian Hütter
blue elephant systems GmbH
Wollgrasweg 49
D-70599 Stuttgart

Tel:  (+49) 0711 - 45 10 17 578
Fax:  (+49) 0711 - 45 10 17 573
e-mail :  [EMAIL PROTECTED]
Sitz   :  Stuttgart, Amtsgericht Stuttgart, HRB 24106
Geschäftsführer:  Joachim Hörnle, Thomas Gentsch, Holger Dietrich


Major CPU performance problems under heavy user load with solr 1.2

2007-10-01 Thread Robert Purdy

Hi there, I am having some major CPU performance problems with heavy user
load with solr 1.2. I currently have approximately 4 million documents in
the index and I am doing some pretty heavy faceting on multi-valued columns.
I know that doing facets are expensive on multi-valued columns but the CPU
seems to max out (400%) with apache bench with just 5 identical concurrent
requests and I have the potential for a lot more concurrent requests then
that with my large number of users that hit our site per day and I am
wondering if there are any workarounds. Currently I am running the out of
the box solr solution (Example jetty application with my own schema.xml and
solrconfig.xml) on a dual Intel Duo core 64 bit box with 8 gigs of ram
allocated to the start.jar process dedicated to solr with no slaves. 

I have set up some aggressive caching in the solrconfig.xml for the
filtercache (class="solr.LRUCache"size="300" initialSize="200") and
have the HashDocSet set to 1 to help with faceting, but still I am
getting some pretty poor performance. I have also tried autowarming the
facets by performing a query that hits all my multivalued facets with no
facet limits across all the documents in the index. This does seem to reduce
my query times by a lot because the filtercache grows to about 2.1 million
lookups and finishes the query in about 70 secs. However I have noticed an
issue with this because each time I do an optimize or a commit after
prewarming the facets the cache gets cleared, according to the stats on the
admin page, but the RSize does not shink for the process, and the queries
get slow again, so I prewarm the facets again and the memory usage keeps
growing like the cache is not being recycled and as a results the prewarm
query starts to get slower and slower as each time this occurs (after about
5 times of prewarms and then commit the query takes about 30 mins... ugh)
and almost run out of memory. 

Any thoughts on how to help improve this and fix the memory issue?
-- 
View this message in context: 
http://www.nabble.com/Major-CPU-performance-problems-under-heavy-user-load-with-solr-1.2-tf4549093.html#a12981540
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Schema version question

2007-10-01 Thread Robert Purdy

Thanks Yonik I have not seen any issues with doing that beside some unrelated
perfomance issues I just posted in another thread.

Robert.




Robert Purdy wrote:
> 
> I was wondering if anyone could help me, I just completed a full index of
> my data (about 4 million documents) and noticed that when I was first
> setting up the schema I set the version number to "1.2" thinking that solr
> 1.2 uses schema version 1.2... ugh... so I am wondering if I can just set
> the schema to 1.1 without having to rebuild the full index? I ask because
> I am hoping that given an invalid schema version number, that version 1.0
> is not used by default and all my fields are now mulitvalued. Any help
> would be greatly appreciated. Thanks in advance
> 

-- 
View this message in context: 
http://www.nabble.com/Schema-version-question-tf4536802.html#a12981543
Sent from the Solr - User mailing list archive at Nabble.com.



Re: solr/home

2007-10-01 Thread yo_keller

I suppose you've solved this problem already. I just ran into it. And solving
it took the following steps:
-putting in the directory  \Tomcat 5.5\conf\Catalina\localhost a proper
solr.xml file, much like the one you have, containing only the text 
 

- modifying the solrconfig.xml , and this was another necessary step ,
changing the default
${solr.data.dir:./solr/data}
to point to your actual solr-home e.g.:
${solr.data.dir:/usr/local/projects/my_app/current/solr-home/data}

To clarify my configuration: I work with Tomcat 5.5.20 under Windows-XP. My
current dataDir is actually:
${solr.data.dir:K:/solr/cur_solr/solr/data}

may be this could help ! This information should be added in the SolrTomcat
(http://wiki.apache.org/solr/SolrTomcat) - it would have saved me hours 

yo




Matt Mitchell-2 wrote:
> 
> Here you go:
> 
>  crossContext="true" >
> 
> 
> 
> This is the same file I'm putting into the Tomcat manager "XML  
> Configuration file URL" form input.
> 
> Matt
> 
> On Sep 6, 2007, at 3:25 PM, Tom Hill wrote:
> 
>> It works for me. (fragments with solr 1.2 on tomcat 5.5.20)
>>
>> Could you post your fragment file?
>>
>> Tom
>>
>>
>> On 9/6/07, Matt Mitchell <[EMAIL PROTECTED]> wrote:
>>> Hi,
>>>
>>> I recently upgraded to Solr 1.2. I've set it up through Tomcat using
>>> context fragment files. I deploy using the tomcat web manager. In the
>>> context fragment I set the environment variable solr/home. This use
>>> to work as expected. The solr/home value pointed to the directory
>>> where "data", "conf" etc. live. Now, this value doesn't get used and
>>> instead, tomcat creates a new directory called "solr" and "solr/data"
>>> in the same directory where the context fragment file is located.
>>> It's not really a problem in this particular instance. I like the
>>> idea of it defaulting to "solr" in the same location as the context
>>> fragment file, but as long as I can depend on it always working like
>>> that. It is a little puzzling as to why the value in my environment
>>> setting doesn't work though?
>>>
>>> Has anyone else experienced this behavior?
>>>
>>> Matt
> 
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/solr-home-tf4394152.html#a12982101
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Major CPU performance problems under heavy user load with solr 1.2

2007-10-01 Thread Yonik Seeley
On 10/1/07, Robert Purdy <[EMAIL PROTECTED]> wrote:
> Hi there, I am having some major CPU performance problems with heavy user
> load with solr 1.2. I currently have approximately 4 million documents in
> the index and I am doing some pretty heavy faceting on multi-valued columns.
> I know that doing facets are expensive on multi-valued columns but the CPU
> seems to max out (400%) with apache bench with just 5 identical concurrent
> requests

One can always max out CPU (unless one is IO bound) with concurrent
requests greater than the number of CPUs on the system.  This isn't a
problem by itself and would exist even if Solr were an order of
magnitude slower or faster.

You should be looking at things the peak throughput (queries per sec)
you need to support and the latency of the requests (look at the 90
percentile, or whatever).


> and I have the potential for a lot more concurrent requests then
> that with my large number of users that hit our site per day and I am
> wondering if there are any workarounds. Currently I am running the out of
> the box solr solution (Example jetty application with my own schema.xml and
> solrconfig.xml) on a dual Intel Duo core 64 bit box with 8 gigs of ram
> allocated to the start.jar process dedicated to solr with no slaves.
>
> I have set up some aggressive caching in the solrconfig.xml for the
> filtercache (class="solr.LRUCache"size="300" initialSize="200") and
> have the HashDocSet set to 1 to help with faceting, but still I am
> getting some pretty poor performance. I have also tried autowarming the
> facets by performing a query that hits all my multivalued facets with no
> facet limits across all the documents in the index. This does seem to reduce
> my query times by a lot because the filtercache grows to about 2.1 million
> lookups and finishes the query in about 70 secs.

OK, that's long.  So focus on the latency of a single request instead
of jumping straight to load testing.

2.1 million is a lot - what's the field with the largest number of
unique values that you are faceting on?

> However I have noticed an
> issue with this because each time I do an optimize or a commit after
> prewarming the facets the cache gets cleared, according to the stats on the
> admin page, but the RSize does not shink for the process, and the queries
> get slow again, so I prewarm the facets again and the memory usage keeps
> growing like the cache is not being recycled

The old searcher and cache won't be discarded until all requests using
it have completed.

> and as a results the prewarm
> query starts to get slower and slower as each time this occurs (after about
> 5 times of prewarms and then commit the query takes about 30 mins... ugh)
> and almost run out of memory.
>
> Any thoughts on how to help improve this and fix the memory issue?

You could try the minDf param to reduce the number of facets stored in
the cache and reduce memory consumption.

-Yonik


Re: Searching combined English-Japanese index

2007-10-01 Thread Yonik Seeley
On 10/1/07, Maximilian Hütter <[EMAIL PROTECTED]> wrote:
> Yonik Seeley schrieb:
> > On 10/1/07, Maximilian Hütter <[EMAIL PROTECTED]> wrote:
> >> When I search using an English term, I get results but the Japanese is
> >> not encoded correctly in the response. (although it is UTF-8 encoded)
> >
> > One quick thing to try is the python writer (wt=python) to see the
> > actual unicode values of what you are getting back (since the python
> > writer automatically escapes non-ascii).  That can help rule out
> > incorrect charset handling by clients.
> >
> > -Yonik
> >
> Thanks for the tip, it turns out that the unicode values are wrong... I
> mean the browser displays correctly what is send. But I don't know how
> solr gets these values.

OK, so they never got into the index correctly.
The most likely explanation is that the charset wasn't set correctly
when the update message was sent to Solr.

-Yonik


Re: correlation between score and term frequency

2007-10-01 Thread Joseph Doehr

Hi Alex,

do you mean, you like to know if both results have the same relevance
through the whole content which is indexed and if both results are
direct comparable?


[EMAIL PROTECTED] schrieb:
> I have a question about the correlation between the score value and the
> term frequency. Let's assume that we have one index about one set of
> documents. In addition to that, let's assume that there is only one term
> in a query.
> 
> If we now search for the term "car" and get a certain score value X, and
> if we then search for the term "football" and get the same score value X.
> Is it now sure that both values X are the same?
> 
> Could you explain, what correlation between the score value and the term
> frequency exists in my scenario?



Re: Letter-number transitions - can this be turned off

2007-10-01 Thread Mike Klaas

On 30-Sep-07, at 12:47 PM, F Knudson wrote:



Is there a flag to disable the letter-number transition in the
solr.WordDelimiterFilterFactory?  We are indexing category codes,  
thesaurus

codes for which this letter number transition makes no sense.  It is
bloating the indexing (which is already large).


Have you considered using a different analyzer?

If you want to continue using WDF, you could make a quick change  
around since 320:


if (splitOnCaseChange == 0 &&
(lastType & ALPHA) != 0 && (type & ALPHA) != 0) {
  // ALPHA->ALPHA: always ignore if case isn't considered.

} else if ((lastType & UPPER)!=0 && (type & LOWER)!=0) {
  // UPPER->LOWER: Don't split
} else {

...

by adding a clause that catches ALPHA -> NUMERIC (and vice versa) and  
ignores it.


Another approach that I am using locally is to maintain the  
transitions, but force tokens to be a minimum size (so r2d2 doesn't  
tokenize to four tokens but arrrdeee does).


There is a patch here: http://issues.apache.org/jira/browse/SOLR-293

If you vote for it, I promise to get it in for 1.3 

-Mike


Re: correlation between score and term frequency

2007-10-01 Thread Mike Klaas

On 1-Oct-07, at 7:06 AM, [EMAIL PROTECTED] wrote:


Hi!

I have a question about the correlation between the score value and  
the

term frequency. Let's assume that we have one index about one set of
documents. In addition to that, let's assume that there is only one  
term

in a query.

If we now search for the term "car" and get a certain score value  
X, and
if we then search for the term "football" and get the same score  
value X.

Is it now sure that both values X are the same?

Could you explain, what correlation between the score value and the  
term

frequency exists in my scenario?


If the field has norms, there is a corrolation but the tf is  
unrecoverable from the score, because of field length normalization.   
query normalization also makes it difficult to compare scores from  
query to query.


see http://lucene.apache.org/java/docs/scoring.html to start out, in  
particular the link to the Similarity class javadocs.


-Mike


RE: Searching combined English-Japanese index

2007-10-01 Thread Lance Norskog
Some servlet containers don't do UTF-8 out of the box. There is information
about this on the wiki. 

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley
Sent: Monday, October 01, 2007 9:45 AM
To: solr-user@lucene.apache.org
Subject: Re: Searching combined English-Japanese index

On 10/1/07, Maximilian Hütter <[EMAIL PROTECTED]> wrote:
> Yonik Seeley schrieb:
> > On 10/1/07, Maximilian Hütter <[EMAIL PROTECTED]> wrote:
> >> When I search using an English term, I get results but the Japanese 
> >> is not encoded correctly in the response. (although it is UTF-8 
> >> encoded)
> >
> > One quick thing to try is the python writer (wt=python) to see the 
> > actual unicode values of what you are getting back (since the python 
> > writer automatically escapes non-ascii).  That can help rule out 
> > incorrect charset handling by clients.
> >
> > -Yonik
> >
> Thanks for the tip, it turns out that the unicode values are wrong... 
> I mean the browser displays correctly what is send. But I don't know 
> how solr gets these values.

OK, so they never got into the index correctly.
The most likely explanation is that the charset wasn't set correctly when
the update message was sent to Solr.

-Yonik



Questions about unit test assistant TestHarness

2007-10-01 Thread Lance Norskog
Hi-
 
Is anybody using the unit test assistant class TestHarness in Solr 1.2?  I'm
trying to use it in Eclipse and found a few problems with classloading.
These might be a quirk of using it with Eclipse. I also found a bug in the
commit() function where '(Object)' should be '(Object[])'.
 
Are all of these problems fixed in the Solr 1.3 trunk? Should I just grab
whatever's there and use them with 1.2?
 
Thanks,
 
Lance Norskog


Re: Questions about unit test assistant TestHarness

2007-10-01 Thread Ryan McKinley

What error are you getting exactly?

Do you only get the error running from eclipse, or do you also get it 
running from ant?


The TestHarness class is used in almost all the tests, so 'yes', it is 
ues with solr 1.2.


ryan


Lance Norskog wrote:

Hi-
 
Is anybody using the unit test assistant class TestHarness in Solr 1.2?  I'm

trying to use it in Eclipse and found a few problems with classloading.
These might be a quirk of using it with Eclipse. I also found a bug in the
commit() function where '(Object)' should be '(Object[])'.
 
Are all of these problems fixed in the Solr 1.3 trunk? Should I just grab

whatever's there and use them with 1.2?
 
Thanks,
 
Lance Norskog






AW: correlation between score and term frequency

2007-10-01 Thread Alexander Kubias
Yes, that was the meaning of my question! Can you answer it?

-Ursprüngliche Nachricht-
Von: Joseph Doehr [mailto:[EMAIL PROTECTED] 
Gesendet: Montag, 1. Oktober 2007 20:00
An: solr-user@lucene.apache.org
Betreff: Re: correlation between score and term frequency



Hi Alex,

do you mean, you like to know if both results have the same relevance
through the whole content which is indexed and if both results are
direct comparable?


[EMAIL PROTECTED] schrieb:
> I have a question about the correlation between the score value and 
> the term frequency. Let's assume that we have one index about one set 
> of documents. In addition to that, let's assume that there is only one

> term in a query.
> 
> If we now search for the term "car" and get a certain score value X, 
> and if we then search for the term "football" and get the same score 
> value X. Is it now sure that both values X are the same?
> 
> Could you explain, what correlation between the score value and the 
> term frequency exists in my scenario?