Solr replication
Hi ! I'm really new to Solr ! Could anybody please explain me with a short example how I can setup a simple Solr replication with 3 machines (a master node and 2 slaves) ? This is my conf: * master (linux 2.6.20) : - Hostname "solr.master" with IP "192.168.1.1" * 2 slaves (linux 2.6.20) : - Hostname "solr.slave1" with IP "192.168.1.2" - Hostname "solr.slave2" with IP "192.168.1.3" N.B: sorry if the question was already asked before, but I could't find anything better than the "CollectionDistribution" on the Wiki. Regards Y.
I18N with SOLR
Hello, Is there anyone who has worked on internationalization with SOLR? Apart from using the dynamicField name="*_eng" say for english, is there any other configurations to be made? Regards Dilip
SOLR server
Hi there I am setting up a dedicated SOLR server on Debian Etch, I was wondering whether the server should be configured 32 bit or 64 bit? What issues are there either way? Cheers Mark
Re: Solr replication
1)On solr.master: +Edit scripts.conf: solr_hostname=localhost solr_port=8983 rsyncd_port=18983 +Enable and start rsync: rsyncd-enable; rsyncd-start +Run snapshooter: snapshooter After running this, you should be able to see a new folder named snapshot.* in data/index folder. You can can solrconfig.xml to trigger snapshooter after a commit or optimise. 2) On slave: +Edit scripts.conf: solr_hostname=solr.master solr_port=8986 rsyncd_port=18986 data_dir= webapp_name=solr master_host=localhost master_data_dir=$MASTER_SOLR_HOME/data/ master_status_dir=$MASTER_SOLR_HOME/logs/clients/ +Run snappuller: snappuller -P 18983 +Run snapinstaller: snapinstaller You should setup crontab to run snappuller and snapinstaller periodically. On 10/1/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > > Hi ! > > I'm really new to Solr ! > > Could anybody please explain me with a short example how I can setup a > simple Solr replication with 3 machines (a master node and 2 slaves) ? > > This is my conf: > > * master (linux 2.6.20) : > - Hostname "solr.master" with IP "192.168.1.1" > * 2 slaves (linux 2.6.20) : > - Hostname "solr.slave1" with IP "192.168.1.2" > - Hostname "solr.slave2" with IP "192.168.1.3" > > N.B: sorry if the question was already asked before, but I could't find > anything better than the "CollectionDistribution" on the Wiki. > > Regards > Y. > > -- Regards, Cuong Hoang
Re: Re: Solr replication
Works like a charm. Thanks very much. cheers Y. Message d'origine >Date: Mon, 1 Oct 2007 21:55:30 +1000 >De: climbingrose >A: solr-user@lucene.apache.org >Sujet: Re: Solr replication > boundary="=_Part_10345_13696775.1191239730731" > >1)On solr.master: >+Edit scripts.conf: >solr_hostname=localhost >solr_port=8983 >rsyncd_port=18983 >+Enable and start rsync: >rsyncd-enable; rsyncd-start >+Run snapshooter: >snapshooter >After running this, you should be able to see a new folder named snapshot.* >in data/index folder. >You can can solrconfig.xml to trigger snapshooter after a commit or >optimise. > >2) On slave: >+Edit scripts.conf: >solr_hostname=solr.master >solr_port=8986 >rsyncd_port=18986 >data_dir= >webapp_name=solr >master_host=localhost >master_data_dir=$MASTER_SOLR_HOME/data/ >master_status_dir=$MASTER_SOLR_HOME/logs/clients/ >+Run snappuller: >snappuller -P 18983 >+Run snapinstaller: >snapinstaller > >You should setup crontab to run snappuller and snapinstaller periodically. > > > >On 10/1/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: >> >> Hi ! >> >> I'm really new to Solr ! >> >> Could anybody please explain me with a short example how I can setup a >> simple Solr replication with 3 machines (a master node and 2 slaves) ? >> >> This is my conf: >> >> * master (linux 2.6.20) : >> - Hostname "solr.master" with IP "192.168.1.1" >> * 2 slaves (linux 2.6.20) : >> - Hostname "solr.slave1" with IP "192.168.1.2" >> - Hostname "solr.slave2" with IP "192.168.1.3" >> >> N.B: sorry if the question was already asked before, but I could't find >> anything better than the "CollectionDistribution" on the Wiki. >> >> Regards >> Y. >> >> > > >-- >Regards, > >Cuong Hoang > >
Searching combined English-Japanese index
Hi, I know there has been quite some discussion about Multilanguage searching already, but I am not quite sure this applies to my case. I have an index with field which contain Japanese and English at the same time. Is this possible? Tokenizing is not the big problem here, the StandardTokenizerFactory is good enough, judging by the Solr-Admin Field Analysis. My problem is, that searches for Japanese Text don't give any results. I get results for the English parts, but not for the Japanese. Using Limo I can see that it is correctly indexed as UTF-8. But using the Solr Admin Query, I don't get any results. As I understood it, Solr should just match the characters and return something. When I search using an English term, I get results but the Japanese is not encoded correctly in the response. (although it is UTF-8 encoded) I am using Solr 1.2. Any ideas, what I might be doing wrong? Best regards, Max -- Maximilian Hütter blue elephant systems GmbH Wollgrasweg 49 D-70599 Stuttgart Tel: (+49) 0711 - 45 10 17 578 Fax: (+49) 0711 - 45 10 17 573 e-mail : [EMAIL PROTECTED] Sitz : Stuttgart, Amtsgericht Stuttgart, HRB 24106 Geschäftsführer: Joachim Hörnle, Thomas Gentsch, Holger Dietrich
Re: Re: Re: Solr replication
One more question about replication. Now that the replication is working, how can I see the changes on slave nodes ? The page statistics : "http://solr.slave1:8983/solr/admin/stats.jsp"; doesn't reflect the correct number of indexed documents and still shows numDocs=0. Is there any command to tell Solr (on slave node) to sync itself with disk ? cheers Y. Message d'origine >De: [EMAIL PROTECTED] >A: solr-user@lucene.apache.org >Sujet: Re: Re: Solr replication >Date: Mon, 1 Oct 2007 15:00:46 +0200 > >Works like a charm. Thanks very much. > >cheers >Y. > >Message d'origine >>Date: Mon, 1 Oct 2007 21:55:30 +1000 >>De: climbingrose >>A: solr-user@lucene.apache.org >>Sujet: Re: Solr replication >> boundary="=_Part_10345_13696775.1191239730731" >> >>1)On solr.master: >>+Edit scripts.conf: >>solr_hostname=localhost >>solr_port=8983 >>rsyncd_port=18983 >>+Enable and start rsync: >>rsyncd-enable; rsyncd-start >>+Run snapshooter: >>snapshooter >>After running this, you should be able to see a new folder named snapshot.* >>in data/index folder. >>You can can solrconfig.xml to trigger snapshooter after a commit or >>optimise. >> >>2) On slave: >>+Edit scripts.conf: >>solr_hostname=solr.master >>solr_port=8986 >>rsyncd_port=18986 >>data_dir= >>webapp_name=solr >>master_host=localhost >>master_data_dir=$MASTER_SOLR_HOME/data/ >>master_status_dir=$MASTER_SOLR_HOME/logs/clients/ >>+Run snappuller: >>snappuller -P 18983 >>+Run snapinstaller: >>snapinstaller >> >>You should setup crontab to run snappuller and snapinstaller periodically. >> >> >> >>On 10/1/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: >>> >>> Hi ! >>> >>> I'm really new to Solr ! >>> >>> Could anybody please explain me with a short example how I can setup a >>> simple Solr replication with 3 machines (a master node and 2 slaves) ? >>> >>> This is my conf: >>> >>> * master (linux 2.6.20) : >>> - Hostname "solr.master" with IP "192.168.1.1" >>> * 2 slaves (linux 2.6.20) : >>> - Hostname "solr.slave1" with IP "192.168.1.2" >>> - Hostname "solr.slave2" with IP "192.168.1.3" >>> >>> N.B: sorry if the question was already asked before, but I could't find >>> anything better than the "CollectionDistribution" on the Wiki. >>> >>> Regards >>> Y. >>> >>> >> >> >>-- >>Regards, >> >>Cuong Hoang >> >> >
Re: Searching combined English-Japanese index
On 10/1/07, Maximilian Hütter <[EMAIL PROTECTED]> wrote: > When I search using an English term, I get results but the Japanese is > not encoded correctly in the response. (although it is UTF-8 encoded) One quick thing to try is the python writer (wt=python) to see the actual unicode values of what you are getting back (since the python writer automatically escapes non-ascii). That can help rule out incorrect charset handling by clients. -Yonik
correlation between score and term frequency
Hi! I have a question about the correlation between the score value and the term frequency. Let's assume that we have one index about one set of documents. In addition to that, let's assume that there is only one term in a query. If we now search for the term "car" and get a certain score value X, and if we then search for the term "football" and get the same score value X. Is it now sure that both values X are the same? Could you explain, what correlation between the score value and the term frequency exists in my scenario? Thanks for your help! Best regards, alex
Re: correlation between score and term frequency
Not sure I follow, you get back the same score for two different queries and you wonder why? The best way to see how a score is calculated is to use the explain (debug) functionality in Solr. -Grant On Oct 1, 2007, at 10:06 AM, [EMAIL PROTECTED] wrote: Hi! I have a question about the correlation between the score value and the term frequency. Let's assume that we have one index about one set of documents. In addition to that, let's assume that there is only one term in a query. If we now search for the term "car" and get a certain score value X, and if we then search for the term "football" and get the same score value X. Is it now sure that both values X are the same? Could you explain, what correlation between the score value and the term frequency exists in my scenario? Thanks for your help! Best regards, alex -- Grant Ingersoll http://lucene.grantingersoll.com
Re: Re: Re: Solr replication
sh /bin/commit should trigger a refresh. However, this command should be executed as part of snapinstaller so you should have to run it manually. On 10/1/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > > One more question about replication. > > Now that the replication is working, how can I see the changes on slave > nodes ? > > The page statistics : > > "http://solr.slave1:8983/solr/admin/stats.jsp"; > > doesn't reflect the correct number of indexed documents and still shows > numDocs=0. > > Is there any command to tell Solr (on slave node) to sync itself with > disk ? > > cheers > Y. > > Message d'origine > >De: [EMAIL PROTECTED] > >A: solr-user@lucene.apache.org > >Sujet: Re: Re: Solr replication > >Date: Mon, 1 Oct 2007 15:00:46 +0200 > > > >Works like a charm. Thanks very much. > > > >cheers > >Y. > > > >Message d'origine > >>Date: Mon, 1 Oct 2007 21:55:30 +1000 > >>De: climbingrose > >>A: solr-user@lucene.apache.org > >>Sujet: Re: Solr replication > >> boundary="=_Part_10345_13696775.1191239730731" > >> > >>1)On solr.master: > >>+Edit scripts.conf: > >>solr_hostname=localhost > >>solr_port=8983 > >>rsyncd_port=18983 > >>+Enable and start rsync: > >>rsyncd-enable; rsyncd-start > >>+Run snapshooter: > >>snapshooter > >>After running this, you should be able to see a new folder named > snapshot.* > >>in data/index folder. > >>You can can solrconfig.xml to trigger snapshooter after a commit or > >>optimise. > >> > >>2) On slave: > >>+Edit scripts.conf: > >>solr_hostname=solr.master > >>solr_port=8986 > >>rsyncd_port=18986 > >>data_dir= > >>webapp_name=solr > >>master_host=localhost > >>master_data_dir=$MASTER_SOLR_HOME/data/ > >>master_status_dir=$MASTER_SOLR_HOME/logs/clients/ > >>+Run snappuller: > >>snappuller -P 18983 > >>+Run snapinstaller: > >>snapinstaller > >> > >>You should setup crontab to run snappuller and snapinstaller > periodically. > >> > >> > >> > >>On 10/1/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > >>> > >>> Hi ! > >>> > >>> I'm really new to Solr ! > >>> > >>> Could anybody please explain me with a short example how I can setup a > >>> simple Solr replication with 3 machines (a master node and 2 slaves) ? > >>> > >>> This is my conf: > >>> > >>> * master (linux 2.6.20) : > >>> - Hostname "solr.master" with IP "192.168.1.1" > >>> * 2 slaves (linux 2.6.20) : > >>> - Hostname "solr.slave1" with IP "192.168.1.2" > >>> - Hostname "solr.slave2" with IP "192.168.1.3" > >>> > >>> N.B: sorry if the question was already asked before, but I could't > find > >>> anything better than the "CollectionDistribution" on the Wiki. > >>> > >>> Regards > >>> Y. > >>> > >>> > >> > >> > >>-- > >>Regards, > >> > >>Cuong Hoang > >> > >> > > > > -- Regards, Cuong Hoang
Re: Re: Re: Re: Solr replication
Perfect. Thanks for all guys. cheers Y. Message d'origine >Date: Tue, 2 Oct 2007 01:01:37 +1000 >De: climbingrose >A: solr-user@lucene.apache.org >Sujet: Re: Re: Re: Solr replication > boundary="=_Part_11644_22377225.1191250897674" > >sh /bin/commit should trigger a refresh. However, this command should be >executed as part of snapinstaller so you should have to run it manually. > >On 10/1/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: >> >> One more question about replication. >> >> Now that the replication is working, how can I see the changes on slave >> nodes ? >> >> The page statistics : >> >> "http://solr.slave1:8983/solr/admin/stats.jsp"; >> >> doesn't reflect the correct number of indexed documents and still shows >> numDocs=0. >> >> Is there any command to tell Solr (on slave node) to sync itself with >> disk ? >> >> cheers >> Y. >> >> Message d'origine >> >De: [EMAIL PROTECTED] >> >A: solr-user@lucene.apache.org >> >Sujet: Re: Re: Solr replication >> >Date: Mon, 1 Oct 2007 15:00:46 +0200 >> > >> >Works like a charm. Thanks very much. >> > >> >cheers >> >Y. >> > >> >Message d'origine >> >>Date: Mon, 1 Oct 2007 21:55:30 +1000 >> >>De: climbingrose >> >>A: solr-user@lucene.apache.org >> >>Sujet: Re: Solr replication >> >> boundary="=_Part_10345_13696775.1191239730731" >> >> >> >>1)On solr.master: >> >>+Edit scripts.conf: >> >>solr_hostname=localhost >> >>solr_port=8983 >> >>rsyncd_port=18983 >> >>+Enable and start rsync: >> >>rsyncd-enable; rsyncd-start >> >>+Run snapshooter: >> >>snapshooter >> >>After running this, you should be able to see a new folder named >> snapshot.* >> >>in data/index folder. >> >>You can can solrconfig.xml to trigger snapshooter after a commit or >> >>optimise. >> >> >> >>2) On slave: >> >>+Edit scripts.conf: >> >>solr_hostname=solr.master >> >>solr_port=8986 >> >>rsyncd_port=18986 >> >>data_dir= >> >>webapp_name=solr >> >>master_host=localhost >> >>master_data_dir=$MASTER_SOLR_HOME/data/ >> >>master_status_dir=$MASTER_SOLR_HOME/logs/clients/ >> >>+Run snappuller: >> >>snappuller -P 18983 >> >>+Run snapinstaller: >> >>snapinstaller >> >> >> >>You should setup crontab to run snappuller and snapinstaller >> periodically. >> >> >> >> >> >> >> >>On 10/1/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: >> >>> >> >>> Hi ! >> >>> >> >>> I'm really new to Solr ! >> >>> >> >>> Could anybody please explain me with a short example how I can setup a >> >>> simple Solr replication with 3 machines (a master node and 2 slaves) ? >> >>> >> >>> This is my conf: >> >>> >> >>> * master (linux 2.6.20) : >> >>> - Hostname "solr.master" with IP "192.168.1.1" >> >>> * 2 slaves (linux 2.6.20) : >> >>> - Hostname "solr.slave1" with IP "192.168.1.2" >> >>> - Hostname "solr.slave2" with IP "192.168.1.3" >> >>> >> >>> N.B: sorry if the question was already asked before, but I could't >> find >> >>> anything better than the "CollectionDistribution" on the Wiki. >> >>> >> >>> Regards >> >>> Y. >> >>> >> >>> >> >> >> >> >> >>-- >> >>Regards, >> >> >> >>Cuong Hoang >> >> >> >> >> > >> >> > > >-- >Regards, > >Cuong Hoang > >
Re: Searching combined English-Japanese index
Yonik Seeley schrieb: > On 10/1/07, Maximilian Hütter <[EMAIL PROTECTED]> wrote: >> When I search using an English term, I get results but the Japanese is >> not encoded correctly in the response. (although it is UTF-8 encoded) > > One quick thing to try is the python writer (wt=python) to see the > actual unicode values of what you are getting back (since the python > writer automatically escapes non-ascii). That can help rule out > incorrect charset handling by clients. > > -Yonik > Thanks for the tip, it turns out that the unicode values are wrong... I mean the browser displays correctly what is send. But I don't know how solr gets these values. For example python output is: 'key':'honshu_server_ovo:application_List VPO NT Templates_integrated', 'backend':'honshu_server', 'service':'ovoconfig', 'objectclass':'ovo:application', 'objecttype':'integrated', 'name':'List VPO NT Templates', 'label':u'VPO \u00e3\u0083\u0086\u00e3\u0083\u00b3\u00e3\u0083\u0097\u00e3\u0083\u00ac\u00e3\u0083\u00bc\u00e3\u0083\u0088', 'path':'', 'context':'', 'revision':'', 'description':'', 'ovo:application_name':'List VPO NT Templates'}, But in Limo the doc looks like this: key honshu_server_ovo:application_List VPO NT Templates_integrated backend honshu_server service ovoconfig objectclass ovo:application objecttype integrated nameList VPO NT Templates label VPO テンプレート path context revision description ovo:application_nameList VPO NT Templates I hope you can view the japanese katakana in the label field. But somehow this is changed to completely different unicode characters in the search result. Max -- Maximilian Hütter blue elephant systems GmbH Wollgrasweg 49 D-70599 Stuttgart Tel: (+49) 0711 - 45 10 17 578 Fax: (+49) 0711 - 45 10 17 573 e-mail : [EMAIL PROTECTED] Sitz : Stuttgart, Amtsgericht Stuttgart, HRB 24106 Geschäftsführer: Joachim Hörnle, Thomas Gentsch, Holger Dietrich
Major CPU performance problems under heavy user load with solr 1.2
Hi there, I am having some major CPU performance problems with heavy user load with solr 1.2. I currently have approximately 4 million documents in the index and I am doing some pretty heavy faceting on multi-valued columns. I know that doing facets are expensive on multi-valued columns but the CPU seems to max out (400%) with apache bench with just 5 identical concurrent requests and I have the potential for a lot more concurrent requests then that with my large number of users that hit our site per day and I am wondering if there are any workarounds. Currently I am running the out of the box solr solution (Example jetty application with my own schema.xml and solrconfig.xml) on a dual Intel Duo core 64 bit box with 8 gigs of ram allocated to the start.jar process dedicated to solr with no slaves. I have set up some aggressive caching in the solrconfig.xml for the filtercache (class="solr.LRUCache"size="300" initialSize="200") and have the HashDocSet set to 1 to help with faceting, but still I am getting some pretty poor performance. I have also tried autowarming the facets by performing a query that hits all my multivalued facets with no facet limits across all the documents in the index. This does seem to reduce my query times by a lot because the filtercache grows to about 2.1 million lookups and finishes the query in about 70 secs. However I have noticed an issue with this because each time I do an optimize or a commit after prewarming the facets the cache gets cleared, according to the stats on the admin page, but the RSize does not shink for the process, and the queries get slow again, so I prewarm the facets again and the memory usage keeps growing like the cache is not being recycled and as a results the prewarm query starts to get slower and slower as each time this occurs (after about 5 times of prewarms and then commit the query takes about 30 mins... ugh) and almost run out of memory. Any thoughts on how to help improve this and fix the memory issue? -- View this message in context: http://www.nabble.com/Major-CPU-performance-problems-under-heavy-user-load-with-solr-1.2-tf4549093.html#a12981540 Sent from the Solr - User mailing list archive at Nabble.com.
Re: Schema version question
Thanks Yonik I have not seen any issues with doing that beside some unrelated perfomance issues I just posted in another thread. Robert. Robert Purdy wrote: > > I was wondering if anyone could help me, I just completed a full index of > my data (about 4 million documents) and noticed that when I was first > setting up the schema I set the version number to "1.2" thinking that solr > 1.2 uses schema version 1.2... ugh... so I am wondering if I can just set > the schema to 1.1 without having to rebuild the full index? I ask because > I am hoping that given an invalid schema version number, that version 1.0 > is not used by default and all my fields are now mulitvalued. Any help > would be greatly appreciated. Thanks in advance > -- View this message in context: http://www.nabble.com/Schema-version-question-tf4536802.html#a12981543 Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr/home
I suppose you've solved this problem already. I just ran into it. And solving it took the following steps: -putting in the directory \Tomcat 5.5\conf\Catalina\localhost a proper solr.xml file, much like the one you have, containing only the text - modifying the solrconfig.xml , and this was another necessary step , changing the default ${solr.data.dir:./solr/data} to point to your actual solr-home e.g.: ${solr.data.dir:/usr/local/projects/my_app/current/solr-home/data} To clarify my configuration: I work with Tomcat 5.5.20 under Windows-XP. My current dataDir is actually: ${solr.data.dir:K:/solr/cur_solr/solr/data} may be this could help ! This information should be added in the SolrTomcat (http://wiki.apache.org/solr/SolrTomcat) - it would have saved me hours yo Matt Mitchell-2 wrote: > > Here you go: > > crossContext="true" > > > > > This is the same file I'm putting into the Tomcat manager "XML > Configuration file URL" form input. > > Matt > > On Sep 6, 2007, at 3:25 PM, Tom Hill wrote: > >> It works for me. (fragments with solr 1.2 on tomcat 5.5.20) >> >> Could you post your fragment file? >> >> Tom >> >> >> On 9/6/07, Matt Mitchell <[EMAIL PROTECTED]> wrote: >>> Hi, >>> >>> I recently upgraded to Solr 1.2. I've set it up through Tomcat using >>> context fragment files. I deploy using the tomcat web manager. In the >>> context fragment I set the environment variable solr/home. This use >>> to work as expected. The solr/home value pointed to the directory >>> where "data", "conf" etc. live. Now, this value doesn't get used and >>> instead, tomcat creates a new directory called "solr" and "solr/data" >>> in the same directory where the context fragment file is located. >>> It's not really a problem in this particular instance. I like the >>> idea of it defaulting to "solr" in the same location as the context >>> fragment file, but as long as I can depend on it always working like >>> that. It is a little puzzling as to why the value in my environment >>> setting doesn't work though? >>> >>> Has anyone else experienced this behavior? >>> >>> Matt > > > > -- View this message in context: http://www.nabble.com/solr-home-tf4394152.html#a12982101 Sent from the Solr - User mailing list archive at Nabble.com.
Re: Major CPU performance problems under heavy user load with solr 1.2
On 10/1/07, Robert Purdy <[EMAIL PROTECTED]> wrote: > Hi there, I am having some major CPU performance problems with heavy user > load with solr 1.2. I currently have approximately 4 million documents in > the index and I am doing some pretty heavy faceting on multi-valued columns. > I know that doing facets are expensive on multi-valued columns but the CPU > seems to max out (400%) with apache bench with just 5 identical concurrent > requests One can always max out CPU (unless one is IO bound) with concurrent requests greater than the number of CPUs on the system. This isn't a problem by itself and would exist even if Solr were an order of magnitude slower or faster. You should be looking at things the peak throughput (queries per sec) you need to support and the latency of the requests (look at the 90 percentile, or whatever). > and I have the potential for a lot more concurrent requests then > that with my large number of users that hit our site per day and I am > wondering if there are any workarounds. Currently I am running the out of > the box solr solution (Example jetty application with my own schema.xml and > solrconfig.xml) on a dual Intel Duo core 64 bit box with 8 gigs of ram > allocated to the start.jar process dedicated to solr with no slaves. > > I have set up some aggressive caching in the solrconfig.xml for the > filtercache (class="solr.LRUCache"size="300" initialSize="200") and > have the HashDocSet set to 1 to help with faceting, but still I am > getting some pretty poor performance. I have also tried autowarming the > facets by performing a query that hits all my multivalued facets with no > facet limits across all the documents in the index. This does seem to reduce > my query times by a lot because the filtercache grows to about 2.1 million > lookups and finishes the query in about 70 secs. OK, that's long. So focus on the latency of a single request instead of jumping straight to load testing. 2.1 million is a lot - what's the field with the largest number of unique values that you are faceting on? > However I have noticed an > issue with this because each time I do an optimize or a commit after > prewarming the facets the cache gets cleared, according to the stats on the > admin page, but the RSize does not shink for the process, and the queries > get slow again, so I prewarm the facets again and the memory usage keeps > growing like the cache is not being recycled The old searcher and cache won't be discarded until all requests using it have completed. > and as a results the prewarm > query starts to get slower and slower as each time this occurs (after about > 5 times of prewarms and then commit the query takes about 30 mins... ugh) > and almost run out of memory. > > Any thoughts on how to help improve this and fix the memory issue? You could try the minDf param to reduce the number of facets stored in the cache and reduce memory consumption. -Yonik
Re: Searching combined English-Japanese index
On 10/1/07, Maximilian Hütter <[EMAIL PROTECTED]> wrote: > Yonik Seeley schrieb: > > On 10/1/07, Maximilian Hütter <[EMAIL PROTECTED]> wrote: > >> When I search using an English term, I get results but the Japanese is > >> not encoded correctly in the response. (although it is UTF-8 encoded) > > > > One quick thing to try is the python writer (wt=python) to see the > > actual unicode values of what you are getting back (since the python > > writer automatically escapes non-ascii). That can help rule out > > incorrect charset handling by clients. > > > > -Yonik > > > Thanks for the tip, it turns out that the unicode values are wrong... I > mean the browser displays correctly what is send. But I don't know how > solr gets these values. OK, so they never got into the index correctly. The most likely explanation is that the charset wasn't set correctly when the update message was sent to Solr. -Yonik
Re: correlation between score and term frequency
Hi Alex, do you mean, you like to know if both results have the same relevance through the whole content which is indexed and if both results are direct comparable? [EMAIL PROTECTED] schrieb: > I have a question about the correlation between the score value and the > term frequency. Let's assume that we have one index about one set of > documents. In addition to that, let's assume that there is only one term > in a query. > > If we now search for the term "car" and get a certain score value X, and > if we then search for the term "football" and get the same score value X. > Is it now sure that both values X are the same? > > Could you explain, what correlation between the score value and the term > frequency exists in my scenario?
Re: Letter-number transitions - can this be turned off
On 30-Sep-07, at 12:47 PM, F Knudson wrote: Is there a flag to disable the letter-number transition in the solr.WordDelimiterFilterFactory? We are indexing category codes, thesaurus codes for which this letter number transition makes no sense. It is bloating the indexing (which is already large). Have you considered using a different analyzer? If you want to continue using WDF, you could make a quick change around since 320: if (splitOnCaseChange == 0 && (lastType & ALPHA) != 0 && (type & ALPHA) != 0) { // ALPHA->ALPHA: always ignore if case isn't considered. } else if ((lastType & UPPER)!=0 && (type & LOWER)!=0) { // UPPER->LOWER: Don't split } else { ... by adding a clause that catches ALPHA -> NUMERIC (and vice versa) and ignores it. Another approach that I am using locally is to maintain the transitions, but force tokens to be a minimum size (so r2d2 doesn't tokenize to four tokens but arrrdeee does). There is a patch here: http://issues.apache.org/jira/browse/SOLR-293 If you vote for it, I promise to get it in for 1.3 -Mike
Re: correlation between score and term frequency
On 1-Oct-07, at 7:06 AM, [EMAIL PROTECTED] wrote: Hi! I have a question about the correlation between the score value and the term frequency. Let's assume that we have one index about one set of documents. In addition to that, let's assume that there is only one term in a query. If we now search for the term "car" and get a certain score value X, and if we then search for the term "football" and get the same score value X. Is it now sure that both values X are the same? Could you explain, what correlation between the score value and the term frequency exists in my scenario? If the field has norms, there is a corrolation but the tf is unrecoverable from the score, because of field length normalization. query normalization also makes it difficult to compare scores from query to query. see http://lucene.apache.org/java/docs/scoring.html to start out, in particular the link to the Similarity class javadocs. -Mike
RE: Searching combined English-Japanese index
Some servlet containers don't do UTF-8 out of the box. There is information about this on the wiki. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Monday, October 01, 2007 9:45 AM To: solr-user@lucene.apache.org Subject: Re: Searching combined English-Japanese index On 10/1/07, Maximilian Hütter <[EMAIL PROTECTED]> wrote: > Yonik Seeley schrieb: > > On 10/1/07, Maximilian Hütter <[EMAIL PROTECTED]> wrote: > >> When I search using an English term, I get results but the Japanese > >> is not encoded correctly in the response. (although it is UTF-8 > >> encoded) > > > > One quick thing to try is the python writer (wt=python) to see the > > actual unicode values of what you are getting back (since the python > > writer automatically escapes non-ascii). That can help rule out > > incorrect charset handling by clients. > > > > -Yonik > > > Thanks for the tip, it turns out that the unicode values are wrong... > I mean the browser displays correctly what is send. But I don't know > how solr gets these values. OK, so they never got into the index correctly. The most likely explanation is that the charset wasn't set correctly when the update message was sent to Solr. -Yonik
Questions about unit test assistant TestHarness
Hi- Is anybody using the unit test assistant class TestHarness in Solr 1.2? I'm trying to use it in Eclipse and found a few problems with classloading. These might be a quirk of using it with Eclipse. I also found a bug in the commit() function where '(Object)' should be '(Object[])'. Are all of these problems fixed in the Solr 1.3 trunk? Should I just grab whatever's there and use them with 1.2? Thanks, Lance Norskog
Re: Questions about unit test assistant TestHarness
What error are you getting exactly? Do you only get the error running from eclipse, or do you also get it running from ant? The TestHarness class is used in almost all the tests, so 'yes', it is ues with solr 1.2. ryan Lance Norskog wrote: Hi- Is anybody using the unit test assistant class TestHarness in Solr 1.2? I'm trying to use it in Eclipse and found a few problems with classloading. These might be a quirk of using it with Eclipse. I also found a bug in the commit() function where '(Object)' should be '(Object[])'. Are all of these problems fixed in the Solr 1.3 trunk? Should I just grab whatever's there and use them with 1.2? Thanks, Lance Norskog
AW: correlation between score and term frequency
Yes, that was the meaning of my question! Can you answer it? -Ursprüngliche Nachricht- Von: Joseph Doehr [mailto:[EMAIL PROTECTED] Gesendet: Montag, 1. Oktober 2007 20:00 An: solr-user@lucene.apache.org Betreff: Re: correlation between score and term frequency Hi Alex, do you mean, you like to know if both results have the same relevance through the whole content which is indexed and if both results are direct comparable? [EMAIL PROTECTED] schrieb: > I have a question about the correlation between the score value and > the term frequency. Let's assume that we have one index about one set > of documents. In addition to that, let's assume that there is only one > term in a query. > > If we now search for the term "car" and get a certain score value X, > and if we then search for the term "football" and get the same score > value X. Is it now sure that both values X are the same? > > Could you explain, what correlation between the score value and the > term frequency exists in my scenario?