system architecture question when using solr/lucene
We are currently looking at large numbers of queries/sec and would like to optimize that as much as possible. The special need is that we would like to show specific results based on a specific field - territory field and depending on where in the world you're coming from we'd like to show you specific results. The index is very large (currently 2 million rows) and could grow even larger (2-3 times) in the future. How do we accomplish this given that we have some domain knowledge (the territory) to use to our advantage? Is there a way we can hint solr/lucene to use this information to provide better results? We could use filters on territory or we could use different indexes for different territories (individually or in a combination.) Are there any other ways to do this? How do we figure out the best case in this situation? -- View this message in context: http://www.nabble.com/system-architecture-question-when-using-solr-lucene-tf3759225.html#a10625155 Sent from the Solr - User mailing list archive at Nabble.com.
Re: Question about delete
I believe in lucene at least deleting documents only marks them for deletion. The actual delete happens only after closing the IndexReader. Not sure about Solr Ajanta. James liu wrote: but index file size not changed and maxDoc not changed. > 2007/5/10, Nick Jenkin <[EMAIL PROTECTED]>: Hi James, As I understand it numDocs is the number of documents in your index, maxDoc is the most documents you have ever had in your index. You currently have no documents in your index by the looks, thus your delete query must of deleted everything. That would be why you are getting no results. -Nick On 5/10/07, James liu <[EMAIL PROTECTED]> wrote: > i use command like this > > > curl http://localhost:8983/solr/update --data-binary 'name:DDR' > > curl http://localhost:8983/solr/update --data-binary '' > > > > > and i get > > > numDocs : 0 > > maxDoc : 1218819 > > > > when i search something which exists in before delete and find nothing. > > but index file size not changed and maxDoc not changed. > > why it happen? > > > -- > regards > jl > -- - Nick
Re: system architecture question when using solr/lucene
Thanks to both of you for your responses - Otis and Chris. We did manage to run some benchmarks, but we think there are some surprising results here. It seems that caching is not affecting performance that much. Is that because of the small index size? Do these seem ok or is there any room for improvement in anyway that you could think of? Regards, Ajanta. Results from development servers <https://storesvn.limewire.com/trac/limestore/wiki/BenchmarkResults#Resultsfromdevelopmentservers>Solr HTTP Interface Configurations <https://storesvn.limewire.com/trac/limestore/wiki/BenchmarkResults#Configurations> * Index size is approx 500M (a little more) * Tomcat 6.0 * Solr (nightly build dated 2007-04-19) * Nginx v0.5.20 is used as load balancer (very light weight in size, functionality and cpu consumption) with round-robin distribution of requests. * Grinder v3.0-beta33 was used for testing. This allows one to write custom scripts (in jython) and has nice GUI interface for presenting results. * Server Config : IntelĀ® Xeon^(TM) 3040 1.87Ghz 1066MHz, 4GB RAM (system boot usage 300MB), 8GB swap * Querylist was custom build from web with some of them having AND/OR between terms. territory field was always US. Benchmarks <https://storesvn.limewire.com/trac/limestore/wiki/BenchmarkResults#Benchmarks> Threads Servers Total queries/ Unique Queries Caching Performance (queries/sec) 25 2 2500/1950 D* 500 25 2 2500/2500 D 142 40 2 4000/4000 D 100 40 2 4000/3000 D 166 40 3 4000/4000 D 133 40(backtoback) 3 4000/4000 D 333 40 3 4000/3300 D 142 10 3 2000/2000 D 434 40 3 4000/4000 Q.Caching: 1024 158 40(backtoback) 3 4000/4000 Q.Caching: 1024 384 Without US territory <https://storesvn.limewire.com/trac/limestore/wiki/BenchmarkResults#WithoutUSterritory> Threads Servers Total queries/ Unique Queries Caching Performance (queries/sec) 40 3 4000/4000 D 142 40 2 4000/4000 D 100 Moving territory:US from query to Filters <https://storesvn.limewire.com/trac/limestore/wiki/BenchmarkResults#Movingterritory:USfromquerytoFilters> Threads Servers Total queries/ Unique Queries Caching Performance (queries/sec) 40 3 4000/4000 F.Caching :16384133 40 3 4000/3400 F.Caching :16384147 * D implies caching was disabled * *backtoback* implies same code was run again * CPU usage when server was processing query was ~40-50% * Tomcat shows 3% memory usage. Otis Gospodnetic wrote: Hi Ajanta, I think you answered your own questions. Either use Filters or partition the index. The advantage of partitioning is that you can update them separately without affecting filters, cache, searcher, etc. for the other indices (i.e. no need to warm up with data from the other indices). If you are indeed working with the high QPS, partitioning also lets you scale indices separately (are all territories the same size document-wise? do they all get the same QPS?). The disadvantage is that you can't easily run queries that don't depend on a territory. Otis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lucene Consulting -- http://lucene-consulting.com/ - Original Message From: Ajanta <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Tuesday, May 15, 2007 11:35:13 AM Subject: system architecture question when using solr/lucene We are currently looking at large numbers of queries/sec and would like to optimize that as much as possible. The special need is that we would like to show specific results based on a specific field - territory field and depending on where in the world you're coming from we'd like to show you specific results. The index is very large (currently 2 million rows) and could grow even larger (2-3 times) in the future. How do we accomplish this given that we have some domain knowledge (the territory) to use to our advantage? Is there a way we can hint solr/lucene to use this information to provide better results? We could use filters on territory or we could use different indexes for different territories (individually or in a combination.) Are there any other ways to do this? How do we figure out the best case in this situation?
Re: To make sure XML is UTF-8
Hi Not sure if you've had a solution for your problem yet, but I had dealt with a similar issue that is mentioned below and hopefully it'll help you too. Of course, this assumes that your original data is in utf-8 format. The default charset encoding for mysql is Latin1 and our display format was utf-8 and that was the problem. These are the steps I performed to get the search data in utf-8 format.. Changed the my.cnf as so (though we can avoid this by executing commands on every new connection if we don't want the whole db in utf format): Under: [mysqld] added: # setting default charset to utf-8 collation_server=utf8_unicode_ci character_set_server=utf8 default-character-set=utf8 Under: [client] default-character-set=utf8 After changing, restarted mysqld, re-created the db, re-inserted all the data again in the db using my data insert code (java program) and re-created the Solr index. The key is to change the settings for both the mysqld and client sections in my.cnf - the mysqld setting is to make sure that mysql doesn't convert it to latin1 while storing the data and the client setting is to ensure that the data is not converted while accessing - going in or coming out from the server. Ajanta. Tiong Jeffrey wrote: Ya you are right! After I change it to UTF-8 the error still there... I looked at the log, this is what it appears, 127.0.0.1 - - [10/06/2007:03:52:06 +] "POST /solr/update HTTP/1.1" 500 4022 I tried to search but couldn't understand what error is this, anybody has any idea on this? Thanks!!! On 6/10/07, Chris Hostetter <[EMAIL PROTECTED]> wrote: : way during indexing is - "FATAL: Connection error (is Solr running at : http://localhost/solr/update : ?): java.io.IOException: Server returned HTTP Response code: 500 for URL: : http://local/solr/update"; : 4.Although the error code doesnt specify is XML utf-8 code error, but I did : a bit research, and look at the XML file that i have, it doesn't fulfill the : utf-8 encoding I *strongly* encourage you to look at the body of the response and/or the error log of your Servlet container and find out *exactly* what the cause of the error is ... you could spend a lot of time working on this and discover it's not your real problem. -Hoss