- Is the solr php in the wiki working out of the box for anyone? show your php.ini. did you performance your php?
2006/9/10, Brian Lucas <[EMAIL PROTECTED]>:
Hi Michael, I apologize for the lack of testing on the SolPHP. I had to "strip" it down significantly to turn it into a general class that would be usable and the version up there has not been extensively tested yet (I'm almost ready to get back to that and "revise" it), plus much of my coding is done in Rails at the moment. However... If you have a new version, could you send it over my way or just upload it to the wiki? I'd like to take a look at the changes and throw your revised version up there or integrate both versions into a cleaner revision of the version already there. With respect to batch queries, it's already designed to do that (that's why you see "array($array)" in the example, because it accepts an array of updates) but I'd definitely like to see how you revised it. Thanks, Brian -----Original Message----- From: Michael Imbeault [mailto:[EMAIL PROTECTED] Sent: Saturday, September 09, 2006 12:30 PM To: solr-user@lucene.apache.org Subject: Got it working! And some questions First of all, in reference to http://www.mail-archive.com/solr-user@lucene.apache.org/msg00808.html , I got it working! The problem(s) was coming from solPHP; the implementation in the wiki isn't really working, to be honest, at least for me. I had to modify it significantly at multiple places to get it working. Tomcat 5.5, WAMP and Windows XP. The main problem was that addIndex was sending 1 doc at a time to solr; it would cause a problem after a few thousand docs because i was running out of resources. I modified solr_update.php to handle batch queries, and i'm now sending batches of 1000 docs at a time. Great indexing speed. Had a slight problem with the curl function of solr_update.php; the custom HTTP header wasn't recognized; I now use curl_setopt($ch, CURLOPT_POST, 1); curl_setopt($ch, CURLOPT_POSTFIELDS, $post_string); - much simpler, and now everything works! Up so far I indexed 15.000.000 documents (my whole collection, basically) and the performance i'm getting is INCREDIBLE (sub 100ms query time without warmup and no optimization at all on a 7 gigs index - and with the cache, it gets stupid fast)! Seriously, Solr amaze me every time I use it. I increased HashDocSet Maxsize to 75000, will continue to optimize this value - it helped a great deal. I will try disMaxHandler soon too; right now the standard one is great. And I will index with a better stopword file; the default one could really use improvements. Some questions (couldn't find the answer in the docs): - Is the solr php in the wiki working out of the box for anyone? Else we could modify the wiki... - What is the loadFactor variable of HashDocSet? Should I optimize it too? - What's the units on the size value of the caches? Megs, number of queries, kilobytes? Not described anywhere. - Any way to programatically change the OR/AND preference of the query parser? I set it to AND by default for user queries, but i'd like to set it to OR for some server-side queries I must do (find related articles, order by score). - Whats the difference between the 2 commits type? Blocking and non-blocking. Didn't see any differences at all, tried both. - Every time I do an <optimize> command, I get the following in my catalina logs - should I do anything about it? 9-Sep-2006 2:24:40 PM org.apache.solr.core.SolrException log SEVERE: Exception during commit/optimize:java.io.EOFException: no more data available - expected end tag </optimize> to close start tag <optimize> from line 1, parser stopped on START_TAG seen <optimize>... @1:10 - Any benefits of setting the allowed memory for Tomcat higher? Right now im allocating 384 megs. Can't wait to try the new Faceted Queries... seriously, solr is really, really awesome up so far. Thanks for all your work, and sorry for all the questions! -- Michael Imbeault CHUL Research Center (CHUQ) 2705 boul. Laurier Ste-Foy, QC, Canada, G1V 4G2 Tel: (418) 654-2705, Fax: (418) 654-2212