RE: Got it working! And some questions

Brian Lucas Sat, 09 Sep 2006 11:40:26 -0700

Hi Michael,

I apologize for the lack of testing on the SolPHP.  I had to "strip" it down
significantly to turn it into a general class that would be usable and the
version up there has not been extensively tested yet (I'm almost ready to
get back to that and "revise" it), plus much of my coding is done in Rails
at the moment.  However...


If you have a new version, could you send it over my way or just upload it
to the wiki?  I'd like to take a look at the changes and throw your revised
version up there or integrate both versions into a cleaner revision of the
version already there.

With respect to batch queries, it's already designed to do that (that's why
you see "array($array)" in the example, because it accepts an array of
updates) but I'd definitely like to see how you revised it.

Thanks,
Brian


-----Original Message-----
From: Michael Imbeault [mailto:[EMAIL PROTECTED] 
Sent: Saturday, September 09, 2006 12:30 PM
To: solr-user@lucene.apache.org
Subject: Got it working! And some questions

First of all, in reference to 
http://www.mail-archive.com/solr-user@lucene.apache.org/msg00808.html , 
I got it working! The problem(s) was coming from solPHP; the 
implementation in the wiki isn't really working, to be honest, at least 
for me. I had to modify it significantly at multiple places to get it 
working. Tomcat 5.5, WAMP and Windows XP.

The main problem was that addIndex was sending 1 doc at a time to solr; 
it would cause a problem after a few thousand docs because i was running 
out of resources. I modified solr_update.php to handle batch queries, 
and i'm now sending batches of 1000 docs at a time. Great indexing speed.

Had a slight problem with the curl function of solr_update.php; the 
custom HTTP header wasn't recognized; I now use curl_setopt($ch, 
CURLOPT_POST, 1); curl_setopt($ch, CURLOPT_POSTFIELDS, $post_string); - 
much simpler, and now everything works!

Up so far I indexed 15.000.000 documents (my whole collection, 
basically) and the performance i'm getting is INCREDIBLE (sub 100ms 
query time without warmup and no optimization at all on a 7 gigs index - 
and with the cache, it gets stupid fast)! Seriously, Solr amaze me every 
time I use it. I increased HashDocSet Maxsize to 75000, will continue to 
optimize this value - it helped a great deal. I will try disMaxHandler 
soon too; right now the standard one is great. And I will index with a 
better stopword file; the default one could really use improvements.

Some questions (couldn't find the answer in the docs):

- Is the solr php in the wiki working out of the box for anyone? Else we 
could modify the wiki...

- What is the loadFactor variable of HashDocSet? Should I optimize it too?

- What's the units on the size value of the caches? Megs, number of 
queries, kilobytes? Not described anywhere.

- Any way to programatically change the OR/AND preference of the query 
parser? I set it to AND by default for user queries, but i'd like to set 
it to OR for some server-side queries I must do (find related articles, 
order by score).

- Whats the difference between the 2 commits type? Blocking and 
non-blocking. Didn't see any differences at all, tried both.

- Every time I do an <optimize> command, I get the following in my 
catalina logs - should I do anything about it?

 9-Sep-2006 2:24:40 PM org.apache.solr.core.SolrException log
SEVERE: Exception during commit/optimize:java.io.EOFException: no more 
data available - expected end tag </optimize> to close start tag 
<optimize> from line 1, parser stopped on START_TAG seen <optimize>... @1:10

- Any benefits of setting the allowed memory for Tomcat higher? Right 
now im allocating 384 megs.

Can't wait to try the new Faceted Queries... seriously, solr is really, 
really awesome up so far. Thanks for all your work, and sorry for all 
the questions!

-- 
Michael Imbeault
CHUL Research Center (CHUQ)
2705 boul. Laurier
Ste-Foy, QC, Canada, G1V 4G2
Tel: (418) 654-2705, Fax: (418) 654-2212

RE: Got it working! And some questions

Reply via email to