Doc add limit, im experiencing it too

2006-09-05 Thread Michael Imbeault
'm using a php interface and curl to post my xml, one document at a time, and commit every 100 document. Indexing 3 docs, it hangs at maybe 5000. Anyone got an idea on this one? It would be helpful. I may try to switch to jetty tomorrow if nothing works :( -- Michael Imbeault CHUL

Doc add limit problem, old issue

2006-09-05 Thread Michael Imbeault
'm using a php interface and curl to post my xml, one document at a time, and commit every 100 document. Indexing 3 docs, it hangs at maybe 5000. Anyone got an idea on this one? It would be helpful. I may try to switch to jetty tomorrow if nothing works :( -- Michael Imbeault CHUL

Got it working! And some questions

2006-09-09 Thread Michael Imbeault
ry the new Faceted Queries... seriously, solr is really, really awesome up so far. Thanks for all your work, and sorry for all the questions! -- Michael Imbeault CHUL Research Center (CHUQ) 2705 boul. Laurier Ste-Foy, QC, Canada, G1V 4G2 Tel: (418) 654-2705, Fax: (418) 654-2212

Re: Doc add limit problem, old issue

2006-09-09 Thread Michael Imbeault
Fixed my problem, the implementation of solPHP was faulty. It was sending one doc at a time (one curl per doc) and the system quickly ran out of resources. Now I modified it to send by batch (1000 at a time) and everything is #1! Michael Imbeault wrote: Old issue (see http://www.mail

Re: Got it working! And some questions

2006-09-10 Thread Michael Imbeault
" instead of just "" : - Any benefits of setting the allowed memory for Tomcat higher? Right : now im allocating 384 megs. the more memory you've got, the more cachng you can support .. but if your index changes so frequently compared to the rate of *unique* queries you get

Re: Got it working! And some questions

2006-09-11 Thread Michael Imbeault
Hello Erik, Thanks for add that feature! "do" is fine with me, if "op" is already used (not sure about this one). Erik Hatcher wrote: On Sep 10, 2006, at 10:47 PM, Michael Imbeault wrote: I'm still a little disappointed that I can't change the OR/AND

MoreLikeThis class in Lucene within Solr?

2006-09-11 Thread Michael Imbeault
Right now I'm determining similar docs by just querying for the whole body with OR between words, and it's not very efficient performance wise. I never coded in Java so I really don't know where I should start... Thanks, -- Michael Imbeault CHUL Research Center (CHUQ) 2705 boul.

Re: MoreLikeThis class in Lucene within Solr?

2006-09-12 Thread Michael Imbeault
ll, I kinda expected it. 1000+ words queries on a 15 millions docs collection, you don't expect miracles). At first glance I think it searches for the most 'relevant' words, I'm I right? What kind of performance are you getting with it? Thanks a lot, Michael Imbeault CHUL R

Re: MoreLikeThis class in Lucene within Solr?

2006-09-13 Thread Michael Imbeault
Thanks for the answer; and try to enjoy your vacation / travel! Can't wait to be able to interface with MoreLikeThis within Solr! Michael Imbeault CHUL Research Center (CHUQ) 2705 boul. Laurier Ste-Foy, QC, Canada, G1V 4G2 Tel: (418) 654-2705, Fax: (418) 654-2212 Erik Hatcher wrote: O

Facet performance with heterogeneous 'facets'?

2006-09-18 Thread Michael Imbeault
sort... which shouldn't yield good performance no matter what, sadly. Is there any other way I could achieve what I'm trying to do? Just a list of the most frequent (top 5) authors present in the results of a query. Thanks, -- Michael Imbeault CHUL Research Center (CHUQ) 2705 boul. L

Re: Facet performance with heterogeneous 'facets'?

2006-09-18 Thread Michael Imbeault
the whole index no matter what's the result set when doing facets on a string field. I must be doing something wrong? Michael Imbeault CHUL Research Center (CHUQ) 2705 boul. Laurier Ste-Foy, QC, Canada, G1V 4G2 Tel: (418) 654-2705, Fax: (418) 654-2212 Michael Imbeault wrote: Been playing

Re: Facet performance with heterogeneous 'facets'?

2006-09-18 Thread Michael Imbeault
Yonik Seeley wrote: I noticed this too, and have been thinking about ways to fix it. The root of the problem is that lucene, like all full-text search engines, uses inverted indicies. It's fast and easy to get all documents for a particular term, but getting all terms for a document documents is

Re: Facet performance with heterogeneous 'facets'?

2006-09-18 Thread Michael Imbeault
n kb, and someone on the list told me it was number of items, but I don't quite get it. Better documentation on that would be welcomed :) Also, is there any plans to add an option not to run a facet search if the result set is too big? To avoid 40 seconds queries if the docset is too

Re: Facet performance with heterogeneous 'facets'?

2006-09-21 Thread Michael Imbeault
Thanks for all the great answers. Quick Question: did you say you are faceting on the first name field seperately from the last name field? ... why? You misunderstood. I'm doing faceting on first author, and last author of the list. Life science papers have authors list, and the first one is u

Re: Facet performance with heterogeneous 'facets'?

2006-09-21 Thread Michael Imbeault
list (been reading it off and on), but I'm afraid I couldn't code my way of a paper bag in Java. I'll contribute to the Solr wiki (the SolrPHP part in particular) as soon as I can. Thats the least I can do! Btw, Any plans for a facets cache? Michael Imbeault CHUL Research Center (CHU

Re: Facet performance with heterogeneous 'facets'?

2006-09-21 Thread Michael Imbeault
="hiv red blood"&start=0&rows=20&fl=article_title+authors+journal_iso+pubdate+pmid+score&qt=standard&facet=true&facet.field=first_author&facet.limit=5&facet.missing=false&facet.zeros=false I'll do more testing on the weekend, Michael Imbeault CHUL

Re: Facet performance with heterogeneous 'facets'?

2006-09-22 Thread Michael Imbeault
eap space. I'm sure this problem will get away on a server with more than the current 500 megs I can allocate to Tomcat. Michael Imbeault CHUL Research Center (CHUQ) 2705 boul. Laurier Ste-Foy, QC, Canada, G1V 4G2 Tel: (418) 654-2705, Fax: (418) 654-2212 Yonik Seeley wrote: On 9/22/06,

Spellchecker in Solr?

2006-10-30 Thread Michael Imbeault
a nice idea. Sadly, I'm no Java developer, so I fear I won't be the one coding that :( Thanks, -- Michael Imbeault CHUL Research Center (CHUQ) 2705 boul. Laurier Ste-Foy, QC, Canada, G1V 4G2 Tel: (418) 654-2705, Fax: (418) 654-2212

Re: Spellchecker in Solr?

2006-10-30 Thread Michael Imbeault
it just at the 'i might do this in the future' stage? Thanks, -- Michael Imbeault CHUL Research Center (CHUQ) 2705 boul. Laurier Ste-Foy, QC, Canada, G1V 4G2 Tel: (418) 654-2705, Fax: (418) 654-2212 Kevin Lewandowski wrote: I have not done one but have been planning to do it bas

Re: Spellchecker in Solr?

2006-10-31 Thread Michael Imbeault
its a lot trickier. For my needs, just a spelling suggester would be perfect. Would it require java programming, or could I get away with it with the current Solr (adding n-gram fields and querying on them)? Thanks, Michael Imbeault CHUL Research Center (CHUQ) 2705 boul. Laurier Ste-Foy, QC, C

Sentence level searching

2006-11-12 Thread Michael Imbeault
uld I do this within Solr? Is there any plans to implement such functionality as standard? Thanks for the help, -- Michael Imbeault CHUL Research Center (CHUQ) 2705 boul. Laurier Ste-Foy, QC, Canada, G1V 4G2 Tel: (418) 654-2705, Fax: (418) 654-2212

Index & search questions; special cases

2006-11-12 Thread Michael Imbeault
letters / numbers. If an user search for HIV 1 hepatitis, I'd rewrite it as ("HIV 1" AND hepatitis) OR ("1 hepatitis" AND hiv). Is it a sensible solution? Thanks, -- Michael Imbeault CHUL Research Center (CHUQ) 2705 boul. Laurier Ste-Foy, QC, Canada, G1V 4G2 Tel: (418) 654-2705, Fax: (418) 654-2212

Re: Sentence level searching

2006-11-12 Thread Michael Imbeault
Hello everyone, Solr puts a configurable gap between values of the same field, so you could index every sentence as a separate value of a multi-valued field. Thanks for the answer Yonik; I forgot about Multivalued fields! I'm not exactly sure of how to add multiple values to a single field (asid

Re: Index & search questions; special cases

2006-11-12 Thread Michael Imbeault
Chris Hostetter wrote: A couple of things make your question really hard to answer ... first off, you can specify differnet analyser chains for index time and query time -- shen dealing with the WordDelim filter (or the synonym fitler) this is frequently neccessary -- so the ansers to your questi

Re: Sentence level searching

2006-11-12 Thread Michael Imbeault
So basically its just as I thought it was, thanks for the help :) I had checked the wiki before asking, but it lacks details and is often vague, or presuppose that you have knowledge about some specific terms without explaining them. Its all clear now, thanks to you ;) Michael Imbeault CHUL

Re: Index & search questions; special cases

2006-11-13 Thread Michael Imbeault
quot; AND hepatitis) OR ("1 hepatitis" AND hiv). Is it a sensible solution? Any chance at all this kind of filter gets implemented into solr? If not, indications on how to do it myself would be appreciated - I can't say I have a clue right now (never did java, the only lucene programming I did was via a php bridge). Thanks for the help, Michael Imbeault CHUL Research Center (CHUQ) 2705 boul. Laurier Ste-Foy, QC, Canada, G1V 4G2 Tel: (418) 654-2705, Fax: (418) 654-2212

Re: Index & search questions; special cases

2006-11-18 Thread Michael Imbeault
enateNumbers="0" catenateAll="0"/> words="stopwords-complete.txt" ignoreCase="true"/> ignoreCase="true"/> And it works perfectly. If Solr is interested in the filter, just tell me (and how should I do to contribute it). Michael Imbeault C

Re: Solr and Oracle

2006-11-23 Thread Michael Imbeault
I index documents I have in a mysql database via xml. You can build your xml documents on the fly with the data from your database and index that, no problem at all. Michael Imbeault CHUL Research Center (CHUQ) 2705 boul. Laurier Ste-Foy, QC, Canada, G1V 4G2 Tel: (418) 654-2705, Fax: (418) 654

Re: Spellchecker in Solr

2006-12-07 Thread Michael Imbeault
you can do this, share with the community; to me its the last 'must have' feature that would make Solr perfect out of the box (its still awesome without this, mind you!). I think the option you describe is the easiest / best one to implement. Michael Imbeault CHUL Research Cen

Re: Better highlighting fragmenter

2007-01-03 Thread Michael Imbeault
I for one would be interested in such a fragmenter, as the default one is lacking and doesnt produce acceptable results for most applications. Michael Mike Klaas wrote: I've written an unpolished custom fragmenter for highlighting which is more expensive than the BasicFragmenter that ships wit