'm using a php
interface and curl to post my xml, one document at a time, and commit
every 100 document. Indexing 3 docs, it hangs at maybe 5000. Anyone
got an idea on this one? It would be helpful. I may try to switch to
jetty tomorrow if nothing works :(
--
Michael Imbeault
CHUL
'm using a php
interface and curl to post my xml, one document at a time, and commit
every 100 document. Indexing 3 docs, it hangs at maybe 5000. Anyone
got an idea on this one? It would be helpful. I may try to switch to
jetty tomorrow if nothing works :(
--
Michael Imbeault
CHUL
ry the new Faceted Queries... seriously, solr is really,
really awesome up so far. Thanks for all your work, and sorry for all
the questions!
--
Michael Imbeault
CHUL Research Center (CHUQ)
2705 boul. Laurier
Ste-Foy, QC, Canada, G1V 4G2
Tel: (418) 654-2705, Fax: (418) 654-2212
Fixed my problem, the implementation of solPHP was faulty. It was
sending one doc at a time (one curl per doc) and the system quickly ran
out of resources. Now I modified it to send by batch (1000 at a time)
and everything is #1!
Michael Imbeault wrote:
Old issue (see
http://www.mail
"
instead of just ""
: - Any benefits of setting the allowed memory for Tomcat higher? Right
: now im allocating 384 megs.
the more memory you've got, the more cachng you can support .. but if
your index changes so frequently compared to the rate of *unique*
queries you get
Hello Erik,
Thanks for add that feature! "do" is fine with me, if "op" is already
used (not sure about this one).
Erik Hatcher wrote:
On Sep 10, 2006, at 10:47 PM, Michael Imbeault wrote:
I'm still a little disappointed that I can't change the OR/AND
Right now I'm determining similar docs by just querying for the whole
body with OR between words, and it's not very efficient performance
wise. I never coded in Java so I really don't know where I should start...
Thanks,
--
Michael Imbeault
CHUL Research Center (CHUQ)
2705 boul.
ll, I kinda expected it.
1000+ words queries on a 15 millions docs collection, you don't expect
miracles). At first glance I think it searches for the most 'relevant'
words, I'm I right? What kind of performance are you getting with it?
Thanks a lot,
Michael Imbeault
CHUL R
Thanks for the answer; and try to enjoy your vacation / travel! Can't
wait to be able to interface with MoreLikeThis within Solr!
Michael Imbeault
CHUL Research Center (CHUQ)
2705 boul. Laurier
Ste-Foy, QC, Canada, G1V 4G2
Tel: (418) 654-2705, Fax: (418) 654-2212
Erik Hatcher wrote:
O
sort... which shouldn't yield
good performance no matter what, sadly.
Is there any other way I could achieve what I'm trying to do? Just a
list of the most frequent (top 5) authors present in the results of a query.
Thanks,
--
Michael Imbeault
CHUL Research Center (CHUQ)
2705 boul. L
the whole index no matter what's the result set when doing
facets on a string field. I must be doing something wrong?
Michael Imbeault
CHUL Research Center (CHUQ)
2705 boul. Laurier
Ste-Foy, QC, Canada, G1V 4G2
Tel: (418) 654-2705, Fax: (418) 654-2212
Michael Imbeault wrote:
Been playing
Yonik Seeley wrote:
I noticed this too, and have been thinking about ways to fix it.
The root of the problem is that lucene, like all full-text search
engines, uses inverted indicies. It's fast and easy to get all
documents for a particular term, but getting all terms for a document
documents is
n kb, and
someone on the list told me it was number of items, but I don't quite
get it. Better documentation on that would be welcomed :)
Also, is there any plans to add an option not to run a facet search if
the result set is too big? To avoid 40 seconds queries if the docset is
too
Thanks for all the great answers.
Quick Question: did you say you are faceting on the first name field
seperately from the last name field? ... why?
You misunderstood. I'm doing faceting on first author, and last author
of the list. Life science papers have authors list, and the first one is
u
list (been reading it off and on), but I'm
afraid I couldn't code my way of a paper bag in Java. I'll contribute to
the Solr wiki (the SolrPHP part in particular) as soon as I can. Thats
the least I can do!
Btw, Any plans for a facets cache?
Michael Imbeault
CHUL Research Center (CHU
="hiv red
blood"&start=0&rows=20&fl=article_title+authors+journal_iso+pubdate+pmid+score&qt=standard&facet=true&facet.field=first_author&facet.limit=5&facet.missing=false&facet.zeros=false
I'll do more testing on the weekend,
Michael Imbeault
CHUL
eap space. I'm sure this problem will
get away on a server with more than the current 500 megs I can allocate
to Tomcat.
Michael Imbeault
CHUL Research Center (CHUQ)
2705 boul. Laurier
Ste-Foy, QC, Canada, G1V 4G2
Tel: (418) 654-2705, Fax: (418) 654-2212
Yonik Seeley wrote:
On 9/22/06,
a nice idea. Sadly, I'm no Java developer, so I fear I won't be the one
coding that :(
Thanks,
--
Michael Imbeault
CHUL Research Center (CHUQ)
2705 boul. Laurier
Ste-Foy, QC, Canada, G1V 4G2
Tel: (418) 654-2705, Fax: (418) 654-2212
it just at the 'i
might do this in the future' stage?
Thanks,
--
Michael Imbeault
CHUL Research Center (CHUQ)
2705 boul. Laurier
Ste-Foy, QC, Canada, G1V 4G2
Tel: (418) 654-2705, Fax: (418) 654-2212
Kevin Lewandowski wrote:
I have not done one but have been planning to do it bas
its a lot trickier. For my
needs, just a spelling suggester would be perfect. Would it require java
programming, or could I get away with it with the current Solr (adding
n-gram fields and querying on them)?
Thanks,
Michael Imbeault
CHUL Research Center (CHUQ)
2705 boul. Laurier
Ste-Foy, QC, C
uld I do this within Solr?
Is there any plans to implement such functionality as standard?
Thanks for the help,
--
Michael Imbeault
CHUL Research Center (CHUQ)
2705 boul. Laurier
Ste-Foy, QC, Canada, G1V 4G2
Tel: (418) 654-2705, Fax: (418) 654-2212
letters / numbers. If an user
search for HIV 1 hepatitis, I'd rewrite it as ("HIV 1" AND hepatitis) OR
("1 hepatitis" AND hiv). Is it a sensible solution?
Thanks,
--
Michael Imbeault
CHUL Research Center (CHUQ)
2705 boul. Laurier
Ste-Foy, QC, Canada, G1V 4G2
Tel: (418) 654-2705, Fax: (418) 654-2212
Hello everyone,
Solr puts a configurable gap between values of the same field, so you
could index every sentence as a separate value of a multi-valued
field.
Thanks for the answer Yonik; I forgot about Multivalued fields! I'm not
exactly sure of how to add multiple values to a single field (asid
Chris Hostetter wrote:
A couple of things make your question really hard to answer ... first off,
you can specify differnet analyser chains for index time and query time --
shen dealing with the WordDelim filter (or the synonym fitler) this is
frequently neccessary -- so the ansers to your questi
So basically its just as I thought it was, thanks for the help :) I had
checked the wiki before asking, but it lacks details and is often vague,
or presuppose that you have knowledge about some specific terms without
explaining them. Its all clear now, thanks to you ;)
Michael Imbeault
CHUL
quot; AND
hepatitis) OR ("1 hepatitis" AND hiv). Is it a sensible solution?
Any chance at all this kind of filter gets implemented into solr? If
not, indications on how to do it myself would be appreciated - I can't
say I have a clue right now (never did java, the only lucene programming
I did was via a php bridge).
Thanks for the help,
Michael Imbeault
CHUL Research Center (CHUQ)
2705 boul. Laurier
Ste-Foy, QC, Canada, G1V 4G2
Tel: (418) 654-2705, Fax: (418) 654-2212
enateNumbers="0"
catenateAll="0"/>
words="stopwords-complete.txt" ignoreCase="true"/>
ignoreCase="true"/>
And it works perfectly.
If Solr is interested in the filter, just tell me (and how should I do
to contribute it).
Michael Imbeault
C
I index documents I have in a mysql database via xml. You can build your
xml documents on the fly with the data from your database and index
that, no problem at all.
Michael Imbeault
CHUL Research Center (CHUQ)
2705 boul. Laurier
Ste-Foy, QC, Canada, G1V 4G2
Tel: (418) 654-2705, Fax: (418) 654
you can do this, share with the community; to me its the last 'must
have' feature that would make Solr perfect out of the box (its still
awesome without this, mind you!).
I think the option you describe is the easiest / best one to implement.
Michael Imbeault
CHUL Research Cen
I for one would be interested in such a fragmenter, as the default one
is lacking and doesnt produce acceptable results for most applications.
Michael
Mike Klaas wrote:
I've written an unpolished custom fragmenter for highlighting which is
more expensive than the BasicFragmenter that ships wit
30 matches
Mail list logo