On Jan 29, 2007, at 1:08 AM, zha jimmy wrote:
hi, all
I am try to config solr to support chinese tokenize。
I saw the tips in schema.xml:
Then I modified schema.xml
positionIncrementGap="100">
class="org.apache.lucene.analysis.cjk.CJKTokenizer "/>
Hi all. I'm new to solr and I regret I didn't use this tool before.
I'd like to know if solr can index Word, Excel and PDF files or I must
create a xml representation of those files matching my schema?
Cheers.
--
Leandro Rodrigo Saad Cruz
software developer - certified scrum master
:: scrum.com.
On 1/29/07, Leandro Saad <[EMAIL PROTECTED]> wrote:
...I'd like to know if solr can index Word, Excel and PDF files or I must
create a xml representation of those files matching my schema?...
Currently you must create the XML yourself outside of Solr.
This might change, see https://issues.apac
: >I realized that solr do not have the CJK package ,but how can I
: > add it
: > in?
:
: You need to add the analyzers JAR from Lucene's contrib area to your
: Solr application, under WEB-INF/lib. You can get that JAR from the
: latest Lucene release distribution.
it's acctually eazier then
I have a mirror of the entire dmoz content in a solr index. International
characters seem to be loaded and returned in queries just fine but queries
that _contain_ international character queries return no results for known
matching patterns.
Is there a filter class I need to be using for internat
: > We override defaultOperator of "OR" to "AND".
:
: We really ought to make AND the default anyway.
No, no, no, no, No..
there is no prefix operator for "OR" so if the default is "AND" there is
no way at request time to indicate that some clauses should be optional
without rev
congrats on the successfull roll-out Tracey,
: We don't use DisMax and (as of now) do not use faceting.
: And finally, the hardest part to convert to Solr.
: I had to write a PHP front-end custom converter to take our query strings,
: parse the clauses and lucene syntax into pieces, and
Hi:
Just want to know if this the norm or is it my configuration. I created simple
file with 10 000 records, 4 field per record these are id, title, desc, link.
First I use the Solrb i.e. ruby gem library to perform insert acording to
instructions and it took me about an hour and still counti
Hi :
If you haven't done so.. I think you need to enable UTF-8 support in your
tomcat/jetty etc.. for quries from web browsers.. have a look
http://wiki.apache.org/tomcat/Tomcat/UTF-8
Regards
Scott Leonard <[EMAIL PROTECTED]> skrev: I have a mirror of the entire dmoz
content in a solr index.
On 1/29/07, Antonio Eggberg <[EMAIL PROTECTED]> wrote:
Is it a good practice to do after every insert .. is this what is
taking the time.. are there any general rule of thumb.
Definitely don't do a commit after every insert. Do a single one at the end.
-Yonik
Hi,
I have a question about the syntax for doing an OR filter in my URL. How
do I specify
where ((fq=colA[10 TO 20]) AND (fq=state:USA OR fq=country:USA) ? Basically,
I am
doing a search for a keyword across certain fields and I want to filter the
result set.
The user can input city/state/count
On 1/29/07, escher2k <[EMAIL PROTECTED]> wrote:
I have a question about the syntax for doing an OR filter in my URL. How
do I specify
where ((fq=colA[10 TO 20]) AND (fq=state:USA OR fq=country:USA) ? Basically,
I am
doing a search for a keyword across certain fields and I want to filter the
res
On 1/29/07, escher2k <[EMAIL PROTECTED]> wrote:
Hi,
I have a question about the syntax for doing an OR filter in my URL. How
do I specify
where ((fq=colA[10 TO 20]) AND (fq=state:USA OR fq=country:USA) ? Basically,
I am
doing a search for a keyword across certain fields and I want to filter th
i'm actually using resin here.
On 1/29/07 3:33 PM, "Antonio Eggberg" <[EMAIL PROTECTED]> wrote:
> Hi :
>
> If you haven't done so.. I think you need to enable UTF-8 support in your
> tomcat/jetty etc.. for quries from web browsers.. have a look
>
> http://wiki.apache.org/tomcat/Tomcat/UTF-8
>
SOLR-121 just got applied to the Solrb library, which allows
Solr::Connection#add to accept arrays of documents:
connection.add([doc1, doc2, doc3])
Which means you can do something like this:
connection.add(records.map { |r| make_solr_doc(r) })
Posting more than a single document in a reques
Thanks Coda and Yonik! for the prompt answer..
I will give Solr-121 a try.. Cool
Cheers
Coda Hale <[EMAIL PROTECTED]> skrev: SOLR-121 just got applied to the Solrb
library, which allows
Solr::Connection#add to accept arrays of documents:
connection.add([doc1, doc2, doc3])
Which means you
: program. I can't use java -jar start.jar because it spawns a new
: process, I need to find the actual java code to set it up. I've tried
: setting up the Jetty Server() and doing the addWebApplication() thing
: but while Jetty starts, it does not seem to find all the support
: files for Solr.
t
Hi:
After doing quite a bit of searching what I understand is that the medicine to
my problem of word count is in docTermFreq and TermEnum ... as Chris Hostetter
points out clearly for statistical purpose in the post below. (Please note I am
not so familer with java)
http://www.mail-archive.c
hoss++
On Jan 29, 2007, at 3:43 PM, Chris Hostetter wrote:
: >I realized that solr do not have the CJK package ,but how can I
: > add it
: > in?
:
: You need to add the analyzers JAR from Lucene's contrib area to your
: Solr application, under WEB-INF/lib. You can get that JAR from the
:
he now is ok.
--
regards
jl
On Jan 29, 2007, at 6:15 PM, Chris Hostetter wrote:
: > We override defaultOperator of "OR" to "AND".
:
: We really ought to make AND the default anyway.
No, no, no, no, No..
:)
Your argument is a good one, and I buy it. However, I've never had a
case where a user typing
Your argument is a good one, and I buy it. However, I've never had a
case where a user typing "multiple words" where the expectation was
for OR, it is always AND.
But there are many cases where the expectation is to to get the best
results possible. With AND you get zero results even when the
: > case where a user typing "multiple words" where the expectation was
: > for OR, it is always AND.
if the input you are passing in comes straight fram a user -- and that
user doesn't understand the Lucene query syntax -- i'd argue
that StandardRequestHandler is the wrong choice, and you should
: I have a mirror of the entire dmoz content in a solr index. International
: characters seem to be loaded and returned in queries just fine but queries
: that _contain_ international character queries return no results for known
: matching patterns.
:
: Is there a filter class I need to be using
On 1/29/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:
there is no prefix operator for "OR" so if the default is "AND" there is
no way at request time to indicate that some clauses should be optional
without reverting to the ugly and missleading binary operator syntax ...
Perhaps that's somethi
I'm a little back logged on mail or i would have replied to your word
count email earlier...
one thing to keep in mind is that the index doesn't deal in "words" it
deals in "terms" -- the differnece being a term has "field" and a "token"
-- what was discussed in the mail archives leading up to th
On Jan 29, 2007, at 10:46 PM, Ryan McKinley wrote:
Your argument is a good one, and I buy it. However, I've never had a
case where a user typing "multiple words" where the expectation was
for OR, it is always AND.
But there are many cases where the expectation is to to get the best
results
On Jan 29, 2007, at 11:01 PM, Chris Hostetter wrote:
if there are cases where DisMax isn't the right choice for raw user
input
... i'm not aware of them, but i'd love to hear about them :)
Ok, ok, ok... I'm a self-admitted dismax avoider thus far. I'll
remedy that by building in dismax ca
On Jan 29, 2007, at 8:49 PM, Antonio Eggberg wrote:
After doing quite a bit of searching what I understand is that the
medicine to my problem of word count is in docTermFreq and
TermEnum ... as Chris Hostetter points out clearly for statistical
purpose in the post below. (Please note I am n
On Jan 29, 2007, at 7:26 PM, Yonik Seeley wrote:
On 1/29/07, escher2k <[EMAIL PROTECTED]> wrote:
I have a question about the syntax for doing an OR filter in my
URL. How
do I specify
where ((fq=colA[10 TO 20]) AND (fq=state:USA OR fq=country:USA) ?
Basically,
I am
doing a search for a k
Wow, I'm in awe of the uptake of solrb already! Answers now being
provided before I even get a chance to chime in. And we haven't even
published a gem yet (though I did get it building successfully on a
nightly build server, and will get the gems published sometime soon).
I've indexed 50k
On Jan 29, 2007, at 7:08 PM, Yonik Seeley wrote:
On 1/29/07, Antonio Eggberg <[EMAIL PROTECTED]> wrote:
Is it a good practice to do after every insert .. is this
what is taking the time.. are there any general rule of thumb.
Definitely don't do a commit after every insert. Do a single one
32 matches
Mail list logo