On Mon, 24 Mar 2008 22:58:18 -0700 (PDT)
Vinci <[EMAIL PROTECTED]> wrote:
> *Hadoop is more focusing on the disturbuted crawler as far I know...
Hadoop is distributed processing based on the MapReduce algorithm/approach.
Nutch is a lucene related project that uses Hadoop for the crawler and
ind
Thanks Yonik,
I will give it a play when I get some time and write back.
Tim
On Tue, Mar 25, 2008 at 1:21 AM, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> On Mon, Mar 24, 2008 at 5:30 PM, tim robertson
> <[EMAIL PROTECTED]> wrote:
> > Is there any documentation on whether indexes can be partition
Hi all,
I started the indexing with jetty and then I come with some question...
1. If I use the example start.jar, what should be my document system layout?
What is the essential folder?
solr_jar
|_start.jar
|_solrhome
|_etc
|_lib
|_logs
And where is the solr main library located? outside of the
Hi Koji,
It needs a bit of polishing first, but we'll provide a patch if you're
interested. I'll keep you informed as soon as it is available.
Nicolas
-Message d'origine-
De : Koji Sekiguchi [mailto:[EMAIL PROTECTED]
Envoyé : vendredi 21 mars 2008 16:50
À : solr-user@lucene.apache.org
O
Hi All,
we use highlighting and snippets for our searches. Besides those two, I would
want to have a list of terms that lucene used for the highlighting, so that I
can pull out of a "Tim OR Antwerpen AND Ekeren" the following terms :
Antwerpen, Ekeren if let's say these are the only terms that
Have you got a link to the new project. Many thanks.
David
Leonardo Santagada wrote:
On 24/03/2008, at 15:34, Yonik Seeley wrote:
On Mon, Mar 24, 2008 at 12:27 PM, Ed Summers <[EMAIL PROTECTED]> wrote:
On Mon, Mar 24, 2008 at 12:13 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote:
AFAIK, no one h
Hi all,
I have checked the wiki and have some question in mind for the solrj...
1. If I want to run solrj as independent server, do I need to write my own
client program?
2. Can I run SolrJ like the example jetty server at anywhere?
*p.s. We should give a better name of the default example jetty
On 3/21/08, Chris Hostetter <[EMAIL PROTECTED]> wrote:
>
>
> : I am investigating to implement an aggregate average function for a
> document
>
> : and require help for the same.
>
>
> First off: please don't repost the same email with a different subject
> (on either solr list) just because you do
Vinci wrote:
Hi all,
I have checked the wiki and have some question in mind for the solrj...
1. If I want to run solrj as independent server, do I need to write my own
client program?
solrj is the client -- it connects to a server. You should not need to
write your own client.
2. Can I r
Vinci wrote:
Hi all,
I am new to Solr and just make the Solr (3-8-nightly) run on the machine.
I want the System to be more portable so I want to use the jetty Solr in
example...before I tried to index the documents, I would like to ask some
question:
1. Do I need to pay special attention when I
Hi,
The wiki for Solr talks about the schema.xml, and it seems that
changes in this file requires a restart of Solr before they have effect.
In the wiki it says:
How can I rebuild my index from scratch if I change my schema?
The most efficient/complete way is to...
1. Stop your
The way we plan to use Solr
together with a Content Management System is that the authors/editors
can create new article/document types when needed, without any need to
restart anything.
Perhaps consider using dynamic fields if you need new fields:
http://wiki.apache.org/solr/SchemaXml#head
Quoting Ryan McKinley <[EMAIL PROTECTED]>:
The way we plan to use Solr together with a Content Management
System is that the authors/editors can create new article/document
types when needed, without any need to restart anything.
Do you really need to change the schema? Your CMS will t
Music is another domain where this is a real problem. E.g., "The The",
"The Who", not to mention the song and album names.
-Sean
Walter Underwood wrote:
We do a similar thing with a no stopword, no stemming field.
There are a surprising number of movie titles that are entirely
stopwords. "Be
[EMAIL PROTECTED] wrote:
Quoting Ryan McKinley <[EMAIL PROTECTED]>:
The way we plan to use Solr together with a Content Management
System is that the authors/editors can create new article/document
types when needed, without any need to restart anything.
Do you really need to change the
Hi,
Thank for your reply.
Question for apply xslt: If I use saxon, where should the saxon.jar located
if I using the example jetty server? lib/ inside example/ or outside the
example/?
Thank you,
Vinci
ryantxu wrote:
>
> Vinci wrote:
>> Hi all,
>>
>> I am new to Solr and just make the Solr (
Hello all,
We r having some problems using solr synonyms. If I define a synonym for
example:
refrigerador,geladeira
And if I search for "refrigerador", I'll have all results for
"refrigerador", for "geladeira", and all results for the flexed words
for what i've typed (refrigerador, refriger
: It worked, but the problem is that I fail to get a decent ration between my
: "other_queries" and "timebias". I would like to keep timebias at ~15% max
: (for totally fresh docs), kind of dropping to nothing at ~one week olds.
: Adding to BooleanQuery sums the subquery scores, so I guess there's
Hi all,
After some test, I get it work :)
Reduced schema.xml: http://kwon37xi.springnote.com/pages/335478
Basically you need apply the change on schema.xml only, the class is in 1.3
nightly build.
CHANGE: change the tokenizer element defined in all analyzer element,
especially and for fieldtyp
I am planning to index 275+ different sites with Solr, each of which
might have anywhere up to 200 000 documents. When performing searches,
I need to be able to search against any combination of sites.
Does anybody have suggestions what the best practice for a scenario
like that would be, consideri
Hi,
I think Solr allow you to do asymmetric query processing and indexing.(*Not
all the preprocessing can be asymmetric - stemming, lowercasing must be
symmetric) To make the query work, at least you need to make the stop words
to be indexed and then the query should not do the stop word removal
Hi all,
I want to Solr to index my html document collection. After I read number of
tutorial and google search, I have some questions...
1. Can I index html document directly?
2. what should I do on the default schema.xml for indexing html documents?
3. Can fields to be defined by a combination o
Hi, I'm wondering if theres a way to change a single field of a document
without re-indexing every field. I'd like to do something like this:
1val1
Then later:
1val2
After the second statement, the document is overwritten, so the value of
field1 is lost. Is there a way I can do something lik
Hi Vinci,
Maybe this answers most of your questions: Solr can't digest HTML - you have to
do HTML parsing outside of Solr, and feed it a document with specific fields
that match the schema.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From: Vinc
Hi Galen,
See SOLR-139 (this is from memory) issue in JIRA. Doable, but not in Solr
nightlies yet, I believe (also from memory), and requires all your fields to be
stored.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From: Galen Pahlke <[EMAIL
Sounds like SOLR-303 is a must for you. Have you looked at Nutch?
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From: Dietrich <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Tuesday, March 25, 2008 4:15:23 PM
Subject: How to index multip
I'm in a rush, so here is just a pointer: Function Queries are your friend.
They'll let you use use field values to calculate your own custom scores based
on your own custom rules/functions.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From: Ami
Vinci - I believe the NGram token filter can be used as a CJKTokenizer
replacement, and there is a Factory for that in Solr, too.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From: Vinci <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Mo
I'm using the standard Solr query language and the normal highlighting
parameters documented at
http://wiki.apache.org/solr/HighlightingParameters. Snippet generation
and highlighting is working pretty well, but my testers have
discovered something they find borderline unacceptable. If they search
On Tue, Mar 25, 2008 at 6:12 PM, Otis Gospodnetic
<[EMAIL PROTECTED]> wrote:
> Sounds like SOLR-303 is a must for you.
Why? I see the benefits of using a distributed architecture in
general, but why do you recommend it specifically for this scenario.
> Have you looked at Nutch?
I don't want to (or
On Mar 25, 2008, at 6:31 PM, Chris Harris wrote:
working pretty well, but my testers have
discovered something they find borderline unacceptable. If they search
for
"stock market"
(with quotes), then Solr correctly returns only documents where
"stock" and "market" appear as adjacent words.
I want to know if we can use index replication when we have segmented indexes
over multiple solr instances?
--
View this message in context:
http://www.nabble.com/Master-Slave-Replication-tp16293553p16293553.html
Sent from the Solr - User mailing list archive at Nabble.com.
Hi Otis,
Thank you for your comment.
Basicially CJKTokenizer is not the same as the NGramTokenizer - CJKTokenizer
only apply biGram on the CJK Character but not the English word
Vinci
Otis Gospodnetic wrote:
>
> Vinci - I believe the NGram token filter can be used as a CJKTokenizer
> replacem
Hi Otis,
Thank you for your reply.
Actually the parsing is done, I just use the html tag as field name - is
that ok for Solr?
By the way, can the attribute in fields be meaningful to Solr?
Vinci
Otis Gospodnetic wrote:
>
> Hi Vinci,
>
> Maybe this answers most of your questions: Solr can't d
Dietrich,
I pointed to SOLR-303 because 275 * 200,000 looks like a too big of a number
for a single machine to handle.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From: Dietrich <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Tuesday,
On 24-Mar-08, at 3:32 AM, Leonardo Santagada wrote:
On 24/03/2008, at 04:39, Christian Vogler wrote:
On Monday 24 March 2008 01:01:59 Leonardo Santagada wrote:
I have done some modifications on the solr python client[1], and
though we kept the same license and my work could be put back in
so
36 matches
Mail list logo