Re: What are the limits? Billions of records anyone?

2008-03-25 Thread Norberto Meijome
On Mon, 24 Mar 2008 22:58:18 -0700 (PDT) Vinci <[EMAIL PROTECTED]> wrote: > *Hadoop is more focusing on the disturbuted crawler as far I know... Hadoop is distributed processing based on the MapReduce algorithm/approach. Nutch is a lucene related project that uses Hadoop for the crawler and ind

Re: What are the limits? Billions of records anyone?

2008-03-25 Thread tim robertson
Thanks Yonik, I will give it a play when I get some time and write back. Tim On Tue, Mar 25, 2008 at 1:21 AM, Yonik Seeley <[EMAIL PROTECTED]> wrote: > On Mon, Mar 24, 2008 at 5:30 PM, tim robertson > <[EMAIL PROTECTED]> wrote: > > Is there any documentation on whether indexes can be partition

Document Path issue and change the layout in the example

2008-03-25 Thread Vinci
Hi all, I started the indexing with jetty and then I come with some question... 1. If I use the example start.jar, what should be my document system layout? What is the essential folder? solr_jar |_start.jar |_solrhome |_etc |_lib |_logs And where is the solr main library located? outside of the

RE: Slow Highlighting -> CopyField maxSize property

2008-03-25 Thread nicolas . dessaigne
Hi Koji, It needs a bit of polishing first, but we'll provide a patch if you're interested. I'll keep you informed as soon as it is available. Nicolas -Message d'origine- De : Koji Sekiguchi [mailto:[EMAIL PROTECTED] Envoyé : vendredi 21 mars 2008 16:50 À : solr-user@lucene.apache.org O

Highlight - get terms used by lucene

2008-03-25 Thread Tim Mahy
Hi All, we use highlighting and snippets for our searches. Besides those two, I would want to have a list of terms that lucene used for the highlighting, so that I can pull out of a "Tim OR Antwerpen AND Ekeren" the following terms : Antwerpen, Ekeren if let's say these are the only terms that

Re: Plans for a new Solr Python library

2008-03-25 Thread David Pratt
Have you got a link to the new project. Many thanks. David Leonardo Santagada wrote: On 24/03/2008, at 15:34, Yonik Seeley wrote: On Mon, Mar 24, 2008 at 12:27 PM, Ed Summers <[EMAIL PROTECTED]> wrote: On Mon, Mar 24, 2008 at 12:13 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote: AFAIK, no one h

SolrJ Questions

2008-03-25 Thread Vinci
Hi all, I have checked the wiki and have some question in mind for the solrj... 1. If I want to run solrj as independent server, do I need to write my own client program? 2. Can I run SolrJ like the example jetty server at anywhere? *p.s. We should give a better name of the default example jetty

Re: solr.search.function

2008-03-25 Thread Umar Shah
On 3/21/08, Chris Hostetter <[EMAIL PROTECTED]> wrote: > > > : I am investigating to implement an aggregate average function for a > document > > : and require help for the same. > > > First off: please don't repost the same email with a different subject > (on either solr list) just because you do

Re: SolrJ Questions

2008-03-25 Thread Ryan McKinley
Vinci wrote: Hi all, I have checked the wiki and have some question in mind for the solrj... 1. If I want to run solrj as independent server, do I need to write my own client program? solrj is the client -- it connects to a server. You should not need to write your own client. 2. Can I r

Re: Beginner questions: Jetty and solr with utf-8 + cached page + dedup

2008-03-25 Thread Ryan McKinley
Vinci wrote: Hi all, I am new to Solr and just make the Solr (3-8-nightly) run on the machine. I want the System to be more portable so I want to use the jetty Solr in example...before I tried to index the documents, I would like to ask some question: 1. Do I need to pay special attention when I

Update schema.xml without restarting Solr?

2008-03-25 Thread solr
Hi, The wiki for Solr talks about the schema.xml, and it seems that changes in this file requires a restart of Solr before they have effect. In the wiki it says: How can I rebuild my index from scratch if I change my schema? The most efficient/complete way is to... 1. Stop your

Re: Update schema.xml without restarting Solr?

2008-03-25 Thread Ryan McKinley
The way we plan to use Solr together with a Content Management System is that the authors/editors can create new article/document types when needed, without any need to restart anything. Perhaps consider using dynamic fields if you need new fields: http://wiki.apache.org/solr/SchemaXml#head

Re: Update schema.xml without restarting Solr?

2008-03-25 Thread solr
Quoting Ryan McKinley <[EMAIL PROTECTED]>: The way we plan to use Solr together with a Content Management System is that the authors/editors can create new article/document types when needed, without any need to restart anything. Do you really need to change the schema? Your CMS will t

Re: stopwords and phrase queries

2008-03-25 Thread Sean Timm
Music is another domain where this is a real problem. E.g., "The The", "The Who", not to mention the song and album names. -Sean Walter Underwood wrote: We do a similar thing with a no stopword, no stemming field. There are a surprising number of movie titles that are entirely stopwords. "Be

Re: Update schema.xml without restarting Solr?

2008-03-25 Thread Ryan McKinley
[EMAIL PROTECTED] wrote: Quoting Ryan McKinley <[EMAIL PROTECTED]>: The way we plan to use Solr together with a Content Management System is that the authors/editors can create new article/document types when needed, without any need to restart anything. Do you really need to change the

Re: Beginner questions: Jetty and solr with utf-8 + cached page + dedup

2008-03-25 Thread Vinci
Hi, Thank for your reply. Question for apply xslt: If I use saxon, where should the saxon.jar located if I using the example jetty server? lib/ inside example/ or outside the example/? Thank you, Vinci ryantxu wrote: > > Vinci wrote: >> Hi all, >> >> I am new to Solr and just make the Solr (

synonyms

2008-03-25 Thread Lucas F. A. Teixeira
Hello all, We r having some problems using solr synonyms. If I define a synonym for example: refrigerador,geladeira And if I search for "refrigerador", I'll have all results for "refrigerador", for "geladeira", and all results for the flexed words for what i've typed (refrigerador, refriger

Re: FunctionQuery in a custom request handler

2008-03-25 Thread Chris Hostetter
: It worked, but the problem is that I fail to get a decent ration between my : "other_queries" and "timebias". I would like to keep timebias at ~15% max : (for totally fresh docs), kind of dropping to nothing at ~one week olds. : Adding to BooleanQuery sums the subquery scores, so I guess there's

Re: CJKTokenizer in Solr 1.3? [Solution, wiki updater wanted]

2008-03-25 Thread Vinci
Hi all, After some test, I get it work :) Reduced schema.xml: http://kwon37xi.springnote.com/pages/335478 Basically you need apply the change on schema.xml only, the class is in 1.3 nightly build. CHANGE: change the tokenizer element defined in all analyzer element, especially and for fieldtyp

How to index multiple sites with option of combining results in search

2008-03-25 Thread Dietrich
I am planning to index 275+ different sites with Solr, each of which might have anywhere up to 200 000 documents. When performing searches, I need to be able to search against any combination of sites. Does anybody have suggestions what the best practice for a scenario like that would be, consideri

Re: stopwords and phrase queries

2008-03-25 Thread Vinci
Hi, I think Solr allow you to do asymmetric query processing and indexing.(*Not all the preprocessing can be asymmetric - stemming, lowercasing must be symmetric) To make the query work, at least you need to make the stop words to be indexed and then the query should not do the stop word removal

Fields, Facets and Indexing html document

2008-03-25 Thread Vinci
Hi all, I want to Solr to index my html document collection. After I read number of tutorial and google search, I have some questions... 1. Can I index html document directly? 2. what should I do on the default schema.xml for indexing html documents? 3. Can fields to be defined by a combination o

Update a field without reindexing the entire document?

2008-03-25 Thread Galen Pahlke
Hi, I'm wondering if theres a way to change a single field of a document without re-indexing every field. I'd like to do something like this: 1val1 Then later: 1val2 After the second statement, the document is overwritten, so the value of field1 is lost. Is there a way I can do something lik

Re: Fields, Facets and Indexing html document

2008-03-25 Thread Otis Gospodnetic
Hi Vinci, Maybe this answers most of your questions: Solr can't digest HTML - you have to do HTML parsing outside of Solr, and feed it a document with specific fields that match the schema. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Vinc

Re: Update a field without reindexing the entire document?

2008-03-25 Thread Otis Gospodnetic
Hi Galen, See SOLR-139 (this is from memory) issue in JIRA. Doable, but not in Solr nightlies yet, I believe (also from memory), and requires all your fields to be stored. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Galen Pahlke <[EMAIL

Re: How to index multiple sites with option of combining results in search

2008-03-25 Thread Otis Gospodnetic
Sounds like SOLR-303 is a must for you. Have you looked at Nutch? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Dietrich <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Tuesday, March 25, 2008 4:15:23 PM Subject: How to index multip

Re: Query Time Boosting

2008-03-25 Thread Otis Gospodnetic
I'm in a rush, so here is just a pointer: Function Queries are your friend. They'll let you use use field values to calculate your own custom scores based on your own custom rules/functions. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Ami

Re: CJKTokenizer in Solr 1.3?

2008-03-25 Thread Otis Gospodnetic
Vinci - I believe the NGram token filter can be used as a CJKTokenizer replacement, and there is a Factory for that in Solr, too. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Vinci <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Mo

Highlighting Quoted Phrases

2008-03-25 Thread Chris Harris
I'm using the standard Solr query language and the normal highlighting parameters documented at http://wiki.apache.org/solr/HighlightingParameters. Snippet generation and highlighting is working pretty well, but my testers have discovered something they find borderline unacceptable. If they search

Re: How to index multiple sites with option of combining results in search

2008-03-25 Thread Dietrich
On Tue, Mar 25, 2008 at 6:12 PM, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > Sounds like SOLR-303 is a must for you. Why? I see the benefits of using a distributed architecture in general, but why do you recommend it specifically for this scenario. > Have you looked at Nutch? I don't want to (or

Re: Highlighting Quoted Phrases

2008-03-25 Thread Brian Whitman
On Mar 25, 2008, at 6:31 PM, Chris Harris wrote: working pretty well, but my testers have discovered something they find borderline unacceptable. If they search for "stock market" (with quotes), then Solr correctly returns only documents where "stock" and "market" appear as adjacent words.

Master Slave Replication

2008-03-25 Thread swarag
I want to know if we can use index replication when we have segmented indexes over multiple solr instances? -- View this message in context: http://www.nabble.com/Master-Slave-Replication-tp16293553p16293553.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: CJKTokenizer in Solr 1.3?

2008-03-25 Thread Vinci
Hi Otis, Thank you for your comment. Basicially CJKTokenizer is not the same as the NGramTokenizer - CJKTokenizer only apply biGram on the CJK Character but not the English word Vinci Otis Gospodnetic wrote: > > Vinci - I believe the NGram token filter can be used as a CJKTokenizer > replacem

Re: Fields, Facets and Indexing html document

2008-03-25 Thread Vinci
Hi Otis, Thank you for your reply. Actually the parsing is done, I just use the html tag as field name - is that ok for Solr? By the way, can the attribute in fields be meaningful to Solr? Vinci Otis Gospodnetic wrote: > > Hi Vinci, > > Maybe this answers most of your questions: Solr can't d

Re: How to index multiple sites with option of combining results in search

2008-03-25 Thread Otis Gospodnetic
Dietrich, I pointed to SOLR-303 because 275 * 200,000 looks like a too big of a number for a single machine to handle. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Dietrich <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Tuesday,

Re: Plans for a new Solr Python library

2008-03-25 Thread Mike Klaas
On 24-Mar-08, at 3:32 AM, Leonardo Santagada wrote: On 24/03/2008, at 04:39, Christian Vogler wrote: On Monday 24 March 2008 01:01:59 Leonardo Santagada wrote: I have done some modifications on the solr python client[1], and though we kept the same license and my work could be put back in so