Re: minimum occurances of term in document

2007-08-30 Thread Jed Reynolds
Mike Klaas wrote: On 30-Aug-07, at 4:01 PM, Chris Hostetter wrote: You could accomplish the goal without any coding by using phrase queries: "calico calico calico"~1 will match only documents that have at least three occurrences of calico. If this is performant enough, you are done. O

Re: multiple solr home directories

2007-08-30 Thread Chris Hostetter
Just to make sure. you mean we can create a directory containing the shared jars, and each solr home/lib will symlink to the jar files in that directory. Right? correct. -Hoss

Re: SOLR developer

2007-08-30 Thread Tim Archambault
Thanks. I didn't mean to send that to the list-serv :} On 8/31/07, Bertrand Delacretaz <[EMAIL PROTECTED]> wrote: > > On 8/31/07, Tim Archambault <[EMAIL PROTECTED]> wrote: > > ...I'm thinking of sending a similar > > list-serv item out, but I noticed this is a solr-user list, not > necessarily >

Re: SOLR developer

2007-08-30 Thread Bertrand Delacretaz
On 8/31/07, Tim Archambault <[EMAIL PROTECTED]> wrote: > ...I'm thinking of sending a similar > list-serv item out, but I noticed this is a solr-user list, not necessarily > a developers list so I thought I'd ask Note that there's also [EMAIL PROTECTED] for such purposes, see http://www.apach

Re: SOLR developer

2007-08-30 Thread Tim Archambault
Mark, Did you get any responses to your inquiry? I'm thinking of sending a similar list-serv item out, but I noticed this is a solr-user list, not necessarily a developers list so I thought I'd ask. I'm looking for someone to integrate with Drupal. Tim Archambault Online Manager Bangordailynews.

Re: performance questions

2007-08-30 Thread Walter Underwood
Sorry dude, I'm pining for Python and coding in Java. --wunder On 8/30/07 6:57 PM, "Erik Hatcher" <[EMAIL PROTECTED]> wrote: > > On Aug 30, 2007, at 6:31 PM, Mike Klaas wrote: >> Another reason why people use stored procs is to prevent multiple >> round-trips in a multi-stage query operation. T

Re: performance questions

2007-08-30 Thread Erik Hatcher
On Aug 30, 2007, at 6:31 PM, Mike Klaas wrote: Another reason why people use stored procs is to prevent multiple round-trips in a multi-stage query operation. This is exactly what complex RequestHandlers do (and the equivalent to a custom stored proc would be writing your own handler). A

Re: Multiple indexes

2007-08-30 Thread James liu
OK...I see...thk u ,mike. 2007/8/31, Mike Klaas <[EMAIL PROTECTED]>: > > > On 29-Aug-07, at 10:21 PM, James liu wrote: > > > Does it affect with doc size? > > > > for example 2 billion docs, 10k doc2 billion docs, but doc size > > is 10m. > > There might be other places that have 2G limit (see

Re: multiple solr home directories

2007-08-30 Thread Yu-Hui Jin
Thanks, Hoss, >> you still use a separate lib directory for each solr home and symlink each jar ... Just to make sure. you mean we can create a directory containing the shared jars, and each solr home/lib will symlink to the jar files in that directory. Right? Thanks, -Hui On 8/30/07, Chri

Re: minimum occurances of term in document

2007-08-30 Thread Mike Klaas
On 30-Aug-07, at 4:01 PM, Chris Hostetter wrote: You could accomplish the goal without any coding by using phrase queries: "calico calico calico"~1 will match only documents that have at least three occurrences of calico. If this is performant enough, you are done. Otherwise, you'll

Re: minimum occurances of term in document

2007-08-30 Thread Chris Hostetter
You could accomplish the goal without any coding by using phrase queries: "calico calico calico"~1 will match only documents that have at least three occurrences of calico. If this is performant enough, you are done. Otherwise, you'll have to do some custom coding. I'll be searching art

Re: minimum occurances of term in document

2007-08-30 Thread Jed Reynolds
Mike Klaas wrote: On 30-Aug-07, at 1:22 PM, Jed Reynolds wrote: Jed Reynolds wrote: Apologies if this is in the Lucene FAQ, but I was looking thru the Lucene syntax and I just didn't see it. Is there a way to search for documents that have a certain number of occurrences of a term in the

Re: minimum occurances of term in document

2007-08-30 Thread Mike Klaas
On 30-Aug-07, at 3:30 PM, Chris Hostetter wrote: One way would be to create your own Query subclass (similar to TermQuery) that returned a score of zero for docs below a certain tf threshold. This is minor clarification: a score of zero is still a match ... the key to writting custom que

Re: performance questions

2007-08-30 Thread Mike Klaas
On 30-Aug-07, at 3:18 PM, Chris Hostetter wrote: 2. Someone asked me if SOLR utilizes anything like a "stored procedure" to make queries faster. Does SOLR support anything such as this? it's kind of an apples vs orange-juice comparison, ut typcailly when people talk about DB stored pr

Re: minimum occurances of term in document

2007-08-30 Thread Chris Hostetter
One way would be to create your own Query subclass (similar to TermQuery) that returned a score of zero for docs below a certain tf threshold. This is minor clarification: a score of zero is still a match ... the key to writting custom queries is to "skip" past a document that doesn't meet the

Re: performance questions

2007-08-30 Thread Chris Hostetter
2. Someone asked me if SOLR utilizes anything like a "stored procedure" to make queries faster. Does SOLR support anything such as this? it's kind of an apples vs orange-juice comparison, ut typcailly when people talk about DB stored procedures being faster then raw SQL they are refering to

Re: How to use special characters during Solr Search?

2007-08-30 Thread Chris Hostetter
Scenario 1 : I want to give "priya" with double quotes. The result should be only priya which is 1. My programming model normally written as (searchtext*) which searches all records. So i given ("priya"*). But it was unable to parse in solr. So i gave ("priya"\*) which gives the result only

Re: multiple solr home directories

2007-08-30 Thread Chris Hostetter
* can we set up multiple Solr home directories within the same Solr instance? (I want to use the same Tomcat Solr instance to support indexing and searching over multiple independent indexes.) yes. using JNDI you can configure multiple instances of Solr each with a seperate solr home. the t

Re: XML output for Analysis admin functionality

2007-08-30 Thread Mike Klaas
On 28-Aug-07, at 3:04 AM, Stephanie Belton wrote: Hi, I need to programmatically put search terms through the query analyser and retrieve the result. I thought the easiest way to do this would be to call the existing /solr/admin/analysis.jsp, but it would be so much nicer if there was

Re: sort problem

2007-08-30 Thread Mike Klaas
On 28-Aug-07, at 6:19 AM, michael ravits wrote: hello solrs, i have an index with 30M records, weights ~50GB. latest trunk version. heap size 1024mb. queries work fine until I specify a field to sort results by. even if the result set consists of only 2 documents, the CPU jumps high and

Re: Embedded Solr w/ multiple indexes

2007-08-30 Thread Mike Klaas
On 30-Aug-07, at 9:43 AM, Panbodee Mekpaiboon wrote: It seems like Solr uses only one index(which will be created under tag) but I need to create more than one index and it would nice to be able to specify the location of each index etc. Is there any way to manage solr index on the fly (f

Re: Index corruption checker?

2007-08-30 Thread Mike Klaas
On 30-Aug-07, at 12:09 PM, Lance Norskog wrote: Is there an app that walks a Lucene index and checks for corruption? How would we know if our index had become corrupted? Try asking on [EMAIL PROTECTED] -Mike

Re: Facet for multiple values field

2007-08-30 Thread Mike Klaas
On 30-Aug-07, at 1:41 PM, Giri wrote: Tom, Thank you very much for the help. If I have multiple values, I add them as separate occurrences of the field I am faceting on. Is this means, for a single record, I can add multiple values for a field? for example for the file "sensor" I can se

Re: minimum occurances of term in document

2007-08-30 Thread Mike Klaas
On 30-Aug-07, at 1:22 PM, Jed Reynolds wrote: Jed Reynolds wrote: Apologies if this is in the Lucene FAQ, but I was looking thru the Lucene syntax and I just didn't see it. Is there a way to search for documents that have a certain number of occurrences of a term in the document? Like, I

Re: Facet for multiple values field

2007-08-30 Thread Giri
Tom, Thank you very much for the help. >>If I have multiple values, I add them as separate occurrences of the field I am faceting on. Is this means, for a single record, I can add multiple values for a field? for example for the file "sensor" I can send multiple values? Let me try this and get

Re: minimum occurances of term in document

2007-08-30 Thread Jed Reynolds
Jed Reynolds wrote: Apologies if this is in the Lucene FAQ, but I was looking thru the Lucene syntax and I just didn't see it. Is there a way to search for documents that have a certain number of occurrences of a term in the document? Like, I want to find all documents that have the term Ca

Re: Facet for multiple values field

2007-08-30 Thread Tom Hill
Hi - I wouldn't facet on a "text" field, I tend to use "string" for the reasons you describe. e.g. Use or in your example If I have multiple values, I add them as separate occurrences of the field I am faceting on. If you still need them all in one field for other reasons, use copyField

Index corruption checker?

2007-08-30 Thread Lance Norskog
Is there an app that walks a Lucene index and checks for corruption? How would we know if our index had become corrupted? Thanks, Lance

Re: Multiple indexes

2007-08-30 Thread Mike Klaas
On 30-Aug-07, at 10:57 AM, Nathaniel E. Powell wrote: Is there functionality for partitioning Solr indexes onto multiple machines? For this to work, I suppose that Solr would have to combine the results from the various machines. I think Nutch does this with the distributed searcher functi

RE: Multiple indexes

2007-08-30 Thread Nathaniel E. Powell
Is there functionality for partitioning Solr indexes onto multiple machines? For this to work, I suppose that Solr would have to combine the results from the various machines. I think Nutch does this with the distributed searcher functionality. -Nathan -Original Message- From: Mike Kla

Re: Multiple indexes

2007-08-30 Thread Mike Klaas
On 29-Aug-07, at 10:21 PM, James liu wrote: Does it affect with doc size? for example 2 billion docs, 10k doc2 billion docs, but doc size is 10m. There might be other places that have 2G limit (see lucene index format docs), but many things are vints and can grow larger. Of course

Re: performance questions

2007-08-30 Thread Mike Klaas
On 30-Aug-07, at 9:51 AM, Andrew Nagy wrote: Here are a few SOLR performance questions: 1. I have noticed with 500,000+ records that my facets run quite fast regarding my dataset when there is a large number of matches, but on a small result set (say 10 - 50) the facet queries become ver

minimum occurances of term in document

2007-08-30 Thread Jed Reynolds
Apologies if this is in the Lucene FAQ, but I was looking thru the Lucene syntax and I just didn't see it. Is there a way to search for documents that have a certain number of occurrences of a term in the document? Like, I want to find all documents that have the term Calico mentioned three

performance questions

2007-08-30 Thread Andrew Nagy
Here are a few SOLR performance questions: 1. I have noticed with 500,000+ records that my facets run quite fast regarding my dataset when there is a large number of matches, but on a small result set (say 10 - 50) the facet queries become very slow. Any suggestions as to how to improve this?

Embedded Solr w/ multiple indexes

2007-08-30 Thread Panbodee Mekpaiboon
It seems like Solr uses only one index(which will be created under tag) but I need to create more than one index and it would nice to be able to specify the location of each index etc. Is there any way to manage solr index on the fly (for Embedded version of Solr)?

Facet for multiple values field

2007-08-30 Thread Giri
Hi, I am trying to get the facet values from a field that contains multiple words, for example: I have a field "keywords" and values for this: Keywords= relative humidity, air temperature, atmospheric moisture Please note: I am combining multiple keywords in to one single field, with comma

Re: How to use solrj ?

2007-08-30 Thread Will Johnson
take a look at the unit tests for examples of how to use the api. also the client is a client API not a client for running queries etc. http://svn.apache.org/viewvc/lucene/solr/trunk/client/java/solrj/test/ org/apache/solr/client/solrj/ - will On Aug 30, 2007, at 5:56 AM, Thierry Collogne

Re: How to use solrj ?

2007-08-30 Thread Thierry Collogne
I don't think the client can be run directly. We have developed a small application that uses the client as an interface to solr. On 30/08/2007, Teruhiko Kurosaka <[EMAIL PROTECTED]> wrote: > > Can anyone tell me how to use the Java client ? > I downloaded the complete source from SVN solr trunk a