Re: Memory use with sorting problem

2007-11-22 Thread Chris Laux
Thanks for your reply. I made some memory saving changes, as per your advice, but the problem remains. > Set the max warming searchers to 1 to ensure that you never have more > than one warming at the same time. Done. > How many documents are in your index? Currently about 8 million. > If you

Re: Performance problems for OR-queries

2007-11-22 Thread Jörg Kiegeland
1. Does Solr support this kind of index access with better performance ? Is there anything special to define in schema.xml? No... Solr uses Lucene at it's core, and all matching documents for a query are scored. So it is not possible to have a "google" like performance with Solr, i.e.

Re: Document update based on ID

2007-11-22 Thread Jörg Kiegeland
Yes, SOLR-139 will eventually do what you need. The most recent patch should not be *too* hard to get running (it may not apply cleanly though) The patch as is needs to be reworked before it will go into trunk. I hope this will happen in the next month or so. As for production? It depend

Re: Strange behavior MoreLikeThis Feature

2007-11-22 Thread Ryan McKinley
Now when I run the following query: http://localhost:8080/solr/mlt?q=id:neardup06&mlt.fl=features&mlt.mindf=1&mlt.mintf=1&mlt.displayTerms=details&wt=json&indent=on try adding: &debugQuery=on to your query string and you can see why each document matches... My guess is that "features" uses

Re: Document update based on ID

2007-11-22 Thread Ryan McKinley
Jörg Kiegeland wrote: Yes, SOLR-139 will eventually do what you need. The most recent patch should not be *too* hard to get running (it may not apply cleanly though) The patch as is needs to be reworked before it will go into trunk. I hope this will happen in the next month or so. As for

Grouping multiValued fields

2007-11-22 Thread Mark Baird
Let's say I have a class Item that has a collection of Sell objects. Sell objects have two properties sellingTime (Date) and salesPerson (String). So in my Solr schema I have something like the following fields defined: An add might look like the following: 1 2007-11-23T23:01:00Z

Heritrix and Solr

2007-11-22 Thread George Everitt
I'm looking for a web crawler to use with Solr. The objective is to crawl about a dozen public web sites regarding a specific topic. After a lot of googling, I came across Heritrix, which seems to be the most robust well supported open source crawler out there. Heritrix has an integratio

Re: Heritrix and Solr

2007-11-22 Thread A. Banji Oyebisi
I am interested in this too. any ideas? A. Banji Oyebisi Choicegen, LLC. Email: [EMAIL PROTECTED] Web URL: http://www.choicegen.com Choicegen... Helping you make better choices! Notice: This email message, together with any attachments, may contain information of Choicegen, LLC.,

Re: Heritrix and Solr

2007-11-22 Thread Cool Coder
I have some sort of same requirement where I need to move to a good crawler. Currently I am using a custom crawler, I mean my own crawler to crawl some public domains and uses Lucene to index all downloaded pages. After doing lots of research I came across JSpider with Lucene. ALso I was looki

Re: Performance problems for OR-queries

2007-11-22 Thread Mike Klaas
On 22-Nov-07, at 6:02 AM, Jörg Kiegeland wrote: 1. Does Solr support this kind of index access with better performance ? Is there anything special to define in schema.xml? No... Solr uses Lucene at it's core, and all matching documents for a query are scored. So it is not possible to hav

Re: Heritrix and Solr

2007-11-22 Thread Norberto Meijome
On Thu, 22 Nov 2007 10:41:41 -0500 George Everitt <[EMAIL PROTECTED]> wrote: > After a lot of googling, I came across Heritrix, which seems to be the > most robust well supported open source crawler out there. Heritrix > has an integration with Nutch (NutchWax), but not with Solr. I'm >

Re: Any tips for indexing large amounts of data?

2007-11-22 Thread Otis Gospodnetic
Brendan - yes, 64-bit Linux this is, and the JVM got 5.5 GB heap, though it could have worked with less. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Brendan Grainger <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Wednesday, Novem

Re: Heritrix and Solr

2007-11-22 Thread Norberto Meijome
On Thu, 22 Nov 2007 19:10:46 -0800 (PST) Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > The answer to that question, Norberto, would depend on versions. Otis, would that relate to what underlying version of Lucene is being used in either Solr & Nutch? _ {Beto|Norberto|Nu

C++ type of analysis issues

2007-11-22 Thread Yu-Hui Jin
Hi, there, I haven't found any existing filter/tokenizer that can deal with "C++" type of search keywords. I'm using WordDelimiterFilter which removes the "++". One way I am thinking of right now is to use synonym filter before the WordDelimiterFilter to replace "c++" (after low-cased it) with s

Re: Document update based on ID

2007-11-22 Thread Walter Underwood
This can be useful, but it is limited. At Infoseek, we used this for demoting porn and spam in the index in 1996, but replaced it with more precise approaches. wunder On 11/22/07 6:49 AM, "Ryan McKinley" <[EMAIL PROTECTED]> wrote: > Jörg Kiegeland wrote: >> >>> Yes, SOLR-139 will eventually do

Re: Heritrix and Solr

2007-11-22 Thread Otis Gospodnetic
The answer to that question, Norberto, would depend on versions. George: why not just use straight Nutch and forget about Heritrix? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Norberto Meijome <[EMAIL PROTECTED]> To: solr-user@lucene.apache

Re: Heritrix and Solr

2007-11-22 Thread Otis Gospodnetic
si si (no need to use reply-all, I'm on solr-user). Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Norberto Meijome <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Cc: [EMAIL PROTECTED] Sent: Thursday, November 22, 2007 11:53:31 PM Subject:

Re: Memory use with sorting problem

2007-11-22 Thread Otis Gospodnetic
Chris, I checked Luke handler for you on a sample index. Indeed, it does provide the number of terms and a bunch of other nice information, for example: 19295605 20437118 49209736 <--- here 1195333103547 false true true Otis -- Sematext -- http://sematext.com/ -- Lucene - Sol

Re: Heritrix and Solr

2007-11-22 Thread Otis Gospodnetic
Hi George, Thank you for your kind words about Lucene in Action. :) I wouldn't compare Solr and Nutch, they are really made for different things. I was suggesting Nutch instead of Heritrix, not instead of Solr. The Solr+Nutch patch is in JIRA and there is a fresh patch in therestill warm,

Re: Heritrix and Solr

2007-11-22 Thread George Everitt
Otis: There are many reasons I prefer Solr to Nutch: 1. I actually tried to do some of the crawling with Nutch, but found the crawling options less flexible than I would have liked. 2. I prefer the Solr approach in general. I have a long background in Verity and Autonomy search, and Solr is

Re: Memory use with sorting problem

2007-11-22 Thread Otis Gospodnetic
I'd have to check, but Luke handler might spit that out. If not, Lucene's TermEnum & co. are your friends. :) Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Chris Laux <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Thursday, Novemb

Re: Strange behavior MoreLikeThis Feature

2007-11-22 Thread Rishabh Joshi
Thanks Ryan. I now know the reason why. Before I explain the reason, let me correct the mistake I made in my earlier mail. I was not using the first document mentioned in the xml . Instead it was this one: IW-02 iPod & iPod Mini USB 2.0 Cable Belkin electronics connector car power adap