Thanks for your reply. I made some memory saving changes, as per your
advice, but the problem remains.
> Set the max warming searchers to 1 to ensure that you never have more
> than one warming at the same time.
Done.
> How many documents are in your index?
Currently about 8 million.
> If you
1. Does Solr support this kind of index access with better performance ?
Is there anything special to define in schema.xml?
No... Solr uses Lucene at it's core, and all matching documents for a
query are scored.
So it is not possible to have a "google" like performance with Solr,
i.e.
Yes, SOLR-139 will eventually do what you need.
The most recent patch should not be *too* hard to get running (it may
not apply cleanly though) The patch as is needs to be reworked before
it will go into trunk. I hope this will happen in the next month or so.
As for production? It depend
Now when I run the following query:
http://localhost:8080/solr/mlt?q=id:neardup06&mlt.fl=features&mlt.mindf=1&mlt.mintf=1&mlt.displayTerms=details&wt=json&indent=on
try adding:
&debugQuery=on
to your query string and you can see why each document matches...
My guess is that "features" uses
Jörg Kiegeland wrote:
Yes, SOLR-139 will eventually do what you need.
The most recent patch should not be *too* hard to get running (it may
not apply cleanly though) The patch as is needs to be reworked before
it will go into trunk. I hope this will happen in the next month or so.
As for
Let's say I have a class Item that has a collection of Sell objects.
Sell objects have two properties sellingTime (Date) and salesPerson
(String).
So in my Solr schema I have something like the following fields defined:
An add might look like the following:
1
2007-11-23T23:01:00Z
I'm looking for a web crawler to use with Solr. The objective is to
crawl about a dozen public web sites regarding a specific topic.
After a lot of googling, I came across Heritrix, which seems to be the
most robust well supported open source crawler out there. Heritrix
has an integratio
I am interested in this too. any ideas?
A. Banji Oyebisi
Choicegen, LLC.
Email: [EMAIL PROTECTED]
Web URL: http://www.choicegen.com
Choicegen... Helping you make better choices!
Notice: This email message, together with any attachments, may contain information of Choicegen, LLC.,
I have some sort of same requirement where I need to move to a good crawler.
Currently I am using a custom crawler, I mean my own crawler to crawl some
public domains and uses Lucene to index all downloaded pages. After doing lots
of research I came across JSpider with Lucene.
ALso I was looki
On 22-Nov-07, at 6:02 AM, Jörg Kiegeland wrote:
1. Does Solr support this kind of index access with better
performance ?
Is there anything special to define in schema.xml?
No... Solr uses Lucene at it's core, and all matching documents for a
query are scored.
So it is not possible to hav
On Thu, 22 Nov 2007 10:41:41 -0500
George Everitt <[EMAIL PROTECTED]> wrote:
> After a lot of googling, I came across Heritrix, which seems to be the
> most robust well supported open source crawler out there. Heritrix
> has an integration with Nutch (NutchWax), but not with Solr. I'm
>
Brendan - yes, 64-bit Linux this is, and the JVM got 5.5 GB heap, though it
could have worked with less.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From: Brendan Grainger <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Wednesday, Novem
On Thu, 22 Nov 2007 19:10:46 -0800 (PST)
Otis Gospodnetic <[EMAIL PROTECTED]> wrote:
> The answer to that question, Norberto, would depend on versions.
Otis, would that relate to what underlying version of Lucene is being used in
either Solr & Nutch?
_
{Beto|Norberto|Nu
Hi, there,
I haven't found any existing filter/tokenizer that can deal with "C++"
type of search keywords. I'm using WordDelimiterFilter which removes
the "++".
One way I am thinking of right now is to use synonym filter before the
WordDelimiterFilter to replace "c++" (after low-cased it) with s
This can be useful, but it is limited. At Infoseek, we used this
for demoting porn and spam in the index in 1996, but replaced it
with more precise approaches.
wunder
On 11/22/07 6:49 AM, "Ryan McKinley" <[EMAIL PROTECTED]> wrote:
> Jörg Kiegeland wrote:
>>
>>> Yes, SOLR-139 will eventually do
The answer to that question, Norberto, would depend on versions.
George: why not just use straight Nutch and forget about Heritrix?
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From: Norberto Meijome <[EMAIL PROTECTED]>
To: solr-user@lucene.apache
si si (no need to use reply-all, I'm on solr-user).
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From: Norberto Meijome <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Cc: [EMAIL PROTECTED]
Sent: Thursday, November 22, 2007 11:53:31 PM
Subject:
Chris, I checked Luke handler for you on a sample index. Indeed, it does
provide the number of terms and a bunch of other nice information, for example:
19295605
20437118
49209736 <--- here
1195333103547
false
true
true
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Sol
Hi George,
Thank you for your kind words about Lucene in Action. :)
I wouldn't compare Solr and Nutch, they are really made for different things.
I was suggesting Nutch instead of Heritrix, not instead of Solr. The
Solr+Nutch patch is in JIRA and there is a fresh patch in therestill warm,
Otis:
There are many reasons I prefer Solr to Nutch:
1. I actually tried to do some of the crawling with Nutch, but found
the crawling options less flexible than I would have liked.
2. I prefer the Solr approach in general. I have a long background in
Verity and Autonomy search, and Solr is
I'd have to check, but Luke handler might spit that out. If not, Lucene's
TermEnum & co. are your friends. :)
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From: Chris Laux <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Thursday, Novemb
Thanks Ryan. I now know the reason why.
Before I explain the reason, let me correct the mistake I made in my earlier
mail. I was not using the first document mentioned in the xml . Instead it
was this one:
IW-02
iPod & iPod Mini USB 2.0 Cable
Belkin
electronics
connector
car power adap
22 matches
Mail list logo