Hi Ken,
It's correct that uncommon words are most likely not showing up in the
signature. However, I was trying to say that if two documents has 99%
common tokens and differ in one token with frequency > quantised
frequency, the two resulted hashes are completely different. If you
want true near d
Chris Hostetter wrote:
: After following Otis' and Thorsten's advice, I still get:
:
: HTTP ERROR: 500 No Java compiler available
Just so i'm clear, you:
1) downloaded solr, tried out the tutorial, and had the
url http://localhost:8983/solr/admin/ work when you ran:
> cd $
: After following Otis' and Thorsten's advice, I still get:
:
: HTTP ERROR: 500 No Java compiler available
Just so i'm clear, you:
1) downloaded solr, tried out the tutorial, and had the
url http://localhost:8983/solr/admin/ work when you ran:
> cd $DIR_CONTAINING_SOLR/example
After following Otis' and Thorsten's advice, I still get:
HTTP ERROR: 500 No Java compiler available
running http://localhost:8280/solr/admin out of the Debian solr-jetty
package.
I have *both* the sun 5 and 6 JDK and JRE installed and both have javac
/usr/lib/jvm/java-1.5.0-sun/bin/javac
/u
On Nov 21, 2007 3:09 PM, Jörg Kiegeland <[EMAIL PROTECTED]> wrote:
> I have N keywords and execute a query of the form
>
> keyword1 OR keyword2 OR .. OR keywordN
[...]
> This seems to take linear time to the size of all possible matched
> documents.
Yes.
> 1. Does Solr support this kind of index
I have N keywords and execute a query of the form
keyword1 OR keyword2 OR .. OR keywordN
The search result would be very large (some million), so I defined a
result limit of 100.
However Solr seems now to calculates for every possible result document
the number of matched keywords and to order
On 20-Nov-07, at 8:51 PM, Tracy Flynn wrote:
I'm trying to find the right place to start in this community.
I recently posted a question in the thread on SOLR-236. In that
posting I mentioned that I was hoping to persuade my management to
move from a FAST installation to a SOLR-based one.
On 21-Nov-07, at 12:29 AM, climbingrose wrote:
The problem with this approach is MD5 hash is very sensitive: one
letter difference will generate completely different hash. You
probably have to roll your own near duplication detection algorithm.
My advice is have a look at existing literature on
Actually when I look at the errormessage, this has nothing to do with
memory.
The error message:
java.lang.OutOfMemoryError: unable to create new native thread
means that the OS can not create any new native threads for this JVM. So the
limit you are running into is not the JVM Memory.
I guess you
Hi Otis,
Thanks for this. Are you using a flavor of linux and is it 64bit? How
much heap are you giving your jvm?
Thanks again
Brendan
On Nov 21, 2007, at 2:03 AM, Otis Gospodnetic wrote:
Mike is right about the occasional slow-down, which appears as a
pause and is due to large Lucene ind
On Nov 21, 2007 11:06 AM, Chris Laux <[EMAIL PROTECTED]> wrote:
> Now when I reduce the size of caches (to a fraction of the default
> settings) and number of warming Searchers (to 2),
Set the max warming searchers to 1 to ensure that you never have more
than one warming at the same time.
> memo
Hi all,
I've been struggling with this problem for over a month now, and
although memory issues have been discussed often, I don't seem to be
able to find a fitting solution.
The index is merely 1.5 GB large, but memory use quickly fills out the
heap max of 1 GB on a 2 GB machine. This then works
Evgeniy Strokin wrote:
Hello,..
I have a document indexed with Solr. Originally it had only few fields. I want to add some more fields to the index later, based on ID but I don't want to submit original fields again. I use Solr 1.2, but I think there is no such functionality yet. But I saw a feat
Hello,..
I have a document indexed with Solr. Originally it had only few fields. I want
to add some more fields to the index later, based on ID but I don't want to
submit original fields again. I use Solr 1.2, but I think there is no such
functionality yet. But I saw a feature here
https://iss
The duplication detection mechanism in Nutch is quite primitive. I
think it uses a MD5 signature generated from the content of a field.
The generation algorithm is described here:
http://lucene.apache.org/nutch/apidocs-0.8.x/org/apache/nutch/crawl/TextProfileSignature.html.
The problem with this
Thanks a lot for your responses! They were all very helpful!
On Nov 20, 2007, at 5:52 PM, Norberto Meijome wrote:
On Tue, 20 Nov 2007 16:26:27 -0600
Alexander Wallace <[EMAIL PROTECTED]> wrote:
Interesting, this ALL MASTERS mode... I guess you don't do any
replication then...
correct
In t
HI Otis,
Thanks for the reply. I am using a pretty "vanilla approach" right
now and it's taking about 30 hours to build an index of about 5.5Gb.
Can you please tell me what some of the changes you made to optimize
the indexing process?
Thanks
Brendan
On Nov 21, 2007, at 2:27 AM, Otis Gos
On Tue, 2007-11-20 at 22:50 -0800, Otis Gospodnetic wrote:
> Phillip,
>
> I won't go into details, but I'll point out that the Java compiler is called
> javac and if memory serves me well, it is defined in one of Jetty's XML
> config files in its etc/ dir. The java compiler is used to compile J
Thanks for the info Cuong!
Regards,
Rishabh
On Nov 21, 2007 1:59 PM, climbingrose <[EMAIL PROTECTED]> wrote:
> The duplication detection mechanism in Nutch is quite primitive. I
> think it uses a MD5 signature generated from the content of a field.
> The generation algorithm is described here:
>
Make sure you have JDK installed not just JRE. Also try to set
JAVA_HOME directory.
apt-get install sun-java5-jdk
On Nov 21, 2007 5:50 PM, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:
> Phillip,
>
> I won't go into details, but I'll point out that the Java compiler is called
> javac and if mem
The duplication detection mechanism in Nutch is quite primitive. I
think it uses a MD5 signature generated from the content of a field.
The generation algorithm is described here:
http://lucene.apache.org/nutch/apidocs-0.8.x/org/apache/nutch/crawl/TextProfileSignature.html.
The problem with this a
21 matches
Mail list logo