Hi again,
in the meantime I discovered the use of jmap (I'm not a Java programmer)
and found that all the memory was being used up by String and char[]
objects.
The Lucene docs have the following to say on sorting memory use:
> For String fields, the cache is larger: in addition to the above
arr
Have you setup your Analyzers, etc. so they correspond to the exact
ones that you were using in Lucene? Under the Solr Admin you can try
the analysis tool to see how your index and queries are treated. What
happens if you do a *:* query from the Admin query screen?
If your index is reason
Is there any specific reason why the CJK analyzers in Solr were chosen to be
n-gram based instead of it being a morphological analyzer which is kind of
implemented in Google as it considered to be more effective than the n-gram
ones?
Regards,
Eswar
On Nov 27, 2007 7:57 AM, Eswar K <[EMAIL PROTE
Eswar,
What type of morphological analysis do you suspect (or know) that
Google does on east asian text? I don't think you can treat the three
languages in the same way here. Japanese has multi-morphemic words,
but Chinese doesn't really.
jds
On Nov 27, 2007 11:54 AM, Eswar K <[EMAIL PROTECTED
Hi folks,
working on a closed source project for an IP concerned company is not
always fun ... we combined SOLR with JAMon
(http://jamonapi.sourceforge.net/) to keep an eye of the query times and
this might be of general interest
+) JAMon comes with a ready-to-use ServletFilter
+) we extende
I'd be interested in seeing more logging in the admin section! I saw
that there is QPS in 1.3, which is great, but it'd be wonderful to see
more.
--Matthew Runo
On Nov 27, 2007, at 9:18 AM, Siegfried Goeschl wrote:
Hi folks,
working on a closed source project for an IP concerned company i
On 27-Nov-07, at 8:54 AM, Eswar K wrote:
Is there any specific reason why the CJK analyzers in Solr were
chosen to be
n-gram based instead of it being a morphological analyzer which is
kind of
implemented in Google as it considered to be more effective than
the n-gram
ones?
The CJK analy
Is it possible to deploy solr.war once to Tomcat (which is on top of an
Apache HTTP Server in my configuration) which then can manage two Solr
indexes?
I have to make accessible two different Solr indexes (both have
different schema.xml files) over the web. If the above architecture is
not po
Dictionaries are surprisingly expensive to build and maintain and
bi-gram is surprisingly effective for Chinese. See this paper:
http://citeseer.ist.psu.edu/kwok97comparing.html
I expect that n-gram indexing would be less effective for Japanese
because it is an inflected language. Korean is ev
Have you looked at this page on the wiki:
http://wiki.apache.org/solr/SolrTomcat#head-024d7e11209030f1dbcac9974e55106abae837ac
That should get you started.
-Chris
Jörg Kiegeland wrote:
> Is it possible to deploy solr.war once to Tomcat (which is on top of an
> Apache HTTP Server in my configura
WordNet itself is English-only. There are various ontology projects for
it.
http://www.globalwordnet.org/ is a separate world language database
project. I found it at the bottom of the WordNet wikipedia page. Thanks
for starting me on the search!
Lance
-Original Message-
From: Eswar K [
Hi,
What is the best way to implement a related search like CNET with SOLR ?
Ex.: Searching for "tv" the related searches are: lcd tv, lcd, hdtv,
vizio, plasma tv, panasonic, gps, plasma
Thanks,
William.
Take a look at this thread
http://www.gossamer-threads.com/lists/lucene/java-user/54996
There was a need to get all related topics for any selected topic. I have
taken help of lucene-sand-box wordnet project to get all synoms of user
selected topics. I am not sure whether wordnet project w
I couldn't tell if this was asked before. But I want to perform a nutch crawl
without any solr plugin which will simply write to some index directory. And
then ideally I would like to use solr for searching? I am assuming this is
possible?
--
Berlin Brown
[berlin dot brown at gmail dot com]
htt
On Nov 27, 2007, at 6:08 PM, bbrown wrote:
I couldn't tell if this was asked before. But I want to perform a
nutch crawl
without any solr plugin which will simply write to some index
directory. And
then ideally I would like to use solr for searching? I am assuming
this is
possible?
Using Wordnet may require having some type of disambiguation approach,
otherwise you can end up w/ a lot of "synonyms". I also would look
into how much coverage there is for non-English languages.
If you have the resources, you may be better off developing/finding
your own synonym/concept
On Tue, 27 Nov 2007 18:18:16 +0100
Siegfried Goeschl <[EMAIL PROTECTED]> wrote:
> Hi folks,
>
> working on a closed source project for an IP concerned company is not
> always fun ... we combined SOLR with JAMon
> (http://jamonapi.sourceforge.net/) to keep an eye of the query times and
> this m
On Tue, 27 Nov 2007 18:12:13 -0500
Brian Whitman <[EMAIL PROTECTED]> wrote:
>
> On Nov 27, 2007, at 6:08 PM, bbrown wrote:
>
> > I couldn't tell if this was asked before. But I want to perform a
> > nutch crawl
> > without any solr plugin which will simply write to some index
> > directory.
I only glanced at Sami's post recently and what I think I saw there is
something different. In other words, what Sami described is not a Solr
instance pointing to a Nutch-built Lucene index, but rather an app that reads
the appropriate Nutch/Hadoop files with fetched content and posts the read
Eswar - I'm interested in the answer to John's question, too! :)
As for why n-grams - probably because they are free and simple, while
dictionary-based stuff would likely not be free (are there free dictionaries
for C or J or K??), and a morphological analyzer would be a bit more work.
That sa
On Nov 28, 2007, at 1:24 AM, Otis Gospodnetic wrote:
I only glanced at Sami's post recently and what I think I saw there
is something different. In other words, what Sami described is not
a Solr instance pointing to a Nutch-built Lucene index, but rather
an app that reads the appropriate
For what it's worth I worked on indexing and searching a *massive* pile of
data, a good portion of which was in CJ and some K. The n-gram approach was
used for all 3 languages and the quality of search results, including
highlighting was evaluated and okay-ed by native speakers of these languag
James - can you elaborate on why you think the n-gram approach is not good for
Chinese?
Thanks,
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From: James liu <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Monday, November 26, 2007 8:51:2
Eswar,
I wouldn't worry about the performance of those CJK analyzers too much - they
are fairly trivial. The StandardAnalyzer is slower, for example. I recently
indexed cca 20MM large docs on a 8-core, 8 GB RAM box in 10 hours - 550
docs/second. No CJK, just English.
Otis
--
Sematext -- htt
John,
There were two parts to my question,
1) n-gram vs morphological analyzer - This was based on what I read at a few
places which rate morphological analysis higher than n-gram. An example
being (
http://www.basistech.com/knowledge-center/products/N-Gram-vs-morphological-analysis.pdf).
My inte
Otis,
Thanks for the information, we will check this out.
Regards,
Eswar
On Nov 28, 2007 12:20 PM, Otis Gospodnetic <[EMAIL PROTECTED]>
wrote:
> Eswar,
>
> I wouldn't worry about the performance of those CJK analyzers too much -
> they are fairly trivial. The StandardAnalyzer is slower, for ex
Eswar - I can answer the Google question. Actually, you are pointing to it in
1) :)
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From: Eswar K <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Wednesday, November 28, 2007 2:21:40 AM
Subje
Not sure how up to date this is: http://www.basistech.com/customers/
I've only used their C++ products, which generally worked well for
web search with a few exceptions. According to http://
www.basistech.com/knowledge-center/chinese/chinese-language-
analysis.pdf , they provide Java APIs as
28 matches
Mail list logo