Re: Memory use with sorting problem

2007-11-27 Thread Chris Laux
Hi again, in the meantime I discovered the use of jmap (I'm not a Java programmer) and found that all the memory was being used up by String and char[] objects. The Lucene docs have the following to say on sorting memory use: > For String fields, the cache is larger: in addition to the above arr

Re: Inconsistent results in Solr Search with Lucene Index

2007-11-27 Thread Grant Ingersoll
Have you setup your Analyzers, etc. so they correspond to the exact ones that you were using in Lucene? Under the Solr Admin you can try the analysis tool to see how your index and queries are treated. What happens if you do a *:* query from the Admin query screen? If your index is reason

Re: CJK Analyzers for Solr

2007-11-27 Thread Eswar K
Is there any specific reason why the CJK analyzers in Solr were chosen to be n-gram based instead of it being a morphological analyzer which is kind of implemented in Google as it considered to be more effective than the n-gram ones? Regards, Eswar On Nov 27, 2007 7:57 AM, Eswar K <[EMAIL PROTE

Re: CJK Analyzers for Solr

2007-11-27 Thread John Stewart
Eswar, What type of morphological analysis do you suspect (or know) that Google does on east asian text? I don't think you can treat the three languages in the same way here. Japanese has multi-morphemic words, but Chinese doesn't really. jds On Nov 27, 2007 11:54 AM, Eswar K <[EMAIL PROTECTED

Combining SOLR and JAMon to monitor query execution times from a browser

2007-11-27 Thread Siegfried Goeschl
Hi folks, working on a closed source project for an IP concerned company is not always fun ... we combined SOLR with JAMon (http://jamonapi.sourceforge.net/) to keep an eye of the query times and this might be of general interest +) JAMon comes with a ready-to-use ServletFilter +) we extende

Re: Combining SOLR and JAMon to monitor query execution times from a browser

2007-11-27 Thread Matthew Runo
I'd be interested in seeing more logging in the admin section! I saw that there is QPS in 1.3, which is great, but it'd be wonderful to see more. --Matthew Runo On Nov 27, 2007, at 9:18 AM, Siegfried Goeschl wrote: Hi folks, working on a closed source project for an IP concerned company i

Re: CJK Analyzers for Solr

2007-11-27 Thread Mike Klaas
On 27-Nov-07, at 8:54 AM, Eswar K wrote: Is there any specific reason why the CJK analyzers in Solr were chosen to be n-gram based instead of it being a morphological analyzer which is kind of implemented in Google as it considered to be more effective than the n-gram ones? The CJK analy

two solr instances?

2007-11-27 Thread Jörg Kiegeland
Is it possible to deploy solr.war once to Tomcat (which is on top of an Apache HTTP Server in my configuration) which then can manage two Solr indexes? I have to make accessible two different Solr indexes (both have different schema.xml files) over the web. If the above architecture is not po

Re: CJK Analyzers for Solr

2007-11-27 Thread Walter Underwood
Dictionaries are surprisingly expensive to build and maintain and bi-gram is surprisingly effective for Chinese. See this paper: http://citeseer.ist.psu.edu/kwok97comparing.html I expect that n-gram indexing would be less effective for Japanese because it is an inflected language. Korean is ev

Re: two solr instances?

2007-11-27 Thread Chris Laux
Have you looked at this page on the wiki: http://wiki.apache.org/solr/SolrTomcat#head-024d7e11209030f1dbcac9974e55106abae837ac That should get you started. -Chris Jörg Kiegeland wrote: > Is it possible to deploy solr.war once to Tomcat (which is on top of an > Apache HTTP Server in my configura

RE: LSA Implementation

2007-11-27 Thread Norskog, Lance
WordNet itself is English-only. There are various ontology projects for it. http://www.globalwordnet.org/ is a separate world language database project. I found it at the bottom of the WordNet wikipedia page. Thanks for starting me on the search! Lance -Original Message- From: Eswar K [

Related Search

2007-11-27 Thread William Silva
Hi, What is the best way to implement a related search like CNET with SOLR ? Ex.: Searching for "tv" the related searches are: lcd tv, lcd, hdtv, vizio, plasma tv, panasonic, gps, plasma Thanks, William.

Re: Related Search

2007-11-27 Thread Cool Coder
Take a look at this thread http://www.gossamer-threads.com/lists/lucene/java-user/54996 There was a need to get all related topics for any selected topic. I have taken help of lucene-sand-box wordnet project to get all synoms of user selected topics. I am not sure whether wordnet project w

Solr and nutch, for reading a nutch index

2007-11-27 Thread bbrown
I couldn't tell if this was asked before. But I want to perform a nutch crawl without any solr plugin which will simply write to some index directory. And then ideally I would like to use solr for searching? I am assuming this is possible? -- Berlin Brown [berlin dot brown at gmail dot com] htt

Re: Solr and nutch, for reading a nutch index

2007-11-27 Thread Brian Whitman
On Nov 27, 2007, at 6:08 PM, bbrown wrote: I couldn't tell if this was asked before. But I want to perform a nutch crawl without any solr plugin which will simply write to some index directory. And then ideally I would like to use solr for searching? I am assuming this is possible?

Re: LSA Implementation

2007-11-27 Thread Grant Ingersoll
Using Wordnet may require having some type of disambiguation approach, otherwise you can end up w/ a lot of "synonyms". I also would look into how much coverage there is for non-English languages. If you have the resources, you may be better off developing/finding your own synonym/concept

Re: Combining SOLR and JAMon to monitor query execution times from a browser

2007-11-27 Thread Norberto Meijome
On Tue, 27 Nov 2007 18:18:16 +0100 Siegfried Goeschl <[EMAIL PROTECTED]> wrote: > Hi folks, > > working on a closed source project for an IP concerned company is not > always fun ... we combined SOLR with JAMon > (http://jamonapi.sourceforge.net/) to keep an eye of the query times and > this m

Re: Solr and nutch, for reading a nutch index

2007-11-27 Thread Norberto Meijome
On Tue, 27 Nov 2007 18:12:13 -0500 Brian Whitman <[EMAIL PROTECTED]> wrote: > > On Nov 27, 2007, at 6:08 PM, bbrown wrote: > > > I couldn't tell if this was asked before. But I want to perform a > > nutch crawl > > without any solr plugin which will simply write to some index > > directory.

Re: Solr and nutch, for reading a nutch index

2007-11-27 Thread Otis Gospodnetic
I only glanced at Sami's post recently and what I think I saw there is something different. In other words, what Sami described is not a Solr instance pointing to a Nutch-built Lucene index, but rather an app that reads the appropriate Nutch/Hadoop files with fetched content and posts the read

Re: CJK Analyzers for Solr

2007-11-27 Thread Otis Gospodnetic
Eswar - I'm interested in the answer to John's question, too! :) As for why n-grams - probably because they are free and simple, while dictionary-based stuff would likely not be free (are there free dictionaries for C or J or K??), and a morphological analyzer would be a bit more work. That sa

Re: Solr and nutch, for reading a nutch index

2007-11-27 Thread Brian Whitman
On Nov 28, 2007, at 1:24 AM, Otis Gospodnetic wrote: I only glanced at Sami's post recently and what I think I saw there is something different. In other words, what Sami described is not a Solr instance pointing to a Nutch-built Lucene index, but rather an app that reads the appropriate

Re: CJK Analyzers for Solr

2007-11-27 Thread Otis Gospodnetic
For what it's worth I worked on indexing and searching a *massive* pile of data, a good portion of which was in CJ and some K. The n-gram approach was used for all 3 languages and the quality of search results, including highlighting was evaluated and okay-ed by native speakers of these languag

Re: CJK Analyzers for Solr

2007-11-27 Thread Otis Gospodnetic
James - can you elaborate on why you think the n-gram approach is not good for Chinese? Thanks, Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: James liu <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Monday, November 26, 2007 8:51:2

Re: CJK Analyzers for Solr

2007-11-27 Thread Otis Gospodnetic
Eswar, I wouldn't worry about the performance of those CJK analyzers too much - they are fairly trivial. The StandardAnalyzer is slower, for example. I recently indexed cca 20MM large docs on a 8-core, 8 GB RAM box in 10 hours - 550 docs/second. No CJK, just English. Otis -- Sematext -- htt

Re: CJK Analyzers for Solr

2007-11-27 Thread Eswar K
John, There were two parts to my question, 1) n-gram vs morphological analyzer - This was based on what I read at a few places which rate morphological analysis higher than n-gram. An example being ( http://www.basistech.com/knowledge-center/products/N-Gram-vs-morphological-analysis.pdf). My inte

Re: CJK Analyzers for Solr

2007-11-27 Thread Eswar K
Otis, Thanks for the information, we will check this out. Regards, Eswar On Nov 28, 2007 12:20 PM, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > Eswar, > > I wouldn't worry about the performance of those CJK analyzers too much - > they are fairly trivial. The StandardAnalyzer is slower, for ex

Re: CJK Analyzers for Solr

2007-11-27 Thread Otis Gospodnetic
Eswar - I can answer the Google question. Actually, you are pointing to it in 1) :) Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Eswar K <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Wednesday, November 28, 2007 2:21:40 AM Subje

Re: CJK Analyzers for Solr

2007-11-27 Thread Luke Lu
Not sure how up to date this is: http://www.basistech.com/customers/ I've only used their C++ products, which generally worked well for web search with a few exceptions. According to http:// www.basistech.com/knowledge-center/chinese/chinese-language- analysis.pdf , they provide Java APIs as