about analyzer and index

2006-08-27 Thread James liu

lucene have ChineseAnalyzer and CJKAnalyzer,,,so i can search chinese
keyword with it.

solr have it? if not, how can i add it?


if i use php+mysql build  data.xml,,,use post.sh data.xml? it is the only
way to index?


i remember i must use same analyzer to index and search when i use lucene2.0
,,,

what is solr analyzer? and how support user defined?(if it not support
chinese)


Re: about analyzer and index

2006-08-27 Thread Erik Hatcher


On Aug 27, 2006, at 3:27 AM, James liu wrote:

lucene have ChineseAnalyzer and CJKAnalyzer,,,so i can search chinese
keyword with it.

solr have it? if not, how can i add it?


Those analyzers are not part of the core Solr distribution, but you  
can add them easily by getting the JAR file from Lucene (it'll be  
called lucene-analyzers-.jar) in the Lucene binary  
downloads.  You'll then need to adjust your schema.xml to point at  
the analyzer you wish to use, something like this:



  class="org.apache.lucene.analysis.snowball.SnowballAnalyzer"/>



if i use php+mysql build  data.xml,,,use post.sh data.xml? it is  
the only

way to index?


No, not at all.  Solr works off XML over HTTP, which is trivial to do  
from PHP and other environments.  Check out the wiki here:  wiki.apache.org/solr/SolPHP>


Erik





Re: Possible bug in copyField

2006-08-27 Thread jason rutherglen
By looking at what is stored.  Has this worked for others?

- Original Message 
From: Yonik Seeley <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org; jason rutherglen <[EMAIL PROTECTED]>
Sent: Friday, August 25, 2006 6:35:43 PM
Subject: Re: Possible bug in copyField

On 8/25/06, jason rutherglen <[EMAIL PROTECTED]> wrote:
> When doing a copyField into a text field that is supposed to be stemmed I'm 
> not seeing the stemming occur.

How did you determine that stemming didn't occur?

-Yonik






Re: Possible bug in copyField

2006-08-27 Thread Chris Hostetter

: By looking at what is stored.  Has this worked for others?

the "stored" value of a field is allways going to be the pre-analyzed text
-- that's why the stored values in your "text" fields still have upper
case characters and stop words.

what matters is whether or not the "indexed" terms of your "text_stem"
fields are really stemmed or not.

I certianly haven't noticed this problem ... using the fields/types you
mentioned before, do you have an example of a doc you've indexed, and
expected to get from a stemmed query that wasn't acctually returned?




-Hoss