Indexing Japanese & English

Paul Clegg Thu, 07 Feb 2008 10:36:04 -0800

I hate asking stupid questions immediately after joining a mailing list, but
I'm in a bit of a pinch here.


 

I'm using Solr/Tomcat for a Ruby on Rails project (acts_as_solr) and I've
had a lot of success getting it working -- for English.  The problem I'm
running into is that our primary customers are actually Japanese.

 

I've done the searching around, and found the thread back in June about
using Lucene's CJKAnalyzer and CJKTokenizer, but apparently I need to write
my own factor or something.  It looks like it's only three lines of Java
code, and I can cut & paste with the best of them.

 

Here's the problem:  I know zip, zilch, zero about Java.  I just hate the
language with an absolute passion.  The reason I went with Solr (besides the
fact it's pretty much the only real game going) is that I could avoid the
Java parts by directly dealing with its XML, JSON and Ruby interfaces.

 

So I'm wondering if there are any "Adding CJKTokenizer to Solr for Dummies"
guides out there someone can point me to, to tell me, pretty much
step-by-step, what I need to do to get this configured.  I saw something
about unpacking the solr.war and repacking it, but, since I know dinkus
about Java, that really didn't mean a whole lot to me, even though I'm
guessing it's probably a grand total of four commands at the unix prompt.
:)

 

.Paul

 

Paul Clegg, Principal Software Engineer

My Digital Life, Inc. (www.mydl.com)

NetService Ventures Group (www.nsv.com)

2108 Sand Hill Road, Menlo Park, CA 94025

Email:  [EMAIL PROTECTED]

Cell: 650-619-1220

Indexing Japanese & English

Reply via email to