RE: How to index long words with StandardTokenizerFactory?

Steven A Rowe Fri, 22 Oct 2010 09:38:52 -0700

Hi Sergey,

I've opened an issue to add a maxTokenLength param to the 
StandardTokenizerFactory configuration:


        https://issues.apache.org/jira/browse/SOLR-2188

I'll work on it this weekend.

Are you using Solr 1.4.1?  I ask because of your mention of Lucene 2.9.3.  I'm 
not sure there will ever be a Solr 1.4.2 release.  I plan on targeting Solr 3.1 
and 4.0 for the SOLR-2188 fix.

I'm not sure why you didn't get the results you wanted with your Lucene hack - 
is it possible you have other Lucene jars in your Solr classpath?

Steve

> -----Original Message-----
> From: Sergey Bartunov [mailto:sbos....@gmail.com]
> Sent: Friday, October 22, 2010 12:08 PM
> To: solr-user@lucene.apache.org
> Subject: How to index long words with StandardTokenizerFactory?
> 
> I'm trying to force solr to index words which length is more than 255
> symbols (this constant is DEFAULT_MAX_TOKEN_LENGTH in lucene
> StandardAnalyzer.java) using StandardTokenizerFactory as 'filter' tag
> in schema configuration XML. Specifying the maxTokenLength attribute
> won't work.
> 
> I'd tried to make the dirty hack: I downloaded lucene-core-2.9.3 src
> and changed the DEFAULT_MAX_TOKEN_LENGTH to 1000000, built it to jar
> and replaced original lucene-core jar in solr /lib. But seems like
> that it had bring no effect.

RE: How to index long words with StandardTokenizerFactory?

Reply via email to