Here are all the files: http://rghost.net/3016862
1) StandardAnalyzer.java, StandardTokenizer.java - patched files from lucene-2.9.3 2) I patch these files and build lucene by typing "ant" 3) I replace lucene-core-2.9.3.jar in solr/lib/ by my lucene-core-2.9.3-dev.jar that I'd just compiled 4) than I do "ant compile" and "ant dist" in solr folder 5) after that I recompile solr/example/webapps/solr.war with my new solr and lucene-core jars 6) I put my schema.xml in solr/example/solr/conf/ 7) then I do "java -jar start.jar" in solr/example 8) index big_post.xml 9) trying to find this document by "curl http://localhost:8983/solr/select?q=body:big*" (big_post.xml contains a long word bigaaaaa...aaaa) 10) solr returns nothing On 23 October 2010 02:43, Steven A Rowe <sar...@syr.edu> wrote: > Hi Sergey, > > What does your ~34kb field value look like? Does StandardTokenizer think > it's just one token? > > What doesn't work? What happens? > > Steve > >> -----Original Message----- >> From: Sergey Bartunov [mailto:sbos....@gmail.com] >> Sent: Friday, October 22, 2010 3:18 PM >> To: solr-user@lucene.apache.org >> Subject: Re: How to index long words with StandardTokenizerFactory? >> >> I'm using Solr 1.4.1. Now I'm successed with replacing lucene-core jar >> but maxTokenValue seems to be used in very strange way. Currenty for >> me it's set to 1024*1024, but I couldn't index a field with just size >> of ~34kb. I understand that it's a little weird to index such a big >> data, but I just want to know it doesn't work >> >> On 22 October 2010 20:36, Steven A Rowe <sar...@syr.edu> wrote: >> > Hi Sergey, >> > >> > I've opened an issue to add a maxTokenLength param to the >> StandardTokenizerFactory configuration: >> > >> > https://issues.apache.org/jira/browse/SOLR-2188 >> > >> > I'll work on it this weekend. >> > >> > Are you using Solr 1.4.1? I ask because of your mention of Lucene >> 2.9.3. I'm not sure there will ever be a Solr 1.4.2 release. I plan on >> targeting Solr 3.1 and 4.0 for the SOLR-2188 fix. >> > >> > I'm not sure why you didn't get the results you wanted with your Lucene >> hack - is it possible you have other Lucene jars in your Solr classpath? >> > >> > Steve >> > >> >> -----Original Message----- >> >> From: Sergey Bartunov [mailto:sbos....@gmail.com] >> >> Sent: Friday, October 22, 2010 12:08 PM >> >> To: solr-user@lucene.apache.org >> >> Subject: How to index long words with StandardTokenizerFactory? >> >> >> >> I'm trying to force solr to index words which length is more than 255 >> >> symbols (this constant is DEFAULT_MAX_TOKEN_LENGTH in lucene >> >> StandardAnalyzer.java) using StandardTokenizerFactory as 'filter' tag >> >> in schema configuration XML. Specifying the maxTokenLength attribute >> >> won't work. >> >> >> >> I'd tried to make the dirty hack: I downloaded lucene-core-2.9.3 src >> >> and changed the DEFAULT_MAX_TOKEN_LENGTH to 1000000, built it to jar >> >> and replaced original lucene-core jar in solr /lib. But seems like >> >> that it had bring no effect.