Re: How to index long words with StandardTokenizerFactory?

Sergey Bartunov Sat, 23 Oct 2010 07:45:58 -0700

Yes. I did. Won't help.


On 23 October 2010 17:45, Ahmet Arslan <iori...@yahoo.com> wrote:
> Did you delete the folder Jetty_0_0_0_0_8983_solr.war_** under 
> apache-solr-1.4.1\example\work?
>
> --- On Sat, 10/23/10, Sergey Bartunov <sbos....@gmail.com> wrote:
>
>> From: Sergey Bartunov <sbos....@gmail.com>
>> Subject: Re: How to index long words with StandardTokenizerFactory?
>> To: solr-user@lucene.apache.org
>> Date: Saturday, October 23, 2010, 3:56 PM
>> Here are all the files: http://rghost.net/3016862
>>
>> 1) StandardAnalyzer.java, StandardTokenizer.java - patched
>> files from
>> lucene-2.9.3
>> 2) I patch these files and build lucene by typing "ant"
>> 3) I replace lucene-core-2.9.3.jar in solr/lib/ by my
>> lucene-core-2.9.3-dev.jar that I'd just compiled
>> 4) than I do "ant compile" and "ant dist" in solr folder
>> 5) after that I recompile solr/example/webapps/solr.war
>> with my new
>> solr and lucene-core jars
>> 6) I put my schema.xml in solr/example/solr/conf/
>> 7) then I do "java -jar start.jar" in solr/example
>> 8) index big_post.xml
>> 9) trying to find this document by "curl
>> http://localhost:8983/solr/select?q=body:big*";
>> (big_post.xml contains
>> a long word bigaaaaa...aaaa)
>> 10) solr returns nothing
>>
>> On 23 October 2010 02:43, Steven A Rowe <sar...@syr.edu>
>> wrote:
>> > Hi Sergey,
>> >
>> > What does your ~34kb field value look like?  Does
>> StandardTokenizer think it's just one token?
>> >
>> > What doesn't work?  What happens?
>> >
>> > Steve
>> >
>> >> -----Original Message-----
>> >> From: Sergey Bartunov [mailto:sbos....@gmail.com]
>> >> Sent: Friday, October 22, 2010 3:18 PM
>> >> To: solr-user@lucene.apache.org
>> >> Subject: Re: How to index long words with
>> StandardTokenizerFactory?
>> >>
>> >> I'm using Solr 1.4.1. Now I'm successed with
>> replacing lucene-core jar
>> >> but maxTokenValue seems to be used in very strange
>> way. Currenty for
>> >> me it's set to 1024*1024, but I couldn't index a
>> field with just size
>> >> of ~34kb. I understand that it's a little weird to
>> index such a big
>> >> data, but I just want to know it doesn't work
>> >>
>> >> On 22 October 2010 20:36, Steven A Rowe <sar...@syr.edu>
>> wrote:
>> >> > Hi Sergey,
>> >> >
>> >> > I've opened an issue to add a maxTokenLength
>> param to the
>> >> StandardTokenizerFactory configuration:
>> >> >
>> >> >        https://issues.apache.org/jira/browse/SOLR-2188
>> >> >
>> >> > I'll work on it this weekend.
>> >> >
>> >> > Are you using Solr 1.4.1?  I ask because of
>> your mention of Lucene
>> >> 2.9.3.  I'm not sure there will ever be a Solr
>> 1.4.2 release.  I plan on
>> >> targeting Solr 3.1 and 4.0 for the SOLR-2188 fix.
>> >> >
>> >> > I'm not sure why you didn't get the results
>> you wanted with your Lucene
>> >> hack - is it possible you have other Lucene jars
>> in your Solr classpath?
>> >> >
>> >> > Steve
>> >> >
>> >> >> -----Original Message-----
>> >> >> From: Sergey Bartunov [mailto:sbos....@gmail.com]
>> >> >> Sent: Friday, October 22, 2010 12:08 PM
>> >> >> To: solr-user@lucene.apache.org
>> >> >> Subject: How to index long words with
>> StandardTokenizerFactory?
>> >> >>
>> >> >> I'm trying to force solr to index words
>> which length is more than 255
>> >> >> symbols (this constant is
>> DEFAULT_MAX_TOKEN_LENGTH in lucene
>> >> >> StandardAnalyzer.java) using
>> StandardTokenizerFactory as 'filter' tag
>> >> >> in schema configuration XML. Specifying
>> the maxTokenLength attribute
>> >> >> won't work.
>> >> >>
>> >> >> I'd tried to make the dirty hack: I
>> downloaded lucene-core-2.9.3 src
>> >> >> and changed the DEFAULT_MAX_TOKEN_LENGTH
>> to 1000000, built it to jar
>> >> >> and replaced original lucene-core jar in
>> solr /lib. But seems like
>> >> >> that it had bring no effect.
>>
>
>
>
>

Re: How to index long words with StandardTokenizerFactory?

Reply via email to