The returned value is the stored or original source value - only the
indexed terms are affected by token filtering.

You could use an update processor if you want to adjust the actual source
value, such as the truncate processor to truncate long source values:

http://lucene.apache.org/solr/5_1_0/solr-core/org/apache/solr/update/processor/TruncateFieldUpdateProcessorFactory.html


-- Jack Krupansky

On Fri, May 15, 2015 at 11:38 AM, Charles Sanders <csand...@redhat.com>
wrote:

> Yes, that is what I am seeing. Looking in the code myself, I see no reason
> for this behavior. That is why I assumed I was doing something very wrong.
>
> Below I have included an example. I set the max length to 300. I insert a
> record with a single token of 500 characters. I expect the token to be
> removed and not included in the index. When I query using the large token,
> the record is returned. I can see the same result using the analysis page
> in the solr console.
>
> He is a test example:
>
> <field name="portal_package" type="text_std" indexed="true" stored="true"
> multiValued="true"/>
>
> <fieldType name="text_std" class="solr.TextField"
> positionIncrementGap="100">
> <analyzer type="index">
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> <filter class="solr.LengthFilterFactory" min="1" max="300" />
> </analyzer>
> </fieldType>
>
>
> A test record:
>
> {
> "documentKind": "test",
> "uri": "test300",
> "id": "test300",
> "portal_package":
> "12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890"
> }
>
>
> Query result:
>
> {
> "responseHeader": {
> "status": 0,
> "QTime": 55,
> "params": {
> "indent": "true",
> "q":
> "portal_package:12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890",
> "_": "1431704135745",
> "wt": "json"
> }
> },
> "response": {
> "numFound": 1,
> "start": 0,
> "docs": [
> {
> "documentKind": "test",
> "uri": "test300",
> "id": "test300",
> "portal_package": [
>
> "12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890"
> ],
> "_version_": 1501249997589446700,
> "timestamp": "2015-05-15T15:26:05.205Z",
> "language": "en"
> }
> ]
> }
> }
>
>
>
>
>
> ----- Original Message -----
>
> From: "Shawn Heisey" <apa...@elyograg.org>
> To: solr-user@lucene.apache.org
> Sent: Friday, May 15, 2015 11:13:14 AM
> Subject: Re: Problem with solr.LengthFilterFactory
>
> On 5/15/2015 8:49 AM, Charles Sanders wrote:
> > I'm seeing a problem with the LengthFilter. It appears to work fine
> until I increase the max value above 254. At the point it stops removing
> the very large token from the stream. As a result I get the error:
> > java.lang.IllegalArgumentException: Document contains at least one
> immense term...... UTF8 encoding is longer than the max length 32766
> >
> > I'm certain I'm doing this wrong. Can someone please show me the light.
> :)
> >
> > <fieldType name="text_std" class="solr.TextField"
> positionIncrementGap="100">
> > <analyzer type="index">
> > <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> > <filter class="solr.LengthFilterFactory" min="1" max="254" />
> > </analyzer>
> > </fieldType>
>
> So with max="254", you don't get the error? Looking at the code for
> LengthFilter, I can't see any way for it to behave differently with a
> max of 254 vs. a max of 255 or higher. All of the interfaces and
> classes involved use "int" for length, which means it should work
> perfectly with numbers above 254.
>
> Thanks,
> Shawn
>
>
>

Reply via email to