Re: ShingleFilterFactory not indexing the whole doc, where is the limit ?

Pierre JdlF Tue, 31 Jan 2012 09:48:55 -0800

Works now ! thanks a lot
... i guess until a document with more than 2.147.483.647 chars
'happy night
+ Pierre


On Tue, Jan 31, 2012 at 5:23 PM, Ahmet Arslan <iori...@yahoo.com> wrote:
>> I'm trying to index word-ngrams using
>> the solr.ShingleFilterFactory,
>> (storing their positions + offset)
>> ...
>>     <fieldType name="edge_ngram"
>> class="solr.TextField"
>> positionIncrementGap="1">
>>       <analyzer type="index">
>>           <charFilter
>> class="solr.HTMLStripCharFilterFactory"/>
>>     <tokenizer
>> class="solr.WhitespaceTokenizerFactory" />
>>         <filter
>> class="solr.LowerCaseFilterFactory" />
>>         <filter
>> class="solr.ShingleFilterFactory" minShingleSize="3"
>> maxShingleSize="5" outputUnigrams="false"
>> tokenSeparator="_"/>
>>       </analyzer>
>> ...
>> <field name="textengram" type="edge_ngram"
>> indexed="true"
>> stored="true" multiValued="false" termVectors="true"
>> termPositions="true" termOffsets="true"/>
>> ...
>> i'm testing it with a (big?) html document, [1.300.000
>> chars], with lots of tags
>> Looking at the index (using Schema browser web interface), i
>> can see
>> some ngrams were indexed (8939)
>> but it appears that they were found only in the beginning of
>> the
>> document (first 1/8 of the document)
>
> It could be the maxFieldLength setting in solrconfig.xml . Set it to 
> <maxFieldLength>2147483647</maxFieldLength>

Re: ShingleFilterFactory not indexing the whole doc, where is the limit ?

Reply via email to