Works now ! thanks a lot ... i guess until a document with more than 2.147.483.647 chars 'happy night + Pierre
On Tue, Jan 31, 2012 at 5:23 PM, Ahmet Arslan <iori...@yahoo.com> wrote: >> I'm trying to index word-ngrams using >> the solr.ShingleFilterFactory, >> (storing their positions + offset) >> ... >> <fieldType name="edge_ngram" >> class="solr.TextField" >> positionIncrementGap="1"> >> <analyzer type="index"> >> <charFilter >> class="solr.HTMLStripCharFilterFactory"/> >> <tokenizer >> class="solr.WhitespaceTokenizerFactory" /> >> <filter >> class="solr.LowerCaseFilterFactory" /> >> <filter >> class="solr.ShingleFilterFactory" minShingleSize="3" >> maxShingleSize="5" outputUnigrams="false" >> tokenSeparator="_"/> >> </analyzer> >> ... >> <field name="textengram" type="edge_ngram" >> indexed="true" >> stored="true" multiValued="false" termVectors="true" >> termPositions="true" termOffsets="true"/> >> ... >> i'm testing it with a (big?) html document, [1.300.000 >> chars], with lots of tags >> Looking at the index (using Schema browser web interface), i >> can see >> some ngrams were indexed (8939) >> but it appears that they were found only in the beginning of >> the >> document (first 1/8 of the document) > > It could be the maxFieldLength setting in solrconfig.xml . Set it to > <maxFieldLength>2147483647</maxFieldLength>