Attachments and images are often eaten by the mail server, your image is not visible at least to me. Can you describe what you're seeing? Or post the image somewhere and provide a link?
Best, Erick On Wed, Oct 23, 2013 at 11:07 AM, Raheel Hasan <raheelhasan....@gmail.com>wrote: > Hi, > > I have an issue here while indexing large html. Here is the confguration > for that: > > 1) Data is imported via URLDataSource / PlainTextEntityProcessor (DIH) > > 2) Schema has this for the field: > type="text_en_splitting" indexed="true" stored="false" required="false" > > 3) text_en_splitting has the following work done for indexing: > HTMLStripCharFilterFactory > WhitespaceTokenizerFactory (create tokens) > StopFilterFactory > WordDelimiterFilterFactory > ICUFoldingFilterFactory > PorterStemFilterFactory > RemoveDuplicatesTokenFilterFactory > LengthFilterFactory > > However, the indexed data is like this (as in the attached image): > [image: Inline image 1] > > > so what are these numbers? > If I put small html, it works fine, but as the size of html file > increases, this is what happens.. > > -- > Regards, > Raheel Hasan >