Attachments and images are often eaten by the mail server, your image is
not visible at least to me. Can you describe what you're seeing? Or post
the image somewhere and provide a link?

Best,
Erick


On Wed, Oct 23, 2013 at 11:07 AM, Raheel Hasan <raheelhasan....@gmail.com>wrote:

> Hi,
>
> I have an issue here while indexing large html. Here is the confguration
> for that:
>
> 1) Data is imported via URLDataSource / PlainTextEntityProcessor (DIH)
>
> 2) Schema has this for the field:
> type="text_en_splitting" indexed="true" stored="false" required="false"
>
> 3) text_en_splitting has the following work done for indexing:
> HTMLStripCharFilterFactory
> WhitespaceTokenizerFactory (create tokens)
> StopFilterFactory
> WordDelimiterFilterFactory
> ICUFoldingFilterFactory
> PorterStemFilterFactory
> RemoveDuplicatesTokenFilterFactory
> LengthFilterFactory
>
> However, the indexed data is like this (as in the attached image):
> [image: Inline image 1]
>
>
> so what are these numbers?
> If I put small html, it works fine, but as the size of html file
> increases, this is what happens..
>
> --
> Regards,
> Raheel Hasan
>

Reply via email to