On 10/24/2013 2:11 AM, Raheel Hasan wrote:
> ok. see this:
> http://s23.postimg.org/yck2s5k1n/html_indexing.png
A recap. You said your index analysis chain is this:
HTMLStripCharFilterFactory
WhitespaceTokenizerFactory (create tokens)
StopFilterFactory
WordDelimiterFilterFactory
ICUFoldingFilter
ok. see this:
http://s23.postimg.org/yck2s5k1n/html_indexing.png
On Wed, Oct 23, 2013 at 10:45 PM, Erick Erickson wrote:
> Attachments and images are often eaten by the mail server, your image is
> not visible at least to me. Can you describe what you're seeing? Or post
> the image somewhere an
Attachments and images are often eaten by the mail server, your image is
not visible at least to me. Can you describe what you're seeing? Or post
the image somewhere and provide a link?
Best,
Erick
On Wed, Oct 23, 2013 at 11:07 AM, Raheel Hasan wrote:
> Hi,
>
> I have an issue here while indexi
Hi,
I have an issue here while indexing large html. Here is the confguration
for that:
1) Data is imported via URLDataSource / PlainTextEntityProcessor (DIH)
2) Schema has this for the field:
type="text_en_splitting" indexed="true" stored="false" required="false"
3) text_en_splitting has the fo