Re: Issue with large html indexing

2013-10-24 Thread Shawn Heisey
On 10/24/2013 2:11 AM, Raheel Hasan wrote: > ok. see this: > http://s23.postimg.org/yck2s5k1n/html_indexing.png A recap. You said your index analysis chain is this: HTMLStripCharFilterFactory WhitespaceTokenizerFactory (create tokens) StopFilterFactory WordDelimiterFilterFactory ICUFoldingFilter

Re: Issue with large html indexing

2013-10-24 Thread Raheel Hasan
ok. see this: http://s23.postimg.org/yck2s5k1n/html_indexing.png On Wed, Oct 23, 2013 at 10:45 PM, Erick Erickson wrote: > Attachments and images are often eaten by the mail server, your image is > not visible at least to me. Can you describe what you're seeing? Or post > the image somewhere an

Re: Issue with large html indexing

2013-10-23 Thread Erick Erickson
Attachments and images are often eaten by the mail server, your image is not visible at least to me. Can you describe what you're seeing? Or post the image somewhere and provide a link? Best, Erick On Wed, Oct 23, 2013 at 11:07 AM, Raheel Hasan wrote: > Hi, > > I have an issue here while indexi

Issue with large html indexing

2013-10-23 Thread Raheel Hasan
Hi, I have an issue here while indexing large html. Here is the confguration for that: 1) Data is imported via URLDataSource / PlainTextEntityProcessor (DIH) 2) Schema has this for the field: type="text_en_splitting" indexed="true" stored="false" required="false" 3) text_en_splitting has the fo