dih HTMLStripTransformer

2013-09-24 Thread Andreas Owen
why does stripHTML="false" have no effect in dih? the html is strippedin text and text_nohtml when i do display the index with select?q=* i'm trying to get a field without html and one with it so i can also index the links on the page. data-config.xml

Re: DIH: HTMLStripTransformer in sub-entities?

2013-07-06 Thread Andy Pickler
That's exactly what turned out to be the problem. We thought we had already tried that permutation but apparently hadn't. I know it's obvious in retrospect. Thanks for the suggestion. Thanks, Andy Pickler On Wed, Jul 3, 2013 at 2:38 PM, Alexandre Rafalovitch wrote: > On Tue, Jul 2, 2013 at 10

Re: DIH: HTMLStripTransformer in sub-entities?

2013-07-03 Thread Alexandre Rafalovitch
On Tue, Jul 2, 2013 at 10:59 AM, Andy Pickler wrote: > SELECT > br.other_content AS replyContent > FROM block_reply > "> > *THIS DOESN'T WORK!* > shouldn't it be column="replyContent" since you are renaming it in SELECT? Regards, Alex. Personal website: http://www.ou

Re: DIH: HTMLStripTransformer in sub-entities?

2013-07-02 Thread Gora Mohanty
On 2 July 2013 20:55, Andy Pickler wrote: > Thanks for the quick reply. Unfortunately, I don't believe my company > would want me sharing our exact production schema in a public forum, > although I realize it makes it harder to diagnose the problem. The > sub-entity is a multi-valued field that

Re: DIH: HTMLStripTransformer in sub-entities?

2013-07-02 Thread Andy Pickler
Thanks for the quick reply. Unfortunately, I don't believe my company would want me sharing our exact production schema in a public forum, although I realize it makes it harder to diagnose the problem. The sub-entity is a multi-valued field that indeed does have a relationship to the outer entity

Re: DIH: HTMLStripTransformer in sub-entities?

2013-07-02 Thread Gora Mohanty
On 2 July 2013 20:29, Andy Pickler wrote: > Solr 4.1.0 > > We've been using the DIH to pull data in from a MySQL database for quite > some time now. We're now wanting to strip all the HTML content out of many > fields using the HTMLStripTransformer ( > http://wiki.apache.org/solr/DataImportHandle

DIH: HTMLStripTransformer in sub-entities?

2013-07-02 Thread Andy Pickler
Solr 4.1.0 We've been using the DIH to pull data in from a MySQL database for quite some time now. We're now wanting to strip all the HTML content out of many fields using the HTMLStripTransformer ( http://wiki.apache.org/solr/DataImportHandler#HTMLStripTransformer). Unfortunately, while it seem

Re: HTML entities being missed by DIH HTMLStripTransformer

2013-04-03 Thread Steve Rowe
; View this message in context: > http://lucene.472066.n3.nabble.com/HTML-entities-being-missed-by-DIH-HTMLStripTransformer-tp4053582p4053609.html > Sent from the Solr - User mailing list archive at Nabble.com.

Re: HTML entities being missed by DIH HTMLStripTransformer

2013-04-03 Thread Ashok
Hi Steve, Fabulous suggestion! Yup, that is it! Using the HTMLStripTransformer twice did the trick. I am using Solr 4.1. Thank you very much! - ashok -- View this message in context: http://lucene.472066.n3.nabble.com/HTML-entities-being-missed-by-DIH-HTMLStripTransformer-tp4053582p4053609

Re: HTML entities being missed by DIH HTMLStripTransformer

2013-04-03 Thread Steve Rowe
Hi Ashok, HTMLStripTransformer uses HTMLStripCharFilter under the hood, and HTMLStripCharFilter converts all HTML entities to their corresponding characters. What version of Solr are you using? My guess is that it only appears that nothing is happening, since when they are presented in a brow

Re: HTML entities being missed by DIH HTMLStripTransformer

2013-04-03 Thread Alexandre Rafalovitch
ometimes with HTML entities and at > other > times with html tags. I have no control over the process that populates the > database tables with info. > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/HTML-entities-being-missed-by-DIH-HTMLStri

Re: HTML entities being missed by DIH HTMLStripTransformer

2013-04-03 Thread Ashok
Well, the database field has text, sometimes with HTML entities and at other times with html tags. I have no control over the process that populates the database tables with info. -- View this message in context: http://lucene.472066.n3.nabble.com/HTML-entities-being-missed-by-DIH

Re: HTML entities being missed by DIH HTMLStripTransformer

2013-04-03 Thread Gora Mohanty
On 4 April 2013 00:30, Ashok wrote: [...] > Two questions. > > (1) Is this the expected behavior of DIH HTMLStripTransformer? Yes, I believe so. > (2) If yes, is there an another transformer that I can employ first to turn > these html entities into their usual symbols that can

HTML entities being missed by DIH HTMLStripTransformer

2013-04-03 Thread Ashok
e in: <li>Item One</> or <b>This is in Bold</b> NOTHING HAPPENS. Two questions. (1) Is this the expected behavior of DIH HTMLStripTransformer? (2) If yes, is there an another transformer that I can employ first to turn these html entities into their usual symbols t