On 2 July 2013 20:29, Andy Pickler <andy.pick...@gmail.com> wrote: > Solr 4.1.0 > > We've been using the DIH to pull data in from a MySQL database for quite > some time now. We're now wanting to strip all the HTML content out of many > fields using the HTMLStripTransformer ( > http://wiki.apache.org/solr/DataImportHandler#HTMLStripTransformer). > Unfortunately, while it seems to be working fine for "top-level" entities, > we can't seem to get it to work for sub-entities: > > (not exact schema, reduced for example purposes)
Please do not do that. This DIH configuration file does not make sense (please see comments below), and we are left guessing in the dark. If the file is too large, you can share it on something like pastebin.com > <entity name="blocks" dataSource="database" > transformer="HTMLStripTransformer" query=" > SELECT > id as blockId, > name as blockTitle, > content as content > FROM engagement_block > "> > <field column="content" stripHTML="true" /> *THIS WORKS!* > <entity name="blockReplies" dataSource="database" > transformer="HTMLStripTransformer" query=" > SELECT > br.other_content AS replyContent > FROM block_reply > "> > <field column="other_content" stripHTML="true" /> *THIS DOESN'T WORK!* [...] (a) You SELECT replyContent, but the column attribute in the field is named "other_content". Nothing should be getting indexed into the field. (b) Why are your entities nested if the inner entity has no relationship to the outer one? Regards, Gora