> - I have zero control over what is stored in the database > - using the Solr XML update protocol i could probably > transform the > data before sending it > - ... but I'd much rather continue using DataImportHandler > to access > the database
If you are already using DIH, http://wiki.apache.org/solr/DataImportHandler#HTMLStripTransformer can do what you want.