Hi g, have a look at the PlainTextEntityProcessor: http://wiki.apache.org/solr/DataImportHandler#PlainTextEntityProcessor
you will have to call the URL twice that way, but I don't think you can get the complete document (the root element with all structure) via xpath - so the XPathEntityProcessor cannot help you. If calling the URL twice slows your indexer down in unacceptable ways you can always subclass XPathEntityProcessor (knowing Java is helpful, thoug...). There surely is a way to make it return what you need. Or maybe an entity processor that caches the content and uses XPath EP and PlainText EP to accomplish your needs (not sure whether the API allows for that). Cheers, Chantal On Thu, 2011-07-28 at 05:53 +0200, solruser@9913 wrote: > I am trying to use DIH to import an XML based file with multiple XML records > in it. Each record corresponds to one document in Lucene. I am using the > DIH FileListEntityProcessor (to get file list) followed by the > XPathEntityProcessor to create the entities. > > It works perfectly and I am able to map XML elements to fields ..... however > I also need to store the entire XML record as separate 'full text' field. > Is there any way the XPathEntityProcessor provides a variable like 'rawLine' > or 'plainText' that I can map to a field. > > I tried to use the Plain Text processor after this - but that does not > recognize the XML boundaries and just gives the whole XML file. > > > <entity name="x" rootEntity="true" dataSource="logfilereader" > processor="XPathEntityProcessor" > url="${logfile.fileAbsolutePath}" stream="false" > forEach="/xml/myrecord" > transformer="...." " > > <field column="mycol1" > xpath="/xml/myrecord/@something" > /> > > and so on ... > This works perfectly. However I also need something like ... > > <field column="fullxmlrecord" name="plainText" /> > > Any help is much appreciated. I am a newbie and may be missing something > obvious here > > -g > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Store-complete-XML-record-DIH-XPathEntityProcessor-tp3205524p3205524.html > Sent from the Solr - User mailing list archive at Nabble.com.