Hi Erick, Can you please give me little more information about SolrJ program and how to use it to construct a Solr document ?
Thanks and Regards, Swapna. -----Original Message----- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Wednesday, December 21, 2011 2:28 AM To: solr-user@lucene.apache.org Subject: Re: Mapping and Capture in ExtractingRequestHandler When you start getting into complex HTML extraction, you're probably better off using a SolrJ program with a forgiving HTML parser and extracting the relevant bits yourself and construction a SolrDocument. FWIW, Erick On Tue, Dec 20, 2011 at 12:54 AM, Swapna Vuppala <swapna.vupp...@arup.com> wrote: > Hi, > > I understand that we can specify parameters in ExtractingRequestHandler in > solrconfig.xml to capture HTML tags of a particular type and map them to > desired solr fields, like something below. > > <str name="capture">div</str> > <str name="fmap.div">mysolrfield</str> > > The above setting will capture content in "div" tags and copy to the solr > field "mysolrfield". > > What am interested is in capturing div tags with a particular class name to a > solr field. When extracting content from outlook messages, I would like to > capture the content within <div class="message-body"> to go into a solr field > and the content within <div class="attachment-entry"> to go into another solr > field. > > Can someone please let me know how to achieve this ? > > Thanks and Regards, > Swapna. > > ____________________________________________________________ > Electronic mail messages entering and leaving Arup business > systems are scanned for acceptability of content and viruses