Googling 'solrj examples' points right to a great example here: http://wiki.apache.org/solr/Solrj
Best Erick On Wed, Dec 21, 2011 at 12:04 AM, Swapna Vuppala <swapna.vupp...@arup.com> wrote: > Hi Erick, > > Can you please give me little more information about SolrJ program and how to > use it to construct a Solr document ? > > Thanks and Regards, > Swapna. > > -----Original Message----- > From: Erick Erickson [mailto:erickerick...@gmail.com] > Sent: Wednesday, December 21, 2011 2:28 AM > To: solr-user@lucene.apache.org > Subject: Re: Mapping and Capture in ExtractingRequestHandler > > When you start getting into complex HTML extraction, you're probably > better off using a SolrJ program with a forgiving HTML parser > and extracting the relevant bits yourself and construction a > SolrDocument. > > FWIW, > Erick > > On Tue, Dec 20, 2011 at 12:54 AM, Swapna Vuppala > <swapna.vupp...@arup.com> wrote: >> Hi, >> >> I understand that we can specify parameters in ExtractingRequestHandler in >> solrconfig.xml to capture HTML tags of a particular type and map them to >> desired solr fields, like something below. >> >> <str name="capture">div</str> >> <str name="fmap.div">mysolrfield</str> >> >> The above setting will capture content in "div" tags and copy to the solr >> field "mysolrfield". >> >> What am interested is in capturing div tags with a particular class name to >> a solr field. When extracting content from outlook messages, I would like to >> capture the content within <div class="message-body"> to go into a solr >> field and the content within <div class="attachment-entry"> to go into >> another solr field. >> >> Can someone please let me know how to achieve this ? >> >> Thanks and Regards, >> Swapna. >> >> ____________________________________________________________ >> Electronic mail messages entering and leaving Arup business >> systems are scanned for acceptability of content and viruses