Googling 'solrj examples' points right to a great example here:

http://wiki.apache.org/solr/Solrj

Best
Erick

On Wed, Dec 21, 2011 at 12:04 AM, Swapna Vuppala
<swapna.vupp...@arup.com> wrote:
> Hi Erick,
>
> Can you please give me little more information about SolrJ program and how to 
> use it to construct a Solr document ?
>
> Thanks and Regards,
> Swapna.
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Wednesday, December 21, 2011 2:28 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Mapping and Capture in ExtractingRequestHandler
>
> When you start getting into complex HTML extraction, you're probably
> better off using a SolrJ program with a forgiving HTML parser
> and extracting the relevant bits yourself and construction a
> SolrDocument.
>
> FWIW,
> Erick
>
> On Tue, Dec 20, 2011 at 12:54 AM, Swapna Vuppala
> <swapna.vupp...@arup.com> wrote:
>> Hi,
>>
>> I understand that we can specify parameters in ExtractingRequestHandler in 
>> solrconfig.xml to capture HTML tags of a particular type and map them to 
>> desired solr fields, like something below.
>>
>> <str name="capture">div</str>
>> <str name="fmap.div">mysolrfield</str>
>>
>> The above setting will capture content in "div" tags and copy to the solr 
>> field "mysolrfield".
>>
>> What am interested is in capturing div tags with a particular class name to 
>> a solr field. When extracting content from outlook messages, I would like to 
>> capture the content within <div class="message-body"> to go into a solr 
>> field and the content within <div class="attachment-entry"> to go into 
>> another solr field.
>>
>> Can someone please let me know how to achieve this ?
>>
>> Thanks and Regards,
>> Swapna.
>>
>> ____________________________________________________________
>> Electronic mail messages entering and leaving Arup  business
>> systems are scanned for acceptability of content and viruses

Reply via email to