Sharing some of our exports from DB to solr. Note: many of the statements
below might not work due to clip-clip.

$SOLR_HOME/conf/dataConfig.xml
<dataConfig>
  <dataSource name="myfilereader" type="FileDataSource" />
   <document>
               <entity name="jc" rootEntity="false" dataSource="null"
                processor="FileListEntityProcessor"
                fileName="^.*\.xml$" recursive="false"
                baseDir="$dumpdir"
                >
               <entity name="x"
                        pk="uid"
                        processor="XPathEntityProcessor"
                        url="${jc.fileAbsolutePath}"
                        forEach="/entries/entry"
                        transformer="DateFormatTransformer"
                        stream="true"
                        dataSource="myfilereader">
                        <field column="uid"     xpath="/entries/entry/uid"
/>
                        <!-- sorry hiding the rest-->
              </entity>
              </entity>
    </document>
</dataConfig>

# Then add this to solrconfig.xml
<requestHandler name="/dataimport"
class="org.apache.solr.handler.dataimport.DataImportHandler">
    <lst name="defaults">
      <str name="config">data-config.xml</str>
    </lst>
</requestHandler>

restart solr

Issue a mysql dump
mysql --xml -uXXX -pXXX -hXXX -DXXX -e "select MD5(link) as
uid,DATE_FORMAT(publishedDate, \"%Y-%m-%dT%H:%i:%sZ\") as publishedDate from
X" > $dumpdir/dump.xml

# Warning: Note the clean command which will wipe your index...
GET "http://
$server:$port/$path/dataimport?command=full-import&clean=true&optimize=true"

Hope this helps out some.

Cheers

//Marcus


On Sun, Jul 5, 2009 at 7:28 PM, Francis Yakin <fya...@liquid.com> wrote:

>  Norberto,
>
> Yes, DIH is one of the option we think to use, but it's required 1.3.0 and
> above and currently we are running Sol 1.2.0.
>
> I am thinking to use CSV file(Convert the XML to CSV format in Database
> machine( , then transport that CSV file to solr box.
> In Solr we run the update to convert the CSV file to Lucene index.
>
> Also , we think the one you suggested, note my question below:
>
> >
> >why not generate your SQL output directly into your oracle server as a
> file,
>   question:  What type of file is this(XML or CSV)?
>
> >upload the file to your SOLR server? Then the data file is local to your
> SOLR
> >server , you will bypass any WAN and firewall you may be having. (or some
> >variation of it, sql -> SOLR server as file, etc..)
>
> How we upload the file? Do we need to convert the data file to Lucene Index
> first?
>  And Documentation how we do this?
>
> >Any speed issues that are rooted in the fact that you are posting via
> >HTTP (vs embedded solr or DIH) aren't going to go away. But it's the
> simpler
> >approach without changing too much of your current setup.
>
>
> -----Original Message-----
> From: Norberto Meijome [mailto:numard...@gmail.com]
> Sent: Sunday, July 05, 2009 3:57 AM
> To: Francis Yakin
> Cc: solr-user@lucene.apache.org
> Subject: Re: Is there any other way to load the index beside using "http"
> connection?
>
> On Thu, 2 Jul 2009 11:02:28 -0700
> Francis Yakin <fya...@liquid.com> wrote:
>
> > Norberto, Thanks for your input.
> >
> > What do you mean with "Have you tried connecting to  SOLR over HTTP from
> > localhost, therefore avoiding any firewall issues and network latency ?
> it
> > should work a LOT faster than from a remote site." ?
> >
> >
> > Here are how our servers lay out:
> >
> > 1) Database ( Oracle ) is running on separate machine
> > 2) Solr master is running on separate machine by itself
> > 3) 6 solr slaves ( these 6 pulll the index from master using rsync)
> >
> > We have a SQL(Oracle) script to post the data/index from Oracle Database
> > machine to Solr Master over http. We wrote those script(Someone in Oracle
> > Database administrator write it).
>
> You said in your other email you are having issues with slow transfers
> between
> 1) and 2). Your subject relates to the data transfer between 1) and 2, - 2)
> and
> 3) is irrelevant to this part.
>
> My question (what you quoted above) relates to the point you made about it
> being slow ( WHY is it slow?), and issues with opening so many connections
> through firewall. so, I'll rephrase my question (see below...)
>
> [....]
> >
> > We can not do localhost since it's solr is not running on Oracle machine.
>
> why not generate your SQL output directly into your oracle server as a
> file,
> upload the file to your SOLR server? Then the data file is local to your
> SOLR
> server , you will bypass any WAN and firewall you may be having. (or some
> variation of it, sql -> SOLR server as file, etc..)
>
> Any speed issues that are rooted in the fact that you are posting via
> HTTP (vs embedded solr or DIH) aren't going to go away. But it's the
> simpler
> approach without changing too much of your current setup.
>
>
> > Another alternative that we think of is to transform XML into CSV and
> > import/export it.
> >
> > How about if LUSQL, some mentioned about this? Is this apps free(open
> source)
> > application? Do you have any experience with this apps?
>
> Not i, sorry.
>
> Have you looked into DIH? It's designed for this kind of work.
>
> B
> _________________________
> {Beto|Norberto|Numard} Meijome
>
> "Great spirits have often encountered violent opposition from mediocre
> minds."
>  Albert Einstein
>
> I speak for myself, not my employer. Contents may be hot. Slippery when
> wet.
> Reading disclaimers makes you go blind. Writing them is worse. You have
> been
> Warned.
>
>


-- 
Marcus Herou CTO and co-founder Tailsweep AB
+46702561312
marcus.he...@tailsweep.com
http://www.tailsweep.com/

Reply via email to