Sharing some of our exports from DB to solr. Note: many of the statements below might not work due to clip-clip.
$SOLR_HOME/conf/dataConfig.xml <dataConfig> <dataSource name="myfilereader" type="FileDataSource" /> <document> <entity name="jc" rootEntity="false" dataSource="null" processor="FileListEntityProcessor" fileName="^.*\.xml$" recursive="false" baseDir="$dumpdir" > <entity name="x" pk="uid" processor="XPathEntityProcessor" url="${jc.fileAbsolutePath}" forEach="/entries/entry" transformer="DateFormatTransformer" stream="true" dataSource="myfilereader"> <field column="uid" xpath="/entries/entry/uid" /> <!-- sorry hiding the rest--> </entity> </entity> </document> </dataConfig> # Then add this to solrconfig.xml <requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler"> <lst name="defaults"> <str name="config">data-config.xml</str> </lst> </requestHandler> restart solr Issue a mysql dump mysql --xml -uXXX -pXXX -hXXX -DXXX -e "select MD5(link) as uid,DATE_FORMAT(publishedDate, \"%Y-%m-%dT%H:%i:%sZ\") as publishedDate from X" > $dumpdir/dump.xml # Warning: Note the clean command which will wipe your index... GET "http:// $server:$port/$path/dataimport?command=full-import&clean=true&optimize=true" Hope this helps out some. Cheers //Marcus On Sun, Jul 5, 2009 at 7:28 PM, Francis Yakin <fya...@liquid.com> wrote: > Norberto, > > Yes, DIH is one of the option we think to use, but it's required 1.3.0 and > above and currently we are running Sol 1.2.0. > > I am thinking to use CSV file(Convert the XML to CSV format in Database > machine( , then transport that CSV file to solr box. > In Solr we run the update to convert the CSV file to Lucene index. > > Also , we think the one you suggested, note my question below: > > > > >why not generate your SQL output directly into your oracle server as a > file, > question: What type of file is this(XML or CSV)? > > >upload the file to your SOLR server? Then the data file is local to your > SOLR > >server , you will bypass any WAN and firewall you may be having. (or some > >variation of it, sql -> SOLR server as file, etc..) > > How we upload the file? Do we need to convert the data file to Lucene Index > first? > And Documentation how we do this? > > >Any speed issues that are rooted in the fact that you are posting via > >HTTP (vs embedded solr or DIH) aren't going to go away. But it's the > simpler > >approach without changing too much of your current setup. > > > -----Original Message----- > From: Norberto Meijome [mailto:numard...@gmail.com] > Sent: Sunday, July 05, 2009 3:57 AM > To: Francis Yakin > Cc: solr-user@lucene.apache.org > Subject: Re: Is there any other way to load the index beside using "http" > connection? > > On Thu, 2 Jul 2009 11:02:28 -0700 > Francis Yakin <fya...@liquid.com> wrote: > > > Norberto, Thanks for your input. > > > > What do you mean with "Have you tried connecting to SOLR over HTTP from > > localhost, therefore avoiding any firewall issues and network latency ? > it > > should work a LOT faster than from a remote site." ? > > > > > > Here are how our servers lay out: > > > > 1) Database ( Oracle ) is running on separate machine > > 2) Solr master is running on separate machine by itself > > 3) 6 solr slaves ( these 6 pulll the index from master using rsync) > > > > We have a SQL(Oracle) script to post the data/index from Oracle Database > > machine to Solr Master over http. We wrote those script(Someone in Oracle > > Database administrator write it). > > You said in your other email you are having issues with slow transfers > between > 1) and 2). Your subject relates to the data transfer between 1) and 2, - 2) > and > 3) is irrelevant to this part. > > My question (what you quoted above) relates to the point you made about it > being slow ( WHY is it slow?), and issues with opening so many connections > through firewall. so, I'll rephrase my question (see below...) > > [....] > > > > We can not do localhost since it's solr is not running on Oracle machine. > > why not generate your SQL output directly into your oracle server as a > file, > upload the file to your SOLR server? Then the data file is local to your > SOLR > server , you will bypass any WAN and firewall you may be having. (or some > variation of it, sql -> SOLR server as file, etc..) > > Any speed issues that are rooted in the fact that you are posting via > HTTP (vs embedded solr or DIH) aren't going to go away. But it's the > simpler > approach without changing too much of your current setup. > > > > Another alternative that we think of is to transform XML into CSV and > > import/export it. > > > > How about if LUSQL, some mentioned about this? Is this apps free(open > source) > > application? Do you have any experience with this apps? > > Not i, sorry. > > Have you looked into DIH? It's designed for this kind of work. > > B > _________________________ > {Beto|Norberto|Numard} Meijome > > "Great spirits have often encountered violent opposition from mediocre > minds." > Albert Einstein > > I speak for myself, not my employer. Contents may be hot. Slippery when > wet. > Reading disclaimers makes you go blind. Writing them is worse. You have > been > Warned. > > -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.he...@tailsweep.com http://www.tailsweep.com/