Hi
Yes, the XML is inside the DB in a clob. Would love to use XPath
inside SQLEntityProcessor as it will save me tons of trouble for file-
dumping (given that I am not able to post it). This is how I setup my
DIH for DB import.
<dataConfig>
<dataSource type="JdbcDataSource" name="data-source-1"
driver="oracle.jdbc.driver.OracleDriver" url="jdbc:oracle:thin:@XXXXX"
user="abc" password="***" batchSize="100"/>
<document>
<entity dataSource="data-source-1"
name ="item" processor="SqlEntityProcessor"
pk="ID"
stream="false"
rootEntity="false"
transformer="ClobTransformer" <!-- custom clob
transformer I saw and not the one from 1.4. -->
query="select xml_col from xml_table where xml_col IS
NOT NULL" > <!-- horrible query I need to work on making it better -->
<entity
dataSource="null" <!-- this is my problem - if I don't
give a name here it complains, if I put in null then the code seems to
fail with a null pointer -->
name="record"
processor="XPathEntityProcessor"
stream="false"
url="${item.xml_col}"
forEach="/record">
<field column="ID" xpath="/record/coreinfo/@a" />
<field column="type" xpath="/record/coreinfo/@b" />
<field column="streetname" xpath="/record/address/@c" />
.. and so on
</entity>
</entity>
</document>
</dataConfig>
The problem with this is that it always fails with this error. I can
see that the earlier SQL entity extraction and clob transformation is
working as the values show in the debug jsp (verbose mode with
dataimport.jsp). However no records are extracted for entity. When I
check catalina.out file, it shows me the following errors for entity
name="record". (the XPath entity on top).
java.lang.NullPointerException at
org
.apache
.solr
.handler
.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:85).
I don't have the whole stack trace right now. If you need it I would
be happy to recreate and post it.
Regards,
Guna
On Jan 21, 2009, at 8:22 PM, Noble Paul നോബിള്
नोब्ळ् wrote:
On Thu, Jan 22, 2009 at 7:02 AM, Gunaranjan Chandraraju
<chandrar...@apple.com> wrote:
Thanks
Yes the source of data is a DB. However the xml is also posted on
updates
via publish framework. So I can just plug in an adapter hear to
listen for
changes and post to SOLR. I was trying to use the XPathProcessor
inside the
SQLEntityProcessor and this did not work (using 1.3 - I did see
support in
1.4). That is not a show stopper for me and I can just post them
via the
framework and use files for the first time load.
XPathEntityprocessor works inside SqlEntityprocessor only if a db
field contains xml.
However ,you can have a separate entity (at the root) to read from db
for delta.
Anyway if your current solution works stick to it.
Have a seen a couple of answers on the backup for crash scenarios.
just
wanted to confirm - if I replace the index with the backup'ed files
then I
can simple start the up solr again and reindex the documents
changed since
last backup? Am I right? The slaves will also automatically adjust
to this.
Yes. you can replace an archived index and Solr should work just fine.
but the docs added since the last snapshot was taken will be missing
(of course :) )
THanks
Guna
On Jan 20, 2009, at 9:37 PM, Noble Paul നോബിള്
नोब्ळ् wrote:
On Wed, Jan 21, 2009 at 5:15 AM, Gunaranjan Chandraraju
<chandrar...@apple.com> wrote:
Hi All
We are considering SOLR for a large database of XMLs. I have
some newbie
questions - if there is a place I can go read about them do let
me know
and
I will go read up :)
1. Currently we are able to pull the XMLs from a file systems using
FileDataSource. The DIH is convenient since I can map my XML
fields
using
the XPathProcessor. This works for an initial load. However
after the
initial load, we would like to 'post' changed xmls to SOLR
whenever the
XML
is updated in a separate system. I know we can post xmls with
'add'
however
I was not sure how to do this and maintain the DIH mapping I use in
data-config.xml? I don't want to save the file to the disk and
then call
the DIH - would prefer to directly post it. Do I need to use
solrj for
this?
What is the source of your new data? is it a DB?
2. If my solr schema.xml changes then do I HAVE to reindex all
the old
documents? Suppose in future we have newer XML documents that
contain a
new
additional xml field. The old documents that are already
indexed don't
have this field and (so) I don't need search on them with this
field.
However the new ones need to be search-able on this new field.
Can I
just add this new field to the SOLR schema, restart the servers
just post
the new new documents or do I need to reindex everything?
3. Can I backup the index directory. So that in case of a disk
crash - I
can restore this directory and bring solr up. I realize that any
documents
indexed after this backup would be lost - I can however keep
track of
these
outside and simply re-index documents 'newer' than that backup
date.
This
question is really important to me in the context of using a Master
Server
with replicated index. I would like to run this backup for the
'Master'.
the snapshot script is can be used to take backups on commit.
4. In general what happens when the solr application is
bounced? Is the
index affected (anything maintained in memory)?
Regards
Guna
--
--Noble Paul
--
--Noble Paul