Re: Newbie Design Questions

Noble Paul നോബിള്‍ नोब्ळ् Thu, 22 Jan 2009 11:05:13 -0800

You are out of luck if you are not using a recent version of DIH

The sub entity will work only if you use the FieldReaderDataSource.
Then you do not need a ClobTransformer also.


The trunk version of DIH can be used w/ Solr 1.3 release

On Thu, Jan 22, 2009 at 12:59 PM, Gunaranjan Chandraraju
<chandrar...@apple.com> wrote:
> Hi
>
> Yes, the XML is inside the DB in a clob.     Would love to use XPath inside
> SQLEntityProcessor as it will save me tons of trouble for file-dumping
> (given that I am not able to post it).  This is how I setup my DIH for DB
> import.
>
> <dataConfig>
> <dataSource type="JdbcDataSource" name="data-source-1"
> driver="oracle.jdbc.driver.OracleDriver" url="jdbc:oracle:thin:@XXXXX"
> user="abc" password="***" batchSize="100"/>
>   <document>
>     <entity dataSource="data-source-1"
>                 name ="item" processor="SqlEntityProcessor"
>             pk="ID"
>             stream="false"
>             rootEntity="false"
>             transformer="ClobTransformer"  <!-- custom clob transformer I
> saw and not the one from 1.4.   -->
>             query="select xml_col from xml_table where xml_col IS NOT NULL"
>>   <!-- horrible query I need to work on making it better -->
>
>        <entity
>           dataSource="null"  <!-- this is my problem - if I don't give a
> name here it complains, if I put in null then the code seems to fail with a
> null pointer -->
>           name="record"
>           processor="XPathEntityProcessor"
>           stream="false"
>           url="${item.xml_col}"
>            forEach="/record">
>
>              <field column="ID" xpath="/record/coreinfo/@a" />
>              <field column="type" xpath="/record/coreinfo/@b" />
>              <field column="streetname" xpath="/record/address/@c" />
>
>      .. and so on
>        </entity>
>
>
>     </entity>
>   </document>
> </dataConfig>
>
>
> The problem with this is that it always fails with this error.  I can see
> that the earlier SQL entity extraction and clob transformation is working as
> the values show in the debug jsp (verbose mode with dataimport.jsp).
>  However no records are extracted for entity.  When I check catalina.out
> file, it shows me the following errors for entity name="record". (the XPath
> entity on top).
>
> java.lang.NullPointerException at
> org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:85).
>
> I don't have the whole stack trace right now.  If you need it I would be
> happy to recreate and post it.
>
> Regards,
> Guna
>
> On Jan 21, 2009, at 8:22 PM, Noble Paul നോബിള്‍ नोब्ळ् wrote:
>
>> On Thu, Jan 22, 2009 at 7:02 AM, Gunaranjan Chandraraju
>> <chandrar...@apple.com> wrote:
>>>
>>> Thanks
>>>
>>> Yes the source of data is a DB.  However the xml is also posted on
>>> updates
>>> via publish framework.  So I can just plug in an adapter hear to listen
>>> for
>>> changes and post to SOLR.  I was trying to use the XPathProcessor inside
>>> the
>>> SQLEntityProcessor and this did not work (using 1.3 - I did see support
>>> in
>>> 1.4).  That is not a show stopper for me and I can just post them via the
>>> framework and use files for the first time load.
>>
>> XPathEntityprocessor works inside SqlEntityprocessor only if a db
>> field contains xml.
>>
>> However ,you can have a separate entity (at the root) to read from db
>> for delta.
>> Anyway if your current solution works stick to it.
>>>
>>> Have a seen a couple of answers on the backup for crash scenarios.  just
>>> wanted to confirm - if I replace the index with the backup'ed files then
>>> I
>>> can simple start the up solr again and reindex the documents changed
>>> since
>>> last backup? Am I right?  The slaves will also automatically adjust to
>>> this.
>>
>> Yes. you can replace an archived index and Solr should work just fine.
>> but the docs added since the last snapshot was taken will be missing
>> (of course :) )
>>>
>>> THanks
>>> Guna
>>>
>>>
>>> On Jan 20, 2009, at 9:37 PM, Noble Paul നോബിള്‍ नोब्ळ् wrote:
>>>
>>>> On Wed, Jan 21, 2009 at 5:15 AM, Gunaranjan Chandraraju
>>>> <chandrar...@apple.com> wrote:
>>>>>
>>>>> Hi All
>>>>> We are considering SOLR for a large database of XMLs.  I have some
>>>>> newbie
>>>>> questions - if there is a place I can go read about them do let me know
>>>>> and
>>>>> I will go read up :)
>>>>>
>>>>> 1. Currently we are able to pull the XMLs from a file systems using
>>>>> FileDataSource.  The DIH is convenient since I can map my XML fields
>>>>> using
>>>>> the XPathProcessor. This works for an initial load.    However after
>>>>> the
>>>>> initial load, we would like to 'post' changed xmls to SOLR whenever the
>>>>> XML
>>>>> is updated in a separate system.  I know we can post xmls with 'add'
>>>>> however
>>>>> I was not sure how to do this and maintain the DIH mapping I use in
>>>>> data-config.xml?  I don't want to save the file to the disk and then
>>>>> call
>>>>> the DIH - would prefer to directly post it.  Do I need to use solrj for
>>>>> this?
>>>>
>>>> What is the source of your new data? is it a DB?
>>>>
>>>>>
>>>>> 2.  If my solr schema.xml changes then do I HAVE to reindex all the old
>>>>> documents?  Suppose in future we have newer XML documents that contain
>>>>> a
>>>>> new
>>>>> additional xml field.    The old documents that are already indexed
>>>>> don't
>>>>> have this field and (so) I don't need search on them with this field.
>>>>> However the new ones need to be search-able on this new field.    Can I
>>>>> just add this new field to the SOLR schema, restart the servers just
>>>>> post
>>>>> the new new documents or do I need to reindex everything?
>>>>>
>>>>> 3. Can I backup the index directory.  So that in case of a disk crash -
>>>>> I
>>>>> can restore this directory and bring solr up. I realize that any
>>>>> documents
>>>>> indexed after this backup would be lost - I can however keep track of
>>>>> these
>>>>> outside and simply re-index documents 'newer' than that backup date.
>>>>> This
>>>>> question is really important to me in the context of using a Master
>>>>> Server
>>>>> with replicated index.  I would like to run this backup for the
>>>>> 'Master'.
>>>>
>>>> the snapshot script is can be used to take backups on commit.
>>>>>
>>>>> 4.  In general what happens when the solr application is bounced?  Is
>>>>> the
>>>>> index affected (anything maintained in memory)?
>>>>>
>>>>> Regards
>>>>> Guna
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> --Noble Paul
>>>
>>>
>>
>>
>>
>> --
>> --Noble Paul
>
>



-- 
--Noble Paul

Re: Newbie Design Questions

Reply via email to