Re: Newbie Design Questions

Noble Paul നോബിള്‍ नोब्ळ् Thu, 22 Jan 2009 20:30:32 -0800

It is planned to be in an another month or so. But it is never too sure.


On Fri, Jan 23, 2009 at 3:57 AM, Gunaranjan Chandraraju
<chandrar...@apple.com> wrote:
> Thanks
>
> A last question - do you have any approximate date for the release of 1.4.
> If its going to be soon enough (within a month or so) then I can plan for
> our development around it.
>
> Thanks
> Guna
>
> On Jan 22, 2009, at 11:04 AM, Noble Paul നോബിള്‍ नोब्ळ् wrote:
>
>> You are out of luck if you are not using a recent version of DIH
>>
>> The sub entity will work only if you use the FieldReaderDataSource.
>> Then you do not need a ClobTransformer also.
>>
>> The trunk version of DIH can be used w/ Solr 1.3 release
>>
>> On Thu, Jan 22, 2009 at 12:59 PM, Gunaranjan Chandraraju
>> <chandrar...@apple.com> wrote:
>>>
>>> Hi
>>>
>>> Yes, the XML is inside the DB in a clob.     Would love to use XPath
>>> inside
>>> SQLEntityProcessor as it will save me tons of trouble for file-dumping
>>> (given that I am not able to post it).  This is how I setup my DIH for DB
>>> import.
>>>
>>> <dataConfig>
>>> <dataSource type="JdbcDataSource" name="data-source-1"
>>> driver="oracle.jdbc.driver.OracleDriver" url="jdbc:oracle:thin:@XXXXX"
>>> user="abc" password="***" batchSize="100"/>
>>>  <document>
>>>   <entity dataSource="data-source-1"
>>>               name ="item" processor="SqlEntityProcessor"
>>>           pk="ID"
>>>           stream="false"
>>>           rootEntity="false"
>>>           transformer="ClobTransformer"  <!-- custom clob transformer I
>>> saw and not the one from 1.4.   -->
>>>           query="select xml_col from xml_table where xml_col IS NOT NULL"
>>>>
>>>>  <!-- horrible query I need to work on making it better -->
>>>
>>>      <entity
>>>         dataSource="null"  <!-- this is my problem - if I don't give a
>>> name here it complains, if I put in null then the code seems to fail with
>>> a
>>> null pointer -->
>>>         name="record"
>>>         processor="XPathEntityProcessor"
>>>         stream="false"
>>>         url="${item.xml_col}"
>>>          forEach="/record">
>>>
>>>            <field column="ID" xpath="/record/coreinfo/@a" />
>>>            <field column="type" xpath="/record/coreinfo/@b" />
>>>            <field column="streetname" xpath="/record/address/@c" />
>>>
>>>    .. and so on
>>>      </entity>
>>>
>>>
>>>   </entity>
>>>  </document>
>>> </dataConfig>
>>>
>>>
>>> The problem with this is that it always fails with this error.  I can see
>>> that the earlier SQL entity extraction and clob transformation is working
>>> as
>>> the values show in the debug jsp (verbose mode with dataimport.jsp).
>>> However no records are extracted for entity.  When I check catalina.out
>>> file, it shows me the following errors for entity name="record". (the
>>> XPath
>>> entity on top).
>>>
>>> java.lang.NullPointerException at
>>>
>>> org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:85).
>>>
>>> I don't have the whole stack trace right now.  If you need it I would be
>>> happy to recreate and post it.
>>>
>>> Regards,
>>> Guna
>>>
>>> On Jan 21, 2009, at 8:22 PM, Noble Paul നോബിള്‍ नोब्ळ् wrote:
>>>
>>>> On Thu, Jan 22, 2009 at 7:02 AM, Gunaranjan Chandraraju
>>>> <chandrar...@apple.com> wrote:
>>>>>
>>>>> Thanks
>>>>>
>>>>> Yes the source of data is a DB.  However the xml is also posted on
>>>>> updates
>>>>> via publish framework.  So I can just plug in an adapter hear to listen
>>>>> for
>>>>> changes and post to SOLR.  I was trying to use the XPathProcessor
>>>>> inside
>>>>> the
>>>>> SQLEntityProcessor and this did not work (using 1.3 - I did see support
>>>>> in
>>>>> 1.4).  That is not a show stopper for me and I can just post them via
>>>>> the
>>>>> framework and use files for the first time load.
>>>>
>>>> XPathEntityprocessor works inside SqlEntityprocessor only if a db
>>>> field contains xml.
>>>>
>>>> However ,you can have a separate entity (at the root) to read from db
>>>> for delta.
>>>> Anyway if your current solution works stick to it.
>>>>>
>>>>> Have a seen a couple of answers on the backup for crash scenarios.
>>>>>  just
>>>>> wanted to confirm - if I replace the index with the backup'ed files
>>>>> then
>>>>> I
>>>>> can simple start the up solr again and reindex the documents changed
>>>>> since
>>>>> last backup? Am I right?  The slaves will also automatically adjust to
>>>>> this.
>>>>
>>>> Yes. you can replace an archived index and Solr should work just fine.
>>>> but the docs added since the last snapshot was taken will be missing
>>>> (of course :) )
>>>>>
>>>>> THanks
>>>>> Guna
>>>>>
>>>>>
>>>>> On Jan 20, 2009, at 9:37 PM, Noble Paul നോബിള്‍ नोब्ळ् wrote:
>>>>>
>>>>>> On Wed, Jan 21, 2009 at 5:15 AM, Gunaranjan Chandraraju
>>>>>> <chandrar...@apple.com> wrote:
>>>>>>>
>>>>>>> Hi All
>>>>>>> We are considering SOLR for a large database of XMLs.  I have some
>>>>>>> newbie
>>>>>>> questions - if there is a place I can go read about them do let me
>>>>>>> know
>>>>>>> and
>>>>>>> I will go read up :)
>>>>>>>
>>>>>>> 1. Currently we are able to pull the XMLs from a file systems using
>>>>>>> FileDataSource.  The DIH is convenient since I can map my XML fields
>>>>>>> using
>>>>>>> the XPathProcessor. This works for an initial load.    However after
>>>>>>> the
>>>>>>> initial load, we would like to 'post' changed xmls to SOLR whenever
>>>>>>> the
>>>>>>> XML
>>>>>>> is updated in a separate system.  I know we can post xmls with 'add'
>>>>>>> however
>>>>>>> I was not sure how to do this and maintain the DIH mapping I use in
>>>>>>> data-config.xml?  I don't want to save the file to the disk and then
>>>>>>> call
>>>>>>> the DIH - would prefer to directly post it.  Do I need to use solrj
>>>>>>> for
>>>>>>> this?
>>>>>>
>>>>>> What is the source of your new data? is it a DB?
>>>>>>
>>>>>>>
>>>>>>> 2.  If my solr schema.xml changes then do I HAVE to reindex all the
>>>>>>> old
>>>>>>> documents?  Suppose in future we have newer XML documents that
>>>>>>> contain
>>>>>>> a
>>>>>>> new
>>>>>>> additional xml field.    The old documents that are already indexed
>>>>>>> don't
>>>>>>> have this field and (so) I don't need search on them with this field.
>>>>>>> However the new ones need to be search-able on this new field.    Can
>>>>>>> I
>>>>>>> just add this new field to the SOLR schema, restart the servers just
>>>>>>> post
>>>>>>> the new new documents or do I need to reindex everything?
>>>>>>>
>>>>>>> 3. Can I backup the index directory.  So that in case of a disk crash
>>>>>>> -
>>>>>>> I
>>>>>>> can restore this directory and bring solr up. I realize that any
>>>>>>> documents
>>>>>>> indexed after this backup would be lost - I can however keep track of
>>>>>>> these
>>>>>>> outside and simply re-index documents 'newer' than that backup date.
>>>>>>> This
>>>>>>> question is really important to me in the context of using a Master
>>>>>>> Server
>>>>>>> with replicated index.  I would like to run this backup for the
>>>>>>> 'Master'.
>>>>>>
>>>>>> the snapshot script is can be used to take backups on commit.
>>>>>>>
>>>>>>> 4.  In general what happens when the solr application is bounced?  Is
>>>>>>> the
>>>>>>> index affected (anything maintained in memory)?
>>>>>>>
>>>>>>> Regards
>>>>>>> Guna
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> --Noble Paul
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> --Noble Paul
>>>
>>>
>>
>>
>>
>> --
>> --Noble Paul
>
>



-- 
--Noble Paul

Re: Newbie Design Questions

Reply via email to