Re: updating existing data in index vs inserting new data in index

2011-08-13 Thread Alexandre Sompheng
Hi Mark,

I guess the "commit=true" when doing a "delta-import" is the solution for
the JIRA I just submit SOLR-2711.
Can you explain to me where you configured this info commit=true ?

thanks,
Alex

On Thu, Jul 7, 2011 at 6:44 PM, Mark juszczec wrote:

> First thanks for all the help.
>
> I think the problem was a combination of not having a unique key defined
> AND
> not including the commit=true parameter in the delta update.
>
> Once I did those things, the delta import left me with a single (updated)
> copy of the record including the changes in the source database.
>
> Do I have write access to the Wiki so I can explicitly state commit=true
> NEEDS to be specified?
>
> Mark
>
>
> On Thu, Jul 7, 2011 at 12:39 PM, Erick Erickson  >wrote:
>
> > I'd restart Solr after changing the schema.xml. The delta import does NOT
> > require restart or anything else like that.
> >
> > The fact that two records are displayed is not what I'd expect. But Solr
> > absolutely handles the replace via . So I suspect that you're
> > not actually doing what you expect. A little-known aid for debugging DIH
> > is solr/admin/dataimport.jsp, that might give you some joy.
> >
> > But, to summarize. This should work fine for DIH as far as Solr is
> > concerned
> > assuming that  is properly defined. In you query above that
> > returns two documents, can you paste the entire response with &fl=*
> > attached?
> > I'm guessing that the data in your index isn't what you're expecting...
> >
> > Also, you might want to get a copy of Luke and examine your index,
> there's
> > a
> > wealth of infomration
> >
> >
> > Best
> > Erick
> >
> >
> > On Thu, Jul 7, 2011 at 11:12 AM, Mark juszczec 
> > wrote:
> > > Erick
> > >
> > > I used to, but now I find I must have commented it out in a fit of rage
> > ;-)
> > >
> > > This could be the whole problem.
> > >
> > > I have verified via admin schema browser that the field is ORDER_ID and
> > will
> > > double check I refer to it in upper case in the appropriate places in
> the
> > > Solr config scheme.
> > >
> > > Curiously, the admin schema browser display for ORDER_ID says
> > "hasDeletions:
> > > false"  - which seems the opposite of what I want.  I want to be able
> to
> > > delete duplicates.  Or am I interpreting this field wrong?
> > >
> > > In order to check for duplicates, I am going to using the admin browser
> > to
> > > enter the following in the Make A Query box:
> > >
> > > TABLE_ID:1 AND ORDER_ID:674659
> > >
> > > When I click search and view the results, 2 records are displayed.  One
> > has
> > > the original values, one has the changed values.  I haven't examined
> the
> > xml
> > > (via view source) too closely and the next time I run I will look for
> > > something indicating one of the records is inactive.
> > >
> > > When you say "change your schema" do you mean via a delta import or by
> > > modifying the config files or both?  FWIW, I am deleting the index on
> the
> > > file system, doing a full import, modifying the data in the database
> and
> > > then doing a delta import.
> > >
> > > I am not restarting Solr at all in this process.
> > >
> > > I understand Solr does not perform key management.  You described
> exactly
> > > what I meant.  Sorry for any confusion.
> > >
> > > Mark
> > >
> > > On Thu, Jul 7, 2011 at 10:52 AM, Erick Erickson <
> erickerick...@gmail.com
> > >wrote:
> > >
> > >> Let me re-state a few things to see if I've got it right:
> > >>
> > >> > your schema.xml file has an entry like
> > order_id,
> > >> right?
> > >>
> > >> > given this definition, any document added with an order_id that
> > already
> > >> exists in the
> > >>   Solr index will be replaced. i.e. you should have one and only one
> > >> document with a
> > >>   given order_id.
> > >>
> > >> > case matters. Check via the admin page ("schema browser") to see if
> > you
> > >> have
> > >>   two fields, order_id an ORDER_ID.
> > >>
> > >> > How are you checking that your docs are duplicates? If you do a
> search
> > on
> > >>   order_id, you should get back one and only one document (assuming
> the
> > >>   definition above). A document that's deleted will just be marked as
> > >> deleted,
> > >>   the data won't be purged from the index. It won't show in search
> > results,
> > >> but
> > >>   it will show if you use lower-level ways to access the data.
> > >>
> > >> > Whenever you change your schema, it's best to clean the index,
> restart
> > >> the server and
> > >>re-index from scratch. Solr won't retroactively remove duplicate
> > >>  entries.
> > >>
> > >> > On the stats admin/stats page you should see maxDocs and numDocs.
> The
> > >> difference
> > >>   between these should be the number of deleted documents.
> > >>
> > >> > Solr doesn't "manage" unique keys. All that happens is Solr will
> > replace
> > >> any
> > >>   pre-existing documents where *you've* defined the  when a
> > >>   new doc is added...
> > >>
> > >> Hope this helps
> > >> Erick
> > >>
> > >> On Thu, 

Re: updating existing data in index vs inserting new data in index

2011-08-13 Thread Alexandre Sompheng
Actually I requested  .../dataimport?command=delta-import&commit=true
And DIH in delta-import mode does not commit. Do you have any guess ???


INFO: Starting Delta Import

Aug 14, 2011 1:42:02 AM org.apache.solr.core.SolrCore execute

INFO: [] webapp=/apache-solr-3.3.0 path=/dataimport
params={commit=true&command=delta-import} status=0 QTime=0

Aug 14, 2011 1:42:02 AM org.apache.solr.handler.dataimport.SolrWriter
readIndexerProperties

INFO: Read dataimport.properties

Aug 14, 2011 1:42:02 AM org.apache.solr.handler.dataimport.DocBuilder
doDelta

INFO: Starting delta collection.

Aug 14, 2011 1:42:02 AM org.apache.solr.handler.dataimport.DocBuilder
collectDelta

INFO: Running ModifiedRowKey() for Entity: event

Aug 14, 2011 1:42:02 AM org.apache.solr.handler.dataimport.JdbcDataSource$1
call

INFO: Creating a connection for entity event with URL: jdbc:mysql://
85.168.123.207:3306/AGENDA

Aug 14, 2011 1:42:03 AM org.apache.solr.handler.dataimport.JdbcDataSource$1
call

INFO: Time taken for getConnection(): 865

Aug 14, 2011 1:42:03 AM org.apache.solr.handler.dataimport.DocBuilder
collectDelta

INFO: Completed ModifiedRowKey for Entity: event rows obtained : 3

Aug 14, 2011 1:42:03 AM org.apache.solr.handler.dataimport.DocBuilder
collectDelta

INFO: Completed DeletedRowKey for Entity: event rows obtained : 0

Aug 14, 2011 1:42:03 AM org.apache.solr.handler.dataimport.DocBuilder
collectDelta

INFO: Completed parentDeltaQuery for Entity: event

Aug 14, 2011 1:42:03 AM org.apache.solr.handler.dataimport.DocBuilder
doDelta

INFO: Delta Import completed successfully

Aug 14, 2011 1:42:03 AM org.apache.solr.update.processor.LogUpdateProcessor
finish

INFO: {} 0 0

Aug 14, 2011 1:42:03 AM org.apache.solr.handler.dataimport.DocBuilder
execute

INFO: Time taken = 0:0:1.282


On Sun, Aug 14, 2011 at 1:39 AM, Alexandre Sompheng wrote:

> Hi Mark,
>
> I guess the "commit=true" when doing a "delta-import" is the solution for
> the JIRA I just submit SOLR-2711.
> Can you explain to me where you configured this info commit=true ?
>
> thanks,
> Alex
>
>
> On Thu, Jul 7, 2011 at 6:44 PM, Mark juszczec wrote:
>
>> First thanks for all the help.
>>
>> I think the problem was a combination of not having a unique key defined
>> AND
>> not including the commit=true parameter in the delta update.
>>
>> Once I did those things, the delta import left me with a single (updated)
>> copy of the record including the changes in the source database.
>>
>> Do I have write access to the Wiki so I can explicitly state commit=true
>> NEEDS to be specified?
>>
>> Mark
>>
>>
>> On Thu, Jul 7, 2011 at 12:39 PM, Erick Erickson > >wrote:
>>
>> > I'd restart Solr after changing the schema.xml. The delta import does
>> NOT
>> > require restart or anything else like that.
>> >
>> > The fact that two records are displayed is not what I'd expect. But Solr
>> > absolutely handles the replace via . So I suspect that you're
>> > not actually doing what you expect. A little-known aid for debugging DIH
>> > is solr/admin/dataimport.jsp, that might give you some joy.
>> >
>> > But, to summarize. This should work fine for DIH as far as Solr is
>> > concerned
>> > assuming that  is properly defined. In you query above that
>> > returns two documents, can you paste the entire response with &fl=*
>> > attached?
>> > I'm guessing that the data in your index isn't what you're expecting...
>> >
>> > Also, you might want to get a copy of Luke and examine your index,
>> there's
>> > a
>> > wealth of infomration
>> >
>> >
>> > Best
>> > Erick
>> >
>> >
>> > On Thu, Jul 7, 2011 at 11:12 AM, Mark juszczec > >
>> > wrote:
>> > > Erick
>> > >
>> > > I used to, but now I find I must have commented it out in a fit of
>> rage
>> > ;-)
>> > >
>> > > This could be the whole problem.
>> > >
>> > > I have verified via admin schema browser that the field is ORDER_ID
>> and
>> > will
>> > > double check I refer to it in upper case in the appropriate places in
>> the
>> > > Solr config scheme.
>> > >
>> > > Curiously, the admin schema browser display for ORDER_ID says
>> > "hasDeletions:
>> > > false"  - which seems the opposite of what I want.  I want to be able
>> to
>> > > delete duplicates.  Or am I interpreting this field wrong?
>> > >
>> > > In order to check for duplicates, 

Re: get update record from database using DIH

2011-08-18 Thread Alexandre Sompheng
Hi guys, i try the delta import, i got logs saying that it found delta
data to update. But it seems that the index is not updated. Amy guess
why this happens ? Did i miss something? I'm on solr 3.3 with no
patch.

Thanks

On 18 août 2011, at 18:10, Dali  wrote:

Hello

Take a look at the delta import example :
http://wiki.apache.org/solr/DataImportHandler
http://wiki.apache.org/solr/DataImportHandler

Regards

--
View this message in context:
http://lucene.472066.n3.nabble.com/get-update-record-from-database-using-DIH-tp3264126p3264393.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: get update record from database using DIH

2011-08-20 Thread Alexandre Sompheng
Actually I requested  .../dataimport?command=delta-import&commit=true
And DIH in delta-import mode does not commit, you can se log below. My index
is quite empty, maybe 10 data rows max... It's just the beginning.


INFO: Starting Delta Import

Aug 14, 2011 1:42:02 AM org.apache.solr.core.SolrCore execute

INFO: [] webapp=/apache-solr-3.3.0 path=/dataimport
params={commit=true&command=delta-import} status=0 QTime=0

Aug 14, 2011 1:42:02 AM org.apache.solr.handler.dataimport.SolrWriter
readIndexerProperties

INFO: Read dataimport.properties

Aug 14, 2011 1:42:02 AM org.apache.solr.handler.dataimport.DocBuilder
doDelta

INFO: Starting delta collection.

Aug 14, 2011 1:42:02 AM org.apache.solr.handler.dataimport.DocBuilder
collectDelta

INFO: Running ModifiedRowKey() for Entity: event

Aug 14, 2011 1:42:02 AM org.apache.solr.handler.dataimport.JdbcDataSource$1
call

INFO: Creating a connection for entity event with URL: jdbc:mysql://
85.168.123.207:3306/AGENDA

Aug 14, 2011 1:42:03 AM org.apache.solr.handler.dataimport.JdbcDataSource$1
call

INFO: Time taken for getConnection(): 865

Aug 14, 2011 1:42:03 AM org.apache.solr.handler.dataimport.DocBuilder
collectDelta

INFO: Completed ModifiedRowKey for Entity: event rows obtained : 3

Aug 14, 2011 1:42:03 AM org.apache.solr.handler.dataimport.DocBuilder
collectDelta

INFO: Completed DeletedRowKey for Entity: event rows obtained : 0

Aug 14, 2011 1:42:03 AM org.apache.solr.handler.dataimport.DocBuilder
collectDelta

INFO: Completed parentDeltaQuery for Entity: event

Aug 14, 2011 1:42:03 AM org.apache.solr.handler.dataimport.DocBuilder
doDelta

INFO: Delta Import completed successfully

Aug 14, 2011 1:42:03 AM org.apache.solr.update.processor.LogUpdateProcessor
finish

INFO: {} 0 0

Aug 14, 2011 1:42:03 AM org.apache.solr.handler.dataimport.DocBuilder
execute

INFO: Time taken = 0:0:1.282


On 19 août 2011, at 10:39, Gora Mohanty  wrote:

On Fri, Aug 19, 2011 at 5:32 AM, Alexandre Sompheng 
wrote:

Hi guys, i try the delta import, i got logs saying that it found delta

data to update. But it seems that the index is not updated. Amy guess

why this happens ? Did i miss something? I'm on solr 3.3 with no

patch.

[...]

Please show us the following:
* The exact URL you loaded for delta-import
* The Solr response which shows the delta documents that it found,
  and the status of the delta-import.
If your index is large, and if you are running an optimise after the
delta-import (the default is to optimise), it can take some time.
Check the status: It will say "busy" if the optimise is still running.

Regards,
Gora