Re: updating existing data in index vs inserting new data in index
Hi Mark, I guess the "commit=true" when doing a "delta-import" is the solution for the JIRA I just submit SOLR-2711. Can you explain to me where you configured this info commit=true ? thanks, Alex On Thu, Jul 7, 2011 at 6:44 PM, Mark juszczec wrote: > First thanks for all the help. > > I think the problem was a combination of not having a unique key defined > AND > not including the commit=true parameter in the delta update. > > Once I did those things, the delta import left me with a single (updated) > copy of the record including the changes in the source database. > > Do I have write access to the Wiki so I can explicitly state commit=true > NEEDS to be specified? > > Mark > > > On Thu, Jul 7, 2011 at 12:39 PM, Erick Erickson >wrote: > > > I'd restart Solr after changing the schema.xml. The delta import does NOT > > require restart or anything else like that. > > > > The fact that two records are displayed is not what I'd expect. But Solr > > absolutely handles the replace via . So I suspect that you're > > not actually doing what you expect. A little-known aid for debugging DIH > > is solr/admin/dataimport.jsp, that might give you some joy. > > > > But, to summarize. This should work fine for DIH as far as Solr is > > concerned > > assuming that is properly defined. In you query above that > > returns two documents, can you paste the entire response with &fl=* > > attached? > > I'm guessing that the data in your index isn't what you're expecting... > > > > Also, you might want to get a copy of Luke and examine your index, > there's > > a > > wealth of infomration > > > > > > Best > > Erick > > > > > > On Thu, Jul 7, 2011 at 11:12 AM, Mark juszczec > > wrote: > > > Erick > > > > > > I used to, but now I find I must have commented it out in a fit of rage > > ;-) > > > > > > This could be the whole problem. > > > > > > I have verified via admin schema browser that the field is ORDER_ID and > > will > > > double check I refer to it in upper case in the appropriate places in > the > > > Solr config scheme. > > > > > > Curiously, the admin schema browser display for ORDER_ID says > > "hasDeletions: > > > false" - which seems the opposite of what I want. I want to be able > to > > > delete duplicates. Or am I interpreting this field wrong? > > > > > > In order to check for duplicates, I am going to using the admin browser > > to > > > enter the following in the Make A Query box: > > > > > > TABLE_ID:1 AND ORDER_ID:674659 > > > > > > When I click search and view the results, 2 records are displayed. One > > has > > > the original values, one has the changed values. I haven't examined > the > > xml > > > (via view source) too closely and the next time I run I will look for > > > something indicating one of the records is inactive. > > > > > > When you say "change your schema" do you mean via a delta import or by > > > modifying the config files or both? FWIW, I am deleting the index on > the > > > file system, doing a full import, modifying the data in the database > and > > > then doing a delta import. > > > > > > I am not restarting Solr at all in this process. > > > > > > I understand Solr does not perform key management. You described > exactly > > > what I meant. Sorry for any confusion. > > > > > > Mark > > > > > > On Thu, Jul 7, 2011 at 10:52 AM, Erick Erickson < > erickerick...@gmail.com > > >wrote: > > > > > >> Let me re-state a few things to see if I've got it right: > > >> > > >> > your schema.xml file has an entry like > > order_id, > > >> right? > > >> > > >> > given this definition, any document added with an order_id that > > already > > >> exists in the > > >> Solr index will be replaced. i.e. you should have one and only one > > >> document with a > > >> given order_id. > > >> > > >> > case matters. Check via the admin page ("schema browser") to see if > > you > > >> have > > >> two fields, order_id an ORDER_ID. > > >> > > >> > How are you checking that your docs are duplicates? If you do a > search > > on > > >> order_id, you should get back one and only one document (assuming > the > > >> definition above). A document that's deleted will just be marked as > > >> deleted, > > >> the data won't be purged from the index. It won't show in search > > results, > > >> but > > >> it will show if you use lower-level ways to access the data. > > >> > > >> > Whenever you change your schema, it's best to clean the index, > restart > > >> the server and > > >>re-index from scratch. Solr won't retroactively remove duplicate > > >> entries. > > >> > > >> > On the stats admin/stats page you should see maxDocs and numDocs. > The > > >> difference > > >> between these should be the number of deleted documents. > > >> > > >> > Solr doesn't "manage" unique keys. All that happens is Solr will > > replace > > >> any > > >> pre-existing documents where *you've* defined the when a > > >> new doc is added... > > >> > > >> Hope this helps > > >> Erick > > >> > > >> On Thu,
Re: updating existing data in index vs inserting new data in index
Actually I requested .../dataimport?command=delta-import&commit=true And DIH in delta-import mode does not commit. Do you have any guess ??? INFO: Starting Delta Import Aug 14, 2011 1:42:02 AM org.apache.solr.core.SolrCore execute INFO: [] webapp=/apache-solr-3.3.0 path=/dataimport params={commit=true&command=delta-import} status=0 QTime=0 Aug 14, 2011 1:42:02 AM org.apache.solr.handler.dataimport.SolrWriter readIndexerProperties INFO: Read dataimport.properties Aug 14, 2011 1:42:02 AM org.apache.solr.handler.dataimport.DocBuilder doDelta INFO: Starting delta collection. Aug 14, 2011 1:42:02 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Running ModifiedRowKey() for Entity: event Aug 14, 2011 1:42:02 AM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Creating a connection for entity event with URL: jdbc:mysql:// 85.168.123.207:3306/AGENDA Aug 14, 2011 1:42:03 AM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Time taken for getConnection(): 865 Aug 14, 2011 1:42:03 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed ModifiedRowKey for Entity: event rows obtained : 3 Aug 14, 2011 1:42:03 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed DeletedRowKey for Entity: event rows obtained : 0 Aug 14, 2011 1:42:03 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed parentDeltaQuery for Entity: event Aug 14, 2011 1:42:03 AM org.apache.solr.handler.dataimport.DocBuilder doDelta INFO: Delta Import completed successfully Aug 14, 2011 1:42:03 AM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {} 0 0 Aug 14, 2011 1:42:03 AM org.apache.solr.handler.dataimport.DocBuilder execute INFO: Time taken = 0:0:1.282 On Sun, Aug 14, 2011 at 1:39 AM, Alexandre Sompheng wrote: > Hi Mark, > > I guess the "commit=true" when doing a "delta-import" is the solution for > the JIRA I just submit SOLR-2711. > Can you explain to me where you configured this info commit=true ? > > thanks, > Alex > > > On Thu, Jul 7, 2011 at 6:44 PM, Mark juszczec wrote: > >> First thanks for all the help. >> >> I think the problem was a combination of not having a unique key defined >> AND >> not including the commit=true parameter in the delta update. >> >> Once I did those things, the delta import left me with a single (updated) >> copy of the record including the changes in the source database. >> >> Do I have write access to the Wiki so I can explicitly state commit=true >> NEEDS to be specified? >> >> Mark >> >> >> On Thu, Jul 7, 2011 at 12:39 PM, Erick Erickson > >wrote: >> >> > I'd restart Solr after changing the schema.xml. The delta import does >> NOT >> > require restart or anything else like that. >> > >> > The fact that two records are displayed is not what I'd expect. But Solr >> > absolutely handles the replace via . So I suspect that you're >> > not actually doing what you expect. A little-known aid for debugging DIH >> > is solr/admin/dataimport.jsp, that might give you some joy. >> > >> > But, to summarize. This should work fine for DIH as far as Solr is >> > concerned >> > assuming that is properly defined. In you query above that >> > returns two documents, can you paste the entire response with &fl=* >> > attached? >> > I'm guessing that the data in your index isn't what you're expecting... >> > >> > Also, you might want to get a copy of Luke and examine your index, >> there's >> > a >> > wealth of infomration >> > >> > >> > Best >> > Erick >> > >> > >> > On Thu, Jul 7, 2011 at 11:12 AM, Mark juszczec > > >> > wrote: >> > > Erick >> > > >> > > I used to, but now I find I must have commented it out in a fit of >> rage >> > ;-) >> > > >> > > This could be the whole problem. >> > > >> > > I have verified via admin schema browser that the field is ORDER_ID >> and >> > will >> > > double check I refer to it in upper case in the appropriate places in >> the >> > > Solr config scheme. >> > > >> > > Curiously, the admin schema browser display for ORDER_ID says >> > "hasDeletions: >> > > false" - which seems the opposite of what I want. I want to be able >> to >> > > delete duplicates. Or am I interpreting this field wrong? >> > > >> > > In order to check for duplicates,
Re: get update record from database using DIH
Hi guys, i try the delta import, i got logs saying that it found delta data to update. But it seems that the index is not updated. Amy guess why this happens ? Did i miss something? I'm on solr 3.3 with no patch. Thanks On 18 août 2011, at 18:10, Dali wrote: Hello Take a look at the delta import example : http://wiki.apache.org/solr/DataImportHandler http://wiki.apache.org/solr/DataImportHandler Regards -- View this message in context: http://lucene.472066.n3.nabble.com/get-update-record-from-database-using-DIH-tp3264126p3264393.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: get update record from database using DIH
Actually I requested .../dataimport?command=delta-import&commit=true And DIH in delta-import mode does not commit, you can se log below. My index is quite empty, maybe 10 data rows max... It's just the beginning. INFO: Starting Delta Import Aug 14, 2011 1:42:02 AM org.apache.solr.core.SolrCore execute INFO: [] webapp=/apache-solr-3.3.0 path=/dataimport params={commit=true&command=delta-import} status=0 QTime=0 Aug 14, 2011 1:42:02 AM org.apache.solr.handler.dataimport.SolrWriter readIndexerProperties INFO: Read dataimport.properties Aug 14, 2011 1:42:02 AM org.apache.solr.handler.dataimport.DocBuilder doDelta INFO: Starting delta collection. Aug 14, 2011 1:42:02 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Running ModifiedRowKey() for Entity: event Aug 14, 2011 1:42:02 AM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Creating a connection for entity event with URL: jdbc:mysql:// 85.168.123.207:3306/AGENDA Aug 14, 2011 1:42:03 AM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Time taken for getConnection(): 865 Aug 14, 2011 1:42:03 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed ModifiedRowKey for Entity: event rows obtained : 3 Aug 14, 2011 1:42:03 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed DeletedRowKey for Entity: event rows obtained : 0 Aug 14, 2011 1:42:03 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed parentDeltaQuery for Entity: event Aug 14, 2011 1:42:03 AM org.apache.solr.handler.dataimport.DocBuilder doDelta INFO: Delta Import completed successfully Aug 14, 2011 1:42:03 AM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {} 0 0 Aug 14, 2011 1:42:03 AM org.apache.solr.handler.dataimport.DocBuilder execute INFO: Time taken = 0:0:1.282 On 19 août 2011, at 10:39, Gora Mohanty wrote: On Fri, Aug 19, 2011 at 5:32 AM, Alexandre Sompheng wrote: Hi guys, i try the delta import, i got logs saying that it found delta data to update. But it seems that the index is not updated. Amy guess why this happens ? Did i miss something? I'm on solr 3.3 with no patch. [...] Please show us the following: * The exact URL you loaded for delta-import * The Solr response which shows the delta documents that it found, and the status of the delta-import. If your index is large, and if you are running an optimise after the delta-import (the default is to optimise), it can take some time. Check the status: It will say "busy" if the optimise is still running. Regards, Gora