Good. We need usecases like these and contributions from users . This is a win-win you will not have to manage the code yourself once it is checked in As we have more eyes on the DIH code it will also improve
Thanks a lot, Noble On Wed, Dec 3, 2008 at 1:49 PM, Marc Sturlese <[EMAIL PROTECTED]> wrote: > > That's what I am trying to do. Thanks for the advice. Once I have it done I > will rise the issue and upload the patch. > > > Noble Paul നോബിള് नोब्ळ् wrote: >> >> OK . I guess I see it. I am thinking of exposing the writes to the >> properties file via an API. >> >> say Context#persist(key,value); >> >> >> This can write the data to the dataimport.properties. >> >> You must be able to retrieve that value by ${dataimport.persist.<key>} >> >> or through an API, Context.getPersistValue(key) >> >> You can raise an issue and give a patch and we can get it committed >> >> I guess this is what you wish to achieve >> >> --Noble >> >> >> >> On Wed, Dec 3, 2008 at 3:28 AM, Marc Sturlese <[EMAIL PROTECTED]> >> wrote: >>> >>> Do you mean the file used by dataimporthandler called >>> dataimport.properties? >>> If you mean this one it's writen at the end of the indexing proccess. The >>> writen date will be used in the next indexation by delta-query to >>> identify >>> the new or modified rows from the database. >>> >>> What I am trying to do is instead of saving a timestamp save the last >>> indexed id. Doing that, in the next execution I will start indexing from >>> the >>> last doc that was indexed in the previous indexation. But I am still a >>> bit >>> confused about how to do that... >>> >>> Noble Paul നോബിള് नोब्ळ् wrote: >>>> >>>> delta-import file? >>>> >>>> >>>> On Wed, Dec 3, 2008 at 12:08 AM, Lance Norskog <[EMAIL PROTECTED]> >>>> wrote: >>>>> Does the DIH delta feature rewrite the delta-import file for each set >>>>> of >>>>> rows? If it does not, that sounds like a bug/enhancement. >>>>> Lance >>>>> >>>>> -----Original Message----- >>>>> From: Noble Paul നോബിള് नोब्ळ् [mailto:[EMAIL PROTECTED] >>>>> Sent: Tuesday, December 02, 2008 8:51 AM >>>>> To: solr-user@lucene.apache.org >>>>> Subject: Re: DataImportHandler: Deleteing from index and db; >>>>> lastIndexed >>>>> id feature >>>>> >>>>> You can write the details to a file using a Transformer itself. >>>>> >>>>> It is wise to stick to the public API as far as possible. We will >>>>> maintain back compat and your code will be usable w/ newer versions. >>>>> >>>>> >>>>> On Tue, Dec 2, 2008 at 5:12 PM, Marc Sturlese <[EMAIL PROTECTED]> >>>>> wrote: >>>>>> >>>>>> Thanks I really apreciate your help. >>>>>> >>>>>> I didn't explain myself so well in here: >>>>>> >>>>>>> 2.-This is probably my most difficult goal. >>>>>>> Deltaimport reads a timestamp from the dataimport.properties and >>>>>>> modify/add all documents from db wich were inserted after that date. >>>>>>> What I want is to be able to save in the field the id of the last >>>>>>> idexed doc. So in the next time I ejecute the indexer make it start >>>>>>> indexing from that last indexed id doc. >>>>>> You can use a Transformer to write something to the DB. >>>>>> Context#getDataSource(String) for each row >>>>>> >>>>>> When I said: >>>>>> >>>>>>> be able to save in the field the id of the last idexed doc >>>>>> I made a mistake, wanted to mean : >>>>>> >>>>>> be able to save in the file (dataimport.properties) the id of the last >>>>>> indexed doc. >>>>>> The point would be to do my own deltaquery indexing from the last doc >>>>>> indexed id instead of the timestamp. >>>>>> So I think this would not work in that case (it's my mistake because >>>>>> of the bad explanation): >>>>>> >>>>>>>You can use a Transformer to write something to the DB. >>>>>>>Context#getDataSource(String) for each row >>>>>> >>>>>> It is because I was saying: >>>>>>> I think I should begin modifying the SolrWriter.java and >>>>>>> DocBuilder.java. >>>>>>> Creating functions like getStartTime, persistStartTime... for ID >>>>>>> control >>>>>> >>>>>> I am in the correct direction? >>>>>> Sorry for my englis and thanks in advance >>>>>> >>>>>> >>>>>> Noble Paul നോബിള് नोब्ळ् wrote: >>>>>>> >>>>>>> On Tue, Dec 2, 2008 at 3:01 PM, Marc Sturlese >>>>>>> <[EMAIL PROTECTED]> >>>>>>> wrote: >>>>>>>> >>>>>>>> Hey there, >>>>>>>> >>>>>>>> I have my dataimporthanlder almost completely configured. I am >>>>>>>> missing three goals. I don't think I can reach them just via xml >>>>>>>> conf or transformer and sqlEntitProcessor plugin. But need to be >>>>>>>> sure of that. >>>>>>>> If there's no other way I will hack some solr source classes, would >>>>>>>> like to know the best way to do that. Once I have it solved, I can >>>>>>>> upload or post the source in the forum in case someone think it can >>>>>>>> be helpful. >>>>>>>> >>>>>>>> 1.- Every time I execute dataimporthandler (to index data from a >>>>>>>> db), at the start time or end time I need to delete some expired >>>>>>>> documents. I have to delete them from the database and from the >>>>>>>> index. I know wich documents must be deleted because of a field in >>>>>>>> the db that says it. Would not like to delete first all from DB or >>>>>>>> first all from index but one from index and one from doc every time. >>>>>>> >>>>>>> You can override the init() destroy() of the SqlEntityProcessor and >>>>>>> use it as the processor for the root entity. At this point you can >>>>>>> run the necessary db queries and solr delete queries . look at >>>>>>> Context#getSolrCore() and Context#getdataSource(String) >>>>>>> >>>>>>> >>>>>>>> The "delete mark" is setted as an update in the db row so I think I >>>>>>>> could use deltaImport. Don't know If deletedPkQuery is the way to do >>>>>>>> that. Can not find so much information about how to make it work. As >>>>>>>> deltaQuery modifies docs (delete old and insert new) I supose it >>>>>>>> must be a easy way to do this just doing the delete and not the new >>>>>>>> insert. >>>>>>> deletedPkQuery does everything first. it runs the query and uses that >>>>>>> to identify the deleted rows. >>>>>>>> >>>>>>>> 2.-This is probably my most difficult goal. >>>>>>>> Deltaimport reads a timestamp from the dataimport.properties and >>>>>>>> modify/add all documents from db wich were inserted after that date. >>>>>>>> What I want is to be able to save in the field the id of the last >>>>>>>> idexed doc. So in the next time I ejecute the indexer make it start >>>>>>>> indexing from that last indexed id doc. >>>>>>> You can use a Transformer to write something to the DB. >>>>>>> Context#getDataSource(String) for each row >>>>>>> >>>>>>>> The point of doing this is that if I do a full import from a db with >>>>>>>> lots of rows the app could encounter a problem in the middle of the >>>>>>>> execution and abort the process. As deltaquey works I would have to >>>>>>>> restart the execution from the begining. Having this new >>>>>>>> functionality I could optimize the index and start from the last >>>>>>>> indexed doc. >>>>>>>> I think I should begin modifying the SolrWriter.java and >>>>>>>> DocBuilder.java. >>>>>>>> Creating functions like getStartTime, persistStartTime... for ID >>>>>>>> control >>>>>>>> >>>>>>>> 3.-I commented before about this last point. I want to give boost to >>>>>>>> doc fields at indexing time. >>>>>>>>>>Adding fieldboost is a planned item. >>>>>>>> >>>>>>>>>>It must work as follows . >>>>>>>>>>Add a special value $fieldBoost.<fieldname> to the row map >>>>>>>> >>>>>>>>>>And DocBuilder should respect that. You can raise a bug and we can >>>>>>>>>>commit it soon. >>>>>>>> How can I do to rise a bug? >>>>>>> https://issues.apache.org/jira/secure/CreateIssue!default.jspa >>>>>>>> >>>>>>>> Thanks in advance >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> View this message in context: >>>>>>>> http://www.nabble.com/DataImportHandler%3A-Deleteing-from-index-and- >>>>>>>> db--lastIndexed-id-feature-tp20788755p20788755.html >>>>>>>> Sent from the Solr - User mailing list archive at Nabble.com. >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> --Noble Paul >>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> View this message in context: >>>>>> http://www.nabble.com/DataImportHandler%3A-Deleteing-from-index-and-db >>>>>> --lastIndexed-id-feature-tp20788755p20790542.html >>>>>> Sent from the Solr - User mailing list archive at Nabble.com. >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> --Noble Paul >>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> --Noble Paul >>>> >>>> >>> >>> -- >>> View this message in context: >>> http://www.nabble.com/DataImportHandler%3A-Deleteing-from-index-and-db--lastIndexed-id-feature-tp20788755p20801932.html >>> Sent from the Solr - User mailing list archive at Nabble.com. >>> >>> >> >> >> >> -- >> --Noble Paul >> >> > > -- > View this message in context: > http://www.nabble.com/DataImportHandler%3A-Deleteing-from-index-and-db--lastIndexed-id-feature-tp20788755p20808620.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- --Noble Paul