On Sun, Aug 7, 2016 at 1:19 AM, Erik Hetzner <[email protected]> wrote: > Hi Martin, > > On Sat, 06 Aug 2016 21:16:40 -0700, > Martin Blais <[email protected]> wrote: > > > > > > Storing a checksum for the imported row suffers from the problem that if > > the user does not immediately copy the result of our conversion, it will > > not be imported further, it could get lost. > > > > Beancount cross-checks extracted transactions against the contents of its > > destination ledger, but because the user often massages the transactions > it > > has to use heuristics in order to perform an approximate match to > determine > > which transactions have already been seen. The heuristic I have in place > > doesn't work too well at the moment (but it could be improved easily to > be > > honest). > > > > A better idea would be to store a unique tag computed from the checksum > of > > the input row and to cross-check the imported transactions against that > > special tag. That uses both your insight around validating the input > > instead of the resulting transaction, and uses the ledger instead of a > > temporary cache. It's the best of both worlds. > > Thanks for the comment. I’m not sure the distinction that you are making > here. > What I do, and I admit I only thought it through for a few minutes as I > don’t > actually use Mint but just wanted a simple CSV format for examples - is: > > 1. Take the input key-value pairs for the row, e.g. Date=2016/01/10 > 2. Sort by key > 3. Generate a string from the key-value pairs and calculate the MD5 > checksum > 4. Check against a metadata value in ledger using the checksum, > a. If the row has already been imported, do nothing > b. If the row is new (no match), import it. > > Here is an example of a generated ledger transaction: > > 2016/08/02 Amazon > ; csvid: mint.a7c028a73d76956453dab634e8e5bdc1 > 1234 $29.99 > Expenses:Shopping -$29.99 > > As you can see, the csvid metadata field is what we query against using > ledger > to see if the transaction is already present. >
This is exactly what I meant (bar the use of metadata over a tag... same thing). I like how you do this. I thought you saved the row checksums somewhere else. Also note that you could also only put the metadata on the last transaction and seek for the latest matching transaction in data order and ignore everything that comes before. > hope that if others need to import a particular type of CSV file, > e.g. from their bank, they can contribute that back to the project. > I tried this for a while (LedgerHub), it's a nice idea, but IMO people don't care to contribute, so I killed the project. There are just too many banks out there and too diverse a population of users for reuse to be effective, and the format of output varies between users and would require heavy customizability. -- --- You received this message because you are subscribed to the Google Groups "Ledger" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
