Re: ledger-autosync: CSV support

Erik Hetzner Sat, 06 Aug 2016 22:20:02 -0700

Hi Martin,

On Sat, 06 Aug 2016 21:16:40 -0700,
Martin Blais <[email protected]> wrote:
>
>
> Storing a checksum for the imported row suffers from the problem that if
> the user does not immediately copy the result of our conversion, it will
> not be imported further, it could get lost.
>
> Beancount cross-checks extracted transactions against the contents of its
> destination ledger, but because the user often massages the transactions it
> has to use heuristics in order to perform an approximate match to determine
> which transactions have already been seen. The heuristic I have in place
> doesn't work too well at the moment (but it could be improved easily to be
> honest).
>
> A better idea would be to store a unique tag computed from the checksum of
> the input row and to cross-check the imported transactions against that
> special tag. That uses both your insight around validating the input
> instead of the resulting transaction, and uses the ledger instead of a
> temporary cache. It's the best of both worlds.


Thanks for the comment. I’m not sure the distinction that you are making here.
What I do, and I admit I only thought it through for a few minutes as I don’t
actually use Mint but just wanted a simple CSV format for examples - is:

1. Take the input key-value pairs for the row, e.g. Date=2016/01/10
2. Sort by key
3. Generate a string from the key-value pairs and calculate the MD5 checksum
4. Check against a metadata value in ledger using the checksum,
   a. If the row has already been imported, do nothing
   b. If the row is new (no match), import it.

Here is an example of a generated ledger transaction:

2016/08/02 Amazon
    ; csvid: mint.a7c028a73d76956453dab634e8e5bdc1
    1234                                      $29.99
    Expenses:Shopping                        -$29.99

As you can see, the csvid metadata field is what we query against using ledger
to see if the transaction is already present.

> Simlarly, Beancount has a powerful but admittedly immature CSV importer
> growing:
> https://bitbucket.org/blais/beancount/src/9f3377eb58fe9ec8cfea8d9e3d56f2446d05592f/src/python/beancount/ingest/importers/csv.py
>
> I've switched to using this and CSV file formats whenever I have them
> available - banking, credit cards, 401k.
>
> I'd like to make a routine to try to auto-detect the columns eventually, at
> the moment, they must be configured when creating the importer
> configuration.

Thanks for the pointer - it does look a lot more flexible than my
implementation.

I decided it what simpler, for my needs, to require a new class for each type of
CSV file. It was too much trouble to try to make it configurable. The core code
handles reading the CSV file, deduplicating, and all of that. The CSV class
simply implements a `convert(row)` method which returns a `Transaction` data
structure. I hope that if others need to import a particular type of CSV file,
e.g. from their bank, they can contribute that back to the project.

best, Erik
--
Sent from my free software system <http://fsf.org/>.

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"Ledger" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: ledger-autosync: CSV support

Reply via email to