Re: ledger-autosync: CSV support

Martin Blais Sat, 06 Aug 2016 21:17:14 -0700

On Wed, Aug 3, 2016 at 1:44 AM, Erik Hetzner <[email protected]> wrote:

> Hi all,
>
> I have added some basic CSV support to ledger-autosync. It has not been
> released
> on pypi, but is available currently via source on gitlab and github.
>
> The features works much like the existing OFX file support in
> ledger-autosync.
> ledger-autosync is invoked with a command line argument of a file path, and
> ledger-autosync prints a series of ledger transactions to stdout.
>
> The advantage ledger-autosync has over other CSV to ledger converters is
> the
> deduplication features. For CSV files which include a unique ID per row,
> the
> transactions will be deduplicated similarly to the existing feature for OFX
> files. For CSV formats that do not include a unique per row, an MD5 sum
> will be
> generated from the row content which will be used to deduplicate.
>


Storing a checksum for the imported row suffers from the problem that if
the user does not immediately copy the result of our conversion, it will
not be imported further, it could get lost.

Beancount cross-checks extracted transactions against the contents of its
destination ledger, but because the user often massages the transactions it
has to use heuristics in order to perform an approximate match to determine
which transactions have already been seen. The heuristic I have in place
doesn't work too well at the moment (but it could be improved easily to be
honest).

A better idea would be to store a unique tag computed from the checksum of
the input row and to cross-check the imported transactions against that
special tag. That uses both your insight around validating the input
instead of the resulting transaction, and uses the ledger instead of a
temporary cache. It's the best of both worlds.





>
> For the moment, I have only added support for Paypal, Amazon and Mint CSV
> formats, as this is all I had available. However, it should be easy enough
> for
> developers to add support for new CSV formats. Examples of converters can
> be found here:
>
>   https://gitlab.com/egh/ledger-autosync/blob/master/
> ledgerautosync/converter.py#L316
>
> I hope that this proves helpful! Feedback welcome. The source is available
> at
> gitlab and github:
>
>   https://{gitlab,github}.com/egh/ledger-autosync
>
> I have also recently been working on the 401k and investment features of
> ledger-autosync, which are not pretty robust, if largely undocumented.
> Anyone
> who is not currently tracking their 401k via ledger is encouraged to give
> it a
> try!
>

Simlarly, Beancount has a powerful but admittedly immature CSV importer
growing:
https://bitbucket.org/blais/beancount/src/9f3377eb58fe9ec8cfea8d9e3d56f2446d05592f/src/python/beancount/ingest/importers/csv.py

I've switched to using this and CSV file formats whenever I have them
available - banking, credit cards, 401k.

I'd like to make a routine to try to auto-detect the columns eventually, at
the moment, they must be configured when creating the importer
configuration.

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"Ledger" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: ledger-autosync: CSV support

Reply via email to