Bug#900277: Enable csv (like json1 is already)

Trent W. Buck Tue, 29 May 2018 21:21:21 -0700

László Böszörményi (GCS) wrote:
> On Mon, May 28, 2018 at 1:33 PM Trent W. Buck <trentb...@gmail.com> wrote:
>> Per /usr/share/doc/sqlite3-doc/csv.html & https://sqlite.org/csv.html
>>
>>   sqlite> CREATE VIRTUAL TABLE temp.t1 USING csv(filename='thefile.csv');
>>   Error: no such module: csv
>>
>> ⋮
>>
>> If it's easy to do, can you please build the CSV module
>> (and any other similar first-party extensions)?
>
>    Again, don't need to load any module. What's wrong with the following?
>
> $ cat test.csv
> a,b
> 1,2
> 3,4
> 5,6
> 7,8
>
> $ sqlite3
> SQLite version 3.23.1
> sqlite> .mode csv
> sqlite> .import test.csv test_table
> sqlite> .schema test_table
> CREATE TABLE test_table(
>     "a" TEXT,
>     "b" TEXT
> );
> sqlite> select * from test_table;
> 1,2
> 3,4
> 5,6
> 7,8


That reads the test.csv into a native sqlite table.
The csv module leaves it as an external table.

Reading into a native table (as you suggest) is definitely safer, and probably 
queries are much faster, so
I agree it's USUALLY the right answer.

My actual use case is to migrate a mess of bash and CSV to python and sqlite, 
but
I want to do it slowly, which means keeping the old CSV files working.
The workload is read-mostly and a small percentage of appends, so
using CSV as an external table looks/looked like a good bet.

A quick proof-of-concept test using the .import method you suggested
shows about 1m to round-trip the database from CSV to sqlite3 (45s)
and back to CSV (22s).  That's the main thing I'm hoping to avoid.



Anyway, I did say "if it's easy to do", and it looks like the upstream
configure.ac does *not* make this easy.

For some reason upstream provides --enable options for only SOME
modules, e.g. json1 and rtree, but not interesting-looking things like

  csv         # read-only  CSV virtual tables
  zip         # read-write ZIP virtual tables (each row is one file in the .zip)
  icu         # Unicode-aware LIKE, lower(), upper(), &c
  lsm1        # key/value storage backend (a la BDB)
  eval        # SQL function eval()
  percentile  # SQL aggregate function for 
https://en.wikipedia.org/wiki/Percentile
  regexp      # SQL function regexp(re, str), for e.g. SELECT WHERE
  scrub       # Like VACUUM but stop-and-copy(?); optionally zeroes unused 
blocks, like ext4's zerofree.
  series      # SQL function equivalent of GNU coreutils's seq(1)
  shathree    # SQL function for SHA-3 Keccak
  spellfix    # http://www.sqlite.org/spellfix1.html
  totype      # Coercion a la postgres ::INT and ::FLOAT
  unionvtab   # virtual table merging e.g. students.db and teachers.db tables
  vfsstat     # monitoring for questions like "why is my database so slow?"

For someone just using regular Debian /usr/bin/sqlite3 (or libsqlite3 via 
python3),
it's not at all obvious how to actually get any of these features.



Following http://sqlite.org/loadext.html, this doesn't error, but does seem to 
hang:

    (BUILDROOT:BUSTER)root@zygon:/tmp/sqlite3# gcc -g -fPIC -shared 
ext/misc/csv.c -o ext/misc/csv.so
    (BUILDROOT:BUSTER)root@zygon:/tmp/sqlite3# seq 100 >test.csv
    (BUILDROOT:BUSTER)root@zygon:/tmp/sqlite3# sqlite3
    SQLite version 3.23.1 2018-04-10 17:39:29
    Enter ".help" for usage hints.
    Connected to a transient in-memory database.
    Use ".open FILENAME" to reopen on a persistent database.
    sqlite> CREATE VIRTUAL TABLE temp.t1 USING csv(filename='test.csv');
    Error: no such module: csv
    sqlite> .load ext/misc/csv.so
    sqlite> CREATE VIRTUAL TABLE temp.t1 USING csv(filename='test.csv');
    sqlite> .schema
    CREATE VIRTUAL TABLE temp.t1 USING csv(filename='test.csv')
    /* temp.t1(c0) */;
    sqlite> select count(*) from temp.t1;
    [hangs]

Probably because RFC-compliant CSV files need CRLF not LF, so…

    (BUILDROOT:BUSTER)root@zygon:/tmp/sqlite3# unix2dos test.csv
    unix2dos: converting file test.csv to DOS format...

    [same behaviour from sqlite3]

The sha1 module *DOES* work when tested in this way:

    (BUILDROOT:BUSTER)root@zygon:/tmp/sqlite3# gcc -g -fPIC -shared 
ext/misc/sha1.c -o ext/misc/sha1.so
    (BUILDROOT:BUSTER)root@zygon:/tmp/sqlite3# sqlite3 <<< $'.load 
ext/misc/sha1.so\n select sha1("hello world");'
    2aae6c35c94fcfb415dbe95f408b9ce91ee846ed
    (BUILDROOT:BUSTER)root@zygon:/tmp/sqlite3# printf 'hello world' | sha1sum
    2aae6c35c94fcfb415dbe95f408b9ce91ee846ed  -

Of course, I don't want to have to carry a csv.so around with me — I
want Debian to provide it in a standard place, so that when I change
architectures, or there's a security update to the sqlite3 package, I
get the newer csv.so automatically.

Oh, also, that way I get debhelper's standard hardening flags ☺

Bug#900277: Enable csv (like json1 is already)

Reply via email to