Peter van Dijk wrote: > I can reproduce this ‘hang’ (actually it’s a very busy read() loop). A > simpler ‘echo 1,2,3 > test2.csv’ does work, so this is just exposing a bug > in the CSV implementation.
You're right --- it looks like ext/misc/csv.c is broken for one-column CSV tables. I can reproduce the problem with a minimal test.csv of "1\n". I can prevent the problem with a minimal test.csv of "1,\n". I guess that's a separate upstream bug! :-) If I do "select * from t1" instead of "select count(*) from t1", I can see it printing blank lines forever, making the problem much more obvious. It handles a 256MB-ish 3-column CSV quite well, though; it can do simple queries in 4s to 7s. $ seq 33333333 | paste -sd ,,\\n >test.csv $ command time sqlite3 <<< $'.load ./csv.so\nCREATE VIRTUAL TABLE temp.t1 USING csv(filename="test.csv"); SELECT count(*) FROM t1;' 11111111 3.59user 0.12system 0:03.71elapsed 99%CPU (0avgtext+0avgdata 3484maxresident)k 0inputs+0outputs (0major+272minor)pagefaults 0swaps $ command time sqlite3 <<< $'.load ./csv.so\nCREATE VIRTUAL TABLE temp.t1 USING csv(filename="test.csv", schema="CREATE TABLE xs(x INT, y INT, z INT)"); SELECT count(*) FROM t1;' 11111111 3.77user 0.08system 0:03.85elapsed 99%CPU (0avgtext+0avgdata 3432maxresident)k 0inputs+0outputs (0major+271minor)pagefaults 0swaps $ command time sqlite3 <<< $'.load ./csv.so\nCREATE VIRTUAL TABLE temp.t1 USING csv(filename="test.csv"); SELECT count(*) FROM t1 WHERE c1 = 1;' 4.82user 0.08system 0:04.90elapsed 99%CPU (0avgtext+0avgdata 3428maxresident)k 0inputs+0outputs (0major+272minor)pagefaults 0swaps $ command time sqlite3 <<< $'.load ./csv.so\nCREATE VIRTUAL TABLE temp.t1 USING csv(filename="test.csv"); SELECT count(*) FROM t1 WHERE c1 BETWEEN 111111 AND 111113;' 73 6.47user 0.12system 0:06.59elapsed 100%CPU (0avgtext+0avgdata 3396maxresident)k 0inputs+0outputs (0major+273minor)pagefaults 0swaps By comparison, the same test CSV takes 16s to import, 8s to export, and 0s to 1s to query (with further optimization available). $ command time sqlite3 test.db <<< $'CREATE TABLE xs(x INT, y INT, z INT); \n.mode csv \n.import test.csv xs' 15.92user 0.20system 0:16.13elapsed 99%CPU (0avgtext+0avgdata 5296maxresident)k 0inputs+0outputs (0major+686minor)pagefaults 0swaps $ command time sqlite3 test.db -csv 'SELECT * FROM xs;' >test2.csv 7.98user 0.27system 0:08.25elapsed 99%CPU (0avgtext+0avgdata 5028maxresident)k 0inputs+0outputs (0major+674minor)pagefaults 0swaps $ cmp test.csv test2.csv $ ls -hl test.csv test.db -rw-r--r-- 1 root root 276M May 31 00:55 test.csv -rw-r--r-- 1 root root 236M May 31 01:02 test.db $ command time sqlite3 test.db <<< $'SELECT count(*) FROM xs;' 11111111 0.08user 0.09system 0:00.18elapsed 100%CPU (0avgtext+0avgdata 5040maxresident)k 0inputs+0outputs (0major+677minor)pagefaults 0swaps $ command time sqlite3 test.db <<< $'SELECT count(*) FROM xs WHERE x = 1;' 1 0.63user 0.06system 0:00.70elapsed 100%CPU (0avgtext+0avgdata 5056maxresident)k 0inputs+0outputs (0major+676minor)pagefaults 0swaps $ command time sqlite3 test.db <<< $'SELECT count(*) FROM xs WHERE x BETWEEN 111111 AND 111113;' 1 0.70user 0.07system 0:00.78elapsed 99%CPU (0avgtext+0avgdata 5108maxresident)k 0inputs+0outputs (0major+677minor)pagefaults 0swaps So if your table's canonical version is stuck in CSV format (for legacy reasons), importing it pays off iff you're going to be doing many query before the CSV changes and you have to re-import it again. PS: without slashes, ".load csv" searches LDPATH rather than PWD: $ strace -e trace=file sqlite3 <<< '.load csv' |& cut -d\" -f2 | grep csv.so /lib/x86_64-linux-gnu/csv.so /usr/lib/x86_64-linux-gnu/csv.so /lib/csv.so /usr/lib/csv.so $ strace -e trace=file sqlite3 <<< '.load test/csv' |& cut -d\" -f2 | grep csv.so test/csv.so So these can be installed in, like, /lib/x86_64-linux-gnu/libsqlite3-ext-misc-csv.so but not /lib/x86_64-linux-gnu/libsqlite3/ext/misc/csv.so That sqliteodbc package I mentioned uses /lib/x86_64-linux-gnu/libsqlite3_mod_csvtable-0.9995.so László, based on the sqliteodbc version numbers, I think "it hasn't even gotten a version 1.0 yet" is a deliberate versioning scheme by the author, not a reflection on its code quality: http://www.ch-werner.de/sqliteodbc/ The latest release is from Feb 2018 and speaks of SQLite 3.22.0; the previous release (in Debian) is from Dec 2016 an speaks of SQLite 3.15.2. The sqliteodbc version has no documentation, but seems to work roughly the same way, except that all the arguments are positional instead of key=value. $ seq 10000000|paste -sd ,\\n >test.csv $ sqlite3 <<< $'.load ./csv\nCREATE VIRTUAL TABLE t USING csv(filename="test.csv", schema="CREATE TABLE t(x INT, y INT)");SELECT sum(x) FROM t WHERE x BETWEEN 10 AND 100;' 2475 $ sqlite3 <<< $'.load libsqlite3_mod_csvtable\nCREATE VIRTUAL TABLE t USING csvtable("test.csv");\n.header on\nSELECT * FROM t LIMIT 3;' column_1 1.2 3.4 5.6 It seems to treat the comma as a LC_NUMERIC=de_DE decimal_point="," separator --- whoops! Adding more columns fixes THAT, at least. $ seq 10000000|paste -sd ,,,\\n >test.csv $ sqlite3 <<< $'.load libsqlite3_mod_csvtable\nCREATE VIRTUAL TABLE t USING csvtable("test.csv");\n.header on\nSELECT * FROM t LIMIT 3;' column_1|column_2|column_3|column_4 1|2|3|4 5|6|7|8 9|10|11|12