On 2020-08-19 07:57, bill d wrote:
> They are text. Specifically, 1 column contains names of moth
> species known from my state. The second column has names of moth
> species known from an adjoining state. Most will be present in both
> states, which doesn't interest me. Some will be listed in one state
> or the other, but not both. This is the data i am trying to
> isolate. While i can do this manually, we are talking some 1,900
> and 1,700 names. Further, this will surely come up again in future
> so I'd like a quicker solution. Many thanks for any suggestions!
I usually use command-line tools on *nix for this. If you have one
file of moth-names in each state:
$ cat local.txt
george
jane
elroy
judy
$ cat adjacent.txt
david
judy
jane
michael
To find the ones where you only have one instance, you might do
$ sort local.txt adjacent.txt | uniq -c | awk '$1 == 1{print $2}'
david
elroy
george
michael
This counts the unique instances of each moth name and then only
prints where the count is 1 ("$1 == 1") meaning the moth-name only
appeared once between the two files.
This falls over if either file can have the same moth more than once.
For that, I'd use a bit of pure awk (which should be available on any
Unix-like, whether Linux, a BSD, or MacOS X)
$ awk '{++a[$1];if (NR == FNR)++loc[$1]; else ++adj[$1]} END {for (n in a) if
(!(n in loc && n in adj)) print n}' local.txt adjacent.txt
That puts all the names in the "a" array, the local ones in the "loc"
array, and the adjacent-state's in the "adj" array, then after
loading them all, goes through all of them to ask which isn't in
both. If you want to run it regularly, you can put it in a script:
$ cat > disjoint.awk
#!/usr/bin/awk -f
{
++a[$1]
if (NR == FNR)
++loc[$1]
else
++adj[$1]
}
END {
for (n in a)
if (!(n in loc && n in adj))
print n
}
then make it executible:
$ chmod +x disjoint.awk
and run it as needed:
$ ./disjoint.awk local.txt adjacent.txt
You can tweak that last condition for whatever stats you want
to run, so if you just want local moths that aren't in the adjacent
state
if (n in loc && ! n in adj)
or just adjacent moths that aren't local
if (n in adj && ! n in loc)
The nice thing about this is that if you keep your historical data in
different files, you don't have to recreate the logic:
$ ./disjoint.awk local2019.txt adjacent2019.txt
$ ./disjoint.awk local2020.txt adjacent2020.txt
Sorry this is a Gnumeric list and I'm providing command-line-utility
solutions, but for doing repeatable tasks with varying input, I find
it easier to script the solution that I can reuse, rather than hoping
I got my expressions right in a spreadsheet. :-)
-tim
_______________________________________________
gnumeric-list mailing list
[email protected]
https://mail.gnome.org/mailman/listinfo/gnumeric-list