Control: tags -1 + patch pending

On Wed, 2 Apr 2025 10:05:52 +0300 Andrius Merkys <mer...@debian.org> wrote:
I finally managed to isolate the difference in cdhit output which causes segfaults in provean. It seems that cdhit >= 4.8.1-4 replaced full FASTA headers in its output with partial IDs:

diff -r /home/andrius/provean/good/cdhit.cluster /home/andrius/provean/bad/cdhit.cluster
1c1
< >gi|119610548|gb|EAW90142.1| tumor protein p53 (Li-Fraumeni syndrome), isoform CRA_c
---
> >EAW90142.1 tumor protein p53 (Li-Fraumeni syndrome), isoform CRA_c [Homo sapiens]

I need to look deeper if cdhit could be persuaded to use the old output format. If not, provean will have to be adjusted to the change.

I was wrong, it is blastdbcmd which has changed its default format to not replicate the full input FASTA header. I managed to successfully patch the code to explicitly set the requested output format.

It would be nice to add an autopkgtest to prevent regressions, but the input database is ~12GB (and it seems that only one from [1] works).

Andrius

[1] ftp://ftp.jcvi.org/data/provean/nr_Aug_2011/

Reply via email to