Hi everyone,

  I've just released chemfp 5.1, my package for working with cheminformatics 
fingerprints.

If you are on a Linux-based OS you can install it with:

  python -m pip install chemfp -i https://chemfp.com/packages/

(Some license restrictions apply. See https://chemfp.com/license/ or contact me 
for a no-cost academic license key.)

To understand what's in the release I need to explain what "superimposed" 
means. The short version is, if you are interested in count fingerprints and 
the count Tanimoto, but only have tools which work with binary fingerprints and 
the binary Tanimoto, then you can use the "superimposed" method to convert the 
count fingerprints to binary, and approximate the count Tanimoto with binary 
Tanimoto.

The longer version is, RDKit generates both binary fingerprints (a sequence of 
0s and 1s) and count fingerprints (a sequence of non-negative counts). In many 
cases, like Morgan fingerprints, the count fingerprint is turned into a binary 
fingerprint by taking the index of each non-zero count, modulo the binary 
fingerprint length, to set the corresponding bit to on.

In some cases, like the atom pair fingerprints, RDKit's default is to use 
"count simulation". This uses a different method to convert the count 
fingerprint into a binary fingerprint, and one which is better at retaining the 
original count Tanimoto.

Chemfp 5.0 added initial support for count fingerprints, and proposed a new 
"superimposed" method for count simulation. Last fall and early this year I 
validated the method, and showed the superimposed coding from count 
fingerprints to binary fingerprints resulted in higher quality near-neighbor 
searches. See https://chemrxiv.org/doi/full/10.26434/chemrxiv-2026-j3hbj/v2 for 
full details.

This means you can get most of the advantages of count fingerprints while using 
existing binary fingerprint tools.

In last fall's 5.0 release I added a way to generate the RDKit Morgan, path, 
atom pairs, and torsion count fingerprints, along with a tool to convert them 
to superimposed binary fingerprints.

With 5.1 this ability is available directly, such as by using 
"--countSimulation superimposed" with the command-line tool rdkit2fps.

I also added three new fingerprint types:

  - the EState atom counts from the Hall and Kier, based on RDKit's EState 
SMARTS definitions
     - available in both count and superimposed binary forms

  - the RDKit implementation of the Gobbi and Poppinger 2D pharmacophore count 
fingerprints

  - LINGO nmer generation from SMILES
     - available in both count and superimposed binary forms

I've also made the count fingerprint API available for public use, with a 
warning that it's still a bit experimental and subject to change.

For more information about this release see 

  https://chemfp.com/docs/whats_new_in_51.html

Best regards,

                                Andrew
                                [email protected]





_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to