Package: wnpp Severity: wishlist Owner: Mo Zhou <lu...@debian.org> * Package name : blingfire Version : git-HEAD Upstream Author : Microsoft * URL : https://github.com/Microsoft/BlingFire * License : MIT Programming Lang: C++, Python, Perl, Batch, etc Description : lightning fast Finite State machine and REgular expression manipulation library
Blingfire provides more than a fast natural language tokenizer. From the benchmarking data its tokenizing speed seems to be much faster than that of SpaCy or NLTK. Unlike NLTK or SpaCy, Blingfire seemingly works without downloaded blobs. This tool might be useful to Enrico[1] as well, and would possibly make him happy[2]. I'll first give it a try and put it to DUPR. And decide whether this should really enter the archive after code inspection. [1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=925294 [2] If we don't think too much about the upstream name.