ITP: cld2 -- Compact Language Detector 2

2015-02-10 Thread Gianfranco Costamagna
Package: wnpp
Severity: wishlist
Owner: Gianfranco Costamagna 

* Package name: cld2
Version : 0.0.0~svn193
Upstream Author : Dick Sites dsi...@google.com 
* URL : https://code.google.com/p/cld2/
* License : Apache-2.0
Programming Lang: C++
Description : Compact Language Detector 2

CLD2 probabilistically detects over 80 languages in Unicode UTF-8 text, either 
plain text or HTML/XML.
Legacy encodings must be converted to valid UTF-8 by the caller. For 
mixed-language input,
CLD2 returns the top three languages found and their approximate percentages of 
the total
text bytes (e.g. 80% English and 20% French out of 1000 bytes of text means 
about 800 bytes
of English and 200 bytes of French). Optionally, it also returns a vector of 
text spans with
the language of each identified. This may be useful for applying different 
spelling-correction
dictionaries or different machine translation requests to each span. The design 
target is web
pages of at least 200 characters (about two sentences); CLD2 is not designed to 
do well on very
short text, lists of proper names, part numbers, etc.

CLD2 is a Naïve Bayesian classifier, using one of three different token 
algorithms. For Unicode
scripts such as Greek and Thai that map one-to-one to detected languages, the 
script defines
the result. For the 80,000+ character Han script and its CJK combination with 
Hiragana,
Katakana, and Hangul scripts, single letters (unigrams) are scored. For all 
other scripts,
sequences of four letters (quadgrams) are scored.

Scoring is done exclusively on lowercased Unicode letters and marks, after 
expanding HTML
entities &xyz; and after deleting digits, punctuation, and . Quadgram 
word beginnings
and endings (indicated here by underscore) are explicitly used, so the word 
_look_ scores
differently from the word-beginning _look or the mid-word look. Quadgram 
single-letter
"words" are completely ignored. For each letter sequence, the scoring uses the 
3-6 most
likely languages and their quantized log probabilities. The training corpus is 
manually
constructed from chosen web pages for each language, then augmented by careful 
automated
scraping of over 100M additional web pages.

Several embellishments improve the basic algorithm: additional scoring of some 
sequences
of two CJK letters or eight other letters; scoring some words and word pairs 
that are
distinctive within sets of statistically-close languages such as {Malay, 
Indonesian}
or {Spanish, Portuguese, Galician}; removing repetitive sequences/words that 
would
otherwise skew the scoring, such as “jpg” in “foo.jpg bar.jpg baz.jpg”; removing
web-specific words that convey almost no language information such as page, 
link,
click, td, tr, copyright, wikipedia, http.

Several hints can be supplied. Because these can be inaccurate on web pages, 
they
are just hints -- they add a bias but do not force a specific language to be the
detection result. The hints include expected language, original document 
encoding,
document URL top-level domain name, and embedded <…lang=xx …> language tags.

The table-driven extraction of letter sequences and table-driven scoring is 
highly optimized
for both space and speed, running about 10x faster than other detectors and 
covering over 70
languages in 1.8MB of x86 code and tables. The main quadgram lookup table 
consists of 256K
four-byte entries, covering about 50 languages. Detection over the average web 
page of 30KB
(half tags/digits/punctuation, half letters) takes roughly 1 msec on a current 
x86 processor.

CLD2 is an update of the prior CLD, adding more languages, updating to Unicode 
6.2 characters,
improving scoring, and adding the optional output vector of labelled language 
spans.

These 83 languages are detected: Afrikaans Albanian Arabic Armenian Azerbaijani 
Basque Belarusian
Bengali Bihari Bulgarian Catalan Cebuano Cherokee Croatian Czech Chinese 
Chinese_T Danish Dhivehi
Dutch English Estonian Finnish French Galician Ganda Georgian German Greek 
Gujarati Haitian_Creole
Hebrew Hindi Hmong Hungarian Icelandic Indonesian Inuktitut Irish Italian 
Javanese Japanese Kannada
Khmer Kinyarwanda Korean Laothian Latvian Limbu Lithuanian Macedonian Malay 
Malayalam Maltese
Marathi Nepali Norwegian Oriya Persian Polish Portuguese Punjabi Romanian 
Russian Scots_Gaelic
Serbian Sinhalese Slovak Slovenian Spanish Swahili Swedish Syriac Tagalog Tamil 
Telugu Thai
Turkish Ukrainian Urdu Vietnamese Welsh Yiddish.


Useful for the upcoming poedit 1.8 release.


--
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: 
https://lists.debian.org/791546596.2444306.1423560498635.javamail.ya...@mail.yahoo.com



Re: Bug#777220: ITP: you-get -- downloader for youtube and number of sites

2015-02-10 Thread Andreas Tille
Hi,

On Fri, Feb 06, 2015 at 04:42:15PM +0100, Jonas Smedegaard wrote:
> 
> I locate these (grouped by core engine, not all CLI-based):
> 
> ytdl, mps-youtube (via python-pafy)
> quvi, nomnom, cclive, mplayer2 and more (via libquvi*)
> groovebasin (via node-ytdl-core)
> youtube-dl, mpv (via youtube-dl)
> smtube
> tribler
> gpodder
> slimrat-nox, slimrat
> fatrat
> get-flash-videos

I'm amazed how much we have to download videos.  I'd really love
a wrapper, say

try-hard-to-download-video

which subsequently tries all these until at least one is successful.  It
would be mind boggling boring to remember all these to try them manually
and I wonder whether there are three further users (except Jonas) who
know them all (and perhaps these are not even all)?

Kind regards

  Andreas.

-- 
http://fam-tille.de


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/20150210132710.gn29...@an3as.eu



Bug#777600: ITP: astlib -- General python tools for astronomy

2015-02-10 Thread Ole Streicher
Package: wnpp
Severity: wishlist
Owner: Ole Streicher 
X-Debbugs-Cc: 
debian-as...@lists.debian.org,debian-devel@lists.debian.org,gijsmolen...@gmail.com

* Package name: astlib
  Version : 0.8.0
  Upstream Author : Matt Hilton
* URL : http://astlib.sourceforge.net/
* License : GPL-2+
  Programming Lang: Python
  Description : General python tools for astronomy
 astLib is a set of Python modules that provides some tools for research
 astronomers. It can be used for astronomical plots, some statistics, common
 calculations, coordinate conversions, and manipulating FITS images with World
 Coordinate System (WCS) information through PyWCSTools - a simple wrapping of
 WCSTools by Jessica Mink. PyWCSTools is distributed (and developed) as part
 of astLib.

The package will be maintained under the hood of debian-astro by Gijs
Molenaar and me. A git repository is setup at

http://anonscm.debian.org/git/debian-astro/packages/astlib.git

Best

Ole


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/54da114f.4060...@debian.org



Bug#777599: ITP: stiff -- convert scientific FITS images to the more popular TIFF format

2015-02-10 Thread Ole Streicher
Package: wnpp
Severity: wishlist
Owner: Ole Streicher 
X-Debbugs-Cc: 
debian-as...@lists.debian.org,debian-devel@lists.debian.org,gijsmolen...@gmail.com

* Package name: stiff
  Version : 2.4.0
  Upstream Author : Emmanuel Bertin
* URL : http://www.astromatic.net/software/stiff
* License : GPL-3
  Programming Lang: C
  Description : convert scientific FITS images to the more popular TIFF 
format
 STIFF is a program that converts scientific FITS images to the more popular
 TIFF format for illustration purposes.

The package will be maintained under the hood of debian-astro by Gijs
Molenaar and me. A git repository is setup at

http://anonscm.debian.org/git/debian-astro/packages/stiff.git

Best

Ole


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/54da1136.6030...@debian.org



Re: Bug#777220: ITP: you-get -- downloader for youtube and number of sites

2015-02-10 Thread Jonas Smedegaard
Quoting Andreas Tille (2015-02-10 14:27:10)
> and I wonder whether there are three further users (except Jonas) who 
> know them all (and perhaps these are not even all)?

I didn't know, just skimmed package descriptions of topmost hits of a 
few searches with axi-cache (in package apt-xapian-index).

So please redirect credit to Enrico and other debtags contributors! :-)


 - Jonas

-- 
 * Jonas Smedegaard - idealist & Internet-arkitekt
 * Tlf.: +45 40843136  Website: http://dr.jones.dk/

 [x] quote me freely  [ ] ask before reusing  [ ] keep private


signature.asc
Description: signature


Bug#777609: ITP: nproc -- process pool implementation for OCaml

2015-02-10 Thread Stéphane Glondu
Package: wnpp
Severity: wishlist
Owner: "Stéphane Glondu" 

* Package name: nproc
  Version : 0.5.1
  Upstream Author : MyLife
* URL : https://github.com/MyLifeLabs/nproc
* License : BSD-3-clause
  Programming Lang: OCaml
  Description : process pool implementation for OCaml

 Nproc is a process pool implementation for OCaml. A process pool is a
 fixed set of processes that perform arbitrary computations for a
 master process, in parallel and without blocking the master. Master
 and workers communicate by message-passing. Nproc relies on fork,
 pipes, Marshal and Lwt.

This package will be maintained in the Debian OCaml Team.


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: 
https://lists.debian.org/20150210165540.28893.83195.report...@wencory.loria.fr



Re: Bug#777609: ITP: nproc -- process pool implementation for OCaml

2015-02-10 Thread Evgeni Golov
Hi,

On 02/10/2015 05:55 PM, Stéphane Glondu wrote:
> * Package name: nproc

not sure this is relevant, but there is /usr/bin/nproc in pkg:coreutils.
This might be confusing for users.

Regards
Evgeni


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/54da4b0f.40...@debian.org



Bug#777617: RFH: phppgadmin

2015-02-10 Thread Christoph Berg
Package: wnpp
Severity: normal

The phppgadmin package is in need of someone maintaining it who's
actually using it as well.

Outstanding issues are the apache 2.4 transition (my objections
against the original patch are mostly irrelevant now), and most
probably some general QA on the package. It should be moved to team
maintenance by the PostgreSQL group as well.

Christoph
-- 
c...@df7cb.de | http://www.df7cb.de/


signature.asc
Description: Digital signature


The future of MariaDB

2015-02-10 Thread David McMackins
In the course of developing a library which heavily relies on
libmysqlclient, I've noticed several issues using MariaDB on Debian
lately. I'm worried about its future.

The latest version of libmariadb in Debian no longer works as a drop-in
replacement for MySQL. The library's name and include path has changed
from mysql to mariadb. While I don't have a problem with someone trying
to use their own name, it means that build scripts relying on
mysql_config and code looking for mysql/mysql.h will break with the new
version. Because of this, I'm considering dropping support in my
software for MariaDB, since they have moved away from their original
purpose.

Can I depend on the future of MySQL in Debian, or will it be phased out
in the foreseeable future?

Happy Hacking,

David E. McMackins II
Associate, Free Software Foundation (#12889)

www.mcmackins.org www.delwink.com
www.gnu.org www.fsf.org


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/54da87ba.5010...@mcmackins.org



Bug#777643: general: possibly, some keyboard layouts should use U+22C5 DOT OPERATOR instead of U+00B7 MIDDLE DOT

2015-02-10 Thread Christoph Anton Mitterer
Package: general
Severity: normal


Hey.

Sorry for reporting against general, but actually I'm not quite
sure which package(s) is/are canonically responsible for the
keyboard mappings in all different places (console, X, wayland)
these days.

Some keyboard layouts (at least the German one) give the
· (U+00B7 MIDDLE DOT) on some combination (here it is AltGr+;).
Close to it (with respect to the location of the combination on
the keyboard) are the characters × (U+00D7 MULTIPLICATION SIGN)
and ÷ (U+00F7 DIVISION SIGN), again on the German keyboard layout.

So I’d conclude that · (U+00B7 MIDDLE DOT) is intended to be used
as a multiplication sign here,... in most non-Anglophone countries
at dot, rahter than the cross (i.e. × (U+00D7 MULTIPLICATION SIGN)),
is used as multiplication sign.

However · (U+00B7 MIDDLE DOT) is not intended to be that
multiplication dot operator, even Unicode itself states:
“for multiplication U+22C5 DOT OPERATOR is preferred”.


So I’d guess that in all such keyboards where · (U+00B7 MIDDLE DOT)
is mapped and intended as multiplication sign, it should be
replaced with ⋅ (U+22C5 DOT OPERATOR).

Cheers,
Chris.


-- System Information:
Debian Release: 8.0
  APT prefers unstable
  APT policy: (500, 'unstable')
Architecture: amd64 (x86_64)

Kernel: Linux 3.16.0-4-amd64 (SMP w/8 CPU cores)
Locale: LANG=en_DE.utf8, LC_CTYPE=en_DE.utf8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: 
https://lists.debian.org/20150211011930.1612.70696.report...@heisenberg.scientia.net



Re: The future of MariaDB

2015-02-10 Thread Clint Byrum
Excerpts from David McMackins's message of 2015-02-10 14:35:38 -0800:
> In the course of developing a library which heavily relies on
> libmysqlclient, I've noticed several issues using MariaDB on Debian
> lately. I'm worried about its future.
> 
> The latest version of libmariadb in Debian no longer works as a drop-in
> replacement for MySQL. The library's name and include path has changed
> from mysql to mariadb. While I don't have a problem with someone trying
> to use their own name, it means that build scripts relying on
> mysql_config and code looking for mysql/mysql.h will break with the new
> version. Because of this, I'm considering dropping support in my
> software for MariaDB, since they have moved away from their original
> purpose.
> 
> Can I depend on the future of MySQL in Debian, or will it be phased out
> in the foreseeable future?
> 

Hi David. First and foremost, as much as the MariaDB team has talked
about remaining a drop-in replacement, both the server and the client
library introduce incompatible features that mean that replacement is a
one-way street. For the server, they have engines and on-disk formats that
differ from MySQL. For their forked libmysqlclient, they add symbols which
don't exist in libmysqlclient, thus a program linked against MariaDB's
libmysqlclient may not function with the original libmysqlclient. For
that reason, we forced it to be renamed to libmariadbclient (upstream
has declined to acknowledge this poisoning of the namespace).

It's best to just treat them as two forks, with forked communities. That
said, there is an umbrella team, pkg-mysql-maint, that works together
to make sure neither one steps on the others' toes. The team also helps
with Percona XtraDB Cluster server which includes Galera support.

As far as their futures in Debian, there is hope for having both. Oracle
has been helpful in assisting Debian and Ubuntu developers in maintaining
MySQL packaging in Debian and Ubuntu. Meanwhile Otto Kekäläinen has
done a fabulous job at maintaining MariaDB. Percona employees have done
their part as well in making sure their tools are included in Debian.

So, my recommendation for your issue is to just build-depend on
libmysqlclient-dev. It's not going anywhere as long as Oracle keeps
showing up to make sure it works. And you'll get binaries that work fine
against mariadb-server or mysql-server or percona-xtradb-cluster-server.


signature.asc
Description: PGP signature


Bug#777643: marked as done (general: possibly, some keyboard layouts should use U+22C5 DOT OPERATOR instead of U+00B7 MIDDLE DOT)

2015-02-10 Thread Debian Bug Tracking System
Your message dated Wed, 11 Feb 2015 05:08:41 +
with message-id <1423631321.2349.178.ca...@decadent.org.uk>
and subject line Re: Bug#777643: general: possibly, some keyboard layouts 
should use U+22C5 DOT OPERATOR instead of U+00B7 MIDDLE DOT
has caused the Debian Bug report #777643,
regarding general: possibly, some keyboard layouts should use U+22C5 DOT 
OPERATOR instead of U+00B7 MIDDLE DOT
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact ow...@bugs.debian.org
immediately.)


-- 
777643: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=777643
Debian Bug Tracking System
Contact ow...@bugs.debian.org with problems
--- Begin Message ---
Package: general
Severity: normal


Hey.

Sorry for reporting against general, but actually I'm not quite
sure which package(s) is/are canonically responsible for the
keyboard mappings in all different places (console, X, wayland)
these days.

Some keyboard layouts (at least the German one) give the
· (U+00B7 MIDDLE DOT) on some combination (here it is AltGr+;).
Close to it (with respect to the location of the combination on
the keyboard) are the characters × (U+00D7 MULTIPLICATION SIGN)
and ÷ (U+00F7 DIVISION SIGN), again on the German keyboard layout.

So I’d conclude that · (U+00B7 MIDDLE DOT) is intended to be used
as a multiplication sign here,... in most non-Anglophone countries
at dot, rahter than the cross (i.e. × (U+00D7 MULTIPLICATION SIGN)),
is used as multiplication sign.

However · (U+00B7 MIDDLE DOT) is not intended to be that
multiplication dot operator, even Unicode itself states:
“for multiplication U+22C5 DOT OPERATOR is preferred”.


So I’d guess that in all such keyboards where · (U+00B7 MIDDLE DOT)
is mapped and intended as multiplication sign, it should be
replaced with ⋅ (U+22C5 DOT OPERATOR).

Cheers,
Chris.


-- System Information:
Debian Release: 8.0
  APT prefers unstable
  APT policy: (500, 'unstable')
Architecture: amd64 (x86_64)

Kernel: Linux 3.16.0-4-amd64 (SMP w/8 CPU cores)
Locale: LANG=en_DE.utf8, LC_CTYPE=en_DE.utf8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)
--- End Message ---
--- Begin Message ---
This is speculation, not a proper bug report.

Ben.

-- 
Ben Hutchings
When in doubt, use brute force. - Ken Thompson


signature.asc
Description: This is a digitally signed message part
--- End Message ---


Bug#777643: general: possibly, some keyboard layouts should use U+22C5 DOT OPERATOR instead of U+00B7 MIDDLE DOT

2015-02-10 Thread Christoph Anton Mitterer
reopen 777643
stop

On Wed, 2015-02-11 at 05:08 +, Ben Hutchings wrote: 
> This is speculation, not a proper bug report.
And is there any reason to name it "speculation" apart from that being
just your personal opinion without any further arguments for it?

It seems to be quite logical that actually the dot multiplication sign
is meant, it's on the same key then the cross multiplcation sign, and in
the group of arithmetic operators.


Your "argument" that it would be "speculation" is more or less the same
as if you say they group of key mappings ¹²³, which are all next to each
other is like this just by accident and not meant to be.


For the above reasons, and since closing a likely valid bug without any
further discussion is heavily impolite, reopening.

Chris.


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/1423632431.4751.38.ca...@gmail.com



Processed: Re: Bug#777643: general: possibly, some keyboard layouts should use U+22C5 DOT OPERATOR instead of U+00B7 MIDDLE DOT

2015-02-10 Thread Debian Bug Tracking System
Processing commands for cont...@bugs.debian.org:

> reopen 777643
Bug #777643 {Done: Ben Hutchings } [general] general: 
possibly, some keyboard layouts should use U+22C5 DOT OPERATOR instead of 
U+00B7 MIDDLE DOT
Bug reopened
Ignoring request to alter fixed versions of bug #777643 to the same values 
previously set
> stop
Stopping processing here.

Please contact me if you need assistance.
-- 
777643: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=777643
Debian Bug Tracking System
Contact ow...@bugs.debian.org with problems


--
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: 
https://lists.debian.org/handler.s.c.142363244223374.transcr...@bugs.debian.org



Re: Bug#777643: general: possibly, some keyboard layouts should use U+22C5 DOT OPERATOR instead of U+00B7 MIDDLE DOT

2015-02-10 Thread Russ Allbery
Christoph Anton Mitterer  writes:

> It seems to be quite logical that actually the dot multiplication sign
> is meant, it's on the same key then the cross multiplcation sign, and in
> the group of arithmetic operators.

Whether that was intended or not, that's not what people actually did when
they made those keyboard layouts.  They did not put the dot multiplication
sign on that key; they put the middle dot symbol on that key.

Those keyboard layouts now exist, and changing something like that is
almost never worth the trouble, for exactly the same reason why we're not
going to remap the default keyboard layout to be Dvorak even if it's
better than QWERTY.

If you want a different keyboard layout, you pretty much need to make a
different keyboard layout, and then convince people to use that instead of
the existing one.  Changing an existing one, even if you think it made a
logical error, seems like a really bad idea.

That's even apart from the fact that diverging from upstream in an area
like this seems like an absolutely awful idea.

I don't think this is an actionable bug report for Debian.  It's an
interesting bit of speculation, and it's arguably a consistency flaw, but
it's not something that makes sense for us to do anything about.

(Based on past history, I suspect the reply to this will be 200 lines
about why I'm wrong, so I'll mention in advance that I'm highly unlikely
to say anything further on this bug report and I'm happy for others to
have the last word.)

-- 
Russ Allbery (r...@debian.org)   


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/87lhk5ueu9@hope.eyrie.org



Bug#777655: ITP: wagon-maven-plugin -- Maven plugin to transfer resources using Maven Wagon

2015-02-10 Thread Tim Potter
Package: wnpp
Severity: wishlist
Owner: Tim Potter 

* Package name: wagon-maven-plugin
  Version : 1.0-beta-3
  Upstream Author : Dan T. Tran, James W. Dumay, Sherali Karimov
* URL : http://mojo.codehaus.org/wagon-maven-plugin/
* License : Apache-2.0
  Programming Lang: Java
  Description : Maven plugin to transfer resources using Maven Wagon

The Wagon Plugin can be used to transfer resources between repositories
using Maven Wagon.

The Wagon Plugin has goals to upload or download files from remote locations,
list the contents of repositories and execute remote commands using SSH.


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/20150205013926.8.83209.reportbug@02ed91797728