Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"

2019-05-24 Thread Mo Zhou
Hi Andy,

On 2019-05-23 17:52, Andy Simpkins wrote:
> Sam.
> Whilst i agree that "assets" in some packages may not have sources
> with them and the application may still be in main if it pulls in
> those assets from contrib or non free. 
> I am trying to suggest the same thing here. If the data set is unknown
> this is the *same* as a dependancy on a random binary blob (music /
> fonts / game levels / textures etc) and we wouldn't put that in main. 

The "ToxicCandy Model" is used to cover a special case. Both
"ToxicCandy"
and "Non-free" model cannot enter our main section, as stated by
DL-Policy #1 from the beginning.

> It is my belief that we consider training data sets as 'source' in
> much the same way

We can interpret training data as sort of "source" indeed. But some
times we even have trouble with free "source". Wikipedia dump is
a frequently used free corpus in the computational linguistics
field. Do we really want to upload the wikipedia dump to the
archive when some Free Model to be packaged is trained from it?

Wikipedia dump is so giant that challenges our .deb format
(see recent threads).

See (Difficulties -- Dataset Size):
https://salsa.debian.org/lumin/deeplearning-policy#difficulties-questions-not-easy-to-answer



Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"

2019-05-24 Thread Mo Zhou
On 2019-05-23 17:58, Sam Hartman wrote:
> So for deep learning models we would require that they be retrainable
> and typically require that we have retrained them.

The two difficulties make the above point not easy to achieve:
https://salsa.debian.org/lumin/deeplearning-policy#difficulties-questions-not-easy-to-answer



Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"

2019-05-24 Thread Paul Wise
On Fri, May 24, 2019 at 1:58 AM Sam Hartman wrote:

> So for deep learning models we would require that they be retrainable
> and typically require that we have retrained them.

I don't think it is currently feasible for Debian to retrain the
models. I don't think we have any buildds with GPUs yet. I don't know
about the driver situation but for example I doubt any deep learning
folks using the nvidia hardware mentioned in deeplearning-policy are
using the libre nouveau drivers. The driver situation for TPUs might
be better though? Either way I think a cross-community effort for
retraining and reproducibility of models would be better than Debian
having to do any retraining.

-- 
bye,
pabs

https://wiki.debian.org/PaulWise



Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"

2019-05-24 Thread Mo Zhou
On 2019-05-24 15:59, Paul Wise wrote:
> On Fri, May 24, 2019 at 1:58 AM Sam Hartman wrote:
> 
>> So for deep learning models we would require that they be retrainable
>> and typically require that we have retrained them.
> 
> I don't think it is currently feasible for Debian to retrain the
> models.

Infeasible, for sure.

> I don't think we have any buildds with GPUs yet.

Non-free nvidia driver is inevitable.
AMD GPUs and OpenCL are not sane choices.

> I don't know
> about the driver situation but for example I doubt any deep learning
> folks using the nvidia hardware mentioned in deeplearning-policy are
> using the libre nouveau drivers.

Don't doubt. Nouveau can never support CUDA well.
Unless someday nvidia rethought about everything.

Some good Xeon CPUs can train models as well,
and a well optimized linear algebra library
helps a lot (e.g. MKL, OpenBLAS). But generally
CPU training takes at least 10x longer time to
finish. (except some toy networks)

> The driver situation for TPUs might
> be better though?

IDK any software detail about TPU..

> Either way I think a cross-community effort for
> retraining and reproducibility of models would be better than Debian
> having to do any retraining.

Sounds like a good way to go. But not today.
Let's do lazy execution at this point, and
see how this subject evolves and how other
FOSS communities think.



Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"

2019-05-24 Thread Adam Borowski
On Thu, May 23, 2019 at 11:37:41PM -0700, Mo Zhou wrote:
> - The datasets used for training a "ToxicCandy" may be
>   private/non-free and not everybody can access them. (This case is more
>   likely a result of problematic upstream licensing, but it sometimes
> happens).
> 
>   One got a free model from internet. That little candy tastes sweet.
>   One wanted to make this candy at home with the provided recipe, but
>   surprisingly found out that non-free ingredients are inevitable.
> -- ToxicCandy

I'm not so sure this model would be unacceptable.  It's no different than
a game's image being a photo of a tree in your garden -- not reproducible by
anyone but you (or someone you invite).  Or, a wordlist frequency produced
by analyzing results of a google search.

At some point, the work becomes an entity on its own rather than the result
of processing some dataset.

A more ridiculous argument: the input is a project requirement sheet, the
neural network being four pieces of wetware, working for 3 months.  Do you
insist on _this_ being reproducible, or would you accept the product as free
software?  Sufficiently advanced artificial intelligence might be not that
different.


喵!
-- 
⢀⣴⠾⠻⢶⣦⠀ Latin:   meow 4 characters, 4 columns,  4 bytes
⣾⠁⢠⠒⠀⣿⡁ Greek:   μεου 4 characters, 4 columns,  8 bytes
⢿⡄⠘⠷⠚⠋  Runes:   ᛗᛖᛟᚹ 4 characters, 4 columns, 12 bytes
⠈⠳⣄ Chinese: 喵   1 character,  2 columns,  3 bytes <-- best!



Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"

2019-05-24 Thread Paul Wise
On Fri, 2019-05-24 at 03:14 -0700, Mo Zhou wrote:

> Non-free nvidia driver is inevitable.
> AMD GPUs and OpenCL are not sane choices.

So no model which cannot be CPU-trained is suitable for Debian main.

> Don't doubt. Nouveau can never support CUDA well.

There is coriander but nouveau doesn't support OpenCL 1.2 yet.

https://github.com/hughperkins/coriander

> Some good Xeon CPUs can train models as well,
> and a well optimized linear algebra library
> helps a lot (e.g. MKL, OpenBLAS). But generally
> CPU training takes at least 10x longer time to
> finish. (except some toy networks)

So only toy networks can enter Debian main?

> Sounds like a good way to go. But not today.
> Let's do lazy execution at this point, and
> see how this subject evolves and how other
> FOSS communities think.

Agreed, that sounds reasonable, similar to how repro builds went.

-- 
bye,
pabs

https://wiki.debian.org/PaulWise



signature.asc
Description: This is a digitally signed message part


RE:Bits from /me: A humble draft policy on "deep learning v.s. freedom"

2019-05-24 Thread PICCA Frederic-Emmanuel
What about ibm power9 with pocl ?

it seems that this is better than the latest NVIDIA GPU.

Cheers


Re: NMUs: Do we want to Require or Recommend DH

2019-05-24 Thread Lucas Nussbaum
Hi,

On 14/05/19 at 14:30 -0400, Sam Hartman wrote:
> I think there's a fairly clear consensus emerging that it's worth having
> things to check when making a build system conversion.  Looking at
> debdiff, ditherscope and reproducibility of the build all appear to be
> important things to consider in such a case.
> 
> So, I think there is an emerging consensus against the idea of people
> NMUing a package simply to convert it to dh.
> 
> First, I'd like to explicitly call for any last comments from people who would
> like to see us permit NMUs simply to move packages toward dh.  Are there
> any cases in which such an NMU should be permitted?

Our NMU policy (Sec 5.11.1 of developers-reference[1]) tries hard to
give some standards of when and how it's acceptable to do an NMU. It is
complex, but in the end, I think that it boils down to:
  NMUs are always permitted, but discouraged in some (many?) cases, and
  extensive use of the DELAYED queue is recommended.

It also explicitely discourages NMUs for packaging style changes:
> Fixing cosmetic issues or changing the packaging style (e.g. switching
> from cdbs to dh) in NMUs is discouraged.

Do you want to change this and explicitely forbid NMUs for converting to
dh? I think that the current policy is quite balanced (but I'm biaised
since I contributed to its adoption a long time ago :) ). I also think
that we should trust the judgement of DDs, and that completely
forbidding some changes via NMUs would be a regression compared to the
current policy.

- Lucas

[1] https://www.debian.org/doc/manuals/developers-reference/ch05.en.html#nmu



Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"

2019-05-24 Thread Sam Hartman
> "Paul" == Paul Wise  writes:

Paul> On Fri, May 24, 2019 at 1:58 AM Sam Hartman wrote:
>> So for deep learning models we would require that they be
>> retrainable and typically require that we have retrained them.

Paul> I don't think it is currently feasible for Debian to retrain
Paul> the models. I don't think we have any buildds with GPUs yet. I
Paul> don't know about the driver situation but for example I doubt
Paul> any deep learning folks using the nvidia hardware mentioned in
Paul> deeplearning-policy are using the libre nouveau drivers. The
Paul> driver situation for TPUs might be better though? Either way I
Paul> think a cross-community effort for retraining and
Paul> reproducibility of models would be better than Debian having
Paul> to do any retraining.

I wonder whether we'd accept a developer's assertion that some large pdf
in a source package could be rebuilt without actually rebuilding it  on
every upload.
I think we probably would.

I think something similar might be acceptable here.



Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"

2019-05-24 Thread Holger Levsen
On Fri, May 24, 2019 at 10:43:34AM -0400, Sam Hartman wrote:
> I wonder whether we'd accept a developer's assertion that some large pdf
> in a source package could be rebuilt without actually rebuilding it  on
> every upload.
> I think we probably would.

I dont think so, actually and AFAIK, we don't accept this and we treat
such bugs as serious. (though quite very probably those bugs might be
tagged buster-ignore right now.)


-- 
tschau,
Holger

---
   holger@(debian|reproducible-builds|layer-acht).org
   PGP fingerprint: B8BF 5413 7B09 D35C F026 FE9D 091A B856 069A AA1C


signature.asc
Description: PGP signature


Re: NMUs: Do we want to Require or Recommend DH

2019-05-24 Thread Sean Whitton
Hello,

On Fri 24 May 2019 at 04:01PM +02, Lucas Nussbaum wrote:

> Hi,
>
> On 14/05/19 at 14:30 -0400, Sam Hartman wrote:
>> I think there's a fairly clear consensus emerging that it's worth having
>> things to check when making a build system conversion.  Looking at
>> debdiff, ditherscope and reproducibility of the build all appear to be
>> important things to consider in such a case.
>>
>> So, I think there is an emerging consensus against the idea of people
>> NMUing a package simply to convert it to dh.
>>
>> First, I'd like to explicitly call for any last comments from people who 
>> would
>> like to see us permit NMUs simply to move packages toward dh.  Are there
>> any cases in which such an NMU should be permitted?
>
> Our NMU policy (Sec 5.11.1 of developers-reference[1]) tries hard to

Note that nothing in dev-ref is binding on developers, so I think it's a
bit misleading to use the term 'policy'.  All of dev-ref is guidelines.

Otherwise, I think your summary of what dev-ref says about NMUs is
correct.

-- 
Sean Whitton


signature.asc
Description: PGP signature


Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"

2019-05-24 Thread Paul Wise
On Fri, 2019-05-24 at 10:43 -0400, Sam Hartman wrote:

> I wonder whether we'd accept a developer's assertion that some large pdf
> in a source package could be rebuilt without actually rebuilding it  on
> every upload.

As I understand it, ftp-master policy is that things in main be
buildable from source using only tools in main, not that everything in
main is actually built from source at `debian/rules build` time.

There are plenty of things in the archive that we do not build from
source on the buildds, firmware-linux-free for example.

Obviously the best way to prove things are buildable from source is to
actually build from source and do it as often as possible.

Personally I'd like:

 * A standard build profile used when building everything from source.
 * A way to tell debian/rules to build everything from source.
 * A build toolchain option to make use of these.
 * A requirement that things not built from source come in a separate
   component tarball of the source package, using the multi-tarball
   feature of the v3 Debian source package format.
 * More upstream separation of build products from source.

> I think we probably would.

Personally I do not think it would be acceptable to not build large
PDFs from source. I doubt the PDF build process could be problematic
enough that we couldn't do it on current buildds.

-- 
bye,
pabs

https://wiki.debian.org/PaulWise



signature.asc
Description: This is a digitally signed message part


Bug#929504: ITP: odysseus-web-browser -- A web browser focusing on decentralized discovery

2019-05-24 Thread Adrian Lyall Cochrane
Package: wnpp
Severity: wishlist
Owner: Adrian Lyall Cochrane 

* Package name: odysseus-web-browser
  Version : 1.5.17
  Upstream Author : Adrian Cochrane 
* URL : http://odysseus.adrian.geek.nz/
* License : GPL
  Programming Lang: Vala
  Description : A web browser focusing on decentralized discovery

A simple and performant yet powerful window onto the open decentralized web

Odysseus is a convenient and privacy-respecting web browser, that increasingly,
gently, and unobtrusively guides you wherever you want to go online.

Through this well thought out simplicity Odysseus lets you focus on the
webpages that matter to you.

High-level features:

* Tabbed web browsing
* Find-in-page
* Downloads
* Opens non-webpage links in 3rd party apps, or suggests ones to install
* DuckDuckGo integration
* Browser history
* Topsites with initial hand-curated recommendations
These features are completed to a high degree of polish.

---

I created my own web browser in order to experiment with relying on
decentralized protocols like webfeeds, SKOS, OpenSearch, and links for
discovery webpage rather than centralized websites. I have heard significant
interest from people online to have it packaged for Debian-based systems.

As such I'm keen to manage a Debian package for my work.