Looks like it's not easy to find contemporary texts accompanied by a permissive 
license. The site liberliber.it has many free contemporary books, by the are 
all distributed under the CC-BY-NC-SA, and the NC (Non Commercial) bit is a bit 
problematic here.
To play safe, better use books from project Gutenberg; I would go for romances, 
because they are likely to contain both dialogues and narrative text. Here's a 
list:
http://www.gutenberg.org/wiki/IT_Romanzi_%28Biblioteca%29

They are all a bit oldish, but we can pick something from the beginning
of last century, at least.

A few options:

- Il perduto amore, 1921
  http://www.gutenberg.org/ebooks/41281

- I sogni dell'anarchico, 1922
  http://www.gutenberg.org/cache/epub/25175/pg25175.txt

- I divoratori, 1922
  http://www.gutenberg.org/ebooks/34983.txt.utf-8


In other categories, other suitable books:

- Fuochi di bivacco, 1913
  http://www.gutenberg.org/files/49223/49223-0.txt
  (I like this because it's mostly in the present tense)

- La favorita del Mahdi, 1911
  http://www.gutenberg.org/cache/epub/25180/pg25180.txt
  (by Emilio Salgari!)


Any opinions on which one we should pick?

Whatever the choice, I think I'll give the chosen book a pass with sed
and replace s/egli/lui/, s/ella/lei/, s/de'/dei/, and similar ones.

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to ubuntu-keyboard in Ubuntu.
https://bugs.launchpad.net/bugs/1591149

Title:
  Find more modern text for Italian word prediction

Status in ubuntu-keyboard package in Ubuntu:
  New

Bug description:
  The text used for word prediction in Italian [1] is IMHO not very
  suitable for the goal of predicting words typed into computers
  nowadays (especially on phones), fo a few reasons:

  - it's very old -- from 1868; particles like "cotesto", "pel", "pei" are not 
used anymore
  - it's mostly written in the "passato remoto" past tense, which is not that 
common in modern speech
  - it talks about history, with ample use of long and rarely used words.
  - dialogues are entirely missing

  I think it should be changed with a modern text, not without
  dialogues. Does this have to be a single book, or can we assemble a
  few different texts together (i'm thinking about adding some pieces
  from newspapers, blogs and short novels, mostly)?

  [1] http://bazaar.launchpad.net/~phablet-team/ubuntu-
  keyboard/trunk/view/head:/plugins/it/src/la_francia_dal_primo_impero.txt

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ubuntu-keyboard/+bug/1591149/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to     : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to