Hi,

We are using Cassandra at Orange to manage a big sparse matrix on a cluster of 
servers.

On this database we want to run a sparse matrix factorization algorithm.



We need to parallelize this matrix factorization algorithm, for instance by 
computing the factorization model rows by rows.

So we want to distribute the computation of the rows on each server.

A natural way to do this would be to apply the algorithm on each server, using 
the local rows that are stored by this server.

As the factorization model is also distributed, there is no need to merge the 
results (no need to a kind of "reduce phase").

So there is no need of Hadoop.

Cassandra and the distributed algorithm on each server could be sufficient.



The problem is that the access to local data is currently not easy with the 
Cassandra API:

- There is a token() function allowing to iterate on local rows.

- but this token function works well only with the one-token-per-server 
partition scheme of Cassandra; with the 256-virtual-token partition scheme, it 
becomes very difficult to access efficiently to local rows

- Unfortunately it seems that the one-token-per-server partition scheme is not 
recommended, and may be it could become deprecated, as the later scheme is more 
efficient for cluster managements.



We believe that the easy access to local data could be a key feature for 
Cassandra to offer implicit parallelization strategies for many classes of 
algorithms and classical process.

To ensure this key feature, it is just necessary to provide an easy, 
transparent and sustainable function to access local data (local tables). This 
function will just have to be compliant with future partition schemes.



Do you think this request may be a priority to Cassandra?

If so, when and how do you plan to provide this feature?, so we could adapt our 
developments?



Many thanks for considering my request,

Best Regards,



Frank Meyer.

Research Engineer

Orange Labs - Lannion


Frank Meyer
France Telecom OLPS/UCE/CRM-DA/PROF (LD128)
2 avenue Pierre Marzin 22307 Lannion Cedex
E-mail : franck.me...@orange.com<mailto:franck.me...@orange-ftgroup.com>
Telephone : +33 (0)2 96 05 28 89
http://www.francetelecom.com/rd


_________________________________________________________________________________________________________________________

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.
Thank you.

Reply via email to