Stream sstables hosted on a node from client using streaming protocol

2015-05-09 Thread Pierre Devops
Hi guys,

I don't know if it's possible but I need to export a raw sstable from a
node in client mode via streaming protocol (the opposite of bulk load),
what I want to do :


public static void main(String[] args) throws Exception {
> Config.setClientMode(true);
> StreamPlan plan = new StreamPlan("SST Import");
> plan.requestRanges(
> InetAddress.getByName("127.0.0.1"),
> InetAddress.getByName(targetedNode),
> keyspace,
> Arrays.asList(new Range(new LongToken(Long.MIN_VALUE), new
> LongToken(Long.MAX_VALUE))),  // fetch everything this node handle for this
> CF
> columnFamily
> );
> plan.execute().get();
> // expect to receive the sstable(s) somewhere...
> }



However now I'm stuck, I don't know how to handle this part in client mode,
help appreciated.


Re: Stream sstables hosted on a node from client using streaming protocol

2015-05-09 Thread Pierre Devops
Thanks yuki, copying SSLTableLoader was the first thing I try, but without
success.

I checked BulkLoadConnectionFactory (
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/tools/BulkLoadConnectionFactory.java)
and I don't see what it provide over the DefaultConnectionFactory that can
help me more in this case.

Without setting up a custom connection factory, it manages already to
connect to the node, and send a streaming request (I see it in cassandra
logs).

INFO  21:16:25 [Stream #a630d860-f690-11e4-a2d0-adca0d5ee899 ID#0] Creating
> new streaming plan for SST Import
> INFO  21:16:25 [Stream #a630d860-f690-11e4-a2d0-adca0d5ee899, ID#0]
> Received streaming plan for SST Import
> INFO  21:16:25 [Stream #a630d860-f690-11e4-a2d0-adca0d5ee899, ID#0]
> Received streaming plan for SST Import
> INFO  21:16:25 [Stream #a630d860-f690-11e4-a2d0-adca0d5ee899 ID#0] Prepare
> completed. Receiving 0 files(0 bytes), sending 2 files(4083518 bytes)
> INFO  21:16:25 [Stream #a630d860-f690-11e4-a2d0-adca0d5ee899] Session with
> /127.0.0.1 is complete
> WARN  21:16:25 [Stream #a630d860-f690-11e4-a2d0-adca0d5ee899] Stream failed
> ERROR 21:16:25 [Stream #a630d860-f690-11e4-a2d0-adca0d5ee899] Streaming
> error occurred



So it looks like my client is receiving two message in its
ConnectionHandler loop (
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/streaming/ConnectionHandler.java#L251)
, the first one is a PREPARE_MESSAGE type with a StreamSummary indicating
the good number of files.

But the second message it receives, it fails to deserialize. So I debugged
and streamed what was coming from this socket, and it was the sstables. but
I don't know why it fails deseriliazion of message type.


Re: Stream sstables hosted on a node from client using streaming protocol

2015-05-10 Thread Pierre Devops
OK so I know a little more now, it's not doable in client mode ATM because
it rely to much on server side stuff.

It needs to initialize ColumnFamilyStore and use an instance of it
afterwards, which will require to much server-side configuration
initialization.

Secondly the way it streams is inefficient because it will deserialize the
streamed sstable to rebuild a new sstable in SSTableWriter.appendFromStream
(needed to rebuild index & other compoment)  while I just need to copy the
-Data- file on the disk.

So I think I'm going to provide my own IncomingFileMessage and its own
deserializer.



2015-05-09 23:32 GMT+02:00 Pierre Devops :

> Thanks yuki, copying SSLTableLoader was the first thing I try, but without
> success.
>
> I checked BulkLoadConnectionFactory (
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/tools/BulkLoadConnectionFactory.java)
> and I don't see what it provide over the DefaultConnectionFactory that can
> help me more in this case.
>
> Without setting up a custom connection factory, it manages already to
> connect to the node, and send a streaming request (I see it in cassandra
> logs).
>
> INFO  21:16:25 [Stream #a630d860-f690-11e4-a2d0-adca0d5ee899 ID#0]
>> Creating new streaming plan for SST Import
>> INFO  21:16:25 [Stream #a630d860-f690-11e4-a2d0-adca0d5ee899, ID#0]
>> Received streaming plan for SST Import
>> INFO  21:16:25 [Stream #a630d860-f690-11e4-a2d0-adca0d5ee899, ID#0]
>> Received streaming plan for SST Import
>> INFO  21:16:25 [Stream #a630d860-f690-11e4-a2d0-adca0d5ee899 ID#0]
>> Prepare completed. Receiving 0 files(0 bytes), sending 2 files(4083518
>> bytes)
>> INFO  21:16:25 [Stream #a630d860-f690-11e4-a2d0-adca0d5ee899] Session
>> with /127.0.0.1 is complete
>> WARN  21:16:25 [Stream #a630d860-f690-11e4-a2d0-adca0d5ee899] Stream
>> failed
>> ERROR 21:16:25 [Stream #a630d860-f690-11e4-a2d0-adca0d5ee899] Streaming
>> error occurred
>
>
>
> So it looks like my client is receiving two message in its
> ConnectionHandler loop (
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/streaming/ConnectionHandler.java#L251)
> , the first one is a PREPARE_MESSAGE type with a StreamSummary indicating
> the good number of files.
>
> But the second message it receives, it fails to deserialize. So I debugged
> and streamed what was coming from this socket, and it was the sstables. but
> I don't know why it fails deseriliazion of message type.
>
>


Re: [discuss] Modernization of Cassandra build system

2015-04-02 Thread Pierre Devops
Hi all,

Not a cassandra contributor here, but I'm working on the cassandra sources
too.

This big cassandra source root caused me trouble too, firstly it was not
easy to import in an IDE, try to import cassandra sources in netbeans, it's
a headcache.

It would be great if we had more small modules/projects in separate POM. It
will be more easier to work on small part of the project, and as a
consequences, I'm sure you will have more external contribution to this
project.

I know cassandra devs are used to ant build model, but it's like a thread I
opened about updated and more complete documentation about sstable
structures. I got answer that it was not needed to understand how to use
Cassandra, and the only way to learn about that is to rtfcode. Because
people working on cassandra already know how sstable structure are, it's
not needed to provide up to date documentation.
So it will take me a very long time to read and understand all the
serialization code in cassandra to understand the sttable structure before
I can work on the code. Up to date documentation about internals would have
gave me the knowledge I need to contribute much quicker.

Here we have the same problem, we have a complex non modular build system,
and core cassandra dev are used to it, so it's not needed to make something
more flexible, even if it could facilite external contribution.



2015-03-31 23:42 GMT+02:00 Benedict Elliott Smith <
belliottsm...@datastax.com>:

> I think the problem is everyone currently contributing is comfortable with
> ant, and as much as it is imperfect, it isn't clear maven is going to be
> better. Having the requisite maven functionality linked under the hood
> doesn't seem particularly preferable to the inverse. The status quo has the
> bonus of zero upheaval for the project and its contributors, though, so it
> would have to be a very clear win to justify the change in my opinion.
>
>
> On Tue, Mar 31, 2015 at 10:24 PM, Łukasz Dywicki 
> wrote:
>
> > Hey Tyler,
> > Thank you very much for coming back. I already lost faith that I will get
> > reply. :-) I am fine with code relocations. Moving constants into one
> place
> > where they cause no circular dependencies is cool, I’m all for doing such
> > thing.
> >
> > Currently Cassandra uses ant for doing some of maven functionalities
> (such
> > deploying POM.xml into repositories with dependency information), it uses
> > also maven type of artifact repositories. This can be easily flipped.
> Maven
> > can call ant tasks for these parts which can not be made with existing
> > maven plugins. Here is simplest example:
> > http://docs.codehaus.org/display/MAVENUSER/Antrun+Plugin <
> > http://docs.codehaus.org/display/MAVENUSER/Antrun+Plugin> - you can see
> > ant task definition embedded in maven pom.xml.
> >
> > Most of things can be made at this moment via maven plugins:
> > apache-rat-plugin:
> > http://mvnrepository.com/artifact/org.apache.rat/apache-rat-plugin/0.11
> <
> > http://mvnrepository.com/artifact/org.apache.rat/apache-rat-plugin/0.11>
> > maven-thrift-plugin:
> >
> http://mvnrepository.com/artifact/org.apache.thrift.tools/maven-thrift-plugin/0.1.11
> > <
> >
> http://mvnrepository.com/artifact/org.apache.thrift.tools/maven-thrift-plugin/0.1.11
> > >
> > antlr4-maven-plugin:
> > http://mvnrepository.com/artifact/org.antlr/antlr4-maven-plugin/4.5 <
> > http://mvnrepository.com/artifact/org.antlr/antlr4-maven-plugin/4.5> or
> > antlr3-maven-plugin:
> > http://mvnrepository.com/artifact/org.antlr/antlr3-maven-plugin/3.5.2 <
> > http://mvnrepository.com/artifact/org.antlr/antlr3-maven-plugin/3.5.2>
> > maven-gpg-plugin:
> >
> http://mvnrepository.com/artifact/org.apache.maven.plugins/maven-gpg-plugin/1.6
> > <
> >
> http://mvnrepository.com/artifact/org.apache.maven.plugins/maven-gpg-plugin/1.6
> > >
> > maven-cobertura-plugin: http://mojo.codehaus.org/cobertura-maven-plugin/
> <
> > http://mojo.codehaus.org/cobertura-maven-plugin/> (but these days jacoco
> > with java agent instrumentation perfoms better)
> > .. and so on
> >
> > I already made some evaluation of impact and it is big. Code has to be
> > separated into different source roots. It’s not easy even for keeping
> > current artifact structure: cassandra-all, cassandra-thrift and
> clientutil
> > (cause of cyclic dependencies). What I can do is prepare of these src
> roots
> > with dependencies which are declared for them and push that to my
> cassandra
> > fork so you will be able to verify that and continue with relocations if
> > you will like new build. Creating new modules (source roots) with maven
> is
> > simple so you could possibly extract more than these 3 predefined
> > artifacts/package roots.
> > Just let me know if you are interested.
> >
> > Kind regards,
> > Lukasz
> >
> >
> > > Wiadomość napisana przez Tyler Hobbs  w dniu 31
> mar
> > 2015, o godz. 21:57:
> > >
> > > Hi Łukasz,
> > >
> > > I'm not very familiar with the build system, but I'll try to respond.
> > >
> > > The