bug report - CQL3 grammar should ignore VARCHAR column length in CREATE statements

2013-03-02 Thread Andrew Prendergast
*DESCRIPTION*

When creating a table in all ANSI-SQL compliant RDBMS' the VARCHAR datatype
takes a numeric parameter, however this parameter is generating errors in
CQL3.

*STEPS TO REPRODUCE*

CREATE TABLE test (id BIGINT PRIMARY KEY, col1 VARCHAR(256)); // emits Bad
Request: line 1:54 mismatched input '(' expecting ')'

CREATE TABLE test (id BIGINT PRIMARY KEY, col1 VARCHAR); // this works

*SUGGESTED RESOLUTION*

The current fail-fast approach does not create the column so that the user
is 100% clear that the length parameter means nothing to NOSQL.

I would like to propose that the column length be allowed in the grammar
(but ignored by cassandra), allowing better ANSI-SQL client compatibility.


Is CQL3 too strict with bound variables?

2013-03-02 Thread Andrew Prendergast
Just posting this as more of a talking point around comparing the behavior
of classical RDMBS to Cassandra. This bug is fixable over in the JDBC
driver, but look at isolation #2 & exception B:

http://code.google.com/a/apache-extras.org/p/cassandra-jdbc/issues/detail?id=72

It's coming from the server. Should Cassandra be more tolerant of feeding
numbers into VARCHAR columns and vice-versa or is that the job of the
client driver?

ap


Re: bug report - CQL3 grammar should ignore VARCHAR column length in CREATE statements

2013-03-02 Thread Andrew Prendergast
Hi Ed,

totally get the reasoning behind NOSQL and that it'll never be 100%
ANSI-SQL.. this is more about making the core functionality you do have
compatible with existing clients, otherwise why bother implementing a JDBC
+ SQL interface?

Cassandra is more than capable of accommodating say a VARCHAR(256) or a
VARCHAR(MAX) column, and thus shouldn't throw an exception.

The Ansi spec says:

  ::=
CHARACTER []
  | CHAR []
  | CHARACTER VARYING   
  | CHAR VARYING   
  | VARCHAR   

...


4) If  is omitted, then a  of 1 is implicit.


Right now, any existing JDBC/SQL client that tries to create a table with
string column > 1 character long will not be compatible with Cassandra.

For the sake of compatibility, I propose that the essential stuff (CREATE,
SELECT, INSERT, UPDATE, DELETE) should "accomodate" the most basic of
ANSI-SQL grammar and that the JDBC driver should behave similarly to
existing JDBC drivers.

I'm picking through where that isn't the case over the next few days. With
a few tiny adjustments we can make CQL3 very compatible with an existing
world of rich SQL-based client apps while still remaining faithful to the
NOSQL abstraction.

ap



On Sun, Mar 3, 2013 at 5:50 AM, Edward Capriolo wrote:

> If the syntax effectively does nothing I do not see the point of adding it.
> CQL is never going to be 100% compatible ANSI-SQL dialect.
>
> On Sat, Mar 2, 2013 at 12:19 PM, Michael Kjellman
> wrote:
>
> > Might want to create a Jira ticket at issues.apache.org instead of
> > submitting the bug report thru email.
> >
> > On Mar 2, 2013, at 3:11 AM, "Andrew Prendergast" <
> a...@andrewprendergast.com>
> > wrote:
> >
> > > *DESCRIPTION*
> > >
> > > When creating a table in all ANSI-SQL compliant RDBMS' the VARCHAR
> > datatype
> > > takes a numeric parameter, however this parameter is generating errors
> in
> > > CQL3.
> > >
> > > *STEPS TO REPRODUCE*
> > >
> > > CREATE TABLE test (id BIGINT PRIMARY KEY, col1 VARCHAR(256)); // emits
> > Bad
> > > Request: line 1:54 mismatched input '(' expecting ')'
> > >
> > > CREATE TABLE test (id BIGINT PRIMARY KEY, col1 VARCHAR); // this works
> > >
> > > *SUGGESTED RESOLUTION*
> > >
> > > The current fail-fast approach does not create the column so that the
> > user
> > > is 100% clear that the length parameter means nothing to NOSQL.
> > >
> > > I would like to propose that the column length be allowed in the
> grammar
> > > (but ignored by cassandra), allowing better ANSI-SQL client
> > compatibility.
> >
> > Copy, by Barracuda, helps you store, protect, and share all your amazing
> >
> > things. Start today: www.copy.com.
> >
>


Re: bug report - CQL3 grammar should ignore VARCHAR column length in CREATE statements

2013-03-04 Thread Andrew Prendergast
Hi Sylvain,

I disagree. Actually, CQL3 is really close to being a faithful subset of
ANSI-SQL, which is rather exciting.

This is just one of a few small adjustments that can be made to the grammar
to make everyone's life easier while still maintaining the spirit of NOSQL.

The semantics of how the grammar is implemented behind the scenes is
unimportant, what matters more is that an an ANSI-like interface should
have ANSI-like behavior where possible, if only for the greater good of
humanity!

Implementing a subset of ANSI is a good thing, changing ANSI-SQL not so
much.

On this specific issue, Cassandra's behavior subsumes what would be
expected of say VARCHAR(256) or VARCHAR(MAX) so throwing an exception to
client apps when they ask for either of these is incorrect.

There are a handful of minor disparities whereby the ANSI spec has been
changed (vs. not implemented) - I'm making a list of the former and I'll
put it into JIRA under one task.

ap




On Mon, Mar 4, 2013 at 7:08 PM, Sylvain Lebresne wrote:

> > When creating a table in all ANSI-SQL compliant RDBMS' the VARCHAR
> datatype
> > takes a numeric parameter, however this parameter is generating errors in
> > CQL3.
> >
>
> CQL3 is *not* ANSI-SQL compliant and will never be. It's not even SQL.
>
>
>
> > I would like to propose that the column length be allowed in the grammar
> > (but ignored by cassandra), allowing better ANSI-SQL client
> compatibility.
> >
>
> I agree with Ed and disagree with your proposal. Allowing a length
> constraint
> but not respecting it would only help people to shoot themselves in the
> foot, as
> they would be led to believe Cassandra does validate the length where it
> doesn't.
>
> If would be acceptable to add the length only if we do respect it. Which is
> something we could do, though adding it "just" to make it close to ANSI-SQL
> honestly doesn't interest me much (adding it because it's useful would be a
> better argument however) because again, CQL is not SQL. In particular, you
> don't model application with CQL the same way you do with SQL, and so using
> a SQL application in CQL without modification is not a good idea. Besides,
> I doubt any non-trivial SQL application could be used as is in CQL, due to
> the
> lack of join for instance.
>
> --
> Sylvain
>


Re: bug report - CQL3 grammar should ignore VARCHAR column length in CREATE statements

2013-03-05 Thread Andrew Prendergast
Sylvain,

All I'm trying to do is make Cassandra work with existing ETL, ORM &
desktop tools a bit better.

I think that is a worthy cause.

ap



On Tue, Mar 5, 2013 at 8:20 PM, Sylvain Lebresne wrote:

> > This is just one of a few small adjustments that can be made to the
> grammar
> > to make everyone's life easier while still maintaining the spirit of
> NOSQL.
>
> To be clear, I am *not* necessarily against making CQL3 closer to the
> ANSI-SQL
> as a convenience. But only if that doesn't compromise the language
> "integrity"
> and is justified. Adding a syntax with a well known semantic but without
> implementing said semantic fails that. Adding varchar size limits *with*
> its
> semantic would be acceptable (which is not saying that I personally care
> for
> it).
>
> And just so there is not misunderstanding, let's be clear that CQL3 will
> *never* be a proper subset of ANSI-SQL. Typically, CQL treats INSERT and
> UPDATE
> the same way, which breaks ANSI-SQL (though CQL never pretended being
> ANSI-SQL
> in the first place, so it doesn't break anything really). And that is not
> going
> to change, there is deep technical reason for that.
>
> > The semantics of how the grammar is implemented behind the scenes is
> > unimportant, what matters more is that an an ANSI-like interface should
> > have ANSI-like behavior where possible,
>
> So you are saying the most important part of the ANSI-SQL specification is
> the
> syntax grammar? I'll have to disagree.
>
> > Implementing a subset of ANSI is a good thing, changing ANSI-SQL not so
> > much.
>
> I have to say that I fail to see how not supporting joins (which we're not
> going to support any time soon, if ever, unless maybe you are suggesting
> supporting the join syntax but with a random semantic?) fails into "CQL3 is
> a
> subset of ANSI-SQL" but not supporting the size limit syntax of varchar
> fails
> into "changing ANSI-SQL".
>
> In fact I would say that not supporting the syntax in the first place is
> making
> it a subset, while supporting the syntax without the correct semantic (your
> suggestion) is breaking the ANSI spec (after all, the spec *does* specify
> that
> "the length in bits of the bit string is fixed and is the value of
> "
> (SQL 1992, section 6.1 )).
>
> --
> Sylvain
>


Re: bug report - CQL3 grammar should ignore VARCHAR column length in CREATE statements

2013-03-05 Thread Andrew Prendergast
Hi Tristan,

I've spent the last couple weekends testing the CRUD DML stuff and its very
close to meeting that objective (although NULL handling needs some tuning).

The main hiccups are in the JDBC driver which I have been working through
with Rick - once he accepts my patches it'll be pretty solid in terms of
cross-platform compatibility.

On the DDL, I personally have a need for similar compatibility. One app I'm
working on  programmatically creates the schema for a rather big ETL
environment. It includes a very nice abstraction that creates databases and
tables to accommodate tuples as they pass through the pipeline and behaves
the same regardless of which DBMS is being used as the storage engine.

This is possible because it turns out there is a subset of DDL that is
common to all of the DBMS platforms and it would be very useful to see that
in Cassandra.

ap




On Tue, Mar 5, 2013 at 8:26 PM, Tristan Tarrant
wrote:

> On Tue, Mar 5, 2013 at 10:20 AM, Sylvain Lebresne  >wrote:
>
> > > This is just one of a few small adjustments that can be made to the
> > grammar
> > > to make everyone's life easier while still maintaining the spirit of
> > NOSQL.
> >
> > To be clear, I am *not* necessarily against making CQL3 closer to the
> > ANSI-SQL
> > as a convenience. But only if that doesn't compromise the language
> > "integrity"
> > and is justified. Adding a syntax with a well known semantic but without
> >
>
> To me database DDL (such as the CREATE statement we are talking about) is
> always going to be handled in a custom fashion by applications.
> While ANSI SQL compatibility for CRUD operations is a great objective, I
> don't think it really matters for DDL.
>
> Tristan
>


Re: bug report - CQL3 grammar should ignore VARCHAR column length in CREATE statements

2013-03-05 Thread Andrew Prendergast
*>
http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/schema_vs_schema_less
*
Thanks for the link Ed, I'm aware of all that.

*> Does your the tool handle the fact that foreign keys do not work?
*
yes. It doesn't use foreign keys or any constraints, they slow things down.

*> how are your dealing with the fact that a "primary key" in cassandra is
nothing like a "primary key" in a RDBMS?
*
locality preserving sequences & natural keys. There are no range queries.

*> Generally under the impression that CRUD tools that auto-generate CQL
schema's can give someone the rope to hang themselves.
*
For those of us that know what we are doing and have had to put up with SQL
based ETL, refining CQL3 would be life changing and ease the transition.

ap




On Wed, Mar 6, 2013 at 8:08 AM, Edward Capriolo wrote:

>
> http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/schema_vs_schema_less
>
> Does your the tool handle the fact that foreign keys do not work? Or for
> that matter, how are your dealing with the fact that a "primary key" in
> cassandra is nothing like a "primary key" in a RDBMS?
>
> Generally under the impression that CRUD tools that auto-generate CQL
> schema's can give someone the rope to hang themselves.
>
> On Tue, Mar 5, 2013 at 3:46 PM, Andrew Prendergast <
> a...@andrewprendergast.com
> > wrote:
>
> > Hi Tristan,
> >
> > I've spent the last couple weekends testing the CRUD DML stuff and its
> very
> > close to meeting that objective (although NULL handling needs some
> tuning).
> >
> > The main hiccups are in the JDBC driver which I have been working through
> > with Rick - once he accepts my patches it'll be pretty solid in terms of
> > cross-platform compatibility.
> >
> > On the DDL, I personally have a need for similar compatibility. One app
> I'm
> > working on  programmatically creates the schema for a rather big ETL
> > environment. It includes a very nice abstraction that creates databases
> and
> > tables to accommodate tuples as they pass through the pipeline and
> behaves
> > the same regardless of which DBMS is being used as the storage engine.
> >
> > This is possible because it turns out there is a subset of DDL that is
> > common to all of the DBMS platforms and it would be very useful to see
> that
> > in Cassandra.
> >
> > ap
> >
> >
> >
> >
> > On Tue, Mar 5, 2013 at 8:26 PM, Tristan Tarrant
> > wrote:
> >
> > > On Tue, Mar 5, 2013 at 10:20 AM, Sylvain Lebresne <
> sylv...@datastax.com
> > > >wrote:
> > >
> > > > > This is just one of a few small adjustments that can be made to the
> > > > grammar
> > > > > to make everyone's life easier while still maintaining the spirit
> of
> > > > NOSQL.
> > > >
> > > > To be clear, I am *not* necessarily against making CQL3 closer to the
> > > > ANSI-SQL
> > > > as a convenience. But only if that doesn't compromise the language
> > > > "integrity"
> > > > and is justified. Adding a syntax with a well known semantic but
> > without
> > > >
> > >
> > > To me database DDL (such as the CREATE statement we are talking about)
> is
> > > always going to be handled in a custom fashion by applications.
> > > While ANSI SQL compatibility for CRUD operations is a great objective,
> I
> > > don't think it really matters for DDL.
> > >
> > > Tristan
> > >
> >
>


Re: bug report - CQL3 grammar should ignore VARCHAR column length in CREATE statements

2013-03-05 Thread Andrew Prendergast
Totally agree, simple is good.

I want to refine the interface - the storage engine is fabulous as-is.

ap




On Wed, Mar 6, 2013 at 8:29 AM, Edward Capriolo wrote:

> Not to say that you can not do it. Or that it is impossible to do
> correctly, but currently Cassandra does not allow it's validation to accept
> parameters per column. IE you can set a column to be varchar UTF8Type, or
> int int32Type but you CAN'T attach more properties to that type, such as
> the size of the text or the integer.
>
> I am very wary of Cassandra adding anymore schema. I signed up for a
> schema-LESS database. If schema can be added that is not conflicting with
> the original use cases so be it. However the latest round of "schema" has
> caused COMPACT TABLES and CQL tables to be very different and essentially
> not compatible with each other.
>
> With schema and cassandra less is more.
>
> On Tue, Mar 5, 2013 at 4:08 PM, Edward Capriolo  >wrote:
>
> >
> >
> http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/schema_vs_schema_less
> >
> > Does your the tool handle the fact that foreign keys do not work? Or for
> > that matter, how are your dealing with the fact that a "primary key" in
> > cassandra is nothing like a "primary key" in a RDBMS?
> >
> > Generally under the impression that CRUD tools that auto-generate CQL
> > schema's can give someone the rope to hang themselves.
> >
> > On Tue, Mar 5, 2013 at 3:46 PM, Andrew Prendergast <
> > a...@andrewprendergast.com> wrote:
> >
> >> Hi Tristan,
> >>
> >> I've spent the last couple weekends testing the CRUD DML stuff and its
> >> very
> >> close to meeting that objective (although NULL handling needs some
> >> tuning).
> >>
> >> The main hiccups are in the JDBC driver which I have been working
> through
> >> with Rick - once he accepts my patches it'll be pretty solid in terms of
> >> cross-platform compatibility.
> >>
> >> On the DDL, I personally have a need for similar compatibility. One app
> >> I'm
> >> working on  programmatically creates the schema for a rather big ETL
> >> environment. It includes a very nice abstraction that creates databases
> >> and
> >> tables to accommodate tuples as they pass through the pipeline and
> behaves
> >> the same regardless of which DBMS is being used as the storage engine.
> >>
> >> This is possible because it turns out there is a subset of DDL that is
> >> common to all of the DBMS platforms and it would be very useful to see
> >> that
> >> in Cassandra.
> >>
> >> ap
> >>
> >>
> >>
> >>
> >> On Tue, Mar 5, 2013 at 8:26 PM, Tristan Tarrant
> >> wrote:
> >>
> >> > On Tue, Mar 5, 2013 at 10:20 AM, Sylvain Lebresne <
> sylv...@datastax.com
> >> > >wrote:
> >> >
> >> > > > This is just one of a few small adjustments that can be made to
> the
> >> > > grammar
> >> > > > to make everyone's life easier while still maintaining the spirit
> of
> >> > > NOSQL.
> >> > >
> >> > > To be clear, I am *not* necessarily against making CQL3 closer to
> the
> >> > > ANSI-SQL
> >> > > as a convenience. But only if that doesn't compromise the language
> >> > > "integrity"
> >> > > and is justified. Adding a syntax with a well known semantic but
> >> without
> >> > >
> >> >
> >> > To me database DDL (such as the CREATE statement we are talking about)
> >> is
> >> > always going to be handled in a custom fashion by applications.
> >> > While ANSI SQL compatibility for CRUD operations is a great
> objective, I
> >> > don't think it really matters for DDL.
> >> >
> >> > Tristan
> >> >
> >>
> >
> >
>


Re: bug report - CQL3 grammar should ignore VARCHAR column length in CREATE statements

2013-03-05 Thread Andrew Prendergast
I think we agree then!

The fact that people misuse Cassandra is a whole separate issue and
shouldn't hold the rest of us back. Adding obfuscation to CQL3 isn't the
solution.

Give me a few more days to put a complete list together of what I think
could be refined in CQL3 then we can pick this up again next week.

I expect it will be a robust discussion :)

ap



On Wed, Mar 6, 2013 at 8:38 AM, Edward Capriolo wrote:

> yes. It doesn't use foreign keys or any constraints, they slow things down.
>
> Exactly what you do not want. Check the history of the "features" that do
> read before write. Counters, the old read before write secondary indexes,
> the new collection functions that impose read before write.
>
> Once people start using them they send an email to cassandra mailing list
> that goes like this:
> "
> Subject: Why is Cassandra so slow?
> Message: I am using secondary indexes and as I write data I seem my
> READ_STAGE is filling up. What is going on? I thought cassandra was faster
> then MySQL? Once my database gets bigger then X GB it slows to a crawl.
> Please help.
> "
> If we make tools that design anti-pattern schema's people will use them, no
> one wins.
>
>
> On Tue, Mar 5, 2013 at 4:30 PM, Andrew Prendergast <
> a...@andrewprendergast.com
> > wrote:
>
> > *>
> >
> >
> http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/schema_vs_schema_less
> > *
> > Thanks for the link Ed, I'm aware of all that.
> >
> > *> Does your the tool handle the fact that foreign keys do not work?
> > *
> > yes. It doesn't use foreign keys or any constraints, they slow things
> down.
> >
> > *> how are your dealing with the fact that a "primary key" in cassandra
> is
> > nothing like a "primary key" in a RDBMS?
> > *
> > locality preserving sequences & natural keys. There are no range queries.
> >
> > *> Generally under the impression that CRUD tools that auto-generate CQL
> > schema's can give someone the rope to hang themselves.
> > *
> > For those of us that know what we are doing and have had to put up with
> SQL
> > based ETL, refining CQL3 would be life changing and ease the transition.
> >
> > ap
> >
> >
> >
> >
> > On Wed, Mar 6, 2013 at 8:08 AM, Edward Capriolo  > >wrote:
> >
> > >
> > >
> >
> http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/schema_vs_schema_less
> > >
> > > Does your the tool handle the fact that foreign keys do not work? Or
> for
> > > that matter, how are your dealing with the fact that a "primary key" in
> > > cassandra is nothing like a "primary key" in a RDBMS?
> > >
> > > Generally under the impression that CRUD tools that auto-generate CQL
> > > schema's can give someone the rope to hang themselves.
> > >
> > > On Tue, Mar 5, 2013 at 3:46 PM, Andrew Prendergast <
> > > a...@andrewprendergast.com
> > > > wrote:
> > >
> > > > Hi Tristan,
> > > >
> > > > I've spent the last couple weekends testing the CRUD DML stuff and
> its
> > > very
> > > > close to meeting that objective (although NULL handling needs some
> > > tuning).
> > > >
> > > > The main hiccups are in the JDBC driver which I have been working
> > through
> > > > with Rick - once he accepts my patches it'll be pretty solid in terms
> > of
> > > > cross-platform compatibility.
> > > >
> > > > On the DDL, I personally have a need for similar compatibility. One
> app
> > > I'm
> > > > working on  programmatically creates the schema for a rather big ETL
> > > > environment. It includes a very nice abstraction that creates
> databases
> > > and
> > > > tables to accommodate tuples as they pass through the pipeline and
> > > behaves
> > > > the same regardless of which DBMS is being used as the storage
> engine.
> > > >
> > > > This is possible because it turns out there is a subset of DDL that
> is
> > > > common to all of the DBMS platforms and it would be very useful to
> see
> > > that
> > > > in Cassandra.
> > > >
> > > > ap
> > > >
> > > >
> > > >
> > > >
> > > > On Tue, Mar 5, 2013 at 8:26 PM, Tristan Tarrant
> > > > wrote:
> > > >
> > > > > On Tue, Mar