Re: [DISCUSS] Updating the C* website design

2020-08-21 Thread Mick Semb Wever
As part of the work, I think all content files should be moved to
> cassandra/doc. This would give a clear separation of concerns;
>  - cassandra/doc contains the material (asciidoc) that is converted to the
> website content.
>  - cassandra-website hosts the live content and contains all the UI
> resources (html templates, css, js, images) that style the content.



Document authors only need to touch one repository to make content edits.



How would this work when you have one version of cassandra-website and
multiple versions of the in-tree docs.

The in-tree docs (cassandra/doc/) is tied to each C* version. Folk want to
look up the documentation specific to the version they are using. While the
cassandra-website docs are for everything not specific to a C* version.

And there are multiple versions of the in-tree docs hosted underneath the
cassandra-website docs, see `asf-staging` and `asf-site` branches. Putting
this in the main repo would make clones bigger. And there's also the issue
of Antora being under MPL and we have to be strict about not distributing
any of its files in any of our releases.

I would have suggested instead, moving as much of the non-version-specific
content to cassandra-website…

regards,
Mick


Re: [DISCUSS] CEP-7 Storage Attached Index

2020-08-21 Thread Jason Rutherglen
> About space efficiency, one of the biggest drawback of SASI was the huge
space required for index structure when using CONTAINS logic because of the
decomposition of text columns into n-grams. Will SAI suffer from the same
issue in future iterations ?

SAI does not have specific ngram support atm, though that may be added
with tokenizers.  Ngrams do indeed grow the index, that's a user
decision for faster queries or more disk space.

On Tue, Aug 18, 2020 at 6:05 AM DuyHai Doan  wrote:
>
> Thank you Zhao Yang for starting this topic
>
> After reading the short design doc, I have a few questions
>
> 1) SASI was pretty inefficient indexing wide partitions because the index
> structure only retains the partition token, not the clustering colums. As
> per design doc SAI has row id mapping to partition offset, can we hope that
> indexing wide partition will be more efficient with SAI ? One detail that
> worries me is that in the beggining of the design doc, it is said that the
> matching rows are post filtered while scanning the partition. Can you
> confirm or infirm that SAI is efficient with wide partitions and provides
> the partition offsets to the matching rows ?
>
> 2) About space efficiency, one of the biggest drawback of SASI was the huge
> space required for index structure when using CONTAINS logic because of the
> decomposition of text columns into n-grams. Will SAI suffer from the same
> issue in future iterations ? I'm anticipating a bit
>
> 3) If I'm querying using SAI and providing complete partition key, will it
> be more efficient than querying without partition key. In other words, does
> SAI provide any optimisation when partition key is specified ?
>
> Regards
>
> Duy Hai DOAN
>
> Le mar. 18 août 2020 à 11:39, Mick Semb Wever  a écrit :
>
> > >
> > > We are looking forward to the community's feedback and suggestions.
> > >
> >
> >
> > What comes immediately to mind is testing requirements. It has been
> > mentioned already that the project's testability and QA guidelines are
> > inadequate to successfully introduce new features and refactorings to the
> > codebase. During the 4.0 beta phase this was intended to be addressed, i.e.
> > defining more specific QA guidelines for 4.0-rc. This would be an important
> > step towards QA guidelines for all changes and CEPs post-4.0.
> >
> > Questions from me
> >  - How will this be tested, how will its QA status and lifecycle be
> > defined? (per above)
> >  - With existing C* code needing to be changed, what is the proposed plan
> > for making those changes ensuring maintained QA, e.g. is there separate QA
> > cycles planned for altering the SPI before adding a new SPI implementation?
> >  - Despite being out of scope, it would be nice to have some idea from the
> > CEP author of when users might still choose afresh 2i or SASI over SAI,
> >  - Who fills the roles involved? Who are the contributors in this DataStax
> > team? Who is the shepherd? Are there other stakeholders willing to be
> > involved?
> >  - Is there a preference to use gdoc instead of the project's wiki, and
> > why? (the CEP process suggest a wiki page, and feedback on why another
> > approach is considered better helps evolve the CEP process itself)
> >
> > cheers,
> > Mick
> >

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [DISCUSS] Updating the C* website design

2020-08-21 Thread Lorina Poland
Thanks for the comments, Mick.

Yes, I think you are correct in what you have to say about versioned vs
non-versioned docs for the website. It's such an obvious comment that I can
only say I must have been half-asleep when Anthony and I discussed the
topic. (Maybe he was, too!)

Since Antora is one of the static site generators known for generating
content from multiple repo sources, I think I see the way forward better,
and will go back to work on it again.

Lorina



On Fri, Aug 21, 2020 at 5:26 AM Mick Semb Wever  wrote:

> As part of the work, I think all content files should be moved to
> > cassandra/doc. This would give a clear separation of concerns;
> >  - cassandra/doc contains the material (asciidoc) that is converted to
> the
> > website content.
> >  - cassandra-website hosts the live content and contains all the UI
> > resources (html templates, css, js, images) that style the content.
>
>
>
> Document authors only need to touch one repository to make content edits.
>
>
>
> How would this work when you have one version of cassandra-website and
> multiple versions of the in-tree docs.
>
> The in-tree docs (cassandra/doc/) is tied to each C* version. Folk want to
> look up the documentation specific to the version they are using. While the
> cassandra-website docs are for everything not specific to a C* version.
>
> And there are multiple versions of the in-tree docs hosted underneath the
> cassandra-website docs, see `asf-staging` and `asf-site` branches. Putting
> this in the main repo would make clones bigger. And there's also the issue
> of Antora being under MPL and we have to be strict about not distributing
> any of its files in any of our releases.
>
> I would have suggested instead, moving as much of the non-version-specific
> content to cassandra-website…
>
> regards,
> Mick
>


Re: [DISCUSS] Updating the C* website design

2020-08-21 Thread Rahul Singh
Folks,

I applaud the choice of Antora for documentation but I’m not sure it is the 
best choice for generating an appealing site.

Antora’s self professed strength is in technical documentation. Do we want to 
stick to a “documentation” / utility look for the front facing site or for a 
blog?

https://gitlab.com/antora/antora/-/issues/444

I don’t want to rehash any conclusion on choosing Antora for docs or whether 
asciidoc is the choice for writing documentation.

Could we think about using something like Gatsby or similar for the front 
facing 5-10 pages + blog ? E. G. Skywalking uses vuepress.

We can use asciidoc as the common format while using Antora for the docs and 
something else for the rest of the content 
(https://www.gatsbyjs.com/plugins/gatsby-transformer-asciidoc/)

Something like Gatsby can use both Markdown and Asciidoc and we can migrate 
from one to the other while still using the same tooling.

Just some thoughts would love feedback!

rahul.xavier.si...@gmail.com

http://cassandra.link
The Apache Cassandra Knowledge Base.
On Jul 29, 2020, 1:28 PM -0400, M Brandon Williams , wrote:
>
> web


Re: [DISCUSS] Updating the C* website design

2020-08-21 Thread Rahul Singh
Seems like even Antora uses another SSG called middleman for their “marketing” 
home page.

https://gitlab.com/antora/antora.org

If the convenience of having both content and docs all in one SSG for code 
maintenance is compatible with the aesthetic/ content / taxonomy strategy need 
for the site visitors, we’ll find out soon enough.




rahul.xavier.si...@gmail.com

http://cassandra.link
The Apache Cassandra Knowledge Base.
On Aug 21, 2020, 8:54 PM -0400, Rahul Singh , wrote:
> Folks,
>
> I applaud the choice of Antora for documentation but I’m not sure it is the 
> best choice for generating an appealing site.
>
> Antora’s self professed strength is in technical documentation. Do we want to 
> stick to a “documentation” / utility look for the front facing site or for a 
> blog?
>
> https://gitlab.com/antora/antora/-/issues/444
>
> I don’t want to rehash any conclusion on choosing Antora for docs or whether 
> asciidoc is the choice for writing documentation.
>
> Could we think about using something like Gatsby or similar for the front 
> facing 5-10 pages + blog ? E. G. Skywalking uses vuepress.
>
> We can use asciidoc as the common format while using Antora for the docs and 
> something else for the rest of the content 
> (https://www.gatsbyjs.com/plugins/gatsby-transformer-asciidoc/)
>
> Something like Gatsby can use both Markdown and Asciidoc and we can migrate 
> from one to the other while still using the same tooling.
>
> Just some thoughts would love feedback!
>
> rahul.xavier.si...@gmail.com
>
> http://cassandra.link
> The Apache Cassandra Knowledge Base.
> On Jul 29, 2020, 1:28 PM -0400, M Brandon Williams , wrote:
> >
> > web