Cassandra site broken links

2021-11-06 Thread Michael Shuler
(Sending to dev@ which seems a better place to discuss; updated subject. 
Thanks OP!)


I ran a couple link checking tools on the site and there are lots more 
problems than the couple noted. This seems like a good task for a 
non-dev to make a substantial project impact. Muffet [0] seemed the 
quickest way to get some decent output. I grabbed the v2.4.4 binary 
release [1]; tar xzvf .., and:


$ ./muffet https://cassandra.apache.org/ \
 | tee -a cassandra.apache.org_muffet.log.txt

result (2950 lines):
https://12.am/tmp/cassandra.apache.org_muffet.log.txt

$ egrep '^\s4' cassandra.apache.org_muffet.log.txt \
 | wc -l
841
$ egrep '^\sid #' cassandra.apache.org_muffet.log.txt \
 | wc -l
1401

[0] https://github.com/raviqqe/muffet
[1] https://github.com/raviqqe/muffet/releases

Kind regards,
Michael

On 11/5/21 4:09 PM, Greg Stein wrote:

see below:

-- Forwarded message -
From: *Hubert Kulas* >

Date: Fri, Nov 5, 2021 at 1:29 PM
Subject: Not working links
To: mailto:webmas...@apache.org>>


Hi,

I am writing my thesis about big data and I was doing some research 
about real-world use cases of Cassandra. While doing that I found that 
after clicking "read more" under 'Coursera'  leads us to DataStax 
website where we are greeted with "You do not have access to view this 
page" message. To reproduce it just go to 
https://cassandra.apache.org/_/case-studies.html 
 and then find 
Coursera and click "read more".  Then after trying to find a way to 
contact you guys about the problem I encountered another problem on this 
part of the website 
https://cassandra.apache.org/doc/3.11.5/contactus.html 

After clicking the icon leads us to 
https://cassandra.apache.org/feed.xml 
 which gives us the 404 Not Found 
message.

2021-11-05_19h26_44.png

Best Regards,
Hubert Kulas


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Cassandra site broken links

2021-11-06 Thread Michael Shuler
FYI - I'm going to try to slow down the checks, since I just noticed a 
bunch of the 4xx errors are "HTTP 429 Too Many Requests"


Kind regards,
Michael

On 11/6/21 11:52 AM, Michael Shuler wrote:
(Sending to dev@ which seems a better place to discuss; updated subject. 
Thanks OP!)


I ran a couple link checking tools on the site and there are lots more 
problems than the couple noted. This seems like a good task for a 
non-dev to make a substantial project impact. Muffet [0] seemed the 
quickest way to get some decent output. I grabbed the v2.4.4 binary 
release [1]; tar xzvf .., and:


$ ./muffet https://cassandra.apache.org/ \
  | tee -a cassandra.apache.org_muffet.log.txt

result (2950 lines):
https://12.am/tmp/cassandra.apache.org_muffet.log.txt

$ egrep '^\s4' cassandra.apache.org_muffet.log.txt \
  | wc -l
841
$ egrep '^\sid #' cassandra.apache.org_muffet.log.txt \
  | wc -l
1401

[0] https://github.com/raviqqe/muffet
[1] https://github.com/raviqqe/muffet/releases

Kind regards,
Michael

On 11/5/21 4:09 PM, Greg Stein wrote:

see below:

-- Forwarded message -
From: *Hubert Kulas* >

Date: Fri, Nov 5, 2021 at 1:29 PM
Subject: Not working links
To: mailto:webmas...@apache.org>>


Hi,

I am writing my thesis about big data and I was doing some research 
about real-world use cases of Cassandra. While doing that I found that 
after clicking "read more" under 'Coursera'  leads us to DataStax 
website where we are greeted with "You do not have access to view this 
page" message. To reproduce it just go to 
https://cassandra.apache.org/_/case-studies.html 
 and then find 
Coursera and click "read more".  Then after trying to find a way to 
contact you guys about the problem I encountered another problem on 
this part of the website 
https://cassandra.apache.org/doc/3.11.5/contactus.html 

After clicking the icon leads us to 
https://cassandra.apache.org/feed.xml 
 which gives us the 404 Not 
Found message.

2021-11-05_19h26_44.png

Best Regards,
Hubert Kulas


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Cassandra site broken links

2021-11-06 Thread Michael Shuler

I overwrote the result link - much better, no more 429s.

https://12.am/tmp/cassandra.apache.org_muffet.log.txt

- lots of page anchor problems
- quite a few busted links
- quite a few hosts that are gone
- one link timeout
- (a few "error" reports are each 200s, just headers)

$ egrep '^\sid #' c*.log.txt |wc -l
1416
$ egrep '^\s4' c*.log.txt |wc -l
55
$ egrep '^\slookup' c*.log.txt |wc -l
20
$ egrep '^\stimeout' c*.log.txt |wc -l
1

Warm regards,
Michael

On 11/6/21 11:59 AM, Michael Shuler wrote:
FYI - I'm going to try to slow down the checks, since I just noticed a 
bunch of the 4xx errors are "HTTP 429 Too Many Requests"


Kind regards,
Michael

On 11/6/21 11:52 AM, Michael Shuler wrote:
(Sending to dev@ which seems a better place to discuss; updated 
subject. Thanks OP!)


I ran a couple link checking tools on the site and there are lots more 
problems than the couple noted. This seems like a good task for a 
non-dev to make a substantial project impact. Muffet [0] seemed the 
quickest way to get some decent output. I grabbed the v2.4.4 binary 
release [1]; tar xzvf .., and:


$ ./muffet https://cassandra.apache.org/ \
  | tee -a cassandra.apache.org_muffet.log.txt

result (2950 lines):
https://12.am/tmp/cassandra.apache.org_muffet.log.txt

$ egrep '^\s4' cassandra.apache.org_muffet.log.txt \
  | wc -l
841
$ egrep '^\sid #' cassandra.apache.org_muffet.log.txt \
  | wc -l
1401

[0] https://github.com/raviqqe/muffet
[1] https://github.com/raviqqe/muffet/releases

Kind regards,
Michael

On 11/5/21 4:09 PM, Greg Stein wrote:

see below:

-- Forwarded message -
From: *Hubert Kulas* >

Date: Fri, Nov 5, 2021 at 1:29 PM
Subject: Not working links
To: mailto:webmas...@apache.org>>


Hi,

I am writing my thesis about big data and I was doing some research 
about real-world use cases of Cassandra. While doing that I found 
that after clicking "read more" under 'Coursera'  leads us to 
DataStax website where we are greeted with "You do not have access to 
view this page" message. To reproduce it just go to 
https://cassandra.apache.org/_/case-studies.html 
 and then find 
Coursera and click "read more".  Then after trying to find a way to 
contact you guys about the problem I encountered another problem on 
this part of the website 
https://cassandra.apache.org/doc/3.11.5/contactus.html 

After clicking the icon leads us to 
https://cassandra.apache.org/feed.xml 
 which gives us the 404 Not 
Found message.

2021-11-05_19h26_44.png

Best Regards,
Hubert Kulas


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [DISCUSS] Releasable trunk and quality

2021-11-06 Thread Ekaterina Dimitrova
Thank you Josh.

“I think it would be helpful if we always ran the repeated test jobs at
CircleCI when we add a new test or modify an existing one. Running those
jobs, when applicable, could be a requirement before committing. This
wouldn't help us when the changes affect many different tests or we are not
able to identify the tests affected by our changes, but I think it could
have prevented many of the recently fixed flakies.“

I think I would love also to see the verification with running new tests in
a loop before adding them to the code happening more often. It was
mentioned by a few of us in this discussion as a good method we already use
successfully so I just wanted to mention it again so it doesn’t slip out of
the list. :-)

Happy weekend everyone!

Best regards,
Ekaterina


On Fri, 5 Nov 2021 at 11:30, Joshua McKenzie  wrote:

> To checkpoint this conversation and keep it going, the ideas I see
> in-thread (light editorializing by me):
> 1. Blocking PR merge on CI being green (viable for single branch commits,
> less so for multiple)
> 2. A change in our expected culture of "if you see something, fix
> something" when it comes to test failures on a branch (requires stable
> green test board to be viable)
> 3. Clearer merge criteria and potentially updates to circle config for
> committers in terms of "which test suites need to be run" (notably,
> including upgrade tests)
> 4. Integration of model and property based fuzz testing into the release
> qualification pipeline at least
> 5. Improvements in project dependency management, most notably in-jvm dtest
> API's, and the release process around that
>
> So a) Am I missing anything, and b) Am I getting anything wrong in the
> summary above?
>
> On Thu, Nov 4, 2021 at 9:01 AM Andrés de la Peña 
> wrote:
>
> > Hi all,
> >
> > we already have a way to confirm flakiness on circle by running the test
> > > repeatedly N times. Like 100 or 500. That has proven to work very well
> > > so far, at least for me. #collaborating #justfyi
> >
> >
> > I think it would be helpful if we always ran the repeated test jobs at
> > CircleCI when we add a new test or modify an existing one. Running those
> > jobs, when applicable, could be a requirement before committing. This
> > wouldn't help us when the changes affect many different tests or we are
> not
> > able to identify the tests affected by our changes, but I think it could
> > have prevented many of the recently fixed flakies.
> >
> >
> > On Thu, 4 Nov 2021 at 12:24, Joshua McKenzie 
> wrote:
> >
> > > >
> > > > we noticed CI going from a
> > > > steady 3-ish failures to many and it's getting fixed. So we're moving
> > in
> > > > the right direction imo.
> > > >
> > > An observation about this: there's tooling and technology widely in use
> > to
> > > help prevent ever getting into this state (to Benedict's point:
> blocking
> > > merge on CI failure, or nightly tests and reverting regression commits,
> > > etc). I think there's significant time and energy savings for us in
> using
> > > automation to be proactive about the quality of our test boards rather
> > than
> > > reactive.
> > >
> > > I 100% agree that it's heartening to see that the quality of the
> codebase
> > > is improving as is the discipline / attentiveness of our collective
> > > culture. That said, I believe we still have a pretty fragile system
> when
> > it
> > > comes to test failure accumulation.
> > >
> > > On Thu, Nov 4, 2021 at 2:46 AM Berenguer Blasi <
> berenguerbl...@gmail.com
> > >
> > > wrote:
> > >
> > > > I agree with David. CI has been pretty reliable besides the random
> > > > jenkins going down or timeout. The same 3 or 4 tests were the only
> > flaky
> > > > ones in jenkins and Circle was very green. I bisected a couple
> failures
> > > > to legit code errors, David is fixing some more, others have as well,
> > etc
> > > >
> > > > It is good news imo as we're just getting to learn our CI post 4.0 is
> > > > reliable and we need to start treating it as so and paying attention
> to
> > > > it's reports. Not perfect but reliable enough it would have prevented
> > > > those bugs getting merged.
> > > >
> > > > In fact we're having this conversation bc we noticed CI going from a
> > > > steady 3-ish failures to many and it's getting fixed. So we're moving
> > in
> > > > the right direction imo.
> > > >
> > > > On 3/11/21 19:25, David Capwell wrote:
> > > > >> It’s hard to gate commit on a clean CI run when there’s flaky
> tests
> > > > > I agree, this is also why so much effort was done in 4.0 release to
> > > > remove as much as possible.  Just over 1 month ago we were not really
> > > > having a flaky test issue (outside of the sporadic timeout issues; my
> > > > circle ci runs were green constantly), and now the “flaky tests” I
> see
> > > are
> > > > all actual bugs (been root causing 2 out of the 3 I reported) and
> some
> > > (not
> > > > all) of the flakyness was triggered by recent changes in the past
> > month.
> > > > >