meonkeys commented on code in PR #56:
URL: https://github.com/apache/fineract-site/pull/56#discussion_r2963142693


##########
scripts/run_whimsy_checks.rb:
##########
@@ -0,0 +1,73 @@
+#!/usr/bin/env ruby
+# Extracted from 
https://github.com/apache/whimsy/blob/master/tools/site-scan.rb
+# Only includes site parsing logic - no ASF/LDAP/committee dependencies
+
+require 'net/http'
+require 'nokogiri'
+require 'uri'
+require 'json'
+
+# Copied directly from site-scan.rb
+def squash(text)
+  text.scrub.gsub(/[[:space:]]+/, ' ').strip
+end
+
+# Copied directly from site-scan.rb
+def get_link_text(anode)
+  bits = []
+  anode.traverse do |node|
+    if node.name == 'text'
+      bits << node.text unless node.parent.name == 'span' and
+        node.parent.attribute('class')&.value&.end_with?('sr-only')
+    end
+  end
+  squash(bits.join(' '))
+end
+
+# Copied from sitestandards.rb COMMON_CHECKS patterns
+CHECKS = {
+  'foundation'  => { url: /apache\.org/,                              text: 
nil },
+  'license'     => { url: /^https?:\/\/.*apache\.org\/licenses\/?$/,  text: 
/^license$/i },
+  'thanks'      => { url: nil, text: /^(thanks|sponsors|thanks to our 
sponsors)$/i },
+  'security'    => { url: nil, text: /^security$/i },
+  'sponsorship' => { url: nil, text: /^(sponsorship|sponsor|donate)$/i },
+  'privacy'     => { url: nil, text: /^privacy$/i },
+  'events'      => { url: /apache\.org\/events\/current-event/, text: nil },
+}

Review Comment:
   Other ideas (take them or leave them, totally up to you):
   
   1. add a unit test. That'll help we need to maintain 
`scripts/run_whimsy_checks.rb`
   2. instead of, or in addition to `scripts/run_whimsy_checks.rb`, make a new 
workflow that runs `curl https://whimsy.apache.org/public/site-scan.json | jq 
.fineract.errors` once a day and logs errors & fails if any errors are found
   3. maybe pull more/all of `lib/whimsy/sitestandards.rb` into our repo? I'm 
skeptical the abbreviated `CHECKS` in `scripts/run_whimsy_checks.rb` are working
   
   Finally, I'll just note how I'm confused that 
<https://whimsy.apache.org/site/project/fineract> shows red boxes (failed 
checks) but `curl https://whimsy.apache.org/public/site-scan.json | jq 
.fineract.errors` returns `[]`. I asked for help about that in `#whimsy` on 
<https://the-asf.slack.com>. `license` and `thanks` in that JSON output are 
both `null`... are those what's represented by the red boxes at 
<https://whimsy.apache.org/site/project/fineract>?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to