It’s broken again. This is the third breakage I am reporting in the past couple of years.
Is there some sort of alert or CI test we could setup to catch or prevent this going forward? > On Dec 21, 2025, at 1:35 PM, Gengliang Wang <[email protected]> wrote: > > Hi all, > > > The crawler issue has been identified and fixed. > > The root cause was that by the crawler fails when the latest result contains > less than 90% of the previous result. Increasing the > `maxLostRecordsPercentage` threshold resolves the issue. > > https://www.algolia.com/doc/tools/crawler/apis/configuration/safety-checks > > > On Wed, Dec 17, 2025 at 10:03 PM Xiao Li <[email protected] > <mailto:[email protected]>> wrote: >> Thanks for reporting it! Will take a look >> >> Nicholas Chammas <[email protected] >> <mailto:[email protected]>> 于2025年12月5日周五 04:19写道: >>> Bueller? >>> >>> Is anyone on this list able to fix the crawler? >>> >>> >>>> On Dec 1, 2025, at 12:19 PM, Nicholas Chammas <[email protected] >>>> <mailto:[email protected]>> wrote: >>>> >>>> Hello, >>>> >>>> This seems to be happening again. >>>> >>>> Perhaps we should add a new test (but where, I wonder?) to ensure that >>>> Algolia search doesn’t break without us knowing. >>>> >>>> Nick >>>> >>>> >>>>> On Dec 11, 2023, at 5:02 AM, Gengliang Wang <[email protected] >>>>> <mailto:[email protected]>> wrote: >>>>> >>>>> Hi Nick, >>>>> >>>>> Thank you for reporting the issue with our web crawler. >>>>> >>>>> I've found that the issue was due to a change(specifically, pull request >>>>> #40269 <https://github.com/apache/spark/pull/40269>) in the website's >>>>> HTML structure, where the JavaScript selector ".container-wrapper" is now >>>>> ".container". I've updated the crawler accordingly, and it's working >>>>> properly now. >>>>> >>>>> Gengliang >>>>> >>>>> On Sun, Dec 10, 2023 at 8:15 AM Nicholas Chammas >>>>> <[email protected] <mailto:[email protected]>> wrote: >>>>>> Pinging Gengliang and Xiao about this, per these docs >>>>>> <https://github.com/apache/spark-website/blob/0ceaaaf528ec1d0201e1eab1288f37cce607268b/release-process.md#update-the-configuration-of-algolia-crawler>. >>>>>> >>>>>> It looks like to fix this problem you need access to the Algolia Crawler >>>>>> Admin Console. >>>>>> >>>>>> >>>>>>> On Dec 5, 2023, at 11:28 AM, Nicholas Chammas >>>>>>> <[email protected] <mailto:[email protected]>> wrote: >>>>>>> >>>>>>> Should I report this instead on Jira? Apologies if the dev list is not >>>>>>> the right place. >>>>>>> >>>>>>> Search on the website appears to be broken. For example, here is a >>>>>>> search for “analyze”: >>>>>>> >>>>>>> <Image 12-5-23 at 11.26 AM.jpeg> >>>>>>> >>>>>>> And here is the same search using DDG >>>>>>> <https://duckduckgo.com/?q=site:https://spark.apache.org/docs/latest/+analyze&t=osx&ia=web>. >>>>>>> >>>>>>> Nick >>>>>>> >>>>>> >>>> >>>
