Thanks for fixing this. I can confirm it’s working from my side.

Looks like we need some kind of alert on Algolia's crawl status 
<https://www.algolia.com/doc/tools/crawler/troubleshooting/crawl-status>. If 
there’s a way a non-committer can help with this, let me know.


> On Apr 3, 2026, at 1:39 AM, Gengliang Wang <[email protected]> wrote:
> 
> Hi Nicholas,
> 
> The crawler configuration was not updated after the Spark 4.1.1 release, as 
> documented in the release process 
> <https://spark.apache.org/release-process.html>. I've fixed it.
> 
> A unit test isn't really feasible here since the doc search is powered by 
> Algolia, but we could set up an Algolia monitoring alert to catch this 
> proactively. I'll look into it when I have the bandwidth.
> 
> Gengliang
> 
> On Wed, Apr 1, 2026 at 3:09 PM Nicholas Chammas <[email protected] 
> <mailto:[email protected]>> wrote:
>> It’s broken again. This is the third breakage I am reporting in the past 
>> couple of years.
>> 
>> Is there some sort of alert or CI test we could setup to catch or prevent 
>> this going forward?
>> 
>> 
>>> On Dec 21, 2025, at 1:35 PM, Gengliang Wang <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> Hi all,
>>> 
>>> 
>>> The crawler issue has been identified and fixed.
>>> 
>>> The root cause was that  by the crawler fails when the latest result 
>>> contains less than 90% of the previous result. Increasing the 
>>> `maxLostRecordsPercentage` threshold resolves the issue.
>>> 
>>> https://www.algolia.com/doc/tools/crawler/apis/configuration/safety-checks
>>> 
>>> 
>>> On Wed, Dec 17, 2025 at 10:03 PM Xiao Li <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>>> Thanks for reporting it! Will take a look
>>>> 
>>>> Nicholas Chammas <[email protected] 
>>>> <mailto:[email protected]>> 于2025年12月5日周五 04:19写道:
>>>>> Bueller?
>>>>> 
>>>>> Is anyone on this list able to fix the crawler?
>>>>> 
>>>>> 
>>>>>> On Dec 1, 2025, at 12:19 PM, Nicholas Chammas 
>>>>>> <[email protected] <mailto:[email protected]>> wrote:
>>>>>> 
>>>>>> Hello,
>>>>>> 
>>>>>> This seems to be happening again.
>>>>>> 
>>>>>> Perhaps we should add a new test (but where, I wonder?) to ensure that 
>>>>>> Algolia search doesn’t break without us knowing.
>>>>>> 
>>>>>> Nick
>>>>>> 
>>>>>> 
>>>>>>> On Dec 11, 2023, at 5:02 AM, Gengliang Wang <[email protected] 
>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>> 
>>>>>>> Hi Nick,
>>>>>>> 
>>>>>>> Thank you for reporting the issue with our web crawler.
>>>>>>> 
>>>>>>> I've found that the issue was due to a change(specifically, pull 
>>>>>>> request #40269 <https://github.com/apache/spark/pull/40269>) in the 
>>>>>>> website's HTML structure, where the JavaScript selector 
>>>>>>> ".container-wrapper" is now ".container". I've updated the crawler 
>>>>>>> accordingly, and it's working properly now.
>>>>>>> 
>>>>>>> Gengliang
>>>>>>> 
>>>>>>> On Sun, Dec 10, 2023 at 8:15 AM Nicholas Chammas 
>>>>>>> <[email protected] <mailto:[email protected]>> wrote:
>>>>>>>> Pinging Gengliang and Xiao about this, per these docs 
>>>>>>>> <https://github.com/apache/spark-website/blob/0ceaaaf528ec1d0201e1eab1288f37cce607268b/release-process.md#update-the-configuration-of-algolia-crawler>.
>>>>>>>> 
>>>>>>>> It looks like to fix this problem you need access to the Algolia 
>>>>>>>> Crawler Admin Console.
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On Dec 5, 2023, at 11:28 AM, Nicholas Chammas 
>>>>>>>>> <[email protected] <mailto:[email protected]>> 
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> Should I report this instead on Jira? Apologies if the dev list is 
>>>>>>>>> not the right place.
>>>>>>>>> 
>>>>>>>>> Search on the website appears to be broken. For example, here is a 
>>>>>>>>> search for “analyze”:
>>>>>>>>> 
>>>>>>>>> <Image 12-5-23 at 11.26 AM.jpeg>
>>>>>>>>> 
>>>>>>>>> And here is the same search using DDG 
>>>>>>>>> <https://duckduckgo.com/?q=site:https://spark.apache.org/docs/latest/+analyze&t=osx&ia=web>.
>>>>>>>>> 
>>>>>>>>> Nick
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>> 
>> 

Reply via email to