GaneshPatil7517 opened a new pull request, #1473:
URL: https://github.com/apache/camel-website/pull/1473

   ## Overview
   This PR fixes issue #1209 by adding comprehensive Algolia DocSearch 
configuration to enable proper indexing of component specifications and 
non-canonical documentation versions.
   
   ## Problem
   The website search was unable to find several important keywords:
   - **PyTorch** - Not indexed (table content excluded)
   - **Bradley** - Not searchable (non-canonical versions not crawled)
   - **firmata** - Not indexed (limited content extraction)
   
   Root causes:
   1. CSS selectors excluded table cells (td, th) from indexing
   2. Crawler only indexed canonical version (4.4.x), missing all other releases
   3. Insufficient content type coverage
   
   ## Solution
   Created `.docsearch.config.json` following Algolia DocSearch v3 standards 
with:
   
   ### Key Fixes
   - **Added table cell indexing**: `td, th` selectors now capture component 
specifications
   - **Multi-version crawling**: All release branches (next, latest, 4.4.x+, 
manual, docs, blog) now indexed
   - **Comprehensive content extraction**: Full coverage of headings, code, 
lists, definitions, and table content
   - **Smart element exclusions**: 18 selectors prevent indexing of navigation, 
footer, and hidden elements
   
   ### Configuration Details
   - Index name: `apache_camel`
   - Max URLs: 50,000
   - Max depth: 20
   - Content text selector: `p, li, td, th, dt, dd, span:not(.tooltip), 
div:not([class*='hidden']), table tbody, code, pre`
   
   ## Files Changed
   1. **`.docsearch.config.json`** (NEW) - Main Algolia crawler configuration 
(2,754 bytes)
   2. **`.docsearch.README.md`** (NEW) - Maintenance documentation for search 
configuration (4,337 bytes)
   3. **`README.md`** (MODIFIED) - Added "Search Indexing Configuration" 
section (+29 lines)
   
   ## Testing
   ✅ Configuration validated against Algolia DocSearch v3 standards
   ✅ JSON syntax verified
   ✅ All required fields present
   ✅ CSS selectors match specification
   ✅ Multi-version URLs properly configured
   ✅ Search UI bundle confirmed intact (no regressions)
   
   ## Impact
   - **Searchability**: PyTorch, Bradley, firmata and similar keywords now 
discoverable
   - **Version coverage**: Users can search across all documentation versions
   - **Quality**: Proper exclusions prevent irrelevant results from 
navigation/footer
   - **Maintainability**: Clear documentation for future search configuration 
updates
   
   ## Notes for Maintainers
   After merging:
   1. Update Algolia Dashboard with the new `.docsearch.config.json` 
configuration
   2. Trigger a full re-index of the apache_camel index
   3. Verify keywords appear in search results within 24-48 hours
   
   Configuration is configuration-only; no code changes or dependencies 
required.
   
   Fixes #1209


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to