[openstreetmap/openstreetmap-website] Add lang attribute to Nominatim results from CJK languages (PR #6079)

2025-06-01 Thread Nick Doiron via rails-dev
### Description

In Unicode, some CJK characters such as 化 have one codepoint but will appear 
differently in Simplified Chinese (), Traditional Chinese (), and Japanese (). On the frontend, we can display names 
correctly using an HTML attribute such as `lang="zh-Hant"` This issue 
is known as [Han unification](https://en.wikipedia.org/wiki/Han_unification) 
and it has appeared over the years [in many software 
projects](https://issues.chromium.org/issues/41315603)

This was addressed in iD https://github.com/openstreetmap/iD/pull/10716 and is 
a long-running discussion in openstreetmap-carto.

If we add `&addressdetails=1` to Nominatim queries, we can read the 
country_code and display the best label for mainland China, Hong Kong, Japan, 
or Taiwan.

### How has this been tested?

This can be tricky to test, as **many names do not change**, and the 
display_name will be in your browser's language if it's available

- Search results will have a lang tag, such as `lang="zh-HK"` or 
`lang="ja"`, regardless of language of display_name
- In Taiwan, a search result for 彰化 should show a horizontal bar in 

- In mainland China, a search result for 玉门 expressway should return a split 
frame   in the second 
character, not the 门 with a +

### Notes

As an alternative to adding `&addressdetails=1` to queries, we could 
possibly parse display_name (varies with the browser language) or use geo 
bounding boxes?

This matching of languages is imperfect, but without a language tag we are 
always using your browser's default for any CJK character. It would be 
difficult to make exceptions (for example, Japanese restaurants in these 
countries) without a name regex, a language tag, or access to other tags

This does not affect Chinese names in other countries

I have heard that there are some variations for Cyrillic in 
[Bulgaria](https://en.wikipedia.org/wiki/Bulgarian_alphabet) and 
[Serbia](https://en.wikipedia.org/wiki/Serbian_Cyrillic_alphabet#Differences_from_other_Cyrillic_alphabets),
 particularly in italics? But I don't know how universal it is. [Additional 
info](https://commons.wikimedia.org/wiki/File:Special_Cyrillics_BGDPT.svg)
You can view, comment on, or merge this pull request online at:

  https://github.com/openstreetmap/openstreetmap-website/pull/6079

-- Commit Summary --

  * add lang attribute to results from CJK countries, plus Cyrillic
  * remove Bulgaria/Serbia for now
  * fix HK subregion

-- File Changes --

M app/controllers/concerns/nominatim_methods.rb (2)
M app/controllers/searches/nominatim_queries_controller.rb (7)
M app/helpers/geocoder_helper.rb (2)

-- Patch Links --

https://github.com/openstreetmap/openstreetmap-website/pull/6079.patch
https://github.com/openstreetmap/openstreetmap-website/pull/6079.diff

-- 
Reply to this email directly or view it on GitHub:
https://github.com/openstreetmap/openstreetmap-website/pull/6079
You are receiving this because you are subscribed to this thread.

Message ID: 
___
rails-dev mailing list
rails-dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/rails-dev


Re: [openstreetmap/openstreetmap-website] Add lang attribute to Nominatim results from CJK languages (PR #6079)

2025-06-01 Thread Nick Doiron via rails-dev
@mapmeld pushed 2 commits.

b905080241450f94fb00a87e4479c22695f21ee3  nominatim stubs
84726db2e95ef0899021e3edbbb2dacc4a7ecb25  reverse geocoder

-- 
View it on GitHub:
https://github.com/openstreetmap/openstreetmap-website/pull/6079/files/72a1995031f0ff809d977fcc933ffe7893da5530..84726db2e95ef0899021e3edbbb2dacc4a7ecb25
You are receiving this because you are subscribed to this thread.

Message ID: 

___
rails-dev mailing list
rails-dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/rails-dev