### Description
In Unicode, some CJK characters such as 化 have one codepoint but will appear
differently in Simplified Chinese (化), Traditional Chinese (化), and Japanese (化). On the frontend, we can display names
correctly using an HTML attribute such as `lang="zh-Hant"` This issue
is known as [Han unification](https://en.wikipedia.org/wiki/Han_unification)
and it has appeared over the years [in many software
projects](https://issues.chromium.org/issues/41315603)
This was addressed in iD https://github.com/openstreetmap/iD/pull/10716 and is
a long-running discussion in openstreetmap-carto.
If we add `&addressdetails=1` to Nominatim queries, we can read the
country_code and display the best label for mainland China, Hong Kong, Japan,
or Taiwan.
### How has this been tested?
This can be tricky to test, as **many names do not change**, and the
display_name will be in your browser's language if it's available
- Search results will have a lang tag, such as `lang="zh-HK"` or
`lang="ja"`, regardless of language of display_name
- In Taiwan, a search result for 彰化 should show a horizontal bar in
化
- In mainland China, a search result for 玉门 expressway should return a split
frame 门 in the second
character, not the 门 with a +
### Notes
As an alternative to adding `&addressdetails=1` to queries, we could
possibly parse display_name (varies with the browser language) or use geo
bounding boxes?
This matching of languages is imperfect, but without a language tag we are
always using your browser's default for any CJK character. It would be
difficult to make exceptions (for example, Japanese restaurants in these
countries) without a name regex, a language tag, or access to other tags
This does not affect Chinese names in other countries
I have heard that there are some variations for Cyrillic in
[Bulgaria](https://en.wikipedia.org/wiki/Bulgarian_alphabet) and
[Serbia](https://en.wikipedia.org/wiki/Serbian_Cyrillic_alphabet#Differences_from_other_Cyrillic_alphabets),
particularly in italics? But I don't know how universal it is. [Additional
info](https://commons.wikimedia.org/wiki/File:Special_Cyrillics_BGDPT.svg)
You can view, comment on, or merge this pull request online at:
https://github.com/openstreetmap/openstreetmap-website/pull/6079
-- Commit Summary --
* add lang attribute to results from CJK countries, plus Cyrillic
* remove Bulgaria/Serbia for now
* fix HK subregion
-- File Changes --
M app/controllers/concerns/nominatim_methods.rb (2)
M app/controllers/searches/nominatim_queries_controller.rb (7)
M app/helpers/geocoder_helper.rb (2)
-- Patch Links --
https://github.com/openstreetmap/openstreetmap-website/pull/6079.patch
https://github.com/openstreetmap/openstreetmap-website/pull/6079.diff
--
Reply to this email directly or view it on GitHub:
https://github.com/openstreetmap/openstreetmap-website/pull/6079
You are receiving this because you are subscribed to this thread.
Message ID:
___
rails-dev mailing list
rails-dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/rails-dev