Re: [OHM] OHM - is the data model broken?

Hannes Röst Thu, 16 Jul 2020 06:13:19 -0700

Dear Jeff
 
Great, I am happy to have started a discussion on this topic and there seems to 
be a need for formal guidelines or even technical change. I would be interested 
to get together and discuss this in a call. I agree that the data model is not 
"broken" but it would benefit from clarification or extension. At least from my 
experience, I was often confused since there was no "right way" to do things 
and multiple ways that each felt not quite correct.


Btw: I just found a case where geometry and meta-data clearly need to be 
separated and that is very hard (not impossible) to model with the current data 
model:
- https://openhistoricalmap.org/node/2088657540 is the statue of Edward Colston 
which was tossed in the harbor in Bristol on June 6 [1]
- https://openhistoricalmap.org/node/2088657539 was the new location of the 
statue for four days [2]
- Now the statue has been replaced (temporarily) with a another one on the same 
geometry [3]
 
this is a case where we would need to re-use the same geometry (node 
2088657540) for the 1895-2020 statue as well as for the new (2020-07-15 ..) 
statue since they are at the same place. 

1. https://www.bbc.com/news/uk-52954305
2. https://www.bbc.com/news/uk-england-bristol-53004748
3. https://www.bbc.co.uk/news/uk-england-bristol-53414463

Gesendet: Dienstag, 14. Juli 2020 um 16:58 Uhr
Von: "Jeff Meyer" <[email protected]>
An: "Hannes Röst" <[email protected]>
Cc: "Open Historical Map" <[email protected]>, "Thomas Schwotzer" 
<[email protected]>
Betreff: Re: [OHM] OHM - is the data model broken?

Hi Hannes (cc: Thomas Schwotzer, in case he'd like to add any observations & 
insight to this discussion!):
 
First, let me say that I don't believe the OHM data model is "broken" but that 
I do believe it may require quite a few workarounds and may not be up to the 
challenge of 100% of dataset needs. Second, Thomas has put his money where his 
mouth is by building a separate system - much respect to him for doing what 
he's done.
 
Third - many thanks to Hannes for taking the time to research and think through 
this topic. Hopefully, others will join in!
 
And, last major point - perhaps we should have a tech discussion / get together 
to go through these issues in real time? Maybe have some guest speakers / 
presentations? I'd be glad to help coordinate, especially if others present.
 
To the background & topics of this thread. : )
 
I think there was some resistance to adopting a new data model and associated 
stack when Thomas wrote to us years ago and we were reluctant to veer away from 
OSM too much, given our own traction issues and other dev needs. I also think 
we should be open to modification and extension where feasible and where needs 
dictate. 
 
That said, I think there's a *lot* of power in relations that could be solved 
with some of the examples that Hannes has done a great job of outlining. 
Between relations, using external sources of stable identifiers & pointing at 
them (rather than worrying about inherent OSM instabilities), and possibly 
embedding some concept of preceded_by=* and followed_by=* and Hannes' 
transition=* idea. Namespace modifications still scare me a bit, for a variety 
of reasons, including tooling and embedding information in labels, etc.
 
Bottom line, this is a benevolent mapocracy, and it feels like there are a 
bunch of us noodling on the same pasta, so let's get together to talk about it. 
Who's in?
 
- Jeff
 
p.s. I highly recommend looking at 
https://www.whosonfirst.org[https://www.whosonfirst.org/docs/contributing/] for 
some additional inspiration on this topic. The guys who put this together have 
pretty deep data structure and tagging backgrounds & are very OSM savvy, so 
there's a lot of relevant synthesis embedded in their implementation.
 
 
 
  

On Mon, Jul 13, 2020 at 10:20 AM Hannes Röst 
<[email protected][mailto:[email protected]]> wrote:

Dear all
 
I was reading this thread 
https://lists.openstreetmap.org/pipermail/historic/2019-February/001186.html[https://lists.openstreetmap.org/pipermail/historic/2019-February/001186.html]
 and the arguments made by Thomas which make a lot of sense. First I would like 
to thank Thomas for his paper and putting thought into this and I hope he reads 
this and has some comments on my arguments (I am aware that other people have 
thought about these problems for much longer than me, so that is why I tried to 
go back and read the old mails on the list). I agree that his data model makes 
a lot of sense and is sometimes necessary to accurately describe historic and 
geographic objects. I was reading his paper and looked at how OHDM models 
geographic data, using a separation between geometries and meta-data. He 
correctly identifies that it will not always be possible to have a 1:1 relation 
between the geometry (node/ways) and meta-data and they therefore should be 
separated. I also read 
https://wiki.openstreetmap.org/wiki/Open_Historical_Map/Tags[https://wiki.openstreetmap.org/wiki/Open_Historical_Map/Tags]
 (and section #Representation_of_change_in_historical_road_networks) which 
contains some of the ideas already. I realized that many of the issues that I 
described in my previous email are due to this assumption, for example lets 
take the case of the 
https://en.wikipedia.org/wiki/London_Bridge_(disambiguation)[https://en.wikipedia.org/wiki/London_Bridge_(disambiguation)]
 which describes at least 5 different entities:
 
- a roman bridge
- one or more medieval bridges
- old London bridge Q56739974 (1209-1831)
- new London bridge Q56739652 (in London 1831 to 1968, then dismantled and 
brought to the US to be rebuild stone by stone as "London Bridge (Lake Havasu 
City)", see Q1868889)
- current London bridge Q130206 (1973-)
 
so the issue is that there is a "concept" of a London bridge, namely a crossing 
at this particular location, then there are specific instances of geometries of 
wood, stone, metal to form a physical bridge and then there is the physical 
continuity of one of these specific bridges being built at one place and then 
being dismantled and rebuilt somewhere else using the same physical stones. It 
is clear that we cannot model this relation with tags alone, but it is my 
belief that if we work on this example than we may have a pretty good model 
that can model pretty complicated spatio-temporal relationships of physical 
objects and geometries. A second example is a building that over time had 
different functions and may have been expanded at some point in time with 
further geometry added or removed due to construction / demolition (for example 
a church of monastry).
 
Basically what we need is a n:m relationship between nodes and spatio-temporal 
concepts. A single building contains multiple nodes but a single node may be 
part of multiple buildings over its lifetime which have different attributes 
(tags) associated with it. On top of that, the data model by Thomas allows the 
n:m relation itself to have start/end time, something that may not be very easy 
to do right now in OHM (see below). Currently to me it seems that there are two 
ways to approach this problem:
 
i) using date namespaces (see 
https://wiki.openstreetmap.org/wiki/Proposed_features/Date_namespace[https://wiki.openstreetmap.org/wiki/Proposed_features/Date_namespace])
ii) using relations for true n:m mapping
 
For (ii), we can use relations which are *already* available in OSM/OHM. 
Currently there are a limited number of relation types: 
https://wiki.openstreetmap.org/wiki/Types_of_relation[https://wiki.openstreetmap.org/wiki/Types_of_relation]
 and I think we would could to expand the list for OHM and introduce new types 
in order to implement the n:m relationship between geometry and concepts.
 
For relations to work properly, I suggest that we create a new relation type of 
a spatio-temporal concept / continuity "type=spatio_temporal" which would 
relate to an entity that is conceptually linking the individual geometries on 
the ground over time and space.
 
Lets look at some examples:
 
Example 1: A church gets converted into a night club
Example 2: A church gets expanded with an additional wing, later burns down
Example 3: A bridge gets moved to a new geographic location
Example 4: A bridge gets replaced by a newer bridge without re-using any 
existing building material
 
We can solve (1) using either (i) or (ii), for example we could use 
"building:1700-1950=church" and "amenity:1950-=nightclub" and this is the most 
economical solution: all other tags may be shared and there is only a single 
way for the whole buildings, so it takes the least amount of storage and is 
very intuitive for editors. We can also use (ii) and create two relations, one 
for the nightclub and one for the church, allowing clean use of "start_date" 
and "end_date" in each the relation to make the history explicit and confirm 
with tagging guidelines. This is also pratical since the nodes and the way 
would *not* be duplicated and only stored once in the database. The temporal 
history is clear since it is explicit that the same building stood there since 
1700 and has been used for 2 purposes.
 
We can solve (2) using (i) by creating a way for the original church, tagging 
it appropriately (start_date/end_date) and then for the extension create a new 
way that re-uses some of the old nodes and adds new nodes, tagging it 
appropriately (start_date/end_date). The temporal history should be mostly 
clear since it is clear that some of the nodes are re-used and therefore part 
of the building was used to create a new building. It is also economical since 
nodes are re-used and not duplicated in the database. Alternatively we could 
use two relations to achieve the same goal as above, basically leading to 2 
ways for old/new and 2 relations for old/new. Both approaches lead to some 
duplication on the tag level (eg the name appears twice and is stored twice). 
But it does become a bit muddy here, since its not the *same* way that is in 
either relation but two different ways with some shared nodes. So some 
information is lost with this approach (namely about spatio-temporal continuity 
of an entity), so adding the two ways to a relation of "type=spatio_temporal" 
would make sense here to be explicit instead of implicit and to avoid 
duplication (otherwise we have 2 ways with the same "name=" tag, leading to 
issue with search and updates). This would also make LOD easier since its more 
likely that external resources like Wikidata / Wikipedia would have information 
on the spatio-temporal concept and not on the geometry which has changed over 
time. Otherwise, where would we add the wikidata tag, to way1 or way2? or both?
 
We could solve (3) as well using a relation of "type=spatio_temporal" which 
contains the name of the object and way1 (former location) as a member and way2 
(new location) with the move being implicit as way1 has end_date which is 
before start_date of way2.
 
We can currently model (4) by just creating two bridges that are at the same 
location, have their own wikidata tags etc, but that loses some information 
since they do not capture information about their relation to each other. Also 
here, a relation of "type=spatio_temporal" would help and we can add both 
bridges to the relation. We can add as many bridges as we like (eg all 5 London 
bridges) and some of these bridges may actually use the exact same way (same 
nodes) if they are built at the same location and some may not if they were 
built a few meters up/downstream. In this case tags that are building-specific 
would stay on each individual geometry while some tags such as the "name=" tag 
would be in the relation, assuming all the bridges had the same name.
 
Second, we could also use relations to indicate how the transition happened 
between two geometries, e.g. if a building burns down and only part of it 
remains, we could use a relation "type=temporal_transition" or 
"type=historical_event" "type=event" which would have "event_type=fire" and 
"start_date=XX" and "end_date=XX" for the date of the *event*, allowing us to 
model multi-day events. Together with a tagging of way1 as "before" and way2 as 
"after" this could clearly indicate what happened to particular building and we 
could model events such as 
https://en.wikipedia.org/wiki/Great_Fire_of_London[https://en.wikipedia.org/wiki/Great_Fire_of_London]
 . The interesting part here is that the relation *itself* could then link to 
the Wikidata event Q164679 while Wikidata could link to OHM for people to get 
an idea of the extent of the fire and affected buildings. (look at the current 
Wikidata page, it only has "coordinate location" pointing to London, more 
information is clearly necessary). To model a physical move, we could use 
"type=event" and "event=physical_move", using the "role" field with way1 as 
"before" and way2 as "after" (similar to how we use inner/outer for 
multipolygons). For new buildings/replacements/additions we could use 
"event=construction" in the relation and indicate how long construction took 
etc and link to the corresponding wikidata articule, such as Q811095 for 
"Construction of the World Trade Center" (the original) and Q5164470 for 
"Construction of One World Trade Center" (the rebuilding after 9/11) .This 
would be a *very* rich way to describe events, but isnt that what we are aiming 
for?
 
Now one issue remains: the current n:m relationship in OHM does not allow the 
*relationship* itself to have a start/end time as described in the paper by 
Thomas. However, I am not sure how big of a problem that is. I have a really 
hard time to come up with an example where this would be necessary and the 
start/end time could not be stored in either the geometry or the relation (of 
course its cleaner to store it only in one place, so its nicer from a design 
point of view, but from a functional point of view I struggle to see an 
necessity). We could approximate this in three ways (i) using parent/child 
relations where the child is the "intermediary" and only stores start/end 
times. We (ii) could modify the relation_members table (see 
https://wiki.openstreetmap.org/wiki/File:OSM_DB_Schema_2016-12-13.svg[https://wiki.openstreetmap.org/wiki/File:OSM_DB_Schema_2016-12-13.svg]
 ) adding 2 columns (start_date / end_date) which would break most editors and 
all OSM-compatibility and seems like a bad idea or (iii) hack the "role" column 
by adding date ranges: for example a bridge that was moved from way1 to way2 
could have a single relation called "London Bridge" and a member of the 
relation is way1 in London with the role field equal to "1831-1968" while way2 
is the bridge in the US with the role field equal to "1968-".  Now I also think 
this is a bad idea, but it is possible :-) Given these options, if this case 
ever comes up then probably (i) is the easiest way to solve this.
 
I hope I have discussed some ideas on how to solve the problems I have run into 
during my own mapping. I have tried to use some previous ideas voiced here on 
the list, such as what Thomas proposed. I have tried very hard to come up with 
proposals that would work without changing the data structure of OHM (except 
the one idea of changing the relation_members table) which I think will be 
crucial for OHM since it would allow people to continue using all 
resources/editors that OSM produces and it will be backwards compatible. I 
think this is important to not deviate too much from OSM code and database 
layout if possible for the project to succeeed.
 
Let me know what you think
 
Hannes
 _______________________________________________
Historic mailing list
[email protected][mailto:[email protected]]
https://lists.openstreetmap.org/listinfo/historic 
 --

Jeff Meyer
206-676-2347
osm: Open Historical Map 
(OHM)[http://wiki.openstreetmap.org/wiki/Open_Historical_Map] / my OSM user 
page[http://www.openstreetmap.org/user/jeffmeyer]

t: @OpenHistMap 
 
 
 
 

_______________________________________________
Historic mailing list
[email protected]
https://lists.openstreetmap.org/listinfo/historic

Re: [OHM] OHM - is the data model broken?

Reply via email to