Hello, > When trying to maintain a huge database (something akin to >> a master game repository), I encounter an annoying >> problem. Doublechecking names easily lead to a nightmare >> : I have to choose some name of a player without knowing >> exactly who to pick. Szabo, for instance, seems to be a >> popular name for a chessplayer. >> > > It is. It is just a popular name anyway. VIAF lists 827 > persons of that name. And VIAF is only built for persons who > published some book or the like...
I did not know about VIAF. Great stuff ! What I mean is that there is no sense to spellcheck a big file if there is all these decisions to make by hand. If we have a tournament file with the names of the players, that kind of correction would really be a spellcheck, not a second-guessing game for too much names. What you suggest would be to build up a database of all > tournaments and their participants. Exactly, like Franz does for players' names. It should be consistent with his work, because the main idea would be to complement them. > Eh? Why is a collection of games no real database? Do you > mean in the IBM sense? (DB2 is a real database, Access not.) > > BTW: I'd surely collect those metadata in a real database. Yes, my point would be to let Scid handle this metadata for his maintainance spellchecks. Well, working within a central database would be in fact the > only way to go. But you suggest to build up an authority > file (or registry if you prefer that naming) for all chess > games of the world. Even if you just start out with > important tournaments of the past and work through them, > this is a huge effort. Exactly. In fact, once you have this tool at hand, working on a central database does not seem that crazy anymore... But I don't see any solution to the lack of information, >> except trying to build a tournament file that would be >> orthogonal to a file of chessplayers' names. >> > > Acutally, the real solution would be something like VIAF. That would be the final step. For now, a text file cross-indexing tournaments and players would simplify a lot spellcheck. > There you build up an authority file based upon the > publications of a person. For your project you'd build up > the authority file containing tournaments and persons based > upon the games of chess they played. Its very similar (just > to avoid the word "identical ;) > > Still, you miss a point here: you'll have to individualize > the players and the games. You'd have to end up at an > authority record for each tournament connected to the > authority records of each participating player and all that > would be connected to the actual games being the backend of > the whole thing. I don't miss that point. It's just that I like bottom-up approaches. I liked correcting names by hand, when I was young. But 5 millions games ? This is nonsense ! > It is a good idea, but it is a huge effort. Have a look at > VIAF's record for Garry Kasparov e.g. > > http://viaf.org/24621810 > > You might want to unfold the MARC-21 section, check out the > 400's and 700's, just to get an idea about the spellings for > his name. And just to get a vague idea about the size of the > project you might just call up http://viaf.org to see who is > participating in this game. And it's only about the > names part... (Plus using databases of books we already have > in huge databases built by autopsy using trained personel. > Creating such a catalogue record for one book from scratch > takes about 15min on the average.) This is just too good to be true ! Debate is over. I know that I'm right and I've quite some > forces behind me. ;) LoC, BNB, DNB, BNF (just to name the > really big ones) can't be wrong. The debate about being right is not the important one : the important one is how to implement the idea in Scid's maintainance window... ;-) > > Still I see the difficulty to get it done in a spare time > project. I see a possibility to gradually work in linkages > to VIAF. > > Franz: getting VIAF record IDs into the spelling file is the > way to go to get linkage up into Wikipedia and whatever. > VIAF-ID is persistend, of course. (Try to search for > 118721097 at de.wikipedia.org. :) At the moment they use > PND-IDs as german wikipedia mainly set this up with DNB but > thats about to change and they go for VIAF-IDs (there's a > 1:1 mapping PND <-> VIAF) plus if I got them right this > should start to spread out over the whole wikipedia. > Wikipedia wants to reuse as much data from authority files > as possible. Guess why ;) The national libraries are > currently investigating if they could use data from > Wikipedia to enrich their authority files. There was just > recently a talk about this issue at the Bibliothekartag in > Erfurt. Great stuff, once again ! Keep on the good ideas coming ! Bye, B
------------------------------------------------------------------------------
_______________________________________________ Scid-users mailing list Scid-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scid-users