On 02/25/11 20:09, f...@libero.it wrote: Hi!
> I had in mind something like this: I'll leave algorithmic stuff in the hands of those who knows that area best... I tend to agree with Gerds comment that it is difficult to find a perfect formula, so it might indeed be a good thing to make the weights adjustable. (I'm not sure that this needs a GUI for the time being. I think variables would be enough.) I think, however, that by playing around here one could come up with a decent preset. Therefore, I like the approach and I think it is indeed feasable. To the comment that it adds a complex sorting criteria I think this is right, indeed. But I'd also say: who knows the formulae used by google and even if, who cares? There is a famous quote of the old chancelor of Germany that applies here and which would translate like "only the result is important". (Well the original German formulation was slightly more colourful and actually a bit ambiguously.) Anyway. To me it's similar to some complex optimisation problem solved by a linear approximation formula. And we do not really need the absolute optimal solution here as it "just" sorts the list and you can still review the result. (Ie. we don't spit out only one game which is the best and nothing more.) It's a bit like google in this regard and sometimes you do have to go to page 25. [...] > For the 'interest' value i need some chess help. > > The actual formula is: > > interest = avg_elo + bonus_comment - penalty_old - penalty_earlydraw - > penalty_rapid - penalty_blind > > and > bonus_comment = 400 > penalty_old = 20*each_year_old (max 400) > penalty_earlydraw = -250 if < 30 moves; -100 if < 40 moves > penalty_rapid = -250 > penalty_blind = 400 avg_elo, which we have now is probably a good starting point. I wonder if difference in ELO should be taken into account. Given the definition of the ELO system it is even a measure for the probable outcome of a game. So if a game, due to elo difference, should end to 1-0 in 90% of the cases but ends in 0-1 it should probably be of more interest then a similar game that indeed ended 1-0, not? For "bonus commented" I think, it is actually referring to several parts. In Scid speak we have "comments" referring to free text commentary, "annotations" referring to the presence of NAGs and "variations" referring to their very presence. Scid also knows their number for a given game. E.g. I've many games that are "commented" cause they have exactly one comment, and if you check out the comment it should have been written to a Source header line instead. :S So these are actually worthless comments for analysis and a reason why I feel that just sorting commented game on top as most relevant isn't the whole story. So I had the idea, if one shound't consider the number of comments/variations/annotations present. I'm also not entirely convinced that all three types of comments are "worth the same". One could say that a variation is 3x the worth of an annotation which itself is 2x the worth of text or something the like just to put up some random numbers. I'm not really sure that just adding up the bare numbers of variations, comments and NAGs is a perfect way to weight this but usually I'd guess that a "heavily annotated" game is more important than one that just contains one "source comment" and a single ! on move 25. So this part is then probably something like bonus_comments * ( A * number_variations + B * number_nags + C * number_comments) Probably starting out with A=B=C=1 and then start playing around. Besides that, one should probably include "flags". Those can be added by a user to mark a game for some characteristics. Think of "white opening" or "black opening" or "pawn structure". As usually you'd flag only games that are representative in some way it might be a good idea to respect this criterion. Eg. when you analyse your opening a game flagged "white opening" might be more relevant as it is probably a typical game for the setup. Or I use a custom flag "Lit" which points me to some printed material that is probably worth checking out. For the last part of your formula, I wonder. How do you know if a game was played in a rapid or blind tournament? (BTW: I think the sign of either penality is wrong, isn't it?) Would it then be sensible to give a bonus for correspondence games then? One could think of a "Mode" header field (which is the way I use it but I don't have such tags for Rapid or Blindfold). I fear however it would be a bit "expensive" to evaluate this on a large list. Which brings me to a further question: the best games list is limited to a certain amount of games. (Gerd, here this is very different from the games list. For best games you usually don't have to sort 5*10^6 games but merely 500.) How do you do the preselect? > It is a huge improvement versus the old simple avg_elo, but i'm not > really satisfied. I do believe that, indeed. cu Alexander ------------------------------------------------------------------------------ Free Software Download: Index, Search & Analyze Logs and other IT data in Real-Time with Splunk. Collect, index and harness all the fast moving IT data generated by your applications, servers and devices whether physical, virtual or in the cloud. Deliver compliance at lower cost and gain new business insights. http://p.sf.net/sfu/splunk-dev2dev _______________________________________________ Scid-users mailing list Scid-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scid-users