Re: [Scid-users] Sorting by variations patch

Alexander Wagner Sat, 26 Feb 2011 05:13:05 -0800

On 02/25/11 20:09, f...@libero.it wrote:

Hi!


> I had in mind something like this:

I'll leave algorithmic stuff in the hands of those who knows that area 
best...

I tend to agree with Gerds comment that it is difficult to find a 
perfect formula, so it might indeed be a good thing to make the weights 
adjustable. (I'm not sure that this needs a GUI for the time being. I 
think variables would be enough.) I think, however, that by playing 
around here one could come up with a decent preset. Therefore, I like 
the approach and I think it is indeed feasable. To the comment that it 
adds a complex sorting criteria I think this is right, indeed. But I'd 
also say: who knows the formulae used by google and even if, who cares? 
There is a famous quote of the old chancelor of Germany that applies 
here and which would translate like "only the result is important". 
(Well the original German formulation was slightly more colourful and 
actually a bit ambiguously.) Anyway.

To me it's similar to some complex optimisation problem solved by a 
linear approximation formula. And we do not really need the absolute 
optimal solution here as it "just" sorts the list and you can still 
review the result. (Ie. we don't spit out only one game which is the 
best and nothing more.) It's a bit like google in this regard and 
sometimes you do have to go to page 25.

[...]
> For the 'interest' value i need some chess help.
>
> The actual formula is:
>
> interest = avg_elo + bonus_comment - penalty_old - penalty_earlydraw -
> penalty_rapid - penalty_blind
>
> and
> bonus_comment = 400
> penalty_old = 20*each_year_old (max 400)
> penalty_earlydraw = -250 if < 30 moves; -100 if < 40 moves
> penalty_rapid = -250
> penalty_blind = 400

avg_elo, which we have now is probably a good starting point. I wonder 
if difference in ELO should be taken into account. Given the definition 
of the ELO system it is even a measure for the probable outcome of a 
game. So if a game, due to elo difference, should end to 1-0 in 90% of 
the cases but ends in 0-1 it should probably be of more interest then a 
similar game that indeed ended 1-0, not?

For "bonus commented" I think, it is actually referring to several 
parts. In Scid speak we have "comments" referring to free text 
commentary, "annotations" referring to the presence of NAGs and 
"variations" referring to their very presence. Scid also knows their 
number for a given game. E.g. I've many games that are "commented" cause 
they have exactly one comment, and if you check out the comment it 
should have been written to a Source header line instead. :S So these 
are actually worthless comments for analysis and a reason why I feel 
that just sorting commented game on top as most relevant isn't the whole 
story.

So I had the idea, if one shound't consider the number of 
comments/variations/annotations present. I'm also not entirely convinced 
that all three types of comments are "worth the same". One could say 
that a variation is 3x the worth of an annotation which itself is 2x the 
worth of text or something the like just to put up some random numbers. 
I'm not really sure that just adding up the bare numbers of variations, 
comments and NAGs is a perfect way to weight this but usually I'd guess 
that a "heavily annotated" game is more important than one that just 
contains one "source comment" and a single ! on move 25. So this part is 
then probably something like

bonus_comments *
  ( A * number_variations +  B * number_nags + C * number_comments)

Probably starting out with A=B=C=1 and then start playing around.

Besides that, one should probably include "flags". Those can be added by 
a user to mark a game for some characteristics. Think of "white opening" 
or "black opening" or "pawn structure". As usually you'd flag only games 
that are representative in some way it might be a good idea to respect 
this criterion. Eg. when you analyse your opening a game flagged "white 
opening" might be more relevant as it is probably a typical game for the 
setup. Or I use a custom flag "Lit" which points me to some printed 
material that is probably worth checking out.

For the last part of your formula, I wonder. How do you know if a game 
was played in a rapid or blind tournament? (BTW: I think the sign of 
either penality is wrong, isn't it?) Would it then be sensible to give a 
bonus for correspondence games then? One could think of a "Mode" header 
field (which is the way I use it but I don't have such tags for Rapid or 
Blindfold). I fear however it would be a bit "expensive" to evaluate 
this on a large list.

Which brings me to a further question: the best games list is limited to 
a certain amount of games. (Gerd, here this is very different from the 
games list. For best games you usually don't have to sort 5*10^6 games 
but merely 500.) How do you do the preselect?

> It is a huge improvement versus the old simple avg_elo, but i'm not
> really satisfied.

I do believe that, indeed.

cu
Alexander

------------------------------------------------------------------------------
Free Software Download: Index, Search & Analyze Logs and other IT data in 
Real-Time with Splunk. Collect, index and harness all the fast moving IT data 
generated by your applications, servers and devices whether physical, virtual
or in the cloud. Deliver compliance at lower cost and gain new business 
insights. http://p.sf.net/sfu/splunk-dev2dev 
_______________________________________________
Scid-users mailing list
Scid-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scid-users

Re: [Scid-users] Sorting by variations patch

Reply via email to