Re: FAVOUR - Marking Scripts Statistically!

mart_b Sat, 28 Oct 2000 18:24:32 -0700
In article <[EMAIL PROTECTED]>,
  [EMAIL PROTECTED] (Donald Burrill) wrote:
> On Sat, 28 Oct 2000, Martin Boulger (impersonating haytham siala)
wrote:
>
> > I have to mark scripts based on a marking scheme thus:
>
> How much of the ensuing paragraph comprises requirements externally
> imposed upon you and not subject to your control, what conditions are
> debatable, and which ones are your own choice?
>
> > 10 questions of equal weighting, grade each answer from A to F.
>
> Is this data as presented, or is part of your task the grading of
each
> question?  If the latter, do you get to mark the questions _de_novo_,
or
> do you receive a collection of numerical values to be classified into
> the categories A to F?  Do the categories A to F comprise 6
categories
> (as the sequel seems to imply), or are they modifiable by (e.g.) +
and -
> (and ++ and -- ?) suffixes?
>
> Frankly, I always prefer to work with the original raw scores (from
which
> I presume the initial letter-grades for the ten questions were
derived,
> and in the process coarsened).  But my responses below are based on
an
> assumption that the only information available to you are the letter
> grades for each question.
>
> > So each question, obviously, counts for 10%.
>
> In one sense, yes:  "equal weighting", in the usual sense, would
imply
> this if you understand the process to entail a weighted sum of the 10
> question-grades.  It is, however, possible (if perhaps not usual) to
> devise grading schemata that give equal importance to each question
but
> do not entail _adding_ the separate grades.
>
> > Given a random set of grades how do I work out a final grade?
>
> Interesting question!  Why, I wonder, would anyone be given a
_random_
> set of grades?  Usually one would have all the papers for a
particular
> course (or section of a course), or all the marks previously assigned
to
> those enrolled, or some systematically chosen subset thereof, I
should
> think.
>
> > I considered allocating for each question 10% for an "A", 8% for
an"B"
> > through to 0% for an F.  I can add up the grades for each question
to
> > get a grade out of 100%.
>
> Yes, you can do this.  I would not use the symbol "%", though.  You
can
> by this means produce a total score whose maximum is 100 points, but
it
> is not all clear that this score would properly represent a per
centum of
> anything at all, let alone anything interesting.
>
> > However, only 70% or above = an "A", 60 - 69 a "B", 50 - 59 a "C",
> > 40 - 49 a "D" and 30 - 39 = "E" below this is an F.
>
> Why?  Is this scheme imposed on you, or is this just your arbitrary
> choice of cutting points for recoding the total score?
>
> > So my idea doesn't work (and wouldn't be very accurate anyway).
>
> In what sense(s) does it not work?  (Apart from some obvious
> dysfunctions, like 10 B's on the original questions magically turning
> into an A for the overall grade, which some folks might not consider
> reasonable.)  As for accuracy, what kind of accuracy do you want to
> attain, and how do you wish to define "accuracy"?
>
> > Any clues to a statistical technique appropriate for this problem?
>
> Not without summat more in the way of detail, including what I
suspect to
> be a whole bunch of hidden assumptions.  (What, for instance, do the
> original 10 questions represent?  They might be different domains on
a
> single final examination;  or a chronological series of midterm tests
in
> some course of study;  or some peculiar combination of these;  or
> something entirely else.  Depending, I'd have different proposals for
> how to handle the aggregation process.)
>
> You might begin by considering how you want the results to turn out.
> One way of operationalizing "equal weight" is to report each case's
> marks in lexicographical order, e.g.   AAAAB BBBCD  for any case that
> has four As, four Bs, one C and one D, ignoring which particular
> question earned each of these marks.  Now consider the table of
possible
> marks (or, if you prefer, restrict your attention to the actual
> distribution of marks you have to deal with;  but better, I think, to
> consider the population first, then the sample).  I report some of
them, in
> groups of five for readability:
>
> AAAAA AAAAA
> AAAAA AAAAB
> AAAAA AAAAC
> AAAAA AAAAD
> AAAAA AAAAE
> AAAAA AAAAF
> AAAAA AAABB
>  . . .
>
> EFFFF FFFFF
> FFFFF FFFFF
>
> For which of these patterns ought the output to be "A"?  ... "B"?
Etc.?
>
> This is perhaps easier to contemplate in the abstract if we consider
the
> task for four marks to be aggregated (instead of ten).  The 126
possible
> patterns are these:
>
> AAAA  AACF    ABCE    ACDF    BBBB    BBFF    BDEE    CCEE    DDDE
> AAAB  AADD    ABCF    ACEE    BBBC    BCCC    BDEF    CCEF    DDDF
> AAAC  AADE    ABDD    ACEF    BBBD    BCCD    BDFF    CCFF    DDEE
> AAAD  AADF    ABDE    ACFF    BBBE    BCCE    BEEE    CDDD    DDEF
> AAAE  AAEE    ABDF    ADDD    BBBF    BCCF    BEEF    CDDE    DDFF
> AAAF  AAEF    ABEE    ADDE    BBCC    BCDD    BEFF    CDDF    DEEE
> AABB  AAFF    ABEF    ADDF    BBCD    BCDE    BFFF    CDEE    DEEF
> AABC  ABBB    ABFF    ADEE    BBCE    BCDF    CCCC    CDEF    DEFF
> AABD  ABBC    ACCC    ADEF    BBCF    BCEE    CCCD    CDFF    DFFF
> AABE  ABBD    ACCD    ADFF    BBDD    BCEF    CCCE    CEEE    EEEE
> AABF  ABBE    ACCE    AEEE    BBDE    BCFF    CCCF    CEEF    EEEF
> AACC  ABBF    ACCF    AEEF    BBDF    BDDD    CCDD    CEFF    EEFF
> AACD  ABCC    ACDD    AEFF    BBEE    BDDE    CCDE    CFFF    EFFF
> AACE  ABCD    ACDE    AFFF    BBEF    BDDF    CCDF    DDDD    FFFF
>
> Of these, there are some that would generally be agreed to
represent "A"
> work overall (AAAA and AAAB surely, probably AAAC and AABB, maybe
AAAD?)
> (and similarly for "B", "C", etc.);  and some that are less easy to
> categorize into a set of six ranks (e.g., AAAF in the above list).
>  Once one has wrapped one's mind around the underlying task, it may
be
> more straightforward (if no less tedious!) to devise an algorithm
that
> approximates that categorization acceptably.  Simple arithmetic is
> unlikely to be subtle enough, though.
>       (And of course, even for simple arithmetic it is unclear
whether
> the grading scheme should be treated as an equal-interval scale.  I
would
> argue that it shouldn't.)
>
> I don't suppose this has made what you thought you were about any
> easier;  sorry about that!  On the other hand, I didn't read you as
> asking (at least, not explicitly) for "easier".
>                                               -- DFB.
>  ---------------------------------------------------------------------
-
>  Donald F. Burrill
[EMAIL PROTECTED]
>  348 Hyde Hall, Plymouth State College,
[EMAIL PROTECTED]
>  MSC #29, Plymouth, NH 03264                             (603) 535-
2597
>  Department of Mathematics, Boston University
[EMAIL PROTECTED]
>  111 Cummington Street, room 261, Boston, MA 02215       (617) 353-
5288
>  184 Nashua Road, Bedford, NH 03110                      (603) 471-
7128
>
> =================================================================
> Instructions for joining and leaving this list and remarks about
> the problem of INAPPROPRIATE MESSAGES are available at
>                   http://jse.stat.ncsu.edu/
> =================================================================
>


Sent via Deja.com http://www.deja.com/
Before you buy.


=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
                  http://jse.stat.ncsu.edu/
=================================================================
Re: FAVOUR - Marking Scripts Statistically!

Reply via email to