How The
Optimal Classification Scaling (OC) Program Works In One Dimension:
The "Edith" Algorithm
7 May 2001
YYYYYYYYYYYYYYYYYNNNNNNNNNNNNNNNNNN * NNNYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY * YYYYYYYYNNNNNNNNNNNNNNNNNNNNNNNNNNN * Etc.
The asterisks indicate the "cutting point" for the roll call; that is, the midpoint of the Yea and Nay outcomes. A legislator located exactly on the midpoint would be indifferent between voting Yea or voting Nay.
Suppose roll call voting is in accord with this model. Then the scaling problem consists of taking a roll call matrix and "unscrambling" it – that is, finding a rank ordering of legislators and the correct "polarity" (Yea to the left of the cutpoint or Yea to the right of the cutpoint) for each roll call such that patterns like those above are produced for each roll call. This is what Edith is designed to do. If the data is perfect, then the solution is easy and the correct rank ordering is always found (see Proof that if Voting is Perfect in One Dimension, then the First Eigenvector Extracted from the Double-Centered Transformed Agreement Score Matrix has the Same Rank Ordering as the True Data).
However, when there is error in the data, things get a bit complicated. When there is error the aim is to find a rank ordering that maximizes the correct classification of the observed Yeas and Nays. This is easier said than done because if there are n legislators, then there are n!/2 possible rank orders to check to find the best one. For example, for 50 legislators this number is about 1.52 * 1064 – a formidable number even with modern supercomputers. Consequently, Edith embodies a "sensible" search procedure (what the Operations Researchers call a "Heuristic") to find a solution. Namely, a good starting rank order of the legislators is generated using the technique discussed for the perfect voting problem ( Proof that if Voting is Perfect in One Dimension...) and the corresponding cutting points are found. These cutting points are used to get a new rank ordering of the legislators, and so on. At each step the correct classification increases until a rank order is found that produces cutting points that in return reproduce the rank order.
Finding the optimal rank ordering requires three steps:
To see how this process works, let the current ordering of the legislators
from left to right be represented by coordinates from x1 to
xp such that:
-1 £ x1 £ x2 £ x3 £ ... £ xp £ +1.
(Note that "£ " allows for tied ranks – that is, identical voting patterns by two or more legislators.) Now, suppose on the jth roll call the legislators voted as indicated by the pattern of "y"s and "n"s shown above the dimension line in the Figure below. There are p+1 possible regions that the cutting point could be in:(-1, x1), (x1 , x2), ... , (xp , +1)
and for each region there are exactly 2 possible perfect voting patterns for an overall total of 2(p+1) possible perfect voting patterns. However, region (xp , +1) is redundant since it produces the same perfect patterns as the region (-1 , x1) so it may be discarded leaving 2p unique perfect voting patterns to consider.Actual Voting Pattern
Y Y Y Y Y Y . . . . Y Y * N Y * N Y * N N . . .N N
___________________________________________________________
-1.0 x1 x2 x3 x4 . . . . . . . 0.0 . . . . . . . xp-1 xp +1.0
Perfect Voting Patterns
(-1 , x1) produces nnnnnnnnn.....nn or yyyyyyyyy.....yy
(x1 , x2) produces ynnnnnnnn.....nn or nyyyyyyyy.....yy
(x2 , x3) produces yynnnnnnn.....nn or nnyyyyyyy.....yy
(x3 , x4) produces yyynnnnnn.....nn or nnnyyyyyy.....yy
(x4 , x5) produces yyyynnnnn.....nn or nnnnyyyyy.....yy
(x5 , x6) produces yyyyynnnn.....nn or nnnnnyyyy.....yy
etc.
(xp-1 , xp) produces yyyyyyyyy.....yn or nnnnnnnnn.....ny
(xp , +1) produces yyyyyyyyy.....yy or nnnnnnnnn.....nn
Since there are only 2p perfect patterns, it is a simple matter to compare each perfect pattern with the actual pattern of votes. This can be done very efficiently by first assuming that the cutting point is in the region (-1 , x1) and calculating the corresponding number of correct classifications. Next assume that the cutting point is in the region (x1 , x2). Only one calculation has to be made to get the correct classifications for this cutting point since the only change is that the cutting point has been moved from the left of x2 to the right of x2. If there is no missing data, either the correct classification increases by 1 or decreases by 1 when the cutting point is moved from the left of x2 to the right of x2. Similar reasoning holds for the remaining points. For each possible cutting point the correct classification corresponding to the two possible perfect patterns can be calculated. The estimated cutting point is set equal to the midpoint of the region for which correct classification is a maximum. This requires only 2p calculations to find the maximum classification cutting point. For the example shown above, placing the cutting point at the position of any of the three asterisks would produce only two classification errors for a correct classification of p-2.
Let the ordering of the q roll call midpoints estimated above be represented by
coordinates from z1 to zq such that:
-1 £ z1 £ z2 £ z3 £ ... £ zq £ +1.
(Note that "£ " allows for tied ranks – that is, identical cutting points for two or more roll calls – a common occurrence empirically.) In order to estimate new legislator coordinates, the "polarity" of each roll call must be taken into account in each legislator’s voting pattern. For each roll call, the cutting point represents the dividing point between the Yeas and the Nays. If a legislator is to the left of the cutting point and votes correctly on the roll call, then denote this with an "L". If a legislator is to the right of the cutting point and votes correctly on the roll call, denote this with a "C". If a legislator is incorrectly classified on the roll call, then denote this with a "C" if the legislator is to the left of the cutting point and an "L" if the legislator is to the right of the cutting point. Note that if the legislator votes perfectly, then vis a vis the cutting points the legislator’s voting pattern will look like:LLLLLLLLLCCCCCCCCCCCCCCCCCCCCCCCCC
or
LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLCCCCCC
The Figure below shows that by taking the polarity of the roll calls into account the legislator problem is equivalent to the roll call cutting point problem. Hence, the algorithm described above can be used to find the optimal legislator location. There are only 2q possible perfect patterns and only 2q calculations are required to find the maximum classification location for each legislator.
Actual Voting Pattern
L L L L L L . . . . L L * C L * C L * C C . . .C C
___________________________________________________________
-1.0 z1 z2 z3 z4 . . . . . . . 0.0 . . . . . . . zq-1 zq +1.0
Perfect Voting Patterns
(-1 , z1) produces CCCCCCCCC.....CC or LLLLLLLLL.....LL
(z1 , z2) produces LCCCCCCCC.....CC or CLLLLLLLL.....LL
(z2 , z3) produces LLCCCCCCC.....CC or CCLLLLLLL.....LL
(z3 , z4) produces LLLCCCCCC.....CC or CCCLLLLLL.....LL
(z4 , z5) produces LLLLCCCCC.....CC or CCCCLLLLL.....LL
(z5 , z6) produces LLLLLCCCC.....CC or CCCCCLLLL.....LL
etc.
(zq-1 , zq) produces LLLLLLLLL.....LC or CCCCCCCCC.....CL
(zq , +1) produces LLLLLLLLL.....LL or CCCCCCCCC.....CC
This simple algorithm converges very rapidly to a solution in which the rank ordering of the legislators and the rank ordering of the roll call midpoints reproduce each other. This is a very strong form of conditional global maximum. Technically, if there are multiple sets of parameters (for example, as in this case, parameters corresponding to rows of a matrix and parameters corresponding to columns of a matrix), and every set of parameters is at a global maximum conditioned on the other sets being held fixed, and these sets reproduce each other, then they are at a conditional global maximum. Note that the overall global maximum, by definition, is a conditional global maximum.
Conditional Global maxima are quite rare because of the nature of the constraints. Indeed, in the metric similarities problem, conditional global minima are very rare and their number declines as the number of parameters increases (Poole, 1990).
A simple illustration of how Edith works can be done using playing cards. Below is the true configuration with 10 legislators represented by the Ace through the Ten of diamonds. The cutting points of nine roll calls are represented by the Ace through the Nine of Spades. Thus, the cutting point of the first roll call is between legislator one and two, and so on.