22 July 2001 HOW TO USE THE QUADRATIC-NORMAL (QN) PROGRAM The file QN.ZIP includes the following files: 1. READTHIS.TXT 2. QUADMD00.EXE -- Quadratic-Normal program 3. QUADMD00.FOR -- FORTRAN Source Code 4. QUADSTRT.DAT -- Input file that controls the program 5. SEN85KH.ORD -- 85th Senate Roll Call Data 6. QUAD0021.DAT -- Output file that reports overall statistics 7. QUAD0023.DAT -- Output file that reports various diagnostics 8. QUAD0028.DAT -- Output file that contains the starting values for the normal vectors estimated from the Optimal Classification (OC) Program (2 or more dimensions only) 9. QUAD0038.DAT -- Output file that contains the starting values for the legislators estimated from the OC Program (2 or more dimensions only) 10. QUAD0048.DAT -- Output file that contains the estimated legislator coordinates 11. QUAD0058.DAT -- Output file that contains the estimated normal vectors Unzip the files into the same directory. The program reads QUADSTRT.DAT and SEN85KH.ORD and writes the files QUAD0021.DAT, QUAD0023.DAT, QUAD0028.DAT, QUAD0038.DAT, QUAD0048.DAT and QUAD0058.DAT to disk. The program expects the input files to be in the same directory as the executable. ***************************************** QUADMD00.EXE ***************************************** The Quadratic-Normal program is written in standard FORTRAN 77 and was compiled by the LAHEY FORTRAN 95 Compiler (release 5.60d). The compiler is current as of 2000 and is optimized for the Pentium III. LAHEY FORTRAN includes the object code for the IMSL library (the program includes a number of IMSL subroutines). The Linker (included with the LAHEY FORTRAN) is from Phar Lap. QUADMD00.EXE will run on any standard WINDOWS machine using a Pentium CPU. ***************************************** QUADMD00.FOR ***************************************** The Quadratic-Normal program is written in standard FORTRAN 77 and it uses several IMSL FORTRAN subroutines. It should compile on any standard UNIX machine with no difficulty if the compiler can link in the IMSL libraries. ***************************************** QUADSTRT.DAT ***************************************** QUADSTRT.DAT controls the program (in the "old days" of card readers this file was known as a "control card file"). It tells the program the name of the roll call data file, how many roll calls there are, how many dimensions to estimate, and so on. Below is QUADSTRT.DAT set up to run the 85th Senate in two dimensions: SEN85KH.ORD QUADRATIC-NORMAL MULTIDIMENSIONAL UNFOLDING 2 307 10 25 7 (3X,7A1,2X,11A1,2X,7A1,4X,3600I1) (I5,1X,25A1,2I5,50F8.3) (I5,1X,25A1,2I5,F8.3,F12.3,50F8.3) The file has *six* lines all of which must be present: A. Line 1: The name of the roll call voting file. This can be a path statement; e.g., F:\DTAORD\SEN85KH.ORD. B. Line 2: A title -- it can be anything C. Line 3: This line consists of five 5-digit integers. ALWAYS BE CERTAIN THAT YOU RETAIN THIS FORMAT! Technically, the line above is: 0000200307000100002500007 The first number (00002) is the number of dimensions (25 max); the second number (00307) is the number of roll calls; the third number (00010) is the number of iterations; the fourth number (00025) is the number of characters the program is to read off the header of the roll call file; the fifth number (00007) is the number of the legislator that should be on the negative (left) side of the first dimension. This is simply included for convenience. It allows the user to determine what is "left" or "right" on the first dimension. Line 3 is telling the program: estimate a two dimensional spatial map for the 85th Senate; the roll call file has 307 roll calls; iterate 10 times; pick up 25 characters from the header of each record from SEN85KH.ORD; and Senator number 7, Fulbright of Arkansas, should have a negative coordinate. D. Line 4: Format statement for reading SEN85KH.ORD. This is a FORTRAN format statement that controls what the program reads from SEN85KH.ORD. SEN85KH.ORD looks like this: 859990199 0USA 20000EISENHOWER 9991969991999916169......etc. 85 876441 0ALABAMA 10001SPARKMAN 6161141661111161166...... 85 441841 0ALABAMA 10001HILL 6161161611111163166...... 85 365861 0ARIZONA 20001GOLDWATER 1111116611666617775...... 85 422761 0ARIZONA 10001HAYDEN 6161161661666161166...... 85 615142 0ARKANSA 10001MCCLELLAN 6161111116666161766...... 85 338842 0ARKANSA 10001FULBRIGHT 6161377774166161366...... etc etc 85 46468 0WYOMING 20001BARRETT 1161166611666617111...... 1. The first three digits -- 085 -- are the Congress number, 2. the next five digits are the ID number for the member (for President Eisenhower this number is 99901), 3. the next two digits are the ICPSR state code (99 is used only for the President), 4. the next two digits are the Congressional District number (this is 00 for the Senate), 5. the next seven letters are the state name, 6. the next 4 digits are the political party code (0100 = Democrat; 0200 = Republican, for a complete list go to http://voteview.uh.edu/party3.htm), 7. the next digit is the ICPSR election code (see any ICPSR codebook), 8. the next digit is the occupancy code (see any ICPSR codebook), 9. the next eleven letters are the member's name, 10. The roll calls begin at column 37 (1=Yea, 2=Paired Yea, 3=Announced Yea, 4=Announced Nay, 5=Paired Nay, 6=Nay, 0=not in Congress, 7,8,9 are missing data). All roll calls for Congresses 1 - 101 were gathered by the ICPSR. Congresses 102-106 were created by Keith Poole and follow the ICPSR format. The format statement instructs the program to skip the three digit Congress number ("3X"), read the id number and state number as letters ("7A1"), skip the Congressional District number ("2X"), read the seven letter State name and four digit party code as letters ("11A1"), skip the election and occupancy codes ("2X"), read seven letters of the name ("7A1"), skip the last four letters of the name ("4X"), and read a maximum of 3600 roll call votes as one digit integers (3600I1). Note that the number of "A1"s in the format statement adds to 25 (see line three): 7+11+7 E. Line 5: Format statement for writing the starting values for the legislator coordinates to QUAD0038.DAT. This is a FORTRAN format statement that controls what the program writes to disk. The output looks like this: 1 9990199USA 200EISENHO 6 132 0.955 0.021 0.540 -0.531 2 876441ALABAMA 100SPARKMA 29 255 0.886 0.015 -0.437 0.105 3 441841ALABAMA 100HILL 31 253 0.877 0.008 -0.442 0.113 4 365861ARIZONA 200GOLDWAT 24 232 0.897 0.006 0.563 0.329 5 422761ARIZONA 100HAYDEN 21 247 0.915 0.004 -0.289 -0.076 etc etc 99 46468WYOMING 200BARRETT 49 252 0.806 0.007 0.426 -0.055 The "I5" is a 5-digit integer that simply counts from 1 to the total number of scaled legislators (99 Senators for the 85th). The "25A1" is the 25 characters read from SEN85KH.ORD (5-digit Id number, 2-digit ICPSR state code, 7 letter state name, 4-digit party code, 7 letters of member name). The "2I5" writes two 5-digit numbers -- the first is the total classification errors, the second is the total number of votes that the member voted on. The "50F8.3" writes in order: 1. The proportion correctly classified (e.g., for Eisenhower, (132-6)/132=0.955, or 95.5% correct classification) 2. The maximum volume of the polytope that the legislator is within. Geometrically, the legislators are only defined up to a polytope. This number is the maximum distance from the legislator point to a boundary using 100 randomly chosen directions from the legislator's point within the polytope. 3,4,5,etc. The coordinates for the legislator. F. Line 6: Format statement for writing the estimated legislator coordinates to QUAD0048.DAT. This is a FORTRAN format statement that controls what the program writes to disk. The output looks like this: 1 9990199USA 200EISENHO 124 8 0.939 -20.287 0.858 0.490 -0.702 0.610 0.059 0.133 2 876441ALABAMA 100SPARKMA 208 47 0.816 -84.554 0.718 -0.473 0.141 0.926 0.042 0.030 3 441841ALABAMA 100HILL 205 48 0.810 -84.631 0.716 -0.473 0.143 0.943 0.043 0.030 4 365861ARIZONA 200GOLDWAT 190 42 0.819 -77.803 0.715 0.718 0.329 1.786 0.106 0.087 5 422761ARIZONA 100HAYDEN 221 26 0.895 -63.058 0.775 -0.218 0.007 0.595 0.034 0.029 etc etc 99 46468WYOMING 200BARRETT 190 62 0.754 -114.906 0.634 0.400 0.100 1.667 0.058 0.044 The "I5" is a 5-digit integer that simply counts from 1 to the total number of scaled legislators (99 Senators for the 85th). The "25A1" is the 25 characters read from SEN85KH.ORD (5-digit Id number, 2-digit ICPSR state code, 7 letter state name, 4-digit party code, 7 letters of member name). The "2I5" writes two 5-digit numbers -- the first is the total correct classifications, the second is the total number of errors. The "F8.3,F12.3,50F8.3" writes in order: 1. The proportion correctly classified (e.g., for Eisenhower, (124)/(124+8)=0.939, or 93.9% correct classification) 2. The Log-Likelihood. 3. Geometric Mean Probability = EXP(Log-L/total number of choices) 4. First dimension coordinate 5. Second dimension coordinate 6. Legislator Sigma (defined up to a multiplicative constant for all choices) 7. Standard Error First Dimension coordinate (conditional standard error from inverting the information matrix for the legislators with all other parameters taken as given). 8. Standard Error Second Dimension coordinate ***************************************** QUAD0021.DAT ***************************************** QUAD0021.DAT contains the overall estimation results. The first seven lines consist of a time stamp, and an echo of the QUADSTRT.DAT file: 22 JULY 2001 14.47.30.81. SEN85KH.ORD QUADRATIC-NORMAL MULTIDIMENSIONAL UNFOLDING 2 307 10 25 7 (3X,7A1,2X,11A1,2X,7A1,4X,3600I1) (I5,1X,25A1,2I5,50F8.3) (I5,1X,25A1,2I5,F8.3,F12.3,50F8.3) *************Two or More Dimensions************* After a row of asterisks, the scaling results are shown. In the first phase the OC program is used to generate starting estimates for the roll call normal vectors and the legislator coordinates: RC CLASSIFICATION ERROR 1 2 2757 23229 0.11869 0.88131 LEG CLASSIFICATION ERROR 1 2 2652 23229 0.11417 0.88583 RC CLASSIFICATION ERROR 2 2 2597 23229 0.11180 0.88820 LEG CLASSIFICATION ERROR 2 2 2564 23229 0.11038 0.88962 RC CLASSIFICATION ERROR 3 2 2549 23229 0.10973 0.89027 LEG CLASSIFICATION ERROR 3 2 2538 23229 0.10926 0.89074 RC CLASSIFICATION ERROR 4 2 2534 23229 0.10909 0.89091 LEG CLASSIFICATION ERROR 4 2 2529 23229 0.10887 0.89113 RC CLASSIFICATION ERROR 5 2 2525 23229 0.10870 0.89130 LEG CLASSIFICATION ERROR 5 2 2518 23229 0.10840 0.89160 Each iteration consists of a pass through the Cutting Plane Procedure ("RC") and the Legislator Procedure ("LEG"). The columns are: 1. Iteration number; 2. number of dimensions; 3. number of classification errors; 4. total number of choices; 5. proportion error (e.g., for iteration 1, 2757/23229=0.11869; 6. proportion correctly classified (e.g., for iteration 1, 1-0.11869=0.88131); The program then estimates the QN model: LOG-LIKELIHOOD 2D/SIGMA PHASE 23229 -7683.36670 0.71837 LOG-LIKELIHOOD SIGMAi PHASE 23229 -7261.86328 0.73153 LOG-LIKELIHOOD 2D/SIGMA PHASE 23229 -7202.21191 0.73341 LOG-LIKELIHOOD SIGMAi PHASE 23229 -7193.51367 0.73368 LOG-LIKELIHOOD & CLASS LEG PHASE 3051 23229 -7036.46094 0.73866 LOG-LIKELIHOOD & CLASS RC PHASE 3145 23229 -6826.54883 0.74537 LOG-L & CLASS N-VECTOR PHASE 3142 23229 -6673.54492 0.75029 LOG-LIKELIHOOD 2D/SIGMA PHASE 23229 -6618.81689 0.75206 LOG-LIKELIHOOD SIGMAi PHASE 23229 -6585.82959 0.75313 LOG-LIKELIHOOD 2D/SIGMA PHASE 23229 -6583.26318 0.75321 LOG-LIKELIHOOD SIGMAi PHASE 23229 -6582.96875 0.75322 LOG-LIKELIHOOD & CLASS LEG PHASE 3122 23229 -6560.29395 0.75396 LOG-LIKELIHOOD & CLASS RC PHASE 3124 23229 -6541.32227 0.75457 LOG-L & CLASS N-VECTOR PHASE 3108 23229 -6530.89160 0.75491 LOG-LIKELIHOOD 2D/SIGMA PHASE 23229 -6521.57666 0.75522 LOG-LIKELIHOOD SIGMAi PHASE 23229 -6514.29980 0.75545 LOG-LIKELIHOOD 2D/SIGMA PHASE 23229 -6513.64746 0.75547 LOG-LIKELIHOOD SIGMAi PHASE 23229 -6513.62207 0.75547 LOG-LIKELIHOOD & CLASS LEG PHASE 3112 23229 -6506.25586 0.75571 LOG-LIKELIHOOD & CLASS RC PHASE 3112 23229 -6500.45850 0.75590 LOG-L & CLASS N-VECTOR PHASE 3122 23229 -6497.93213 0.75599 Each iteration consists two passes through: A. Estimate 2*Gamma's (the roll call directional distance/variance parameter); B. Estimate Sigma's (the legislator variances); then a pass through C. Estimate Legislator Coordinates; D. Estimate Roll Call Midpoints; E. Estimate Normal Vectors. The columns are: 1. number of classification errors (legislator, midpoints, normal vectors, only); 2. total number of choices; 3. Log-Likelihood; 4. Geometric Mean Probability = EXP(Log-L/total number of choices) Below the last iteration are the lines: CLASSIFICATION CHECK AT CONVERGENCE 20107 3122 23229 7675 0.8656 0.5932 LEGS: Rs BTWN STARTS & ESTIMATES 1 99 0.9687 0.9717 LEGS: Rs BTWN STARTS & ESTIMATES 2 99 0.9698 0.9722 The line 20107 3122 23229 7675 0.8656 0.5932 shows the total correct classification, the number of classification errors, the total number of choices, the total number of choices on the minority side, the proportion correctly classfied (20107/23229 = .8656), and the aggregate proportional reduction in error (APRE = (7675-3122)/7675= .5932). The lines LEGS: Rs BTWN STARTS & ESTIMATES 1 99 0.9687 0.9717 LEGS: Rs BTWN STARTS & ESTIMATES 2 99 0.9698 0.9722 show the Spearman and Pearson Correlations for the first and second dimensions, respectively, between the starting estimates and the final estimates of the legislator coordinates. For each dimension the n is 99. The Spearman Correlations for the first and second dimensions are .9687 and .9698, respectively, and the the corresponding Pearson Correlations are .9717 and .9722, respectively. *************One Dimension************* In one dimension, the output for the starting coordinates is slightly different: 1 ROLL CALLS 1 3886 23229 0.16729 0.83271 0.49368 2 LEGISLATORS 1 3702 23229 0.15937 0.84063 0.51765 0.97951 3 ROLL CALLS 1 3608 23229 0.15532 0.84468 0.52990 4 LEGISLATORS 1 3588 23229 0.15446 0.84554 0.53251 0.99599 5 ROLL CALLS 1 3562 23229 0.15334 0.84666 0.53590 6 LEGISLATORS 1 3546 23229 0.15265 0.84735 0.53798 0.99824 7 ROLL CALLS 1 3536 23229 0.15222 0.84778 0.53928 8 LEGISLATORS 1 3536 23229 0.15222 0.84778 0.53928 0.99666 9 ROLL CALLS 1 3528 23229 0.15188 0.84812 0.54033 10 LEGISLATORS 1 3528 23229 0.15188 0.84812 0.54033 0.99969 Ignoring the column "ROLL CALLS"/"LEGISLATORS", the columns are: 1. Iteration number; 2. number of dimensions; 3. number of classification errors; 4. total number of choices; 5. proportion error (e.g., for iteration 1, 3886/23229=0.16729; 6. proportion correctly classified (e.g., for iteration 1, 1-0.16729=0.83271); 6. Aggregate Proportional Reduction in Error (APRE) is the total number of minority choices minus the number of classification errors divided by the total number of minority choices: APRE = (MIN - ERROR)/MIN. For the 85th Senate, there were 7675 choices in the minority so for iteration 1, (7675-3886)/7675=0.49368; 7. For "LEGISLATORS", the Pearson Correlation between the current legislator coordinates and the legislator coordinates from the previous iteration. The QN output for one dimension is the same as that for two or more dimensions. ***************************************** QUAD0023.DAT ***************************************** QUAD23.DAT contains miscellaneous diagnostic output most of which is not important for the average user of the Quadratic-Normal program. It is intended for debugging purposes should the program misbehave for some reason. There are some sections of the file that contain important information for the user. In particular, near the top of the file the number of roll calls and legislators included in the scaling is reported: ROLL-CALLS READ= 307 NUMBER REJECTED= 52 NUMBER ACCEPTED= 255 CUTOFF= 0.025 LEGISLATORS READ= 102 NUMBER REJECTED= 3 NUMBER ACCEPTED= 99 CUTOFF= 25 The cutoff for inclusion of a roll call is 2.5 percent or better in the minority. For example, in the Senate, this cutoff includes all votes 97-3 to 50-50. The cutoff for inclusion of a legislator is 25 votes. Just below this information the distribution of the roll call margins is shown: DISTRIBUTION OF SCALABLE ROLL CALLS 1 50 - 55 51 0.200 2 56 - 60 35 0.137 3 61 - 65 38 0.149 4 66 - 70 36 0.141 5 71 - 75 32 0.125 6 76 - 80 27 0.106 7 81 - 85 14 0.055 8 86 - 90 9 0.035 9 91 - 95 7 0.027 10 96 - 97.5 6 0.024 A bit further down the file the first ten eigenvalues of the double-centered transformed agreement score matrix are shown along with the eigenvalues of the Heckman-Snyder covariance matrix (see the Appendix to CONGRESS: A POLITICAL-ECONOMIC HISTORY OF ROLL CALL VOTING, pp. 243-244, for how the agreement score matrix is computed and transformed). The first column after the integer counter is the eigenvalue, the second is the percent of the variance "explained", and the third is the total variance "explained". The last three columns are the same information for the Heckman-Snyder covariance matrix. 1 5.7680 35.9679 35.9679 6.8380 32.2437 32.2437 2 2.5681 16.0140 51.9819 3.2356 15.2571 47.5008 3 0.7310 4.5582 56.5402 1.0460 4.9322 52.4330 4 0.4821 3.0064 59.5466 0.7165 3.3786 55.8115 5 0.3498 2.1812 61.7278 0.6209 2.9280 58.7395 6 0.3288 2.0505 63.7782 0.5180 2.4427 61.1821 7 0.2635 1.6430 65.4213 0.4899 2.3098 63.4920 8 0.2281 1.4223 66.8435 0.4024 1.8976 65.3896 9 0.1844 1.1496 67.9932 0.3424 1.6146 67.0042 10 0.1775 1.1071 69.1003 0.3055 1.4407 68.4449 ***************************************** QUAD0028.DAT !!!!!TWO OR MORE DIMENSIONS ONLY!!!!! ***************************************** **********ROLL CALL STARTING COORDINATES********** In two dimensions the roll call output looks like this: 1 1 45 48 4 46 53 0.911 -0.092 -0.934 0.358 2 2 53 40 5 40 59 0.875 -0.192 0.346 0.938 3 3 34 54 7 61 38 0.794 0.145 0.580 -0.814 4 4 68 18 13 8 91 0.278 -0.592 -0.996 0.095 5 6 31 63 9 40 59 0.710 -0.082 0.353 -0.936 6 7 49 43 7 51 48 0.837 0.002 0.994 -0.112 7 8 28 64 16 17 82 0.429 -0.460 0.920 -0.391 8 9 30 60 18 24 75 0.400 -0.274 -0.018 -1.000 9 10 74 21 7 77 22 0.667 0.377 -0.666 0.746 10 11 27 62 14 66 33 0.481 0.246 -0.597 -0.802 etc etc 253 305 37 36 19 39 60 0.472 -0.214 0.994 0.111 254 306 35 38 15 52 47 0.571 0.001 -0.563 0.826 255 307 37 35 15 60 39 0.571 0.214 -0.994 -0.108 The columns in order are: 1. a counter from 1 to the number of scaled roll calls; 2. the actual roll call number in SEN85KH.ORD; 3. the number of Yeas; 4. the number of Nays; 5. the number of classification errors; 6. the number of legislators "below" the cutting plane -- this includes all legislators eligible to vote; 7. the number of legislators "above" the cutting plane -- this includes all legislators eligible to vote -- "below"/"above" are simply opposite sides of the cutting plane; 8. PRE = (Minority - Error)/Minority; 9. Projected roll call midpoint -- where the cutting plane cuts through the normal vector; 10,11,etc. The Normal Vector. ***************************************** QUAD0038.DAT !!!!!TWO OR MORE DIMENSIONS ONLY!!!!! ***************************************** **********LEGISLATOR OUTPUT********** This file is shown above under the discussion of QUADSTRT.DAT. ***************************************** QUAD0048.DAT ***************************************** **********LEGISLATOR OUTPUT********** This file is shown above under the discussion of QUADSTRT.DAT. ***************************************** QUAD0058.DAT ***************************************** **********ROLL CALL OUTPUT********** 1 1 46 49 89 4 0.957 -9.062 0.907 -0.030 0.959 -0.284 6.720 2 2 55 40 84 9 0.903 -24.889 0.765 -0.130 0.435 0.900 5.400 3 3 35 55 82 6 0.932 -16.359 0.830 0.280 0.782 -0.624 4.680 4 4 70 18 71 15 0.826 -33.050 0.681 -0.620 -0.986 0.166 1.560 5 6 32 64 83 11 0.883 -21.351 0.797 -0.180 0.291 -0.957 4.960 6 7 50 44 85 7 0.924 -16.230 0.838 -0.040 -0.998 0.055 5.000 7 8 29 65 73 19 0.793 -34.238 0.689 0.380 -0.931 0.365 2.600 8 9 31 61 70 20 0.778 -43.808 0.615 -0.300 -0.107 -0.994 1.400 9 10 75 22 82 13 0.863 -22.948 0.785 0.510 -0.696 0.718 3.560 10 11 27 64 71 18 0.798 -31.219 0.704 0.380 -0.677 -0.736 3.040 etc etc 254 306 35 38 58 15 0.795 -40.721 0.572 -0.130 -0.803 0.596 1.000 255 307 37 35 56 16 0.778 -41.031 0.566 -0.030 1.000 0.023 1.120 The columns in order are: 1. a counter from 1 to the number of scaled roll calls; 2. the actual roll call number in SEN85KH.ORD; 3. the number of Yeas; 4. the number of Nays; 5. the number correctly classified; 6. the number of classification errors; 7. the proportion correctly classified; 8. Log-Likelihood; 9. Geometric Mean Probability = EXP(Log-L/total number of choices) 10. Projected roll call midpoint -- where the cutting plane cuts through the normal vector; 11. Normal Vector first dimension coordinate; 12. Normal Vector second dimension coordinate; 13. 2*Gamma (this is defined up to a positive multiplicative constant common to all Gammas and Sigmas) If you need help just send me e-mail at: KPoole@uh.edu