Measurement Theory UGA, Homework Number 11

LEGACY CONTENT. If you are looking for Voteview.com, PLEASE CLICK HERE

This site is an archived version of Voteview.com archived from University of Georgia on May 23, 2017. This point-in-time capture includes all files publicly linked on Voteview.com at that time. We provide access to this content as a service to ensure that past users of Voteview.com have access to historical files. This content will remain online until at least January 1st, 2018. UCLA provides no warranty or guarantee of access to these files.

Homework 11, POLS 8505: MEASUREMENT THEORY
Due 13 April 2015

In this problem we are going to try out the metric unfolding version of SMACOF ("Scaling by Maximizing a Convex Function" or "Scaling by Majorizing a Complicated Function") which is discussed in Chapter 8 of Borg and Groenen. We are going to apply it to the 2008 NES Candidate Feeling Thermometers.

Download the R program:

nes2008_thermometers_smacof_2.r -- Program that runs Metric Multidimensional Unfolding in one and two dimensions in R using the smacofRect function.

nes2008_first_11.dta -- The 2008 National Election Study

# THIS IS THE ONE DIMENSION SECTION OF THE PROGRAM # ONE DIMENSIONAL RESULTS # zmetric1 <- result1$conf.col zmetric1 <- as.numeric(zmetric1) #dim(zmetric) <- c(nq,ndim) We do not need this because we have a one-dimensional vector xmetric1 <- result1$conf.row xmetric1 <- as.numeric(xmetric1) #dim(xmetric) <- c(np,ndim) sse11 <- 0.0 sse22 <- 0.0 for (i in 1:np) { for(j in 1:nq) { dist_i_j <- 0.0 # # Calculate distance between points # dist_i_j <- dist_i_j+ (xmetric1[i]-zmetric1[j])*(xmetric1[i]-zmetric1[j]) sse11 <- sse11 + ((TEIGHT[i,j]) - sqrt(dist_i_j))*((TEIGHT[i,j]) - sqrt(dist_i_j)) sse22 <- sse22 + ((TEIGHT[i,j]) - sqrt(dist_i_j))*((TEIGHT[i,j]) - sqrt(dist_i_j))*weightmat[i,j] } } obama.voters <- xmetric1[zz[,3]==1] zz[,3] has the who the respondent voted for mccain.voters <- xmetric1[zz[,3]==3] non.voters <- xmetric1[NOTVOTE==1 | NOTVOTE==2 | NOTVOTE==3] # Note the logic here obamaShare <- length(obama.voters)/(length(obama.voters)+length(mccain.voters)+length(non.voters)) mccainShare <- length(mccain.voters)/(length(obama.voters)+length(mccain.voters)+length(non.voters)) nonShare <- length(non.voters)/(length(obama.voters)+length(mccain.voters)+length(non.voters)) # gvdens <- density(obama.voters) Get a smoothed histogram of the voters gvdens$y <- gvdens$y*obamaShare Adjust the heigth of the density by the share of the vote # This makes the three densities add up to 1 bvdens <- density(mccain.voters) bvdens$y <- bvdens$y*mccainShare # nvdens <- density(non.voters) nvdens$y <- nvdens$y*nonShare # Simple trick to make the 3 densities fit nicely in the graph maxheight <- 1.1*(max(gvdens$y,bvdens$y,nvdens$y)) windows() plot(gvdens,xlab="",ylab="", main="", xlim=c(-1.5,1.5),ylim=c(0,maxheight),type="l",lwd=3,col="red",font=2) lines(bvdens,lwd=3,col="blue") lines(nvdens,lwd=3,col="black") # Main title mtext("2008 Thermometer Scaling\nObama and McCain Voters and Non-Voters",side=3,line=1.50,cex=1.2,font=2) # x-axis title mtext("Liberal - Conservative",side=1,line=2.75,cex=1.2) # y-axis title mtext("Density",side=2,line=2.5,cex=1.2) # This puts the proportions each candidate gets onto the graph text(-1.0,0.53,paste("Obama Voters ", 100.0*round(obamaShare, 3)),col="red",font=2) text(-1.0,0.5,paste("McCain Voters ", 100.0*round(mccainShare, 3)),col="blue",font=2) text(-1.0,0.47,paste("Non Voters ", 100.0*round(nonShare, 3)),col="black",font=2) #

Run nes2008_thermometers_smacof_2.r and turn in the two plots AFTER YOU HAVE NEATLY FORMATTED THEM!! In particular, be sure that Obama is on the left side of the plots and McCain is on the right side of the plots. Also, you should adjust the candidate names that are placed on top of the voter distributions so that they can be easily seen.

Report result$stress, sse1, sse2, result1$stress, sse11, sse22.

Report zmetric1. Relative to xmetric1, do the results make sense to you? Explain.

In this problem we are going continue trying out the metric unfolding version of SMACOF ("Scaling by Maximizing a Convex Function" or "Scaling by Majorizing a Complicated Function") which is discussed in Chapter 8 of Borg and Groenen. We are going to apply it to the 1968 NES Candidate Feeling Thermometers and compare it to MLSMU6.FOR -- a metric unfolding program I wrote in 1978.

Download the R program:

nes68_thermometers_smacof_3.r -- Program that runs Metric Multidimensional Unfolding in R using the smacofRect function and the MLSMU6.FOR

nes1968_first_11.dta

#  Double-Center Function for a rectangular squared distance matrix
doubleCenterRect2 <- function(T){
  p <- dim(T)[1]
  n <- dim(T)[2]
  -(T-matrix(apply(T,1,mean),nrow=p,ncol=n)-
     t(matrix(apply(T,2,mean),nrow=n,ncol=p))+ mean(T))/2
}
#
etc etc etc
TTSQDC <- doubleCenterRect2(TTSQ)  TTSQ is TT squared with the squared mean in the 
#                                    Missing entries
#  Perform Singular Value Decomposition
#
xsvd <- svd(TTSQDC)
#
#  The Two Lines Below Put the Singular Values in a
#    Diagonal Matrix -- The first one creates an 
#    identity matrix and the second command puts
#    the singular values on the diagonal
#
Lambda <- diag(nq)
diag(Lambda) <- xsvd$d
#
#  Compute U*LAMBDA*V' for check below
#
XREPRODUCE <- xsvd$u %*% Lambda %*% t(xsvd$v)
ssecheck <- sum(abs(TTSQDC-XREPRODUCE))     Check the Decomposition
#
# Starting Coordinates for MLSMU6 Algorithm
#
zz <- xsvd$v[,1:2]           Use the Left Singular Vectors as the Stimuli Coordinate Starts
#
xx <- rep(0,np*ndim)
dim(xx) <- c(np,ndim)        Use the Right Singular Vectors as the Stimuli Coordinate Starts
xx[,1] <- xsvd$u[,1]*sqrt(xsvd$d[1]) Multiplied by the Square Root of the Singular Values
xx[,2] <- xsvd$u[,2]*sqrt(xsvd$d[2])
###################################################
### chunk number 15: 
###################################################
##Here is suma over the NQ dimensions This Function calculates the Sum of Squared Error   
sumaj <- function(i){                   for the ith Row   
  jjj <- 1:nq
  j <- jjj[!is.na(T[i,])]    Here is the trick to have j only range over the non-missing values   
  s <- (xx[i,1]-zz[j,1])^2+(xx[i,2]-zz[j,2])^2  Note that "j" is a vector so this is a vector calculation   
  s=sqrt(s)
  sx=TEIGHT[i,j]
  sum((s-sx)^2)       Here is the Sum of Squared Error   
}
## now get SUMA over NP dimension
#xx <- xmetric   # Used as a check on the MLSMU6 Algorithm
#zz <- zmetric
sumvector <- sapply(1:np,sumaj)  
suma <- sum(sumvector)   Here is the Total Sum of Squared Error   
xxx <- xx
zzz <- zz
kp=0
ktp=0
sumalast <- 100000.00
SAVEsumalast <- sumalast
for(loop in 1:50){
  xxx[,1:2] <- 0
  zzz[,1:2] <- 0
  kp=kp+1
  ktp=ktp+1

  for(i in 1:np){      This For Loop is doing the sum of the line equations for the Stimuli   
    for(j in 1:nq){
      if(!is.na(T[i,j])){
        s=0
        for(k in 1:ndim)
          s <- s+(xx[i,k]-zz[j,k])^2
        xc <- ifelse(s==0, 1.0, TEIGHT[i,j]/sqrt(s))
        for(k in 1:ndim)
          zzz[j,k]=zzz[j,k]+(xx[i,k]-xc*(xx[i,k]-zz[j,k]))/xcol[j]
      }
    }
  }
  for(k in 1:ndim){    This For Loop is doing the sum of the line equations for the Respondents   
    for(i in 1:np){
      sw <- 0.0
      for(j in 1:nq){
        if(!is.na(T[i,j])){
          s=0
          for(kk in 1:ndim)
            s <- s+(xx[i,kk]-zzz[j,kk])^2
          xc <- ifelse(s==0, 1.0, TEIGHT[i,j]/sqrt(s))
          xxx[i,k]=xxx[i,k]+(zzz[j,k]-xc*(zzz[j,k]-xx[i,k]))
        }
      }
      xxx[i,k]=xxx[i,k]/xrow[i]
    }
  }
  xx <- xxx
  zz <- zzz
sumvector <- sapply(1:np,sumaj)
suma <- sum(sumvector)   Calculate Current SSE   
#
  print(c(loop,suma))
  done=((sumalast-suma)/suma)<0.0005  Compare the Current with the Previous SSE   
  sumalast=suma
  if(done) break   If Convergence Break out of the Loop   
}

Run nes68_thermometers_smacof_3.r and turn in the plot that it makes.
Report result$stress, sse1, sse2, and sumalast (this will appear on your screen).
Make two panel plots of the SMACOF configurations against the MLSMU6 configurations. Namely, rotate xx to best match xmetric and apply that same rotation matrix to zz. That is, one two-panel plot shows the respondents and one two panel plot shows the 12 Political figures. Report the rotation matrix, T, and the before and after r-squares by dimension.

In this problem we are going to compare probit and logit coefficients using the analysis in my book on pages 37-40 (also, work through Notes on the Geometry of Logit and Probit). Download the R program:

bush_v_gore_example_h106.r

probit

logit

glm

hdmg106_2009_fixed.dta

STATA

#
# bush_v_gore_example_h106.r -- GLM Examples
#
rm(list=ls(all=TRUE))
#
library(MASS)
library(foreign)
#
setwd("C:/uga_course_homework_11_2015/")
data <- read.dta("hdmg106_2009_fixed.dta")  Read in the Stata Data set
attach(data,warn.conflicts = FALSE)
#                                           Grab all the variables
T <- cbind(bush00,gore00,black00,south,hispanic00,income,owner00,dwnom1n,dwnom2n,ybush,ygore)
TT <- T
mode(TT) <- "double"                        Name the variables -- handy below
colnames(TT) <- c("Bushvote","Gorevote","blackpct","southdum","hispandum","income","ownerhome",
                  "dwnom1n","dwnom2n","ybush","ygore")
#
nrow <- length(TT[,1])
ncol <- length(TT[1,])
#
#
#                 STATA OUTPUT FOR REFERENCE
#  &&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&
#. summ
#
#    Variable |       Obs        Mean    Std. Dev.       Min        Max
#-------------+--------------------------------------------------------
#      bush00 |       432    46.81019    14.36873          6         80
#      gore00 |       432    49.71296      14.005         20         93
#     black00 |       432    12.68333    16.25699         .3         96
#       south |       432    .3101852    .4631056          0          1
#  hispanic00 |       432    12.31597    16.25281         .6         86
#-------------+--------------------------------------------------------
#      income |       432    30.75224    8.319547      15.06     57.219
#     owner00 |       432    65.86875    11.92404          8       84.3
#     dwnom1n |       432    .0586875    .4671698      -.815      1.269
#     dwnom2n |       432   -.0381065    .4706545     -1.123       1.27
#       ybush |       432    .4861111    .5003865          0          1  1 if Bush > 50% in a CD, 0 otherwise
#-------------+--------------------------------------------------------
#       ygore |       432       .4375    .4966535          0          1
#. probit ybush black00 south hispanic00 income owner00 dwnom1n dwnom2n
#
#Probit regression                                 Number of obs   =        432
#                                                  LR chi2(7)      =     338.29
#                                                  Prob > chi2     =     0.0000
#Log likelihood = -130.12789                       Pseudo R2       =     0.5652
#
#------------------------------------------------------------------------------
#       ybush |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
#-------------+----------------------------------------------------------------
#     black00 |  -.0285973   .0097913    -2.92   0.003    -.0477878   -.0094067
#       south |   .7695862   .2545783     3.02   0.003     .2706219    1.268551
#  hispanic00 |  -.0089458   .0069163    -1.29   0.196    -.0225015    .0046099
#      income |  -.0241489   .0126394    -1.91   0.056    -.0489217    .0006239
#     owner00 |   .0235461   .0143687     1.64   0.101    -.0046161    .0517083
#     dwnom1n |   2.761974   .2619813    10.54   0.000       2.2485    3.275448
#     dwnom2n |   1.136417   .2339829     4.86   0.000     .6778186    1.595015
#       _cons |  -.9697998   1.077738    -0.90   0.368    -3.082127    1.142528
#------------------------------------------------------------------------------
#                            How to run a Probit in R
probitbush <-glm(ybush ~ black00+south+hispanic00+income+owner00+dwnom1n+dwnom2n,family=binomial(link=probit))
sumprobitbush <- summary(probitbush)  Get the Summary
#
#  ---- Useful Commands To See What is in an Object
#
# > length(sumprobitbush)
# [1] 17
# > class(sumprobitbush)
# [1] "summary.glm"
# > names(sumprobitbush)     The coefficients are in sumprobitbush$coefficients[1:8]
#  [1] "call"           "terms"          "family"         "deviance"      
#  [5] "aic"            "contrasts"      "df.residual"    "null.deviance" 
#  [9] "df.null"        "iter"           "deviance.resid" "coefficients"  
# [13] "aliased"        "dispersion"     "df"             "cov.unscaled"  
# [17] "cov.scaled"    
#
#                            How to run a Logit in R 
logitbush <-glm(ybush ~ black00+south+hispanic00+income+owner00+dwnom1n+dwnom2n,family=binomial(link=logit))
sumlogitbush <- summary(logitbush)  Get the Summary

Run the program and report all the output NEATLY FORMATTED produced by sumprobitbush and sumlogitbush.
Calculate Cosineq (Cosine Theta if you use a handicapped browser) between the probit and logit coefficients (see page 40 of my book).
The probit and logit summaries report Null Deviance, Residual Deviance, and AIC. Null Deviance is the deviation of a model that contains only the intercept term, that is, a fixed probability for all observations. Residual Deviance is simply -2*Log-Likelihood. In this instance, N1=210, the number of "1"s, N0=222, the number of "0"s, K=8, the number of coefficients, and the formulas are:

Null Deviance = -2*N1*[log(N1/(N1+N2))] -2*N2*[log(N2/(N1+N2))]
Residual Deviance = -2*[LOG LIKELIHOOD]
Akaike Information Criterion (AIC) = -2*[LOG LIKELIHOOD] + 2*K

Calculate these three statistics for the probit output. Show all of your calculations.

In this problem we are going to compare ordered probit and ordered logit coefficients using the analysis in my book on pages 37-40 (also, work through Notes on the Geometry of Logit and Probit). Download the R program:

oprobit_ologit.r

ordered probit

ordered logit

polr

NES1968_FIRST.DTA

STATA

#
rm(list=ls(all=TRUE))
library(foreign)
library(MASS)
library(stats)
setwd("C:/uga_course_homework_11_2015")
#
#  READ STATA FILE HERE
data <- read.dta("nes1968_first.dta")  Read in the Stata Data set
attach(data,warn.conflicts = FALSE)
#                                      Grab only the variables we need
T <- cbind(VAR00120,VAR00156,VAR00261,VAR00263,VAR00264,VAR00478,VAR00479,VAR00480,VAR00533)
TT <- T
mode(TT) <- "double"                   We do not use these variable names, we just need to keep track
colnames(TT) <- c("partyid2","education5","income2","sex2","black2","wallace2","humphrey2","nixon2","age2")
#
# partyid -- 0,1,2,3,4,5,6             The Codes we need
# education -- 00 - 22=8th grade or less, 31-61=9th-12th grades, 71=some college, 81=BS/BA, 82-87, advanced
# income -- 10 - 35
# sex -- 1=male, 2=female
# black -- 1=white, 2=black
# wallace -- 0 - 97
# humphrey -- 0 - 97
# nixon -- 0 - 97
# age -- 20 - 93
#
partyidx <- VAR00120                   Recode the variables and insert NA's for missing data
partyid2 <- ifelse(partyidx > 6,NA,partyidx)
#
educationx <- VAR00156                 Note the Logic here
education1 <- ifelse(educationx < 23,1,educationx)
education2 <- ifelse(((23 < educationx) & (educationx < 62)),2,education1)
education3 <- ifelse(educationx==71,3,education2)
education4 <- ifelse(((72 < educationx) & (educationx <= 87)),4,education3)
education5 <- ifelse(educationx > 87,NA,education4) 
#
incomex <- VAR00261
income2 <- ifelse(incomex > 35,NA,incomex)
#
sex <- VAR00263 - 1       0=Male, 1=Female
#
racex <- VAR00264
race1 <- ifelse(racex==1,0,racex)  White = 0
race2 <- ifelse(racex==2,1,race1)  Black = 1
race3 <- ifelse(racex>2,NA,race2)
#
wallacex <- VAR00478      Feeling Thermometers, max=97
wallace1 <- ifelse(wallacex>97,NA,wallacex)
#
humphreyx <- VAR00479
humphrey1 <- ifelse(humphreyx>97,NA,humphreyx)
#
nixonx <- VAR00480
nixon1 <- ifelse(nixonx>97,NA,nixonx)
#
agex <- VAR00533          Maximum age in the 1968 data was 93
age1 <- ifelse(agex>93,NA,agex)
#
#  VARIABLES
# partyid2, education5, income2, sex, reace3, wallace1, humphrey1, nixon1, age1
#                        Stick Variables in a Matrix
XDATA <- cbind(partyid2,education5,income2,sex,race3,wallace1,humphrey1,nixon1,age1)
YDATA <- na.omit(XDATA)  List-Wise Deletion -- NEAT COMMAND
#
#  ORDERED PROBIT
#
scarecrow <- polr(as.ordered(YDATA[,1]) ~ YDATA[,2]+YDATA[,3]+YDATA[,4]+YDATA[,5]+YDATA[,6]+YDATA[,7]+YDATA[,8]+YDATA[,9],
                  method="probit")
#
#  ORDERED LOGIT
#
blackcat <- polr(as.ordered(YDATA[,1]) ~ YDATA[,2]+YDATA[,3]+YDATA[,4]+YDATA[,5]+YDATA[,6]+YDATA[,7]+YDATA[,8]+YDATA[,9],
                  method="logistic")

Run the program and report all the output NEATLY FORMATTED produced by summary(scarecrow) and summary(blackcat).
Calculate Cosineq (Cosine Theta if you use a handicapped browser) between the ordered probit and ordered logit coefficients (see page 40 of my book).
The ordered probit and ordered logit summaries report Residual Deviance, and AIC. Calculate these two statistics for both the ordered probit and ordered logit outputs. Show all of your calculations.

The aim of this problem is to familarize you with the R version of Optimal Classification (OC). Download the R program:
1. Run the program and turn in the two plots that the program generates.
2. Report the fits of the one and two dimensional models -- result1$fits and result$fits.
3. Report summary(result1) and summary(result) NEATLY FORMATTED.
4. On your hard drive you will have two files: Sen112_X.txt and Sen112_Z.txt. Plot the Senator Coordinates in Sen112_X.txt (this is the same graph that the program produces) only now put the normal vector for the second roll call in the plot as a black arrow (this is the second row of Sen112_Z.txt) and draw the cutting line as a black line that is perpendicular to the normal vector and passes through the normal vector at the midpoint coordinates (which lie on the normal vector).

Homework 11, POLS 8505: MEASUREMENT THEORYDue 13 April 2015

Homework 11, POLS 8505: MEASUREMENT THEORY
Due 13 April 2015