> #Statistical analysis of the interaction between letter position > #and lexicality in their effects on probability of correct response. This is the result of a "script" or "program" in the statisitcal language Splus. Commands and comments are interspersed with results. Included is the table of basic data (extracted from the data handout), and several other tables that represent intermediate products in this analysis. > #Goal: we observed that for the data averaged over the eight subjects, > #the mean percent correct for Words was a relatively constant 9% (.09) > #greater than for Nonwords. How seriously should this be taken? > #I.e. how precisely do we know that the difference is constant? > fullmat <- matrix(scan("position.data",multi.line=T),ncol=2,byrow=T) > datamat <- fullmat[-c(1,10,19,28,37,46,55,64),]/1000 > dnames <- list( c("sub1","sub2","sub3","sub4","sub6","sub7","sub8","sub9"), c("NW","W"), c("pos1","pos2","pos3","pos4"), c("Pc","Qc")) > data.ar1 <- array(as.vector(datamat),dim=c(8,2,4,2),dimnames=dnames) > data.ar <- aperm(data.ar1,c(1,4,2,3)) > data.ar , , NW, pos1 Pc Qc sub1 0.6 0.8 sub2 0.8 0.9 sub3 0.8 0.8 sub4 0.8 0.8 sub6 0.7 1.0 sub7 0.6 0.8 sub8 0.8 0.9 sub9 0.6 0.9 , , W, pos1 Pc Qc sub1 0.7 0.9 sub2 1.0 1.0 sub3 0.9 1.0 sub4 0.5 1.0 sub6 0.9 1.0 sub7 0.9 1.0 sub8 0.8 0.9 sub9 0.7 0.9 , , NW, pos2 Pc Qc sub1 0.6 1.0 sub2 1.0 0.8 sub3 0.7 1.0 sub4 0.5 0.7 sub6 1.0 1.0 sub7 0.7 1.0 sub8 1.0 1.0 sub9 0.6 0.7 , , W, pos2 Pc Qc sub1 0.8 0.9 sub2 1.0 0.9 sub3 0.8 0.8 sub4 0.9 1.0 sub6 1.0 1.0 sub7 0.9 0.9 sub8 1.0 1.0 sub9 0.8 1.0 , , NW, pos3 Pc Qc sub1 0.8 0.7 sub2 0.7 1.0 sub3 0.8 0.5 sub4 0.7 0.9 sub6 0.9 0.9 sub7 0.9 1.0 sub8 0.8 0.8 sub9 0.8 1.0 , , W, pos3 Pc Qc sub1 0.7 0.8 sub2 1.0 1.0 sub3 0.9 0.8 sub4 0.8 0.6 sub6 1.0 1.0 sub7 1.0 1.0 sub8 1.0 1.0 sub9 0.9 1.0 , , NW, pos4 Pc Qc sub1 0.6 0.5 sub2 0.8 0.7 sub3 0.3 0.9 sub4 0.8 0.6 sub6 0.6 0.9 sub7 0.9 0.6 sub8 0.5 0.8 sub9 0.2 0.9 , , W, pos4 Pc Qc sub1 0.6 0.7 sub2 0.8 0.9 sub3 0.8 0.7 sub4 0.5 0.8 sub6 0.6 0.8 sub7 0.7 1.0 sub8 0.8 0.8 sub9 0.7 1.0 > #8 rows for each half-data set, first 8 being NW, second being W. > #four data sets like above, for critical positions 1, 2, 3, 4. > #1st column is Pc (prob of correct response on match trial) > #2nd column is Q3 (prob correct response on mismatch trial) > #mean of these is mean prob correct. > meanPC <- apply(data.ar,c(1,3,4),mean) > meanPC , , pos1 NW W sub1 0.70 0.80 sub2 0.85 1.00 sub3 0.80 0.95 sub4 0.80 0.75 sub6 0.85 0.95 sub7 0.70 0.95 sub8 0.85 0.85 sub9 0.75 0.80 , , pos2 NW W sub1 0.80 0.85 sub2 0.90 0.95 sub3 0.85 0.80 sub4 0.60 0.95 sub6 1.00 1.00 sub7 0.85 0.90 sub8 1.00 1.00 sub9 0.65 0.90 , , pos3 NW W sub1 0.75 0.75 sub2 0.85 1.00 sub3 0.65 0.85 sub4 0.80 0.70 sub6 0.90 1.00 sub7 0.95 1.00 sub8 0.80 1.00 sub9 0.90 0.95 , , pos4 NW W sub1 0.55 0.65 sub2 0.75 0.85 sub3 0.60 0.75 sub4 0.70 0.65 sub6 0.75 0.70 sub7 0.75 0.85 sub8 0.65 0.80 sub9 0.55 0.85 > #check means obtained previously > groupmeans <- apply(meanPC,2:3,mean) > round(groupmeans,2) pos1 pos2 pos3 pos4 NW 0.79 0.83 0.82 0.66 W 0.88 0.92 0.91 0.76 > #Now, suppose the lexicality effect is constant across the four > #positions (i.e., additive with position). Then consider a plot of the > #size of the effect: > #Pr{correct | Word} - Pr{correct | nonword} > #versus position. It should be a flat function. That is, a linear > #function fitted through the four points should be flat, or, in other > #words, the slope of the fitted line should be zero. > # > #(Of course, that isn't sufficient to argue constancy; the function > #could be U-shaped, for example. So to convince ourselves of the > #constancy we would also want to fit a U-shaped function and check that > #its curvature - e.g., a fitted quadratic coefficient - is sufficiently > #close to zero. The advantage of looking at a slope coefficient and a > #quadratic coefficient is that they can be estimated so that the two > #tests are statistically independent.) > #Here, we first look at the slope coefficient. > nw.data <- meanPC[,1,] > w.data <- meanPC[,2,] > lex.effect <- w.data - nw.data > nw.data pos1 pos2 pos3 pos4 sub1 0.70 0.80 0.75 0.55 sub2 0.85 0.90 0.85 0.75 sub3 0.80 0.85 0.65 0.60 sub4 0.80 0.60 0.80 0.70 sub6 0.85 1.00 0.90 0.75 sub7 0.70 0.85 0.95 0.75 sub8 0.85 1.00 0.80 0.65 sub9 0.75 0.65 0.90 0.55 > w.data pos1 pos2 pos3 pos4 sub1 0.80 0.85 0.75 0.65 sub2 1.00 0.95 1.00 0.85 sub3 0.95 0.80 0.85 0.75 sub4 0.75 0.95 0.70 0.65 sub6 0.95 1.00 1.00 0.70 sub7 0.95 0.90 1.00 0.85 sub8 0.85 1.00 1.00 0.80 sub9 0.80 0.90 0.95 0.85 > lex.effect pos1 pos2 pos3 pos4 sub1 0.10 0.05 0.00 0.10 sub2 0.15 0.05 0.15 0.10 sub3 0.15 -0.05 0.20 0.15 sub4 -0.05 0.35 -0.10 -0.05 sub6 0.10 0.00 0.10 -0.05 sub7 0.25 0.05 0.05 0.10 sub8 0.00 0.00 0.20 0.15 sub9 0.05 0.25 0.05 0.30 > #fit a linear function of position for each subject, > #and determine its slope. > lex.eff.slope <- (lsfit(1:4,t(lex.effect))$coef)[2,] > mean(lex.eff.slope) [1] 0.00125 > round(lex.eff.slope,3) sub1 sub2 sub3 sub4 sub6 sub7 sub8 sub9 -0.005 -0.005 0.025 -0.045 -0.035 -0.045 0.065 0.055 > round(sort(lex.eff.slope),3) sub7 sub4 sub6 sub1 sub2 sub3 sub9 sub8 -0.045 -0.045 -0.035 -0.005 -0.005 0.025 0.055 0.065 > se.m(lex.eff.slope) [1] 0.01535 > ttest(lex.eff.slope) ---> t= 0.0815 df= 7 pval= 0.9374 mean= 0.00125 95%int= (-0.035,0.0375) > #While the mean slope is remarkably close to zero, the range over > #subjects is large. Suppose the slopes differed by the value at the > #high end of the 95% confidence interval; call it h.end. Then the > #lexicality effect, instead of being constant, would increase over a > #range of (3 * h.end), or 3*.0375 = .11 If the mean effect is .09, the > #change from position 1 to position 4 would then be greater than the > #mean effect. In other words, the data are consistent with huge > #interaction between lexicality and position. > # > #Suppose we weaken this test, by using the standard error of the mean, > #which is .015. note that this quantity is 0.023. If the slope were > #this value instead of zero, the change from position 1 to position 4 > #would be 3*0.015, or .045, or 50% of the mean effect. > #Now let's look at the quadratic coefficient. By regressing the four > #values of the lexicality (for each subject) against the vector > #(1/4, -1/4, -1/4, 1/4), we get an estimate that is independent > #of the slope estimate. > lex.eff.quad <- (lsfit(c(1,-1,-1,1)/4,t(lex.effect))$coef)[2,] > mean(lex.eff.quad) [1] 0.025 > round(lex.eff.quad,3) sub1 sub2 sub3 sub4 sub6 sub7 sub8 sub9 0.15 0.05 0.15 -0.35 -0.05 0.25 -0.05 0.05 > round(sort(lex.eff.quad),3) sub4 sub6 sub8 sub2 sub9 sub3 sub1 sub7 -0.35 -0.05 -0.05 0.05 0.05 0.15 0.15 0.25 > se.m(lex.eff.quad) [1] 0.06478 > ttest(lex.eff.quad) ---> t= 0.3859 df= 7 pval= 0.711 mean= 0.025 95%int= (-0.128,0.178) > #Again, while the mean is quite close to zero, the range over > #subjects is large, and, given this variability, the data are > #compatible with a noticeably u-shaped lexicality effect.