Monday, May 24, 2010

Assignment #6 - Spatial Autocorrelation

---------------
orcounty <- readShapePoly("orcounty.shp",proj4string=CRS("+proj=longlat"))
plot(orcounty)
#save image now: "plot.orcounty.pdf"



---------------
summary(orcounty)
coordinates(orcounty)
centers=coordinates(orcounty)

centers=data.frame(centers) points(centers,col="blue",cex=1.2)
text(centers,labels=rownames(centers),cex=1.5)
orcounty.centers = coordinates(orcounty)
#save image now: "orcounty.labels.pdf"



---------------
k=1
knn1 = knearneigh(orcounty.centers,k,longlat=T)
orcounty.knn1=knn2nb(knn1)

plot(orcounty)
plot(orcounty.knn1, orcounty.centers, col="blue",add=T)

#save image now: "knn1.pdf"



---------------
plot(orcounty)
k=2

knn2 = knearneigh(orcounty.centers,k,longlat=T)

orcounty.knn2=knn2nb(knn2)
plot(orcounty.knn2, orcounty.centers, col="blue",add=T)

#save image now: "knn2.pdf"


---------------
plot(orcounty)
k=3
knn3 = knearneigh(orcounty.centers,k,longlat=T)

orcounty.knn3=knn2nb(knn3)

plot(orcounty.knn3, orcounty.centers, col="blue",add=T)

#save image now: "knn3.pdf"

---------------
plot(orcounty)
k=4

knn4 = knearneigh(orcounty.centers,k,longlat=T)

orcounty.knn4=knn2nb(knn4)

plot(orcounty.knn4, orcounty.centers, col="blue",add=T)

#save image now: "knn4.pdf"



---------------
plot(orcounty)
k=5
knn5 = knearneigh(orcounty.centers,k,longlat=T)
orcounty.knn5=knn2nb(knn5)

plot(orcounty.knn4, orcounty.centers, col="blue",add=T)

#save image now: "knn5.pdf"


---------------
d=100
orcounty.dist.100 = dnearneigh(orcounty.centers,0,d,longlat=T)

plot(orcounty)

plot(orcounty.dist.100, orcounty.centers,add=T,lwd=2,col="red")

#save image now: "d100.pdf"


---------------
d=200

orcounty.dist.200 = dnearneigh(orcounty.centers,0,d,longlat=T)
plot(orcounty)

plot(orcounty.dist.200, orcounty.centers,add=T,lwd=2,col="red")

#save image now: "d200.pdf"



---------------
d=150
orcounty.dist.150 = dnearneigh(orcounty.centers,0,d,longlat=T)
plot(orcounty)

plot(orcounty.dist.150, orcounty.centers,add=T,lwd=2,col
="red")
#save image now: "d150.pdf"



---------------
d=15
orcounty.dist.15 = dnearneigh(orcounty.centers,0,d,longlat=T)

plot(orcounty)

plot(orcounty.dist.15, orcounty.centers,add=T,lwd=2,col="red")
#save image now: "d15.pdf"


---------------
d=1000
orcounty.dist.1000 = dnearneigh(orcounty.centers,0,d,longlat=T)
plot(orcounty)
plot(orcounty.dist.1000, orcounty.centers,add=T,lwd=2,co
l="red")
#save image now: "d1000.pdf"



---------------
orcounty.lags=nblag(orcounty.knn2,2)
plot(orcounty)
plot(orcounty.lags[[2]],orcounty.centers, add=T,lwd=3,col="green",lty=2)

#save image now: "orcounty.lag2.pdf"


---------------
w.cols = 1:36
w.rows = 1:36

w.mat.knn = nb2mat(orcounty.knn1, zero.policy=TRUE)
w.mat.knn

image(w.cols,w.rows,w.mat.knn,col=brewer.pal(3,"BuPu"))

#save image now: "knn1.matrix.pdf"



---------------
w.mat.dist = nb2mat(orcounty.dist.100, zero.policy=TRUE) image(w.cols,w.rows,w.mat.dist,col=brewer.pal(9,"PuRd"))
#save image now: "d100.matrix.pdf"


---------------
breaks = round(quantile(orcounty$MEDIANRENT))
colors = c("red","orange","yellow","green")
plot(orcounty,col=colors[findInterval(orcounty$MEDIANRENT,breaks,all.inside=TRUE)])
#save image now: "orcounty.medianrent.pdf"


---------------
display.brewer.all() nclr = 4 plotclr = brewer.pal(nclr,"PuRd")
class = classIntervals(orcounty$MEDIANRENT,nclr,style="quantile")
colcode = findColours(class,plotclr)
plot(orcounty,col=colcode)
title(main="Median Rent in Oregon",sub="Quantiles")

#legend code not working:

##legend(71.5,35,legend=names(attr(colcode, "table")),fill=attr(colcode, "palette"), cex=0.75,bty="n")

#tried to alter parameters (guessing this is what I should do)

##legend(321.2,242.2,legend=names(attr(colcode, "table")),fill=attr(colcode, "palette"), cex=0.75,bty="n")

#didn't work, so just save image as is: "orcounty.medianrent.PuRd"

---------------
moran.plot(orcounty$MEDIANRENT,nb2listw(orcounty.dist.200),labels=orcounty$NAME)
#save image now: "orcounty.moran.pdf"

---------------

moran.test(orcounty$MEDIANRENT,nb2listw(orcounty.dist.200, style="W"))


Monday, May 17, 2010

Final Project Proposal

This project is the result of a number of data-related issues I’ve been thinking about for a few years now. I joined the Zomba project (as I informally refer to it) in 2007 as a data processor. This is a series of research projects begun by Pauline Peters (Harvard Center for International Development) in 1986 in a rural area of the Zomba District in southern Malawi. I was hired to process the data from her most recent round of data collection from 2006. I continued the data collection in Malawi during the summer of 2008. My current priority is to make sure our use of the data is appropriately oriented to statistical practice. Analysis is meant to better understand household wellbeing relative to each other. The problems I have spotted are listed below.

The Zomba project has consisted of ethnographic and survey-based data collection from approximately 230 households from 6 clusters of villages in the area. The general objective is to study the response of smallholder household food security to cash crop initiatives and HIV infection. Below is a description of which data have been collected over the years, followed by a list of the problems I’d like to address.

The Data (household level)

· income indicators: expenditures—monthly/annual totals, percentiles, and classified by category (eg. food, labor, household supplies); occupations; household assets—by total value and percentiles; income from crop sales (maize and tobacco); size of landholdings;

· demographics: number of members; age of household head; dependency ratio; headship (female de jure, female de facto, joint, male, child); occupation; education level; relationship and extended family; HIV prevalence (self-reported); morbidity and mortality;

· lifestyle: daily activities; mobility ;

· agriculture: crops grown; maize yield; tobacco yield; farming strategies (qualitative); fertilizer use; field size, location and use;

· [NEW] GPS coordinates for each household and some roads/paths;

· [Forthcoming] GPS polygons of farmers’ fields;

Problems/Questions

· The data are not parametrically distributed, but by and large we have been using parametric statistics. I need to find a more appropriate approach.

· Data are biased and cannot be described as representing a general population. Dr. Peters’ original intent was to compare tobacco-growing households to non-growers. Thus, she sampled households to select an equal number of tobacco-growers and non-growers. Though she tried to select for a representative sample based on other household indicators, the inclusion of a high proportion of tobacco growers biased the sample in favor of higher income and landholdings. How does this bias affect analysis? How can this be moderated for statistical analysis? (ie. Am I simply stuck with including a footnote explaining the bias? I’m pretty sure I am…)

· These data have never been analyzed geographically. The study area is geographically limited (roughly 30 km across), so Dr. Peters, an anthropologist, believes there should be no geographical effect on the data (that is, no spatial autocorrelation). It was only in the 2008 round that I collected GPS coordinates on each household. I was unable to record coordinates for other locations, such as clinics, wells or agricultural depots. Are we in danger of ignoring geography?

· Some of the visualization methods used by Peters--both for analysis and presentation--can be improved upon. Also, I would like to integrate spatial data into these visualizations.



References: (in alpha order)

Hargreaves, J., Morison, L., Gear, J., Kim, J., Makhubele, M., Porter, J., et al. (2007). Assessing household wealth in health studies in developing countries: a comparison of participatory wealth ranking and survey techniques in rural South Africa. Emerging Themes in Epidemiology, 4(1):4.

(This article compares our current method of wealth assessment to one I’ve been thinking of trying out. It uses some statistical techniques that might come in handy.)

Jayne, T.S., Takashi Yamano, Michael T. Weber, David Tschirley, Rui Benfica, Antony Chapoto and Ballard Zulu. (2003). Smallholder income and land distribution in Africa: implications for poverty reduction strategies. Food Policy, 28(3):253-275.

(This article uses spatial and statistical analysis to examine similar data to that collected for the Zomba project. It also comes from a journal I frequently read and reference, and would like to contribute to, so it makes a good example.)

Miller, D.C. (2002). Handbook of Research Design and Social Measurement, 6th ed. Thousand Oaks, CA: Sage Publications.

(General notes for research design.)

O’Sullivan, D. and D.J. Unwin. (2003). Geographic Information Analysis. Hoboken, NJ:John Wiley and Sons.

(First steps to integrating spatial data into our existing data.)

Peters, P.E., Walker, P.A., & Kambewa, D. (2008). The Effects of Increasing Rates of HIV/AIDS-related Illness and Deaths on Rural Families in the Zomba District, Malawi: a Longitudinal Study: RENEWAL Program.

(This is the most recent report produced by Peters and colleagues. I will use this to draw examples of our current use of statistical analysis and some of the problems I see.)

Serneels, S. and E. Lambin. (2001). Proximate Causes of Land-use Change in Narok District, Kenya: a Spatial Statistical Model. Agriculture, Ecosystems, and Environment, 85(1):65-81.
(Another new technique for future research? The secondary author is a prominent political ecologist whose work closely parallels my own.)

Smith, L. C., & Subandoro, A. (2008). Measuring food security using household expenditure surveys. Washington, D.C.: International Food Policy Research Institute.

(This is specific to one of the types of data we use. It should be noted that expenditures are always cross-references with other indicators, as they are a proxy.)

Tuesday, May 11, 2010

Fishnet Maps--Assignment #5


Look everybody! I managed to create something publishable for my blog!