Sunday, October 11, 2009

Making a Map with Epi Info

I wanted to create a map that would show changes in some kinds of data on a county-by-county basis for the state of Indiana. To create this map, I downloaded and installed the free Epi Info program from the CDC. To get the outline of Indiana and its counties, I downloaded the appropriate TIGER/Line shapefile from the Census Bureau. I had previously tried downloading one from another source, but it turned out not to be able to accommodate county-by-county data.

I had prepared some data in an Excel spreadsheet. I imported it into Epi Info (Analyze Data > Data > Read (Import). I ran into some problems with this. One problem was that I had a hyphen in the filename, and Epi Info couldn't deal with that. I was also not quite sure what to do with this data, so I backed up and tried using some canned data from a standard source. I went to the Census Bureau's USA Counties Data Files webpage and downloaded the POP01.xls spreadsheet. I edited out the non-Indiana data and saved that spreadsheet. I imported it into Epi Info. I tried to export just one variable in Epi 2000 format and got an error message: "The name specified for the output table is reserved." A Google search turned up no references to that error. I thought that meant I was using the name of an already existing file, but when I tried a different (new) file name, I got the same thing. The filename had indeed been created, but there was nothing in it.

A less restrictive search led to, among other things, an Epi Info Training Manual on the website of the Department of Food Science and Human Nutrition at Iowa State. (You can major in food? Why didn't I think of that?) Their manual said that having a space in your filename could cause problems, but that wasn't the problem for me. Finding that manual made me think of searching for Epi Info manuals specifically, and that led me to the CDC's manual for Epi Info for DOS. I couldn't figure out where that had downloaded itself to, but I finally found it (epi6man.exe) in My Documents\Downloads; but when I ran it from a DOS prompt, it gave me "Access Denied." Apparently it was was created in 1992 (or maybe 1994) and, as such, was designed for old-style DOS systems, not for my CMD window in Windows XP. I found another, much shorter manual produced by the Great Lakes Epidemiology Center. But neither of these seemed to explain this problem.

What I finally figured out was this. I import the data from a spreadsheet or database file. I list the imported contents. So far, so good. Now I want to write the output in Epi 2000 format (which is just .mdb format, i.e., Access). Here's the trick. I didn't need to do this on the computer where I first used Epi Info, but I needed to do it on my home installation. To write the data successfully, *temporarily* chose the same format and filename as your input file, as if you were going to overwrite that file. This will make your tables show up in the Data Table drop-down box. Select the table you want to use. Don't click OK yet. Go back to the File Name box and change it to the new name that you want to create. If necessary, select the table name again. You may have to just type the file name and let it save itself wherever, in order to avoid the problem that comes from browsing to the desired location.

So that solved that problem.

Next, in Epi Info, I clicked on the Map button to open Epi Map. In the upper-left corner, I clicked on the button that looked like a stack of three sheets of paper. This opened Map Manager. I clicked Add Layer and navigated to my Indiana state and counties shapefile (above). This was where I had finally figured out that the other shapefile I had downloaded previously did not have a capacity for county data: its Add Data button was grayed out. This time around, that was not a problem. I clicked Add Data and navigated to the .mdb file I had just written from Analyze Data. After clicking a couple of buttons, it gave me a Select Relate Fields and Render Field dialog with three panes. In the left pane, it showed Shape Fields. In the center, it showed the Geographic Field from my .mdb file, which I had called the County field.

I figured I needed to put my County names in a form that the related Shape Field would recognize, so that the data for Adams County would actually appear in the Adams County part of the map. But what was the right form? I went back to the files that I had unzipped, when I had downloaded the shapefile. There weren't any Read-Me or .pdf files there. There was an .xml file that looked like it contained what I wanted, but I had to try several browsers before I found that Google Chrome would at least show its contents in a normally readable (although unformatted) form. It didn't have much info after all, but it did point me toward a page on TIGER (short for Topically Integrated Geographic Encoding and Referencing system) products, which led to the shapefiles main page, which led to their technical documentation page, where I downloaded their full 185-page Technical Documentation manual, which was already nicely bookmarked (though for some reason it didn't open up that way), wherein I groped around for a while.

As I looked at the Select Relate Fields dialog, I saw two likely candidates: CNTYIDFP00 and COUNTYFP00. One of the unzipped files accompanying the shapefile was a .dbf file, so I tried viewing it in Access, but no go. I tried opening it in Epi Analyze as a dBASE IV file, but it said, "Filenames for this data format must be in the old 8.3 style." So I copied that .dbf file and called the copy TEMP.dbf, and tried opening that in Epi Analyze. I did a Statistics > List and there we were, and I saw there was a third field I had not counted on: the NAMELSAD00 field had complete county names, just like they were in my massaged spreadsheet: Adams County, etc. So, OK, bailing out of that, back in Epi Map I related their NAMELSAD00 field to my County field and selected, as the Render Field, the first of the several (actually, ten) years' worth of data that I wanted to map. Sadly, this gave me an OpenData dialog that said, "There were no matches found between the data table field and the shapefile field." Looking again at the structure of my data table in Access, I saw that the County field was Text type, field size 255. To get comparable information on the shapefile, I went back into that TEMP.dbf file in Epi Analyze and chose Variables > Display. This told me that NAMELSAD00 was a text field, but not its length. I tried a Write (Export) to a TEMP.mdb file in Epi 2000 (probably could have used Access 2000) format. Finally, I got the information I wanted: the NAMELSAD00 field was configured exactly the same as my County field. So what was I doing wrong?

A Google search for that error message turned up nothing. A search for "no matches" in the Technical Documentation .pdf produced - you guessed it - no matches. Once again, a less precise Google search led to some possibilities, including a Cardiff Council manual, presented in Scribd format, but I wasn't getting much mileage there either. A remark in the Technical Documentation said, "Federal Information Processing Series codes will continue to serve as the key matching and joining codes for Census Bureau products." So I thought maybe I should try linking on a numeric field instead of the County name field. In Access, I created another version of my data table, with additional fields for CNTYIDFP00 (which I made primary key) and COUNTYFP00. This dropped two records, due to county names that contained spaces or differed between the two, but I manually reinserted those and marched on. Back in Epi Map's Select Relate Fields dialog, I designated CNTYIDFP00 as my Relate field. That worked. I had myself a map. The last step was to tinker with Map Manager's Properties button, which took a lot of time but yielded great improvement in appearance.

I repeated the last few steps for each of the ten years in my study period. For each finished map, I took a screenshot (Print Screen button on keyboard) and pasted it into IrfanView, where I did some batch cropping so that the state map would remain in the same position on all ten screenshots.  The shapefile I had used also had distorted the state, making it look shorter and thicker than it normally is on maps, so I used IrfanView to batch-adjust the dimensions of each screenshot.  I then imported the images into Adobe Premiere Elements and added transitions and titles to show the year.  I posted the result on YouTube and have also posted an explanation of what it's all about, here on my blog.

3 comments:

raywood

For users of Epi Info, I forgot to mention the user forums at http://forums.myepi.info.

raywood

Users of Social Explorer who watch the accompanying video will quickly note that I was seeking a Social Explorer effect here, using data and adding some effects that are not presently available in Social Explorer.

To play with Social Explorer, see http://www.socialexplorer.com.

Lau

I find this information useful to me because I have the same problem as how to create a mdb file in oder to export my data from spreadsheet to epi Info
Lau, PNG.