Showing posts with label poverty. Show all posts
Showing posts with label poverty. Show all posts

Monday, March 22, 2010

Survey of Income and Program Participation (SIPP): An Introduction

The Survey of Income and Program Participation has its roots in the War on Poverty of the Johnson Administration in the late 1960s.  It was hard to find good data on welfare and income for poor people.  To meet this need, the U.S. Department of Health, Education, and Welfare (now HHS) created the Income Survey Development Program (ISDP) in 1975.  According to Citro and Scholz (2009, pp. 18-19), the transition from ISDP to the new SIPP survey proceeded in fits and starts, narrowly surviving cancellation until 1984, when the first SIPP panel of about 21,000 households finally began.

From 1985 through 1993, new panels were introduced every February, on a rotating basis.  Within each panel, households were followed through multiple four-month waves (Citro & Scholz, p. 22).  That is, the sample was divided into four groups, and all of the households in a given group were interviewed once every four months, also on a rotating basis.  After a major redesign effort beginning in 1993, panels were interviewed every four months over a four-year period, and one panel was completed before another was introduced.  After 1993, then, panels were introduced in 1996, 2001, 2004, and 2008 (pp. 23-24, 29).

SIPP seems to have been plagued, throughout its existence, by irregular and inadequate political support and funding.  In 2006, the Bush Administration proposed to cut its budget from $44 million to $9.2 million, and to focus more than half of the latter figure on the creation of a new data collection program called the Dynamics of Economic Well-being System (DEWS) (p. 28).  DEWS would rely upon administrative sources of data, rather than a survey, to update and complement decennial census data.  Congress was persuaded that SIPP was important, however, and instead cut its 2007 budget to $33 million in 2007 (p. 29).  DEWS was downscaled, and now seems to survive as the "re-engineered SIPP."

In the 2004 and 2008 panels, it was planned that households would be followed for as many as 12 waves.  These panels used 51,400 households in 2004 and 45,000 in 2008 (pp. 25, 29).  Budgetary and resource restrictions forced a significant cutback in the numbers of households and waves in later years of the 2004 panel.  The 2008 panel now seems to have sufficient funding to carry through to completion.  Data from the 2008 panel will apparently become available in 2012.  The next panel is planned for 2013, structured as a three- or four-year series of annual interviews.

In their book, Citro and Scholz provide extensive information on strengths and weaknesses of SIPP, recommendations for improving data quality and using administrative records effectively, potential innovations in data collection, and other matters.  The general sense is that these authors consider the SIPP flawed but capable of being greatly improved, and in any event irreplaceable as a practical matter.

Sunday, January 3, 2010

American Community Survey (2003): Disabilities on the State Level

In a previous post, I examined some statistics provided by the American Community Survey (ACS) for 2003, regarding the prevalence of certain kinds of disabilities on the national level.  Now I turn to the interpretation of the ACS on the state level.

Generally, as a Census Bureau methodological report explains, state population estimates are obtained by adding up the estimated numbers of people in each county.  The calculation of county population begins with the decennial census.  At present, that is the census taken in 2000, but of course there will be a new one this year (2010).  The Bureau makes adjustments from the previous year based on a variety of county-level sources, including tax records, estimates of migration, and registered births and deaths.  Previous years' estimates are revised as new information sheds light on actual developments.

State totals are not added up to produce the national population, however.  Instead, national totals are developed independently, and are used as a check upon county-level calculations.  Roughly speaking, if the sum of all state totals (that is, the sum of all county totals) were to exceed the independently developed national population estimate by 3%, then the population estimates for all counties would be reduced by 3% so that the state totals would match the national estimate.

Given this information, a lay reader might suggest that a parallel method of estimating the prevalence of disabilities within a county -- and thus within a state -- would supplement the ACS questionnaire by investigating actual records of disabilities, from such sources as hospital and school records, handicap-plate motor vehicle registrations, and Social Security records.  Presumably the efforts of researchers have uncovered shortcomings in decennial censuses; one might expect that these sorts of disability-related investigations would exert a similar beneficial influence on ACS disability estimates.

According to Weathers (2005, p. 61), the Census Bureau does investigate ways in which its statistical processes may produce errors.  Errors may be random, in which case statistical adjustments can be made or estimates of error can be produced.  Errors may also be nonrandom.  Nonrandom errors are also called systematic errors.  These occur, not because (for example) a predictable percentage of statistical typists will make a predictable percentage of typographical errors as they enter the data, but because some kind of atypical distortion occurs with respect to some particular kind of person or situation.  If, for example, followers of a certain religion were concentrated in a particular county, and if their religion taught that it was wrong to respond to surveys, then there could be a pronounced systematic, nonrandom undercounting of people in that county.  Along these lines, it would certainly seem that people who have physical disabilities that make every task a chore, or mental disabilities that discourage cooperation with governmental surveys they perceive as suspicious, could be systematically undercounted.

Weathers (2005, p. 61) indicates that the Census Bureau maintains information on identified systematic errors in the ACS at a page on its website.  A quick search of that webpage finds, at present, no entries pertaining to disabilities.  Weathers (pp. 68-70) does note certain regards, however, in which a redesign of the ACS disability questions resulted in dramatic and potentially erroneous declines in reported disabilities in 2003.  There have been substantial changes in the ACS measurement of disabilities since then.  A subsequent post will discuss those changes.  First, however, the following paragraphs explore the 2003 ACS in light of Weathers's comments, many of which are still applicable and/or have not yet been revisited.

ACS 2003 state-by-state disability prevalence rates (Weathers, 2005, pp. 45-46) raised questions of consistency in data collection procedures.  In the Midwest, for example, most states were fairly similar to one another:  overall disability rates were between 11.2 and 13.3 in Ohio, Indiana, Iowa, Michigan, Wisconsin, Missouri, Kansas, and Nebraska.  Indeed, without Indiana and Ohio, the range among those states would have been in the narrow band of 11.2 to 12.4.  Yet somehow, in the middle of those states, Illinois – right next to Indiana (13.3) – somehow produced a rate of 9.2.  Certainly it is plausible that a city like Chicago would manage to accommodate its persons with disabilities better than a more rural state; but why the presence of a large city would affect disability prevalence itself is not intuitively obvious.  If population density itself were a positive factor, Rhode Island (12.0) would not have had a rate considerably higher than those of its neighbors Connecticut (9.2) and Massachusetts (9.7).

Certainly the 2003 ACS state-level data are interesting.  Weathers (2005, pp. 23-24) compares states in terms of the levels of employment, poverty, and household income experienced by persons with disabilities.  His tables 7-9 (pp. 47-52) also provide relative comparisons of the experiences of people with and without disabilities in those several regards.  Thus, for example, whatever the rate of unemployment in West Virginia as a whole, the data reveal that that state’s people with disabilities are employed at only about one-third (35.8%) the rate of its people without disabilities -- as compared to Wyoming, on the opposite extreme among the lower 49 states, where the rate is more like two-thirds (65.5%).  Among the midwestern states just listed (including Illinois), that relative rate ranges from 48.4% (Michigan and Ohio) to 55.8% (Nebraska).

According to Weathers (2005), the relative experience of poverty for people with disabilities, like their relative level of employment, varies considerably among states.  At the low end, people with disabilities in Utah are only 2.1 times as likely to fall below the povety line as are people without disabilities.  At the high end, people in Nebraska are 4.9 times as likely to do so.  Nebraska aside, the midwestern states listed above are within the range of 3.4 (Illinois) to 3.8 (Kansas).

As with some of the other values discussed here, relative household incomes contrast western states against southern states.  Utah leads with a value of 75.9% – that is, the median household income of a person with disabilities in Utah is 75.9% of the median household income of a person without disabilities – and Louisiana (49.6%) and Alabama (50.4%) are beaten at the bottom end only by Delaware (which, with the slightly lower value of 48.3%, is an outlier in regional terms by several of these measures).  The midwestern range is from 54.2% (Ohio) to 62.3% (Wisconsin).  Relatively large differences among these states’ neighbors (e.g., 61.9% in Indiana, 56.4% in Michigan) raise the question of how state-level policies impact these numbers.

Some caveats are in order.  First, as with most of the observations in this post, it remains to be seen how these data have changed since 2003.  Note, too, that the foregoing analyses provided by Weathers (2005, p. 19) are focused upon the working-age population, ages 25-61.  Also, the ACS presents values on employment and poverty that are in some regards markedly divergent from those reported by most other national studies cited in a previous post in this series (Weathers, p. 29).

Across all age groups, Weathers (2005, pp. 20-21), notes that national counts of disabilities may be influenced by race, culture, gender, and education.  Black people comprise 13.8% of people with disabilities, as compared to 11.7% of the population without disabilities.  Hispanic people comprise 14.0% of the population without disabilities, but only 9.6% of the reported population with disabilities.  Women constitute 52.8% of the population with disabilities (versus 51% without), and are especially highly represented in disabilities involving self-care (58.9%) and going outside the home (63.6%).  People with less than a high school education account for 11.6% of people without a disability, but they account for 25.0% of people with a disability.  Disability rates may be correspondingly affected, in states that vary from the mean in any of these demographic regards.

Having provided an introduction to state-level measurement of disabilities through the ACS in 2003, largely as interpreted by Weathers (2005), the next step is to examine how ACS federal- and state-level measurements and results were changed in 2008.

Sunday, October 11, 2009

Making a Map with Epi Info

I wanted to create a map that would show changes in some kinds of data on a county-by-county basis for the state of Indiana. To create this map, I downloaded and installed the free Epi Info program from the CDC. To get the outline of Indiana and its counties, I downloaded the appropriate TIGER/Line shapefile from the Census Bureau. I had previously tried downloading one from another source, but it turned out not to be able to accommodate county-by-county data.

I had prepared some data in an Excel spreadsheet. I imported it into Epi Info (Analyze Data > Data > Read (Import). I ran into some problems with this. One problem was that I had a hyphen in the filename, and Epi Info couldn't deal with that. I was also not quite sure what to do with this data, so I backed up and tried using some canned data from a standard source. I went to the Census Bureau's USA Counties Data Files webpage and downloaded the POP01.xls spreadsheet. I edited out the non-Indiana data and saved that spreadsheet. I imported it into Epi Info. I tried to export just one variable in Epi 2000 format and got an error message: "The name specified for the output table is reserved." A Google search turned up no references to that error. I thought that meant I was using the name of an already existing file, but when I tried a different (new) file name, I got the same thing. The filename had indeed been created, but there was nothing in it.

A less restrictive search led to, among other things, an Epi Info Training Manual on the website of the Department of Food Science and Human Nutrition at Iowa State. (You can major in food? Why didn't I think of that?) Their manual said that having a space in your filename could cause problems, but that wasn't the problem for me. Finding that manual made me think of searching for Epi Info manuals specifically, and that led me to the CDC's manual for Epi Info for DOS. I couldn't figure out where that had downloaded itself to, but I finally found it (epi6man.exe) in My Documents\Downloads; but when I ran it from a DOS prompt, it gave me "Access Denied." Apparently it was was created in 1992 (or maybe 1994) and, as such, was designed for old-style DOS systems, not for my CMD window in Windows XP. I found another, much shorter manual produced by the Great Lakes Epidemiology Center. But neither of these seemed to explain this problem.

What I finally figured out was this. I import the data from a spreadsheet or database file. I list the imported contents. So far, so good. Now I want to write the output in Epi 2000 format (which is just .mdb format, i.e., Access). Here's the trick. I didn't need to do this on the computer where I first used Epi Info, but I needed to do it on my home installation. To write the data successfully, *temporarily* chose the same format and filename as your input file, as if you were going to overwrite that file. This will make your tables show up in the Data Table drop-down box. Select the table you want to use. Don't click OK yet. Go back to the File Name box and change it to the new name that you want to create. If necessary, select the table name again. You may have to just type the file name and let it save itself wherever, in order to avoid the problem that comes from browsing to the desired location.

So that solved that problem.

Next, in Epi Info, I clicked on the Map button to open Epi Map. In the upper-left corner, I clicked on the button that looked like a stack of three sheets of paper. This opened Map Manager. I clicked Add Layer and navigated to my Indiana state and counties shapefile (above). This was where I had finally figured out that the other shapefile I had downloaded previously did not have a capacity for county data: its Add Data button was grayed out. This time around, that was not a problem. I clicked Add Data and navigated to the .mdb file I had just written from Analyze Data. After clicking a couple of buttons, it gave me a Select Relate Fields and Render Field dialog with three panes. In the left pane, it showed Shape Fields. In the center, it showed the Geographic Field from my .mdb file, which I had called the County field.

I figured I needed to put my County names in a form that the related Shape Field would recognize, so that the data for Adams County would actually appear in the Adams County part of the map. But what was the right form? I went back to the files that I had unzipped, when I had downloaded the shapefile. There weren't any Read-Me or .pdf files there. There was an .xml file that looked like it contained what I wanted, but I had to try several browsers before I found that Google Chrome would at least show its contents in a normally readable (although unformatted) form. It didn't have much info after all, but it did point me toward a page on TIGER (short for Topically Integrated Geographic Encoding and Referencing system) products, which led to the shapefiles main page, which led to their technical documentation page, where I downloaded their full 185-page Technical Documentation manual, which was already nicely bookmarked (though for some reason it didn't open up that way), wherein I groped around for a while.

As I looked at the Select Relate Fields dialog, I saw two likely candidates: CNTYIDFP00 and COUNTYFP00. One of the unzipped files accompanying the shapefile was a .dbf file, so I tried viewing it in Access, but no go. I tried opening it in Epi Analyze as a dBASE IV file, but it said, "Filenames for this data format must be in the old 8.3 style." So I copied that .dbf file and called the copy TEMP.dbf, and tried opening that in Epi Analyze. I did a Statistics > List and there we were, and I saw there was a third field I had not counted on: the NAMELSAD00 field had complete county names, just like they were in my massaged spreadsheet: Adams County, etc. So, OK, bailing out of that, back in Epi Map I related their NAMELSAD00 field to my County field and selected, as the Render Field, the first of the several (actually, ten) years' worth of data that I wanted to map. Sadly, this gave me an OpenData dialog that said, "There were no matches found between the data table field and the shapefile field." Looking again at the structure of my data table in Access, I saw that the County field was Text type, field size 255. To get comparable information on the shapefile, I went back into that TEMP.dbf file in Epi Analyze and chose Variables > Display. This told me that NAMELSAD00 was a text field, but not its length. I tried a Write (Export) to a TEMP.mdb file in Epi 2000 (probably could have used Access 2000) format. Finally, I got the information I wanted: the NAMELSAD00 field was configured exactly the same as my County field. So what was I doing wrong?

A Google search for that error message turned up nothing. A search for "no matches" in the Technical Documentation .pdf produced - you guessed it - no matches. Once again, a less precise Google search led to some possibilities, including a Cardiff Council manual, presented in Scribd format, but I wasn't getting much mileage there either. A remark in the Technical Documentation said, "Federal Information Processing Series codes will continue to serve as the key matching and joining codes for Census Bureau products." So I thought maybe I should try linking on a numeric field instead of the County name field. In Access, I created another version of my data table, with additional fields for CNTYIDFP00 (which I made primary key) and COUNTYFP00. This dropped two records, due to county names that contained spaces or differed between the two, but I manually reinserted those and marched on. Back in Epi Map's Select Relate Fields dialog, I designated CNTYIDFP00 as my Relate field. That worked. I had myself a map. The last step was to tinker with Map Manager's Properties button, which took a lot of time but yielded great improvement in appearance.

I repeated the last few steps for each of the ten years in my study period. For each finished map, I took a screenshot (Print Screen button on keyboard) and pasted it into IrfanView, where I did some batch cropping so that the state map would remain in the same position on all ten screenshots.  The shapefile I had used also had distorted the state, making it look shorter and thicker than it normally is on maps, so I used IrfanView to batch-adjust the dimensions of each screenshot.  I then imported the images into Adobe Premiere Elements and added transitions and titles to show the year.  I posted the result on YouTube and have also posted an explanation of what it's all about, here on my blog.

Video: Ten Years of Income and Poverty Fluctuations in Indiana

I used Epi Info to make a map of some statistical information from Indiana for 1998.  I made another map of the same data for 1999, and so on through 2007.  I treated the maps as still photos and made a video of them, which is now available on YouTube.  This post explains what the video shows.


The video shows a map of Indiana and its counties over a ten-year period, from 1998 through 2007 inclusive.  The counties are represented in various colors.  The colors show whether counties fared well or poorly during each of those years.  The best outcomes are in deep green.  Nearly neutral outcomes are on the boundary between light green and yellow.  The worst outcomes range from yellow through orange to red.

The calculations behind these maps begin with year-by-year data from the U.S. Census Bureau.  The specific data sources used are described in the other post.   These maps show the relationship between two streams of county-by-county data.  One is the per capita income, adjusted for inflation.  The other is the number of people in poverty. 

For both of these data streams, I calculated rates of change.  So, for example, the video begins with 1998.  Warren County, at the left edge of the state, appears in red.  Red indicates an extreme divergence between changes in per capita income and in poverty rates.  In most if not all cases, moreover, red indicates that the divergence is undesirable.

In the case of Warren County in 1998, the situation is as follows.  Per capita income dropped very slightly (i.e., by only 0.2%), from $23,577 in 1997 to $23,527 in 1998.  Unfortunately, the number of persons in poverty rose 13.5%, from 644 in 1997 to 731 in 1998.  So there was not a general recession or other drop in earnings shared equally by everyone.  Indeed, the stable per capita income raises the question of how many people actually experienced an increase in income.

Warren County stands out, in 1998, because the ratio of its change in poverty rate to its change in per capita income was greater than 60:1.  It stands out in red because that change signals bad news for poor people.  It would have stood out in green if the ratio had been 60:1 in poor people’s favor – if, that is, there had been a slight increase in income and a dramatic decrease in poverty.  In that case, it would seem that the county channeled much of its additional prosperity, that year, into an improvement in the conditions of the poor.

The color scheme used on these maps, then, ranges from red down through orange to yellow, as the bad news for poor people becomes progressively less bad, and from yellowish green up through deep green as the good news for poor people becomes progressively better.  The color gradations go in steps of twenty:  that is, red is for a ratio of worse than (negative) 60:1; a dark shade of orange accounts for ratios between 40:1 and 60:1; a lighter shade of orange represents ratios between 20:1 and 40:1; and so on down to zero and then up through the deepest green at 60:1.  There is nothing magical about those particular gradations.  They were chosen for simplicity.  The seas of yellow and green that appear in a few years depicted in the video suggest that closer gradations might have provided more information.

One step I took that now appears to have been a mistake was to eliminate a half-dozen extreme values that I considered outliers.  Had I not done that, there would have been a handful of additional counties shown in the deepest reds and greens throughout this ten-year period.

This presentation does not purport to be definitive, or even scholarly.  Along the lines suggested in the refinements just mentioned, a high-quality product would call for manual analysis of a number of counties, like the analysis of Warren County provided above, so as to insure that representative and appropriate colors were used for all counties.  Data and calculations used here have not been carefully proofread.  The spot checks that I have done do seem to indicate accuracy in the basic calculations.

One technical refinement that will become more feasible in future years, as data become more readily available, could involve a finer-grained analysis by zip code and/or census tract.  Another refinement worth considering would be to overlay an indication of population centers.  Also, if the video were converted to, say, a PDF, it would also be possible to create links or tooltips for each county, so that mousing over or clicking on a county would bring up or lead to the underlying data.

The video suggests some areas for further inquiry.  It appears, in my review thus far, that a number of the most extreme contrasts appear in counties in the regions of Chicago, Evansville, Indianapolis, and Louisville – and also around West Lafayette.  Also, it seems that some counties tend to experience the same trends:  they are the same colors as one or more of their neighbors in most if not all of the years depicted.  There also appear to be years of greater and lesser homogeneity among the counties – such as the contrast between 1998 and 1999.  Closer investigation of sharp divergences among neighboring counties (such as in 2004) could also lead to indicia of balkanization, where large employers or governmental policies yield marked departures from (and possibly distortions in) the general tendency in the state for the year.

As noted in the other post, there were some technical difficulties in the preparation of this video.  It was, nonetheless, an interesting project.  I hope the links provided in these posts, and the techniques used in the video, lead me and/or others to undertake further analyses of this kind.

Saturday, February 23, 2008

Recession: Expect More Flakiness

When everyone has money and is busy, they don't have time to sweat the small stuff. That cuts both ways. On one hand, they are more likely to overlook details that may matter to someone else. It's not important to them, therefore it just doesn't seem very important, period. On the other hand, if someone does catch them on something they overlooked, they are more inclined to just pay the money or do whatever seems necessary to take care of it in the easiest possible way. It's different when people have less money and more time. They are more likely to notice the details that weren't handled quite right, because now those little amounts of money seem more important. They have the time to fool with the details, and the time to hassle those who aren't paying attention. In this sense, it can be more difficult to get away with small crimes and offenses in hard times. A countervailing factor is that small offenses are likely to be more common in hard times. When everyone has money, it's pretty much assumed that everyone will pay their bills on time, that broken stuff will be fixed properly, and that generally things will work as they should. But when people don't have money, or are afraid of losing what they've got, they are likely to be more flaky. They will want to be adjusting or backing out of deals and looking for squirrely ways to save a buck. Poor countries are not known for their crisp, efficient handling of problems. As more people find it necessary or helpful to scrounge for the occasional extra little bit, it seems likely that corruption and complication will be increasingly likely, in situations where one would not previously have expected such behaviors. It is ironic, because this theory implies that the countries that most desperately need efficiency are least able to achieve it. If this prediction of one aspect of future life in America should prove accurate, it will reflect an unfortunate and ironic fact. There was a long period of time, a half-century or more, in which the U.S. had a unique opportunity to shape the terms of trade around the world. There was sufficient power to make a tremendous impact upon the processing of routine transactions in developing nations -- transactions that sometimes meant everything to the powerless. Rather than stand for corporate power and the accumulation of wealth by a few, the international image of the U.S. could now be that of a power that believed in its touted principles -- of equality, for example, and of the rule of law over all citizens. The current business climate in places like China could have been influenced favorably. Now, instead, the U.S. economy is increasingly at risk of coming to resemble that of a developing nation. Having failed to make the world a better place in this regard, we may find ourselves forced to live in the world we have helped to create.