Promoting diversity in education and employment requires us to rethink testing and "meritocracy."
Susan Sturm and Lani Guinier
For more than two decades, affirmative action has been under sustained assault. In courts, legislatures, and the media, opponents have condemned it as an unprincipled program of racial and gender preferences that threaten fundamental American values of fairness, equality, and democratic opportunity. Such preferences, they say, are extraordinary departures from prevailing "meritocratic" modes of selection, which they present as both fair and functional: fair, because they treat all candidates as equals; functional, because they are well suited to picking the best candidates.
Thischallenge to affirmative action has met with concerted response. Defenders argue that affirmative action is still needed to rectify continued exclusion and marginalization. And they marshal considerable evidence showing that conventional standards of selection exclude women and people of color, and that people who were excluded in the past do not yet operate on a level playing field. But this response has largely been reactive. Proponents typically treat affirmative action as a crucial but peripheral supplement to an essentially sound framework of selection for jobs and schools.
We think it is time to shift the terrain of debate. We need to situate the conversation about race, gender, and affirmative action in a wider account of democratic opportunity by refocusing attention from the contested periphery of the system of selection to its settled core. The present system measures merit through scores on paper-and-pencil tests. But this measure is fundamentally unfair. In the educational setting, it restricts opportunities for many poor and working-class Americans of all colors and genders who could otherwise obtain a better education. In the employment setting, it restricts access based on inadequate predictors of job performance. In short, it is neither fair nor functional in its distribution of opportunities for admission to higher education, entry-level hiring, and job promotion.
To be sure, the exclusion experienced by women and people of color is especially revealing of larger patterns. The race- and gender-based exclusions that are the target of current affirmative action policies remain the most visible examples of bias in ostensibly neutral selection processes. Objectionable in themselves, these exclusions also signal the inadequacy of traditional methods of selection for everyone, and the need to rethink how we allocate educational and employment opportunities. And that rethinking is crucial to our capacity to develop productive, fair, and efficient institutions that can meet the challenges of a rapidly changing and increasingly complex marketplace. By using the experience of those on the margin to rethink the whole, we may forge a new, progressive vision of cross-racial collaboration, functional diversity, and genuinely democratic opportunity.
Affirmative Action Narratives
Competing narratives drive the affirmative action debate. The stock story told by critics in the context of employment concerns the white civil servant–say a police officer or firefighter–John Doe. (Similar stories abound in the educational setting.) Doe scores several points higher on the civil service exam and interview rating process, but loses out to a woman or person of color who did not score as high on those selection criteria.
Doe and others in similar circumstances advance two basic claims: first, that they have more merit than beneficiaries of affirmative action; and second, that as a matter of fairness they are entitled to the position for which they applied. Consider these claims in turn.
The idea of merit can be interpreted in a variety of ways: for example, as a matter of desert (because they were next in line, based on established criteria of selection, they deserve the position), or as earned recognition (“when an individual has worked hard and succeeded, she deserves recognition, praise and/or reward”). But, most fundamentally, arguments about merit are functional: a person merits a job if he or she has, to an especially high degree, the qualities needed to perform well in that job. Many critics of affirmative action equate merit, functionally understood, with a numerical ranking on standard paper-and-pencil tests. Those with higher scores are presumed to be most qualified, and therefore most deserving.
Fairness like merit, is a concept with varying definitions. The stock story defines fairness formally. Fairness, it assumes, requires treating everyone the same: allowing everyone to enter the competition for a position, and evaluating each person’s results the same way. If everyone takes the same test, and every applicant’s test is evaluated in the same manner, then the assessment is fair. So affirmative action is unfair because it takes race and gender into account, and thus evaluates some test results differently. A crucial premise of this fairness challenge to affirmative action is the assumption that tests afford equal opportunity to demonstrate individual merit, and therefore are not biased.
Underlying the standard claims about merit and fairness, then, is the idea that we have an objective yardstick for measuring qualification. Institutions are assumed to know what they are looking for (to continue the yardstick analogy, length), how to measure it (yards, meters), how to replicate the measurement process (using the ruler), and how to rank people accordingly (by height). Both critics and proponents of affirmative action typically assume that objective tests for particular attributes of merit–perhaps supplemented by subjective methods such as unstructured interviews and reference checks–can be justified as predictive of performance, and as the most efficient method of selection.
MeritFairness, and Testocracy
The basic premise of the stock narrative is that the selection criteria and processes used to rank applicants for jobs and admission to schools are fair and valid tests of merit. This premise is flawed. The conventional system of selection does not give everyone an equal opportunity to compete. Not everyone who could do the job, or could bring new insights about how to do the job even better, is given an opportunity to perform or succeed. The yardstick metaphor simply does not withstand scrutiny.
For present purposes, we accept the idea that capacity to perform–functional merit–is a legitimate consideration in distributing jobs and educational opportunities. But we dispute the notion that merit is identical to performance on standardized tests. Such tests do not fulfill their stated function. They do not reliably identify those applicants who will succeed in college or later in life, nor do they consistently predict those who are most likely to perform well in the jobs they will occupy. Particularly when used alone or to rank-order candidates, timed paper-and-pencil tests screen out applicants who could nevertheless do the job.
Those who use standardized tests need to be able to identify and measure successful performance in the job or at school. In both contexts, however, those who use tests lack meaningful measures of successful performance. In the employment area, many employers have not attempted to correlate test performance with worker productivity or pay. In the educational context, researchers have attempted to correlate standardized tests with first-year performance in college or post-graduate education. But this measure does not reflect successful overall academic achievement or performance in other areas valued by the educational institution.
Moreover “successful performance” needs to be interpreted broadly. A study of three classes of Harvard alumni over three decades, for example, found a high correlation between “success”–defined by income, community involvement, and professional satisfaction–and two criteria that might not ordinarily be associated with Harvard freshmen: low SAT scores and a blue-collar background. When asked what predicts life success, college admissions officers at elite universities report that, above a minimum level of competence, “initiative” or “drive” are the best predictors.
By contrast, the conventional measures attempt to predict successful performance, narrowly defined, in the short-run. They focus on immediate success in school and a short time-frame between taking the test and demonstrating success. Those who excel based on those short-term measures, however, may not in fact excel over the long-run in areas that are equally or more important. For example, a study of graduates of the University of Michigan Law School found a negative relationship between high LSAT scores and subsequent community leadership or community service.
Those with higher LSAT scores are less likely, as a general matter, to serve their community or do pro bono service as a lawyer. In addition, the study found that admission indexes–including the LSAT–fail to correlate with other accomplishments after law school, including income levels and career satisfaction.
Standardized tests may thus compromise an institution’s capacity to search for what it really values in selection. Privileging the aspects of performance measured by standardized tests may well screen out the contributions of people who would bring important and different skills to the workplace or educational institution. It may reward passive learning styles that mimic established strategies rather than creative, critical, or innovative thinking.
Finally, individuals often perform better in both workplace and school when challenged by competing perspectives or when given the opportunity to develop in conjunction with the different approaches or skills of others.
The problem of using standardized tests to predict performance is particularly acute in the context of employment. Standardized tests may reward qualities such as willingness to guess, conformity, and docility. If they do, then test performance may not relate significantly to the capacity to function well in jobs that require creativity, judgment, and leadership. In a service economy, creativity and interpersonal skills are important, though hard to measure. In the stock scenario of civil service exams for police and fire departments, traits such as honesty, perseverance, courage, and ability to manage anger are left out. In other words, people who rely heavily on numbers to make employment decisions may be looking in the wrong place. While John Doe scored higher on the civil service exam, he may not perform better as a police officer.
Scores on standardized tests are, then, inadequate measures of merit. But are the conventional methods of selecting candidates for high-stakes positions fair? The stock affirmative action narrative implicitly embraces the idea that fairness consists in sameness of treatment. But this conception of fairness assumes a level playing field–that if everyone plays by the same rules, the game does not favor or disadvantage anyone.
An alternative conception of fairness–we call it “fairness as equal access and opportunity”–rejects the automatic equation of sameness with fairness. It focuses on providing members of various races and genders with opportunities to demonstrate their capacities and recognizes that formal sameness can camouflage actual difference and apparently neutral screening devices can be exclusionary. The central idea is that the standards governing the process must not arbitrarily advantage members of one group over another. It is not “fair,” in this sense, to use entry-level credentials that appear to treat everyone the same, but in effect deny women and people of color a genuine opportunity to demonstrate their capacities.
On this conception, the “testocracy” fails to provide a fair playing field for candidates. Many standardized tests assume that there is a single way to complete a job, and assess applicants solely on the basis of this uniform style. In this way, the testing process arbitrarily excludes individuals who may perform equally effectively, but with different approaches.
For example, in many police departments, strength, military experience, and speed weigh heavily in the decision to hire police officers. These characteristics relate to a particular mode of policing focusing on “command presence” and control through authority and force.
If the job of policing is defined as subduing dangerous suspects, then it makes sense to favor the strongest, fastest, and most disciplined candidates. But not every situation calls for quick reaction time. Indeed, in some situations, responding quickly gets police officers and whole departments in trouble.
This speed-and-strength standard normalizes a particular type of officer: tough, brawny, and macho. But other modes of policing–dispute resolution, persuasion, counseling, and community involvement–are also critical, and sometimes superior, approaches to policing. One study of the Los Angeles Police Department, conducted in the wake of the Rodney King trials, recommended that the department increase the number of women on the police force as part of a strategy to reduce police brutality and improve community relations. The study found that women often display a more interactive and engaged approach to policing.
Similarly, an informal survey of police work in some New York City Housing Authority projects found that many women housing authority officers, because they could not rely on their brawn to intimidate potential offenders, developed a mentoring style with young adolescent males. The women, many of whom came from the community they were patrolling, increased public safety because they did not approach the young men in a confrontational way. Their authority was respected because they offered respect.
The retention and success of new entrants to institutions often depend on expanding measures of successful performance. But because conventional measures camouflage their bias, one-size-fits-all testocracies invite people to believe that they have earned their status because of a test score, and invite beneficiaries of affirmative action to believe exactly the opposite–that they did not earn their opportunity. By allowing partial and underinclusive selection standards to proceed without criticism, affirmative action perpetuates an asymmetrical approach to evaluation.
In addition to arbitrarily favoring certain standards of performance, conventional selection methods advantage candidates from higher socioeconomic backgrounds and disproportionately screen out women and people of color, as well as those in lower income brackets. When combined with other unstructured screening practices, such as personal connections and alumni preferences, standardized testing creates an arbitrary barrier for many otherwise-qualified candidates.
The evidence that the testocracy is skewed in favor of wealthy contestants is consistent and striking. Consider the linkage between test performance and parental income. Average family income rises with each 100-point increase in SAT scores, except for the highest SAT category, where the number of cases is small. Within each racial and ethnic group, SAT scores increase with income.
Reliance on high school rank alone excludes fewer people from lower socioeconomic backgrounds. When the SAT is used in conjunction with high school rank to select college applicants, the number of applicants admitted from lower-income families decreases. This is because the SAT is more strongly correlated with every measure of socioeconomic background than is high school rank.
[back] [top] [next]