Deep roots of admixture-related cognitive differences in the USA?

This study attempts to determine if the association between admixture and cognitive ability among African, European, and Amerindian descent groups in the USA holds across a large time period. First, we use the large and nationally representative Adolescent Brain Cognitive Development Study (ABCD) sample to examine the associations between cognitive ability, socially identified-race, genetically-predicted color, and genetic ancestry among Puerto Ricans, and non-Hispanic Whites, Blacks, and American Indians in the 21 st century. Second, we use the 1850 to 1930 US censuses to see if we can trace ancestry-associated cognitive differences back to the 19th and early 20th century by taking advantage of early census distinctions by blood and also by using age-heaping based numeracy as a proxy for cognitive ability. In the ABCD sample, we find that European ancestry is positively associated with cognitive ability within race/ethnic groups ( r s =.05 to.47; r weighted-average =. 10). In the census data, among African Americans and American Indians but not among Puerto Ricans, we find that greater apparent European admixture is associated with higher numeracy and that this holds when we subset data by age, sex, and literacy-status. The implications of these findings are discussed.

VI, which represents deeply pigmented dark brown to darkest brown skin (scores 35-36). To create a single measure of color, we calculated the weighted medium score of each type using the probability of each type as detailed in (Lasker et al., 2019). Based on the Fitzpatrick scores, we also computed three broad color categories (Type I-IV, "palest to moderate brown"; Type V, "dark brown", Type VI, "deeply pigmented dark brown").

General cognitive ability
The dataset used in this study, known as ABCD, includes data from 11 cognitive tests primarily obtained from the NIH Toolbox battery. These tests include Picture Vocabulary, Flanker, List Sorting, Card Sorting, Pattern Comparison, Picture Sequence Memory, Oral Reading Recognition, the Matrix test from the WISC-4, Little Man Test, and Rey's Auditory Learning immediate and delayed recall tests. To ensure that age and sex differences did not impact the study's results, the test data were adjusted for these variables. We utilized the IRMI algorithm to impute missing data, as this approach has been validated and produces reproducible results. Only 10.3% of the cells were missing, and 48% of the cases had some missing data. We imputed data for subjects with no more than five missing data points. After the data were imputed, 1.3% of the cells were missing, and 98.2% of the subjects had complete data. Subjects with remaining missing data were not included in the analyses. For our study, we employed exploratory factor analysis (EFA) utilizing the psych package (Revelle & Revelle, 2015) to extract the first factor from the 11 neurocognitive tests administered at baseline. The resulting general factor accounted for 35% of the variance in test scores, which is slightly lower than the typically observed percentage of >40%. We attributed this finding to the inclusion of a larger number of working memory tests in our set. In contrast to multigroup confirmatory factor analysis -as used in previous studies (e.g., Fuerst, Hu, & Connor, 2021) -we opted to focus on EFA. The reason for this decision was that we did not want to commit to a specific model regarding the nature of cognitive differences between race/ethnic groups, such as the popularly known Spearman's Hypothesis. Thus, our approach allowed for a more exploratory and flexible analysis, which is particularly relevant when investigating complex constructs such as cognitive ability. However, to address a reviewer's concern we also include g scores saved the data collection site and same-family identifiers within the sample. This approach enables the possibility of correlations in the error term within data collection sites or families with multiple tested individuals. This model aligns with the one used in the ABCD Data Exploration and Analysis Portal (DEAP), as noted by Heeringa and Berglund (2021). Consequently, using this multilevel model facilitates replication. To execute the mixed-effects regression models, we utilized the lmer command from the lme4 package (Bates et al., 2009). For these analyses, Fitzpatrick color scores were standardized on the study sample of N = 8344 individuals.
One of the reviewers suggested that there may be serious collinearity, which could potentially bias our regression results, between our ancestry variables and color. However, in admixed populations, genetic crossover and segregation would theoretically attenuate the correlations between race-associated traits and global genetic ancestry, especially for relatively simple traits such as color. The extent of this attenuation is an empirical question. To address this concern, we have included the correlation matrices for ancestry components and color in the supplemental file. As can be seen, African ancestry is the non-European ancestry which has the highest correlation with color. For the White, Black, Indian, and Puerto Rican samples, the sample weighted correlations are, respectively, rs =. 16,.52,.56, and.46. These correlations are in line with previously reported results (Fuerst, Hu, & Connor, 2021;Lasker et al., 2019) and indicate that two members of the same race/ethnic group can have the same amount of African ancestry and yet differ substantially in skin color.  1850, 1860, 1870, 1880, 1990, 1910, 1920, and 1930.

Variables for age-heaping analyses
For the purpose of the analysis using the census data, various variables were computed. These variables, which are detailed in Table 2, are listed below.

Sex
Interviewers were asked to record the sex of the household inhabitants (Male =1; Female =1).

Age
Interviewers were notified about the tendency for individuals to age-heap and were instructed to ascertain exact ages if possible. Based on the age variable, we created an age 23-62 cohort and four ten year interval subcohorts (23-32; 33-42; 43-52; 53-62). Results for the age 23-62 cohort are of primary interest, while those for the four ten year subcohorts were Qeios, CC-BY 4.0 · Article, June 14, 2023 computed to assess if the primary results are due to age structure effects. This is possible since different age cohorts are known to have different age-heaping patterns and since admixture groups could differ in their age structure.

Color or Race
Interviewers were asked to record "Color" (1850-1880) or "Color or Race" . We focus on the White, Black, and American Indian groups. In the 1850-1880 and the 1910-1920 census, interviewers were also asked to carefully distinguish between Blacks who were "full-blooded negroes" and Mulattoes who were "Negroes having some proportion of white blood" (1920). Dummy variables for Black, Mulatto, White, and Indian race/color were created. Note, while some may find the term "Mulatto" offensive, we retain the term since it was the official designation used for the admixed group in the original datasets.

Blood Quantum
In 1900 and 1910, special Indian schedules were included in the census. Interviewers were asked to ascertain, through inquiry with older men of the tribe, if an individual was a full-blooded American Indian. If not, interviewers were instructed to record the fraction of White blood which the American Indian had. Following Thornton and Young-DeMarco (2021), we created four blood quantum categories for American Indians: Full-blooded Indians, greater than 0% White and less than 25%, greater than 25% White and less than 50%, and greater than 50% White. A small number of American Indians were recorded as having 100% White blood (despite being marked as belonging to the Indian, and not White, race). These individuals were included in the greater than 50% White category; their inclusion/exclusion did not have an interpretatively significant effect on the results.

Full-blooded and Mixed-blooded Indian
Using the Blood Quantum data in the 1900 and 1910 censuses, we coded American Indians (excluding Whites living on reservations) as Full-blooded (meaning 0% White blood) and Mixed-blooded Indians (meaning greater than 0% White blood). In 1930, interviewers were asked to record if Indians were Full-blooded or Mixed-blooded. Some interviewers reported % of Indian blood. For 1930, we coded American Indians as Full-blooded if they were either reported as having 100% Indian blood or as being Full-blooded and as Mixed-blooded Indians if they were either reported as having less than 100% Indian blood or as being Mixed-blooded.

Slavery legal in 1861 and Slavery illegal in 1861
We coded the 50 USA states by whether they corresponded with a slave state/territory in 1861 or a slavery-free state /territory. We then created two dummy variables for residence, Slavery legal in 1861 and Slavery illegal in 1861.

USA-born
Interviewers were asked to record the state, territory, or nation of birth of the household members. We created a dummycoded USA-born variable, coded "1" if the respondent was born in a contemporaneous US state and "0" if otherwise. Qeios, CC-BY 4.0 · Article, June 14, 2023 Interviewers assessed whether respondents were literate. How this was done was not reported. Respondents were coded as literate if they could both read and write. Literacy was used to control for familiarity with written material which might include records about the participants' age.

Variables Description Code
Sex respondent sex   3.2. 1850, 1860, 1870, 1880, 1910, 1920 We computed numeracy for USA-born Whites, free Mulattoes, and free Blacks for the 1850 to 1920 censuses. First, we analyzed data for individuals aged 23-62 using the 10% random sample, and then we analyzed data for individuals aged 33-42 using the 40% random sample. We only computed numeracy for the 23-62 and 33-42 age cohorts since the 33-42 age subsample was large, making it unnecessary to compute numeracy for all age groups. Estimates were decomposed by residence (Slavery legal in 1861 vs. Slavery illegal in 1861) and literacy.

1900 & 1910 Indian schedule samples and the 1930 5% Indian sample
Beginning in 1890, all American Indians, including those on reservations, were enumerated. However, the 1890 data were mostly lost due to a fire, so data on all American Indians is first available in 1900. In the 1900 and 1910 censuses, information on Indians on reservations and in the general population was added to an Indian Schedule (along with information on non-Indians living with Indian families on reservations). Those listed on the Indian Schedule were uniquely asked questions about tribal affiliation and blood quantum. For these analyses, we first computed numeracy for American Indians by census year, blood quantum, and literacy. For comparison, we also computed numeracy for Whites living on reservations with Indian families. Next, we divided the Indian samples by age cohort. Owing to small numbers for older age groups, we computed numeracy only by Full-or Mixed-blooded status when splitting the data by age cohorts.
In 1930, interviewers were asked to report if an American Indian was Full-blooded or Mixed-blooded. While some interviewers reported blood quantum, most simply categorized American Indians as either Full-or Mixed-blooded. As such, we did not compute numeracy by blood quantum for the 1930 census. Instead, we divided the Indian samples by age cohort and we computed numeracy by Full-or Mixed-blooded status and by literacy.

1910 & 1920 12% Puerto Rican sample
The first USA-based census for Puerto Rico was conducted in 1910. We computed numeracy for USA-born Whites residing in Puerto Rico, and Puerto Rican-born individuals identified as White, Mulatto, or Black in the 1910 and 1920 censuses. The USA-born Whites would have been mostly of European ancestry in origin, while Puerto Rican-born Qeios, CC-BY 4.0 · Article, June 14, 2023 individuals would have been of admixed African, European, and Amerindian ancestry.

Calculation of numeracy
We limited ourselves to individuals aged 23 to 62 since these are the most stable age groups for computing age-heaping using the Whipple Index ( Szołtysek et al., 2018). Age heaping was computed for both males and females separately. We focus on the results for males because during this time period, the head of the household was more often male and because the census questions were directed to the household head. Results for females are provided in the supplemental file.
The Whipple index, which is applied to test for age-heaping, is calculated as the sum of the number of persons who report ages ending in 5 or 0, divided by the sum of the total number of persons and then multiplied by 5. The formula is: where Px is the population of age x in completed years.
The Whipple index can be transformed into an index, called ABCC, which is an estimation of the proportion of the population that can accurately report ages, without rounding. The formula is: where W is the Whipple index. The ABCC value represents the share of the population who know their correct age. The ABCC index can be transformed into a standard-deviation-unit metric using an inverse cumulative transformation, which Reardon and Ho (2015) denote as dtpac. The formula is: where ABCC a and ABCC b are the ABCC variables for population a and b, respectively. On the assumption of normality and equal variances, dtpac is equivalent to Cohen's d (Reardon and Ho, 2015).

Analyses
Sampling weight (variable PERWT) was applied as recommended by the IPUMS because the person-level analysis is conducted on "flat" samples in which each observation, whether a household or individual, represents a fixed number of persons in the general US population. The analysis was performed in R, using the following packages: ipumsr, dplyr, simPop, psych. We used the whipple() function of the simPop package.
While the hypothesis is that admixture will be related to cognitive ability both in the 19 th / early 20 th century and also in the early 21 st century, we do not attempt to compare magnitudes of effects across centuries because the two cognitive ( ) measures (age-heaping based numeracy and g, respectively) are psychometrically very different. As such, we focus on a qualitative evaluation.

Results
3.1. 21 st -century results based on the ABCD sample Ricans with lighter skin tones (Type I-IV) nonetheless score worse on cognitive tests than those with darker skin tones (Type V and VI).  Table 4 shows the weighted correlation matrices for each of the four race/ethnic groups. The magnitudes of the correlations depend on the variance in genetic ancestry proportions within groups. Since the variability of genetic ancestry is often low, the correlations are correspondingly often low. Moreover, since variance in ancestry differs substantially across groups (as seen in Table 3), the correlation coefficients are not directly comparable across groups.  indicates approximately normal distributions. Among Whites (Model 1), both African and Amerindian ancestry are predictors of lower g scores. Among Blacks (Model 2), African ancestry is associated with lower g scores. In this group, White SIRE, but not color, is also statistically significantly related to g. Among American Indians (Model 3), African ancestry is associated with lower g scores. Among Puerto Ricans (Model 4), both African and Amerindian ancestry are negatively associated with g scores, while the reverse holds for color. Across all groups, we see that African ancestry tends to be negatively related to lower g scores, whereas this is not the case with color when also taking into account ancestry.   Table 9 shows the results for Mixed and Full-blooded American Indians by age group. As seen, there is substantial variability across ages. This could be due to the modest sample sizes in conjunction with ceiling effects for some of the American Indians identified as Mixed-blooded are more numerate than those identified as Full-blooded. 4. Discussion cognitive ability in Puerto Rico and that differences are not being vertically transmitted on the island. This latter alternative hypothesis seems to be less likely given the results for Mainland Puerto Ricans and since educational attainment has been found to positively correlate with European vs. African genetic ancestry in Puerto Rico (Kirkegaard et al., 2017); nonetheless, this possibility should be investigated in future studies.
Socially-identified race/ethnic groups, whether based on appearance or parent/self-report, need not track genetic ancestry well. This is especially the case after many generations of admixture, as in the case of Puerto Ricans. This is because the correlations between genetic ancestry, self-identified race, and ancestry-associated phenotype, such as color, can become attenuated after a number of generations of admixture. Due to this, modern methods using admixture regression can be used to statistically separate effects related to genetic ancestry from ones related to skin color and/or selfidentified group as is done in the present study or in one other recent study (Fuerst, Hu, & Connor, 2021).
Understanding the nature of self-reported race/ethnic-related disparities in cognitive ability, and how these differences are transmitted across generations, is necessary to reduce both the differences and their social impacts. Race/ethnicity is multifaceted and involves appearance, cultural background, self-identity, and geographic ancestry (Roth, 2016). In some cases, government-defined race/ethnic categories, in the USA, describe groups with similar cultural characteristics (e.g., Hispanic: "Spanish culture or origin, regardless of race"), or with similar genetics (e.g., Black: "origins in any of the black racial groups of Africa"), but in other cases there seems to be little genetic or cultural basis for the groupings (e.g., Asian: "A person having origins in any of the original peoples of the Far East, Southeast Asia, or the Indian subcontinent").
Therefore, evaluating the independent contribution of factors related to genetic ancestry, common culture, and other dimensions related to socially defined race/ethnicity and/or color can help in identifying the source of group differences (Fuerst, Hu, & Connor, 2021). This issue is obviously also relevant to concerns about social inequality, as focusing exclusively on socially identified race/ethnicity, ignores possible race-related inequalities within socially defined groups.
The most obvious explanation for a substantial association between genetic ancestry and cognitive ability within groupsespecially when conspicuous phenotypes, and possible discriminatory factors related to them, are controlled for -is inherited disadvantage. This model, as with the similar racial-cognitive ability-socioeconomic (R~CA-S) hypothesis detailed by Fuerst & Kirkegaard (2016) and by Hu et al. (2019), does not specify a reason for the source population differences or a mechanism of inheritance (e.g., family environment or genes). For example, owing to trait-biased migration or to cultural norms related to exogamy, one source population could be a genetically selective sample. And, as a result of this selectivity, there could be phenotypic differences between source (sub) populations and these would transmit across generations when within group heritabilities were nontrivial. Generally, the reasons for the original differences and the mechanisms by which differences are transmitted is a topic for future research.
As noted, the inherited disadvantage model does not specify mechanisms for vertical transmission -this could occur through cultural or genetic pathways. An alternative explanation for the association between ancestry and cognitive ability is phenotypic based discrimination or so-called "colorism". Two designs have been proposed to disentangle intergenerational effects from discriminatory ones: sibling and admixture regressions studies. Shibaev & Fuerst (2023) reviewed published sibling studies and report that while light or more European looking full-siblings tended to have slightly Qeios, CC-BY 4.0 · Article, June 14, 2023 Qeios ID: CCN648.7 · https://doi.org/10.32388/CCN648.7 25/30 better academic-related outcomes then their darker siblings, the vast majority of the association between appearance and academic outcomes is due to family factors. The authors further ran an admixture-regression analyses and found that European appearance had no effect independent of genetic ancestry on cognitive ability, thus replicating previous results (e.g., Lasker et al., 2019). In the present analyses, genetically-predicted darker skin color was only associated with g, independent of ancestry, among Puerto Ricans; moreover, this association was positive not negative and so inconsistent with the predictions of a colorism model. Overall, studies which attempt to disentangle intergenerational and discriminatory models have provided little support for the latter in contrast to the former.
Future studies on ethnic/racial cognitive differences need to consider genetic ancestry, since cognitive ability differences seem to be strongly related to genetic ancestry independent of socially-defined race/ethnicity and color (Fuerst, Hu, & Connor, 2021;Kirkegaard et al., 2019;Lasker et al., 2019;Warne, 2020). To ameliorate ancestry-associated differences and the social consequences of these it will be necessary to better understand the reason for the association between genetic ancestry and g. Despite recognizing the importance of general cognitive ability, societal factors such as the declining availability of public housing, which disproportionately affects minorities, can also account for the persistence of race and ethnic differences in economic outcomes to some extent (Goetz, 2011). That genetic ancestry largely statistically explains group differences in cognitive ability does not imply that it must also mostly explain differences in social outcomes, such as income and educational attainment. Whether this is the case is something that could also be explored using the admixture regression design.