The concepts of band-width and fidelity are analogous to characteristics of a

Department of Psychiatry, Laboratory of Personality and Development, University of Rochester Medical Center

Find articles by Benjamin P. Chapman

Disclaimer

Benjamin P. Chapman, Department of Psychiatry, Laboratory of Personality and Development, University of Rochester Medical Center;

Benjamin P. Chapman, University of Rochester Medical Center, Department of Psychiatry, Laboratory of Personality and Development, 300 Crittenden Boulevard, Rochester, NY 14642, ude.retsehcoR.CMRU@nampahC_neB

Copyright notice

The publisher's final edited version of this article is available free at J Pers Assess

Abstract

Many users of the NEO-Five Factor Inventory (NEO-FFI; ) are unaware that developed item cluster subcomponents for each broad domain of the instrument similar to the facets of the Revised NEO Personality Inventory (). In this study, I examined the following: the replicability of the subcomponents in young adult university and middle-aged community samples; whether item keying accounted for additional covariance among items; subcomponent correlations with a measure of socially desirable responding; subcomponent reliabilities; and subcomponent discriminant validity with respect to age-relevant criterion items expected to reflect varying associations with broad and narrow traits. Confirmatory factor analyses revealed that all subcomponents were recoverable across samples and that the addition of method factors representing positive and negative item keying improved model fit. The subcomponents correlated no more with a measure of socially desirable responding than their parent domains and showed good average reliability. Correlations with criterion items suggested that subcomponents may prove useful in specifying which elements of NEO-FFI domains are more or less related to variables of interest. I discuss their use for enhancing the precision of findings obtained with NEO-FFI domain scores.

Most researchers have agreed that personality traits can be measured in hierarchical tiers that vary in generality (; ; ; ). Traits at higher levels of multistratum taxonomies are merely composites of more fine-grained specific traits. For example, within the Five Factor Model of personality (; ; ; ), the broad trait of Conscientiousness is composed of narrow attributes such as self-discipline, achievement orientation, and orderliness. The commonly used Revised NEO Personality Inventory (NEO-PI-R; ) permits users to capture information about such higher order broad traits as well as the lower order, narrow traits that compose them (cf. ).

The 60-item NEO-Five Factor Inventory (NEO-FFI; ) is a brief version of the NEO-PI-R designed to provide speedy and convenient measurement of the Five Factor Model domains; however, it does so by relinquishing information about the narrow traits that comprise each broad factor. To remedy this, derived item cluster subcomponents for the NEO-FFI that provided a more specific level of trait measurement. These subcomponents are somewhat similar, although not as specific, as the facets of the full length NEO-PI-R and are listed in Table 1. In their development, these subcomponents evidenced strong average internal consistency comparable to NEO-PI-R facets (e.g., an average Cronbach’s α of .70 in a development sample and .66 in a cross-validation sample compared to average NEO-PI-R facet α of .70) and captured the majority (although not all) of the content of the NEO-PI-R facets (cf. ). Table 1 also reveals that the Neuroticism domain may be decomposed into either two or three item-cluster subcomponents; the former scoring scheme was empirically derived (as were all other sets of subcomponents), whereas the latter was rationally derived by Saucier to provide separate measures of anxiety and depression.

TABLE 1

NEO–Five Factor Inventory Item Cluster Subcomponents

Domain and SubcomponentsItemsPossible Description of High ScorersNeuroticism (alternative 1) Self Reproach6, 21, 26, 36, 41, 51, 56Feels inferior, worthless, helpless, reactive, tense, ashamed Negative Affect1α , 11, 16, 31α , 46Worried, stressed, anxious, blue, depressedNeuroticism (alternative 2) Self Reproach6, 11, 26, 51Feels inferior, worthless, helpless, stressed Anxiety1α , 21, 31αWorried, fearful, tense, anxious Depression16α , 41, 46αBlue, discouraged, sad, depressedExtraversion Positive affect7, 12α , 37, 42αLight-hearted, cheerful, optimistic Sociability2, 17, 27α , 57αGregarious, enjoys others, prefers company Activity22, 32, 47, 52Energetic, active, fast paced, action seekingOpenness Aesthetic interests13, 23α , 43Artistic, poetic, aesthetically sensitive Intellectual interests48α , 53, 58Abstract, philosophical, intellectual Unconventionality3α , 8α , 18α , 38αNonconforming, free thinking, whimsicalAgreeableness Nonantagonistic orientation9α , 14α , 19, 24α , 29α , 44α , 54α , 59αCooperative, trusting, amiable, conflict avoidant Prosocial orientation4, 34, 39α , 49Actively courteous and considerate, well-likedConscientiousness Orderliness5, 10, 15α , 30α , 55αMethodical, neat, organized, efficient Goal-striving25, 35, 60Goal-driven, hard working, motivated to excel Dependability20, 40, 45α , 50Reliable, consistent, dependable

Open in a separate window

Note. offered two alternative scoring schemes for the Neuroticism items: The first was empirically derived, the second rationally derived to disentangle anxiety and depression. Possible descriptions of high scorers based on item wording.

αReverse keyed item.

Although they provide a fruitful complement to the use of broad domain scores, these subcomponents have seen surprisingly little use despite call for further investigation and refinement. PsycINFO and Social Science Citation Index revealed only three studies that have used one or more of the subcomponents in analyses (; ; ) and only one additional study () was a direct effort to investigate their basic psychometric properties (i.e., means, standard deviations, test-retest reliability) in an Australian sample. One reason for this lack of use may be that the subcomponents are relatively unknown despite widespread use of the NEO-FFI. Another may be that in the absence of further development and cross-validation work, researchers are hesitant to utilize them. The goal of this study is therefore to provide a detailed investigation of the subcomponents’ replicability and psychometric characteristics in two independent samples.

An appreciation of the subcomponent’s importance may be gleaned from a brief review of the debate over the relative merits of scales capturing broad traits versus those capturing narrow traits (; ; ; ; , ; ; ). In hierarchical personality inventories, the former correspond to higher order, general factors, whereas the latter correspond to lower order, specific factors. and others (e.g., ) couched the debate in the context of the bandwidth-fidelity dilemma originally laid out by ; see also ). The bandwidth of a scale refers to the breadth of its content or, in the case of a personality test, the scope of affective, behavioral, and cognitive tendencies it measures; the fidelity of the scale refers to its dependability or reliability ().

Rather than concerning themselves with the specific case of hierarchical personality inventories, discussed a general relationship between bandwidth and fidelity in which increasing bandwidth tends to reduce fidelity. For instance, if one uses 20 items to measure five specific constructs (e.g., 4 items for each construct), the reliability (fidelity) of each of those 4-item scales will be less than if all 20 items were expended to measure the construct of interest. Similarly, all other conditions being equal, the reliability of the broadband, factorially complex, 20-item composite should also be less than that of a narrowband, unifactorial, 20-item construct.

A slightly different relationship between bandwidth and fidelity may obtain in the case of personality tests measuring higher and lower order factors corresponding to broad and narrow traits (). Broad traits corresponding to higher order factors are measured by wideband composites; these composites are themselves composed of narrowband scales for specific, lower order traits that load on the higher one. In this case, the broadband composite actually has higher reliability, or fidelity, compared to narrow-band scales for specific traits because (a) the broadband scale contains far more items than any of the narrower bandwidth scales and (b) all of the narrower bandwidth scales are highly intercorrelated. Thus, in hierarchical personality inventories, increasing bandwidth from a specific to a broad trait tends to increase fidelity as well, a situation slightly different than the inverse relationship between the two originally articulated by .

Despite the reliability advantage of broadband composites, the narrow traits in hierarchical personality inventories are important for at least three reasons. First, the higher reliability of broad trait scales means only that a general, multidimensional construct has been measured with greater precision. What do the resulting scores mean? They summarize people’s standing on a multifaceted trait but may well obscure important individual differences across that trait’s specific facets. Consider Neuroticism, which, on the NEO-PI-R, is measured by six facets: depression, anxiety, angry hostility, self-consciousness, impulsiveness, and vulnerability. Different individuals may all obtain Neuroticism scores at the 50th percentile through markedly different patterns of facet scores (). One person may score high on angry hostility and impulsiveness and low on self-consciousness and depression. A second may score high on depression and self-consciousness but low on impulsiveness and angry hostility. A third may score in the 50th percentile on all facets of Neuroticism. Broadband Neuroticism scores mask such important phenotypic variation, and may lead to accordingly imprecise results and interpretations.

A second reason lower order, narrow traits are important was highlighted by . Ashton argued that the greater reliability of broadband trait scales does not always mean they have better predictive power. Although it is commonly held that reliability places an upper limit on validity (), this is a theoretical notion, and high reliability may be outweighed by another important consideration as noted:

Despite this increase in reliability, it does not follow that the broad composite scale must be a more effective predictor of a given criterion than are all of its constituent subscales. The question to be asked is whether or not the improved reliability derived by aggregating the subscales provides a gain in validity that outweighs the loss in validity due to the dilution of variance specific to certain subscales which relate to the criterion of interest. (p. 289)

In other words, imprecision in predictive analyses can arise because frequently only some but not all aspects of a broad trait are correlated with the criterion of interest. Figure 1 illustrates a set of personality items aggregated alternatively into a few broadband, multidimensional trait composites or several narrower bandwidth composites within each domain. In this diagram, only one specific dimension of the broad trait composite shares variance with the criterion measure. This is an oversimplified case because often other dimensions of the broadband composite will correlate to some degree with a criterion. However, in the simplified case of Figure 1, the broadband composite will correlate less with the criterion than the relevant specific composite because the broadband trait has a lower ratio of shared to unshared variance with the criterion. In other words, items or factors in the broadband composite that are unrelated to the outcome of interest (i.e., irrelevant information) will attenuate the correlation. Empirical results support this (; ; , ; ; cf. also ).

The concepts of band-width and fidelity are analogous to characteristics of a

Open in a separate window

FIGURE 1

The same set of personality items may be aggregated into a smaller number of broad, general traits or a greater number of more specific, narrow traits. Often only portions of variance within broad traits (i.e., a specific narrow trait subcomponent) may be relevant to a criterion of interest.

A third reason narrow traits remain important is that the exact substantive meaning of a correlation between a broad trait and a criterion may be difficult to discern because the broad trait involves several components. Interpretations of such results are perforce overly general because it is impossible to know which aspects of the broad trait are more or less related to the outcome of interest. The understanding of an initial broadband trait effect can be enhanced by examining which of its specific dimensions are more or less related to the outcome. For instance, recently supplemented their overall finding that Conscientiousness was inversely related to mortality with facet-level analyses that indicated that it was primarily the self-discipline facet that confers longevity.

These arguments for the use of narrow trait composites suggest that NEO-FFI subcomponents represent a potentially important tool for users of the NEO-FFI. However, their replicability and psychometric characteristics in cross-validation samples remain largely unknown. With the exception of examination of norms and test-retest reliability in an Australian sample, little work has built on initial efforts. In this study, I pursued four primary aims in this respect. First, I assessed the degree to which the item clusters could be recovered in two independent samples of varying age and demographics via multiple-group confirmatory factor analysis (CFA). Saucier also expressed concern about the imbalance between positively and negatively keyed items within item clusters. Therefore, my second objective was to assess whether item keying explained additional covariance among items within each domain beyond that attributable to the subcomponents. This was supplemented by examining the subcomponents’ differential correlations with a measure of socially desirable responding. Third, Saucier noted small reliability shrinkages in his cross-validation sample using coefficient alpha. Reliability assessment therefore constituted another aim of the study. Consistent with recent suggestions that alpha alone presents a limited picture of internal consistency (), I used three different internal consistency estimates varying in their psychometric assumptions to describe the subcomponents: alpha (α); rho (ρ); and , ) dimension free, lower bound, reliability estimate, theta (θ). Finally, a fourth aim was to determine whether specific subcomponents either (a) correlated higher than broad domains with a small set of criterion items or (b) showed differential criterion correlations, thereby illuminating which elements of a broad domain drive criterion correlations.

I selected items to represent criteria for which narrow traits might vary in importance as well as criteria highly salient to each sample’s life stage. Items for Sample 1, young adult university students, included the extent to which, over the last semester, participants had socialized with the other sex, felt depressed, anxious, or contemplated suicide. I selected these items to reflect psychosocial conflict of intimacy versus isolation and well-known symptoms of college maladjustment (). I hypothesized the following subcomponent-criterion correlations to exceed those of other subcomponents within the domain and of the broad domain score: sociability and time spent socializing with the opposite sex, depression and amount of depressed mood reported over the semester, self-reproach and thoughts of suicide over the semester, and anxiety and the extent of anxious feelings reported over the prior semester.

Items for Sample 2, middle-age community adults, reflected satisfaction in domains salient to midlife from the perspective of ; see also ) midlife transitions: life satisfaction, marital satisfaction, job satisfaction, and engagement with creative or artistic pursuits. I hypothesized the following subcomponent-criterion correlations to exceed those of the criterion with other subcomponents within the domain or the broad domain score: positive affect and life satisfaction, goal striving and job satisfaction, prosocial orientation and marital satisfaction, and aesthetic interests and artistic hobbies. Finally, the revised University of California, Los Angeles (UCLA) Loneliness Scale () served as a criterion measure for both samples. Social relationships remain important throughout the life course but serve different ends at different points in life. Because younger adults tend to cultivate a broad array of relationships in the service of knowledge acquisition goals (), I hypothesized that intellectual interests and sociability would be stronger (negative) predictors of loneliness than other subcomponents and domain scores in Sample 1. By midlife, Carstensen’s theory suggests that social relationships begin to function more as a means of emotion regulation; and thus, in Sample 2, depression and lower positive affect were hypothesized to predict loneliness better than other subcomponents or broad domains. I used both standard correlations and correlations corrected for attenuation due to measurement error (CAME; cf. ) in these analyses to gauge the effect on criterion correlations of probable decreased subcomponent reliability due to shorter length.

METHOD

Participants and Procedure

Sample 1 consisted of 308 young adults drawn from large introductory psychology classes (74% women; age M = 20, SD = 2.2; 66% White). Sample 2 consisted of 256 middle-age adults drawn from the community (62% women; age M = 49, SD = 5.4; 76% White). Approximately 70% of the sample was recruited through contacts in undergraduate classes using a snowball technique in which students provided surveys to their parents who then dispensed surveys to middle-age friends. The remainder of the sample was recruited through churches and a hospital, again using a snowball technique. The middle-age sample contained proportionally more women and Anglo-Americans and was also more educated than the college sample (15.23 years vs. 13.58 years). Participants confidentially completed a battery of measures for a large-scale project on personality, intelligence, emotional intelligence, and social adjustment in young adulthood and middle age (). A total of 305 young adults and 209 midlife adults provided usable NEO-FFI data. I reverse scored all negatively keyed items prior to analyses. In an earlier article on emotional intelligence, reported means and standard deviations of NEO-FFI domains and the Revised UCLA Loneliness Scale for 291 of the young adult sample.

Other Measures

To assess social desirability, participants completed the 10-item version of the Marlowe-Crowne Social Desirability Scale (). Criterion variables in each sample consisted of the UCLA Loneliness Scale, Revised (), and single item measures from brief surveys. Three items asked middle-aged participants to rate their overall life satisfaction, marital satisfaction, and job satisfaction on a 5-point Likert scale ranging from 1 (not at all satisfied) to 5 (very satisfied). Another item asked them to report the degree to which they were active in artistic pursuits such as music, painting, or writing as either a producer or consumer on a scale ranging from 1 (no involvement at all) to 5 (frequent involvement). Four items were adapted for the young adults from the College Adjustment Rating Scales (CARS; , ) to assess the frequency with which young adults had engaged in certain behaviors or experienced certain problems. One behavioral item asked how often in the past semester participants had socialized with the other sex; and three mood items asked how frequently over the past semester they had felt anxious, felt depressed, or contemplated suicide. Response format ranged from 1 (not at all) to 5 (quite a bit). The use of single-item criterion measures is common in survey research methodology and is consistent with prior bandwidth-fidelity research (cf. ; , ).

Overview of Analyses

I included only participants with complete data in the analyses, which proceeded in several steps. CFAs modeled the item clusters as oblique factors within each NEO-FFI domain separately and fixed their variances to unity for scaling purposes and freely estimated all item loadings, residuals, and factor covariances (cf. ). I estimated models from polychoric correlation matrices, which approximate the continuous distributions underlying response scales involving ordinal categories () and were suggested for use with the ordinal response scales of personality instruments (Panter, Swygert, Dahlstrom, & Tanaka, 1997). I utilized maximum likelihood (ML) robust estimation () to produce a chi-square fit statistic scaled to correct for departures from multivariate normality (the Satorra-Bentler Chi-Square).

CFA models proceeded in the following order. First, I tested models including only latent trait factors for each domain separately in each sample. Then I tested a second set of models in which each item loaded on a corresponding trait factor and on a secondary method factor representing common variance among either the positively or negatively keyed items of each domain (cf. ). Figure 2 depicts the parameterization of this model for the Conscientiousness item clusters. I evaluated the extent to which these models fit the data better than models without method factors with the Akaike information criterion (AIC; ), which can be used to compare nonnested factor models (lower values indicate better fit). Finally, in a third series of nested models, I assessed the subcomponents’ measurement invariance across age groups. These models incrementally constrained trait factor loadings, method factor loadings, residual variances, and factor covariances (cf. ; ) to ascertain whether items functioned differently across age groups. I gauged fit degradations by use of the Satorra-Bentler () Chi-Square Difference Test only in cases in which baseline models had yielded nonsignificant chi-squares (). In all other cases, I made nested comparisons on the basis of the comparative fit index (CFI) difference test suggested by ; e.g., CFI changes greater than .01).

The concepts of band-width and fidelity are analogous to characteristics of a

Open in a separate window

FIGURE 2

In this trait-method model, variance in the 12 items of the NEO-FFI Conscientious domain is partitioned into three latent trait factors, two method factors corresponding to the direction of item keying, and residual or item-specific variance.

recommended that the root mean square error of approximation (RMSEA), an absolute fit index, be given the heaviest weight in assessing models based on personality data because it is less affected by violations of multivariate normality typically seen in personality data. Similarly, after extensive simulation studies intended to reflect conditions typical of personality data (i.e., simple structure violations, non-normality, smaller sample sizes), also suggested the RMSEA as a trustworthy index of model fit. The RMSEA was supplemented with the CFI, a commonly used and well-supported incremental fit index. RMSEA vales below .05 indicate good fit, .05 to .08 acceptable fit, and .10 or above poor fit (); whereas CFI values greater than .90 are generally considered indicative of adequate fit ().

The reliability analyses also proceeded in multiple steps. Because , demonstrated that Cronbach’s α underestimates reliability in composites that are not essentially tau equivalent (i.e., in which items differ in amount of true score variance, regardless of intercept), I first tested each subcomponent for essential tau equivalence. These tests consisted of specifying a simple one-factor model in which item loadings were free to vary and then a similar model in which I constrained all item loadings to equality (cf. ). Significant decreases in fit of the more restrictive model, which I identified in the same manner as previously, identified scales that did and did not meet the assumption of essential tau equivalence. Next, I compared three reliability coefficients for each subcomponent in each sample: Cronbach’s α; reliability coefficient ρ, which is derived from a unidimensional latent-variable model but does not assume essential tau equivalence among items; and , ) dimension-free, lower bound estimate of reliability θ, which does not even assume that a set of items is congeneric (i.e., multiple factors may characterize the item set).

Following reliability analyses, I calculated means, standard deviations, and subcomponent intercorrelations for each sample because adult and college norms for the NEO-FFI domains are different (; see also ). Finally, I tested criterion correlations of NEO-FFI broad domain scores in each sample. To reduce Type I error rate, I only considered correlations significant at .01. For each significant domain correlation, I then examined the pattern of subcomponent correlations. I then corrected all correlations for measurement error. I conducted analyses in SPSS 10.0 and EQS 6.1.

RESULTS

Subcomponent Replication

Table 2 depicts the fit indexes for factor models of each NEO-FFI domain including the final multigroup model in which I constrained all possible item loadings, residuals, and factor covariances to equality across groups. In general, the most restrictive models specifying only latent traits evidenced adequate fit, according to the RMSEA (e.g., .048 for the Conscientiousness subcomponents to .086 for the Extraversion subcomponents, both in the middle-aged sample). However, these values suggested that additional interitem covariance remained unexplained by models including only latent trait factors. When I added positive and negative item-keying method factors, RMSEAs decreased considerably. Substantial decreases in the AIC values also indicate that models including method factors provided a better fit. In general, the maximally constrained multigroup models with method factors provided the most general representation of the subcomponents as evidenced by their largely metric invariance, good absolute fit, and low AICs relative to less constrained multigroup models.

TABLE 2

Fit Indexes for Confirmatory Factor Models of NEO–Five Factor Inventory (NEO–FFI) Subcomponents by Domain

NEO-FFI Domain/ Factor ModelSB χ2dfpCFIRMSEAAICNeuroticism (scoring alternative 1) Self-reproach, negative affect  Young adults, trait factors only98.7453<.001.94.053−7.26  Middle-age adults, trait factors only95.1853<.001.95.062−10.82  Young adults, trait + method factors54.4441.077.98.038−27.56  Middle-age adults, trait + method factors55.8141.060.98.042−26.19  Multiple group, trait + method factors173.07113<.001.97.032−52.93Neuroticism (scoring alternative 2) Self-reproach, anxiety, depression  Young adults, trait factors only67.1532<.001.92.0603.15  Middle-age adults, trait factors only76.2932<.001.91.08212.29  Young adults, trait + method factors30.9622.097.98.037−13.04  Middle-age adults, trait + method factors26.0522.250.99.030−17.95  Multiple group, trait + method factors100.7073.018.97.027−45.30Extraversion Sociability, positive affect, activity  Young adults, trait factors only111.1951<.001.93.0639.19  Middle-age adults, trait factors only128.6151<.001.89.08626.61  Young adults, trait + method factors62.7839<.009.97.045−15.22  Middle-age adults, trait + method factors76.3239<.001.95.068−1.68  Multiple group, trait + method factors122.9874<.001.93.036−25.02Openness Aesthetic interests, intellectual interests, unconventionality  Young adults, trait factors only70.9832<.001.88.0646.98  Middle-age adults, trait factors only57.0532.004.91.062−6.95  Young adults, trait + method factors48.2122<.001.92.0634.21  Middle-age adults, trait + method factors21.3322.5001.00.000−22.67  Multiple group, trait + method factors122.9874<.001.925.036−25.02Agreeableness Nonantagonistic orientation, prosocial orientation  Young adults, trait factors only108.7853<.001.89.0592.78  Middle-age adults, trait factors only114.4753<.001.87.0758.47  Young adults, trait + method factors60.0841.027.96.040−21.92  Middle-age adults, trait + method factors68.0041.005.94.056−14.00  Multiple group, trait + method factors172.01108<.001.94.034−43.99Conscientiousness Orderliness, goal striving, dependability  Young adults, trait factors only140.6751<.000.85.07738.67  Middle-age adults, trait factors only74.6851.017.96.048−27.32  Young adults, trait + method factors63.9839.007.96.032−30.76  Middle-age adults, trait + method factors47.2439.171.99.047−14.02  Multiple group, trait + method factors175.32109<.001.95.035−42.68

Open in a separate window

Note. SB χ2 = Satorra–Bentler Scale corrected chi-square; CFI = comparative fit index; RMSEA = root mean square error of approximation; AIC = akaike information criteria. Multiple-group models represent the final factor model after nested model tests gradually constraining the most parameters possible to equality across groups without deterioration of model fit.

Of the 60 NEO-FFI items, 53 loaded primarily on their trait factor, whereas 7 loaded higher on their respective method factor. The series of nested multiple-group models revealed that 45 of the 60 items were invariant across samples with respect to trait and method factor loadings and residual variances when using the three-subcomponent scheme for Neuroticism, and 43 of 60 were invariant using the two-subcomponent Neuroticism scheme. Of these Neuroticism items, Item 46 loaded slightly (.46 vs. 42) and Item 41 moderately (.71 vs. .52) higher on depression in the young adults for the tripartite solution. In the bipartite alternative, Item 51 loaded marginally (.64 vs. .63) and Item 36 slightly (.64 vs. .58) higher on negative affect in the young adults, whereas item 21 loaded slightly (.57 vs. .56) and 56 moderately (.74 vs. .95) higher on negative affect in the middle-aged adults. Other loading discrepancies across samples were within .20 for all items except Items 34 and 54, which had loading differences across samples of .31 and .35, respectively. In both groups, loadings for the unconventionality subcomponent were quite low (i.e., all below .40), which suggested that it was not a well-defined factor despite the overall model fit for the Openness subcomponents.

Correlations among the trait factors tended to be above .60, with the Neuroticism subcomponents in the .90s; this was expected given that different elements of a multifaceted construct must by definition be strongly correlated; and when measurement error is removed in latent variable models, these correlations increase, sometimes substantially. When treated as unweighted linear composites of observed items scores rather than latent variables, subcomponent intercorrelations were lower (see following and Table 4).

TABLE 4

Intercorrelations Between Subcomponents in Each Sample

SubcomponentsSelf- reproach 1 (N)Negative affect (N)Self- reproach 2 (N)Anxiety (N)Depression (N)Positive affect (E)Sociability (E)Activity (E)Aesthetic interests (O)Intellectual interests (O)Unconventionality (O)Nonantagonistic Orientation (A)Prosocial Orientation (A)Orderliness (C)Goal striving (C)Depend- abilitySelf-reproach 1 (N).62−.36−.35−.24−.05−.15−.01−.25−.22−.35−.26−.28Negative affect (N).59−.31−.24−.23−.02−.09.05−.16−.11−.17−.15−.15Self-reproach 2 (N).58.53−.32−.28−.21.02−.13−.01−.17−.14−.30−.22−.21Anxiety (N).57.46−.27−.25−.07−.10−.09−.06−.30−.16−.16−.05−.13Depression (N).59.51−.37−.32−.36−.05−.12.06−.14−.21−.30−.30−.26Positive affect (E)−.48−.45−.43−.39−.48.57.55.07.09.02.26.56.21.33.34Sociability (E)−.35−.22−.33−.23−.27.50.53.04−.03−.07.11.30.21.29.30Activity (E)−.31−.26−.33−.17−.31.43.50.08.18−.10−.10.30.24.42.35Aesthetic interests (O)−.05−.01−.10−.01−.02.11.15.16.47.22.09.16.03.05.07Intellectual interests (O)−.26−.11−.30−.20−.08.05.08.27.46.21.02.19.03.21.14Unconventionality (O).03.08.03.02.09−.18−.16−.14.24.23−.05.21−.13−.13.03Nonantagonistic orientation (A)−.38−.23−.29−.34−.28.32.11−.01.20.20−.10.43.13.07.12Prosocial orientation (A)−.31−.17−.26−.21−.25.49.28.23.13.13−.17.50.12.34.39Orderliness (C)−.38−.15−.35−.18−.29.29.32.20−.01.03−.17.29.37.48.37Goalstriving (C)−.40−.24−.37−.18−.37.38.44.41.06.06−.22.29.49.51.53Dependability−.37−.08−.38−.13−.20.32.25.19.12.11−.15.41.55.59.54

Open in a separate window

Note. Midlife adults (N = 204) below diagonal, young adults (N = 291) above diagonal. The two- and three-subscale scoring alternatives for the Neuroticism domain are mutually exclusive and therefore correlations between them are not reported. Correlations between subscales within each domain are underlined. N = Neuroticism; E = Extraversion; O = Openness; C = Conscientious.

Bivariate correlations between domain or subcomponent scores and the 10-item Marlowe-Crowne are presented in Table 3, which also includes reliability estimates (discussed following). Significant negative correlations with social desirability were observed in the young adults (Ns = 296–302, ps < .005) for negative affect and self-reproach in the two-subcomponent Neuroticism model and for anxiety and self-reproach in the three-subcomponent Neuroticism model. Within Agreeableness, the nonantagonistic orientation subcomponent correlated positively with social desirability. Among the middle-aged adults (Ns = 204–207, ps < .005), all subcomponents from either Neuroticism scheme correlated negatively with social desirability, whereas all subcomponents of Agreeableness and Conscientiousness correlated positively with social desirability. Table 3 shows that the parent domains of these subcomponents also correlated with social desirability in the same direction and to a comparable degree.

TABLE 3

Internal Consistency Reliability Estimates of Subcomponents in Each Sample

Middle-Age Adults


Young Adults
NEO–FFI Domain/Sub ComponentsEssentially τ Equivalent?αρθSocial Desirability rEssentially τ Equivalent?αρθSocial Desirability rNeuroticism (alternative 1)—.86.87.92−.27*—.84.84.89−.25* Negative affectNo.74.74.81−.18*Yes.66.66.73−.17* Self-reproachNo.87.88.90−.29*No.84.85.90−.25*Neuroticism (alternative 2)—————————— AnxietyYes.62.62.62−.29*Yes.59.59.59−.21* DepressionYes.66.69.69−.20*Yes.56.61.61−.10 Self-reproachNo.72.73.75−.23*No.70.71.71−.26*Extraversion—.85.85.93.10—.85.85.91−.03 Positive affectNo.79.80.84.19No.71.73.77.01 SociabilityNo.67.68.71.05Yes.63.64.68.02 ActivityYes.72.72.78−.10No.74.74.77−.09Openness—.72.73.85.05—.83.83.91.05 Aesthetic interestsYes.81.81.81.12Yes.67.69.69.05 Intellectual interestsYes.52.53.53.07Yes.64.65.65.09 UnconventionalityNo.27.47.48−.13No.01.34.24−.05Agreeableness—.83.83.91.28*—.77.77.87.18* Nonantagonistic orientationNo.76.77.86.31*No.67.67.75.26* Prosocial orientationNo.73.74.79.29*Yes.78.78.78.03Conscientiousness—.88.88.93.22*—.79.79.89.02 OrderlinessNo.74.75.80.17*No.55.57.66.07 Goal strivingNo.78.80.80.21*No.72.75.75.01 DependabilityYes.75.75.78.20*No.61.63.72−.03

Open in a separate window

Note. NEO–FFI = NEO–Five Factor Inventory; α = alpha; ρ = congeneric reliability estimate rho; θ = lower bound, dimension free, reliability estimate theta; Social desirability r = correlations with 10-item Marlowe–Crowne Social Desirability Scale. The most appropriate internal consistency estimate is underlined, depending on whether the subscale meets the assumption of essential tau-equivalence. Because domains were considered multidimensional rather than congeneric, their essential tau-equivalence was not tested and θ may be considered an appropriate reliability estimate.

*p < .01.

Reliability Analyses

Table 3 also presents the results of tests for essential tau equivalence along with α, ρ, and θ estimates of subcomponent internal consistency in each sample. For essentially tau-equivalent subclusters, α was nearly equal to ρ as expected. However, for subcomponents in which the degree of true score variance differed across items, ρwas slightly higher than α (i.e., .01–.02 rounding at the second decimal). These results are consistent with demonstration that α may underestimate reliability when items contain differing amounts of true score variance. , ) θ provided far higher estimates of subscale reliability for some subcomponents. This coefficient makes no assumptions about the dimensionality of a composite and always exceeds alpha (, ).

In general, it can be seen that the appropriate congeneric reliability estimates, bolded in Table 3 (i.e., either α or ρ, depending on tau equivalency) ranged from fair to good, and were slightly higher in the middle-aged sample. An obvious exception was the unconventionality subcomponent. Average reliability across the subcomponents in the middle-aged sample was .72 (.73 without unconventionality), slightly higher than that reported by ; .70 for his midlife sample) and the average NEO-FFI facet reliability (.70), both of which are based strictly on α. For the young adults, the average subcomponent reliability was .66 (.68 without unconventionality), equal to what Saucier obtained in his college sample.

Finally, all three reliability estimates are provided in Table 3 for the domains themselves. Because the domains are multidimensional rather than congeneric by definition, I did not test essential tau equivalence. Whereas α and ρ were intended for use with congeneric item sets, θ, which entails no such assumption, may be regarded as a more appropriate reliability estimate for the domains. However, domain αs af-firmed the well-known psychometric fact that α may be high in multidimensional scales (, ) and thus should be not be regarded as a measure of “homogeneity” ().

Subcomponent Intercorrelations, Means, and Standard Deviations

Table 4 presents the subscale intercorrelations, with the young adults below the diagonal, the middle-aged sample above, and correlations among subcomponents within each domain bolded in corresponding off-diagonal blocks. The pattern for both samples suggested moderate to strong correlation among subcomponents within a domain and much smaller correlations between item clusters from different domains. This is the precise pattern expected among the lower order dimensions of a set of personality domains. Of interest, decomposing the Neuroticism domain into three rather than two item clusters resulted in slightly more independent subcomponents, particularly in the young adults. Means and standard deviations of the item clusters are presented in Table 5, which also shows a measure of effect size (Cohen’s d) of the cross-sectional differences between these samples.

TABLE 5

Comparison of Domain and Subcomponents Means Between Midlife and Young Adults

Middle-Aged Adults


Young Adults
Domain and SubcomponentMSDMSDtdDomains Neuroticism20.008.4124.148.09−5.60****−.50 Extraversion27.357.1627.747.37−.59−.05 Openness24.755.9727.785.64−5.81****−.53 Agreeableness31.525.9429.015.654.80****.44 Conscientiousness33.716.4328.296.229.46****.86Subcomponent Negative affect10.073.8611.613.29−4.85****−.44 Self-reproach9.935.5312.535.43−5.29****−.48 Anxiety5.722.496.382.30−3.09**−.28 Depression5.092.506.095.68−2.39**−.22 Self-reproach (Alternate)5.733.097.383.26−5.75****−.52 Positive affect10.349.4810.492.94−0.26−.02 Sociability7.802.878.062.78−1.03−.09 Activity9.202.919.143.110.22.02 Aesthetic interests6.402.917.022.60−2.52***−.23 Intellectual interests5.992.036.312.15−1.70*−.15 Unconventionality7.802.269.942.11−10.93****−.99 Nonantagonistic orientation19.304.7317.694.463.91****.35 Prosocial orientation12.574.9111.802.732.27**.20 Orderliness13.853.4011.143.119.28****.84 Goal striving7.761.856.762.235.34****.48 Dependability12.082.3910.372.507.72****.70

Open in a separate window

Note. d = Cohen’s d measure of effect size for difference between means.

*p < .10.

**p < .05.

***p < .01,

****p < .001.

Criterion Correlations

Domains correlated with criterion items are presented in Table 6 along with corresponding subcomponent criterion correlations. Correlations corrected for attenuation due to measurement error (using the appropriate reliability estimate for each subcomponent from Table 3) are bolded. In some cases, a set of subcomponents tended to be equally associated with a criterion, with their parent domain correlating slightly higher (e.g., Extraversion vs. its subcomponents and job satisfaction in the middle-age sample). In other instances, it was evident that broad domain correlations were driven primarily by one or two subcomponents (e.g., prosocial orientation and loneliness in the young adults). Some specific hypotheses were supported in the uncorrected correlations; and when measurement error was corrected, many were supported.

TABLE 6

Domain and Subcomponents Criterion Correlations

Mid Life Adults


Young Adults
NEO–FFI Domain and SubcomponentLife SatisfactionJob SatisfactionMarital SatisfactionArtistic HobbiesLonelinessOpposite Sex SocializingFeeling AnxiousFeeling DepressedContemplating SuicideLonelinessNeuroticism−.39.41−.24−.25−.23−.24.47.51.50.52.60.63.38.40.49.49 Anxiety−.27−.34−.15−.19−.14−.18.32.43.41.53.45.59.32.42.34.46 Depression−.37−.46−.24−.30−.24−.30.40.52.36.48.48.64.29.39.44.62 Self-reproach−.34−.40−.23−.27−.18−.21.46.56.44.52.52.62.33.40.43.53Extraversion.27.29.28.30−.56−.64.28.30−.31−.34−.31−.34−.57−.65 Positive affect.31.35.21.23−.54−.63.25.29−.29−.34−.31−.34−.57−.57 Sociability.19.23.25.30−.49−.62.29.37−.24−.30−.28−.35−.48−.63 Activity.15.17.21.24−.31−.38.17.20−.24−.28−.20−.24−.40−.49Openness.44.49 Aesthetic interests.48.53 Intellectual interests.24.33 Unconventionality.16.23Agreeableness.27.29.20.21−.44.49−.42−.47 Nonantagonistic orientation28.32.21.24−.36−.44−.29−.35 Prosocial orientation.17.20.09.11−.43−.53−.45−.51Conscientiousness−.43−.49.19.21−.28−.31−.36−.41 Orderliness−.33−.40.14.16−.19−.25−.28−.39 Goal striving−.33−.39.15.17−.26−.30−.27−.33 Dependability−.38−.46.18.21−.22−.28−.33−.44

Open in a separate window

Note. Domain criterion correlations significant at p < .01, followed by subcomponent criterion correlations. Underlined numbers are correlations corrected for attenuation due to measurement error.

DISCUSSION

The goal of this study was to determine if NEO-FFI item cluster subcomponents were replicable in independent samples, to assess whether item keying accounted for additional item covariance within domains, the extent to which subcomponents were susceptible to socially desirable responding, the most appropriate reliability estimate for each subcomponent, and to provide a preliminary comparison of subcomponents versus domain correlations with a small set of criterion items. Results permit several conclusions about the NEO-FFI subcomponents.

First, item clusters appear replicable in independent samples. Evaluation of trait-only CFA models for each domain on the basis of the RMSEA (; ) suggested adequate fit in both the young and middle-aged adults. These trait-only models were fairly strict tests of cross-validation in that they posited no additional sources of covariation among items other than factors corresponding to narrow personality traits within each domain. Relative improvements in fit were noted with the addition of method factors reflecting item keying. Although additional factors usually improve model fit, many items loaded saliently on these method factors including a few that loaded higher on their corresponding method factor than on their trait factor. The fact that positive and negative item keying appear to contribute variance to items within each NEO-FFI domain was consistent with characterization of personality data as capturing a complex set of processes that are not easily summarized by simple models. When individuals respond to questions about their typical behavior, attitudes, and feelings, they may be influenced by whether the item is worded in a way reflecting favorable or unfavorable characteristics.

The subcomponents were also structurally invariant across age groups, consistent with evidence that the basic configuration of higher and lower order factors characterize individuals of different ages (see, e.g., ). Moreover, about three fourths of the items showed metric invariance with respect to trait and method factor loadings as well as equal residual variance across age groups. The remaining items evidenced some differences in one or more of these parameters. Exceptions included Item 34, which showed a loading difference of .31 in favor of the young adults. This item asks whether the respondent is well liked by the majority of people known to him or her and appeared to describe prosocial orientation much better in young adults. Item 54, which loaded .35 higher in middle-aged adults, asks whether respondents are forthright in expressing their antipathy toward individuals they dislike. Scores appeared to characterize nonantagonistic orientation much better in middle-aged than in young adults.

Finally, certain Neuroticism items may show variation according to whether the two or three subcomponent scheme is used. Under the tripartite arrangement, Item 41, which deals with feeling discouraged and giving up when hardship is encountered, characterizes trait depression better in young than midlife adults; decomposing Neuroticism into only two subcomponents, Item 56, which asks about shame, characterizes self-reproach better in middle-aged respondents. On the whole however, these analyses revealed that not only was the basic factor configuration of lower tier NEO-FFI traits recoverable in somewhat different samples but that the vast majority of items functioned quite similarly across age groups.

Social desirability correlations were small but significant and were moderated by age group. In particular, middle-aged individuals may produce scores on all Neuroticism, Agreeableness, and Conscientiousness subcomponents more influenced by social desirability. Out of those domains’ subcomponents, young adults by contrast may respond more honestly to questions tapping orderliness, goal striving, dependability, prosocial orientation, and depression. However, social desirability has itself been considered a personality trait reflecting positive characteristics or a sense of well-being rather than a dissembling response style per se (; ). Some work has also indicated scores on the short Marlowe-Crowne are correlated with age on the order of .30, at least in women (; ). In this context, these correlations may also mean that whatever positive trait might be tapped by the Marlowe-Crowne is associated with all elements of Neuroticism, Agreeableness, and Conscientiousness in middle-aged adults but with only with a limited set of these domains’ subcomponents in young adults. As one reviewer pointed out, whether social desirability is viewed as a substantively meaningful trait in and of itself or not, these results bear no necessary connection to the NEO-FFI validity question “Have you responded accurately and honestly” (). It would not be socially desirable to answer negatively, and individuals high on trait well-being probably have little to conceal. Naturally, dissent on this validity item should still constitute sufficient grounds for jettisoning a protocol, but NEO-FFI validity scales similar to those that have been developed for the NEO-PI-R () might still be considered in future research.

With respect to reliability, less than half the subcomponents in each sample met assumptions of essential tau equivalence required for Cronbach’s α to provide a strictly accurate estimate of reliability. Violations of essential tau equivalence are fairly common in practice (, ). , an internal consistency estimate unaffected by violation of this assumption, yielded reliability estimates slightly higher. Although the small magnitude of difference was consistent with prior work on these coefficients (, ; see also ), these analyses do suggest it may slightly underestimate reliability for some subcomponents.

With shorter composites that violate essential tau equivalence, “rule-of-thumb” guidelines about acceptable values of α that pervade applied psychological literature may be misleading (). , among others, emphasized the well-known fact that α is strongly affected the number of items in a scale and, like , cautioned against blind use of α to evaluate an instrument’s reliability without consideration for factors that affect α (). For instance, even though α is intended for unidimensional scales, these results clearly show that α was high for the multifactorial NEO-FFI domains. It is well known in the psychometric literature that α is not a measure of dimensionality or homogeneity (; ), and for this reason, alternative estimates of reliability provide a better estimate of true reliability when scales are not con-generic or essentially tau equivalent (; see also ).

Despite these caveats, α may still often be a reasonable lower bound estimate of true score variance within a scale (); and regardless of the internal consistency estimate used, these analyses generally revealed that the reduced length of the subcomponents brought expected reductions in reliability. Thus, moving to narrow bandwidth subcomponents on the NEO-FFI reduces fidelity somewhat. From original perspective, however, if one could use the same number of items to measure a specific trait and a broad, multidimensional trait, the narrower bandwidth measure would show higher fidelity. However, because the NEO-FFI subcomponents are located beneath domains in a factorial hierarchy, they cannot be measured with same number of items as the broad domains, and, all other things being equal, reductions in reliability are inevitable. This should not deter researchers interested in using the subcomponents given the availability of methods for the control of measurement error. In addition to correcting single and multiple correlations for attenuation due to measurement error, narrow traits can be modeled as latent variables in structural equation analyses. The use of factor scores from exploratory common factor analysis (not principal component analysis) rather than simple linear combinations of observed item scores is another alternative for reducing measurement error, as are regression models which account for errors in predictors.

Another interesting pattern with respect to the internal consistency analyses was that the subcomponents were more reliable in the middle-aged rather than the young adult sample, with the exceptions of intellectual interests, prosocial orientation, and activity. This is not surprising because the middle-aged sample was demographically quite similar to that in which derived the subcomponents. On the other hand, the college sample differed not only in age and gender proportion from Saucier’s middle-aged community sample but also in age from his college sample, the members of which were on average 32 years old. Nonetheless, the average internal consistency reliabilities observed in both this middle-aged sample (.72–.73, with and without unconventionality) and young sample (.66–.68) were nearly identical to the (α-based) .70 and .66 reported by Saucier. As Saucier pointed out, these numbers compare favorably with the average (α-based) reliability of .70 for the NEO-PI-R facets and, in this context, provide further evidence of the subcomponents’ stability.

The unconventionality item cluster was an obvious exception to this pattern. However, it also evidenced the lowest reliability and greatest cross-validation α shrinkage in work. From a conceptual point of view, it is easy to understand why such a subcomponent was essentially unrecoverable in these samples. The item content tapped a dimension of personality defined by the rejection of intellectual and cultural norms; high scorers would be free thinking and unshackled by social and attitudinal conventions. By definition, however, the psychometric consistency of a trait in the population depends on shared convention in the interpretation of items. If a sample contains many unconventional participants who diverge in thought and attitude, their responses to items may be highly idiosyncratic. Rather than stable covariation, items measuring such a trait might be essentially uncorrelated as a result of diverging response tendencies. The effect on a four-item composite would be to substantially diminish or even eliminate the stable common variance on which internal consistency estimates are based (cf. ). One would also expect diminished internal consistency of the broadband scale within which such items are embedded. Not surprisingly, the Openness domain has routinely shown the lowest internal consistency (e.g., ; ). This raises an interesting paradox about the measurement of unconventionality itself as a lower order trait: Respondents need be at least somewhat conventional in their interpretations of items measuring unconventionality, and the items need to elicit a standard pattern of covariation across samples.

Finally, the criterion correlations suggested that NEO-FFI subcomponents hold promise for delineating which elements of broad domains are more and less related to criteria. Hypothesized relations obtained, in general. Exceptions involved goal striving, which did not predict job satisfaction in the middle-aged adults, and self-reproach, which did not predict suicidal ideation in the young adults. In the corrected correlations, sociability was the second most predictive subcomponent of loneliness in young adults, but intellectual interests was unrelated to loneliness. Sociability was more predictive than positive affect of loneliness among the middle-aged adults, and in both samples, broad Extraversion correlated more highly with loneliness than any subcomponent. Another interesting result I did not hypothesize was that in the young adults, it was the prosocial component of Agreeableness rather than nonantagonistic orientation that seemed to best predict loneliness. Prosocial orientation reflects proactive courtesy and consideration, which are likely to garner one friends; on the other hand, nonantagonistic orientation merely reflects a general absence of belligerence, Machiavellianism, and mistrust. Simply lacking these qualities may not be sufficient to protect against loneliness.

In other cases, criterion correlations were not markedly different across subcomponents. These may be instances in which all subcomponents were equally relevant to a given criterion, as when the criterion itself is multidimensional (). The shared variance among “sibling” subcomponents captured by broadband composite scores appeared to outweigh the specific variance of each subcomponent in these cases, which resulted in higher criterion correlations for the domains rather than their subcomponents. This is consistent with argument favoring broadband composites on the basis of their higher reliability, which is in part due to variance shared across constituent (highly correlated) domains. However, when reliability differences were equalized by correcting for measurement error, the pattern of differential correlations among the subcomponents and domains was magnified in some cases. For instance, aesthetic interests predicted the amount of weekly time middle-aged people devoted to artistic hobbies slightly better than broad Openness. Sociability predicted the extent to which college students socialized with the other sex better than broad Extraversion. The nature and magnitude of these differences depends, of course, on the population and criterion of interest. Nevertheless, these analyses suggest the NEO-FFI subcomponents may provide a way to examine the specific elements of monolithic personality factors most important for outcomes in the same way facets may be used to supplement broad domain results with the full-length NEO-PI-R (e.g., ; ).

These results must be qualified by five limitations. First, the generalizability of the subcomponents to other populations requires further investigation. Second, in this study, I relied on self-report methodology. These analyses suggest that either item keying and/or social desirability may have affected subcomponent scores, albeit minimally. Future work might investigate the agreement between self and informant ratings on the traits represented by these subcomponents, as has been done with NEO-PI-R facets (). Third, development sample (N = 732) was considerably larger than this middle-aged sample, and potential users looking for subcomponent norms may wish to use his means and standard deviations, although this study provides additional data on means and standard deviations in a fairly large traditional college sample. Fourth, the use of a small number of mostly single-item criterion measures was intended as only a supplementary investigation of predictive contrasts between the domains and subcomponents. The correction for attenuation due to measurement error may tend to favor shorter scales, other things being equal, and I did not subject differences between subcomponent and domain criterion correlations to significance testing. The use of the subcomponents in routine research will probably yield more robust tests of their predictive specificity and discriminant validity. Fifth, although strong correlations between narrow traits within the same domain are virtually axiomatic, modeling them as latent variables may inflate these correlations even more by removing measurement error. Thus, analyses implementing the subcomponents may wish to guard against collinearity problems by including subcomponents within the same domain one at a time, as recently did, or perhaps in some cases using a single linear combination of two subcomponents (if such an increase in bandwidth is deemed conceptually tolerable and interpretable given the research context).

In the final analysis, these findings suggest that with the exception of unconventionality, the narrow trait subcomponents embedded in the NEO-FFI were stable across independent samples, comparable in reliability to NEO-PI-R facets, and offer the potential to measure personality at higher specificity than is possible with only broadband domain scores. robust item cluster subcomponents mean that greater speed in five-factor personality assessment no longer need come at the expense of measurement specificity.

Acknowledgments

Preparation of this work was supported by National Institute of Mental Health Public Health Service Grant T32MH073452 to Jeffrey Lyness, University of Rochester Medical Center, Department of Psychiatry. I thank Lewis Goldberg and an anonymous reviewer for helpful comments on an earlier draft of the article. Additional thanks to Paul R. Duberstein who provided valuable input during the writing process, to Bert Hayslip Jr. who supervised the research on which the article is based, and to Gerard Saucier who offered impressions on the subcomponents that helped orient the article early on.

Footnotes

1I would like to thank an anonymous reviewer for noting this important distinction.

2 also suggest that genetic bases underlie phenotypic variation at the level of specific, lower order traits as well as higher order trait factors.

3Although provided more stringent guidelines for evaluating model fit (i.e., CFI >.95, RMSEA < .05), recent simulation results suggest that these cut points for model rejection may be less reliable than previously thought.

4Note that in baseline models of clusters of three items, the unconstrained model itself is just identified and always fits the data perfectly. In such cases, the constraints imposed by essential tau-equivalence overidentify the model and permit it to be tested. In these cases, the essential tau-equivalence of the item cluster was evaluated merely by the significance of the Satorra-Bentler chi square for this testable model. However, this approach may be less optimal than comparing the fit of an essentially tau-equivalent model nested within one in which loadings are permitted to vary.

5For instance, characterized multistratum trait taxonomies as statistical variance-covariance hierarchies in which covariance among lower order dimensions constitute the variance of higher order factors.

6Note that even though internal consistency estimates may be high in multidimensional scales, the interpretation of true score variance is less clear because of factorial complexity ().