The concepts of band-width and fidelity are analogous to characteristics of a

Department of Psychiatry, Laboratory of Personality and Development, University of Rochester Medical Center

Find articles by Benjamin P. Chapman

Disclaimer

Benjamin P. Chapman, Department of Psychiatry, Laboratory of Personality and Development, University of Rochester Medical Center;

Benjamin P. Chapman, University of Rochester Medical Center, Department of Psychiatry, Laboratory of Personality and Development, 300 Crittenden Boulevard, Rochester, NY 14642, ude.retsehcoR.CMRU@nampahC_neB

Copyright notice

The publisher's final edited version of this article is available free at J Pers Assess

Abstract

Many users of the NEO-Five Factor Inventory [NEO-FFI; ] are unaware that developed item cluster subcomponents for each broad domain of the instrument similar to the facets of the Revised NEO Personality Inventory []. In this study, I examined the following: the replicability of the subcomponents in young adult university and middle-aged community samples; whether item keying accounted for additional covariance among items; subcomponent correlations with a measure of socially desirable responding; subcomponent reliabilities; and subcomponent discriminant validity with respect to age-relevant criterion items expected to reflect varying associations with broad and narrow traits. Confirmatory factor analyses revealed that all subcomponents were recoverable across samples and that the addition of method factors representing positive and negative item keying improved model fit. The subcomponents correlated no more with a measure of socially desirable responding than their parent domains and showed good average reliability. Correlations with criterion items suggested that subcomponents may prove useful in specifying which elements of NEO-FFI domains are more or less related to variables of interest. I discuss their use for enhancing the precision of findings obtained with NEO-FFI domain scores.

Most researchers have agreed that personality traits can be measured in hierarchical tiers that vary in generality [; ; ; ]. Traits at higher levels of multistratum taxonomies are merely composites of more fine-grained specific traits. For example, within the Five Factor Model of personality [; ; ; ], the broad trait of Conscientiousness is composed of narrow attributes such as self-discipline, achievement orientation, and orderliness. The commonly used Revised NEO Personality Inventory [NEO-PI-R; ] permits users to capture information about such higher order broad traits as well as the lower order, narrow traits that compose them [cf. ].

The 60-item NEO-Five Factor Inventory [NEO-FFI; ] is a brief version of the NEO-PI-R designed to provide speedy and convenient measurement of the Five Factor Model domains; however, it does so by relinquishing information about the narrow traits that comprise each broad factor. To remedy this, derived item cluster subcomponents for the NEO-FFI that provided a more specific level of trait measurement. These subcomponents are somewhat similar, although not as specific, as the facets of the full length NEO-PI-R and are listed in Table 1. In their development, these subcomponents evidenced strong average internal consistency comparable to NEO-PI-R facets [e.g., an average Cronbach’s α of .70 in a development sample and .66 in a cross-validation sample compared to average NEO-PI-R facet α of .70] and captured the majority [although not all] of the content of the NEO-PI-R facets [cf. ]. Table 1 also reveals that the Neuroticism domain may be decomposed into either two or three item-cluster subcomponents; the former scoring scheme was empirically derived [as were all other sets of subcomponents], whereas the latter was rationally derived by Saucier to provide separate measures of anxiety and depression.

TABLE 1

NEO–Five Factor Inventory Item Cluster Subcomponents

Domain and SubcomponentsItemsPossible Description of High ScorersNeuroticism [alternative 1] Self Reproach6, 21, 26, 36, 41, 51, 56Feels inferior, worthless, helpless, reactive, tense, ashamed Negative Affect1α , 11, 16, 31α , 46Worried, stressed, anxious, blue, depressedNeuroticism [alternative 2] Self Reproach6, 11, 26, 51Feels inferior, worthless, helpless, stressed Anxiety1α , 21, 31αWorried, fearful, tense, anxious Depression16α , 41, 46αBlue, discouraged, sad, depressedExtraversion Positive affect7, 12α , 37, 42αLight-hearted, cheerful, optimistic Sociability2, 17, 27α , 57αGregarious, enjoys others, prefers company Activity22, 32, 47, 52Energetic, active, fast paced, action seekingOpenness Aesthetic interests13, 23α , 43Artistic, poetic, aesthetically sensitive Intellectual interests48α , 53, 58Abstract, philosophical, intellectual Unconventionality3α , 8α , 18α , 38αNonconforming, free thinking, whimsicalAgreeableness Nonantagonistic orientation9α , 14α , 19, 24α , 29α , 44α , 54α , 59αCooperative, trusting, amiable, conflict avoidant Prosocial orientation4, 34, 39α , 49Actively courteous and considerate, well-likedConscientiousness Orderliness5, 10, 15α , 30α , 55αMethodical, neat, organized, efficient Goal-striving25, 35, 60Goal-driven, hard working, motivated to excel Dependability20, 40, 45α , 50Reliable, consistent, dependable

Open in a separate window

Note. offered two alternative scoring schemes for the Neuroticism items: The first was empirically derived, the second rationally derived to disentangle anxiety and depression. Possible descriptions of high scorers based on item wording.

αReverse keyed item.

Although they provide a fruitful complement to the use of broad domain scores, these subcomponents have seen surprisingly little use despite call for further investigation and refinement. PsycINFO and Social Science Citation Index revealed only three studies that have used one or more of the subcomponents in analyses [; ; ] and only one additional study [] was a direct effort to investigate their basic psychometric properties [i.e., means, standard deviations, test-retest reliability] in an Australian sample. One reason for this lack of use may be that the subcomponents are relatively unknown despite widespread use of the NEO-FFI. Another may be that in the absence of further development and cross-validation work, researchers are hesitant to utilize them. The goal of this study is therefore to provide a detailed investigation of the subcomponents’ replicability and psychometric characteristics in two independent samples.

An appreciation of the subcomponent’s importance may be gleaned from a brief review of the debate over the relative merits of scales capturing broad traits versus those capturing narrow traits [; ; ; ; , ; ; ]. In hierarchical personality inventories, the former correspond to higher order, general factors, whereas the latter correspond to lower order, specific factors. and others [e.g., ] couched the debate in the context of the bandwidth-fidelity dilemma originally laid out by ; see also ]. The bandwidth of a scale refers to the breadth of its content or, in the case of a personality test, the scope of affective, behavioral, and cognitive tendencies it measures; the fidelity of the scale refers to its dependability or reliability [].

Rather than concerning themselves with the specific case of hierarchical personality inventories, discussed a general relationship between bandwidth and fidelity in which increasing bandwidth tends to reduce fidelity. For instance, if one uses 20 items to measure five specific constructs [e.g., 4 items for each construct], the reliability [fidelity] of each of those 4-item scales will be less than if all 20 items were expended to measure the construct of interest. Similarly, all other conditions being equal, the reliability of the broadband, factorially complex, 20-item composite should also be less than that of a narrowband, unifactorial, 20-item construct.

A slightly different relationship between bandwidth and fidelity may obtain in the case of personality tests measuring higher and lower order factors corresponding to broad and narrow traits []. Broad traits corresponding to higher order factors are measured by wideband composites; these composites are themselves composed of narrowband scales for specific, lower order traits that load on the higher one. In this case, the broadband composite actually has higher reliability, or fidelity, compared to narrow-band scales for specific traits because [a] the broadband scale contains far more items than any of the narrower bandwidth scales and [b] all of the narrower bandwidth scales are highly intercorrelated. Thus, in hierarchical personality inventories, increasing bandwidth from a specific to a broad trait tends to increase fidelity as well, a situation slightly different than the inverse relationship between the two originally articulated by .

Despite the reliability advantage of broadband composites, the narrow traits in hierarchical personality inventories are important for at least three reasons. First, the higher reliability of broad trait scales means only that a general, multidimensional construct has been measured with greater precision. What do the resulting scores mean? They summarize people’s standing on a multifaceted trait but may well obscure important individual differences across that trait’s specific facets. Consider Neuroticism, which, on the NEO-PI-R, is measured by six facets: depression, anxiety, angry hostility, self-consciousness, impulsiveness, and vulnerability. Different individuals may all obtain Neuroticism scores at the 50th percentile through markedly different patterns of facet scores []. One person may score high on angry hostility and impulsiveness and low on self-consciousness and depression. A second may score high on depression and self-consciousness but low on impulsiveness and angry hostility. A third may score in the 50th percentile on all facets of Neuroticism. Broadband Neuroticism scores mask such important phenotypic variation, and may lead to accordingly imprecise results and interpretations.

A second reason lower order, narrow traits are important was highlighted by . Ashton argued that the greater reliability of broadband trait scales does not always mean they have better predictive power. Although it is commonly held that reliability places an upper limit on validity [], this is a theoretical notion, and high reliability may be outweighed by another important consideration as noted:

Despite this increase in reliability, it does not follow that the broad composite scale must be a more effective predictor of a given criterion than are all of its constituent subscales. The question to be asked is whether or not the improved reliability derived by aggregating the subscales provides a gain in validity that outweighs the loss in validity due to the dilution of variance specific to certain subscales which relate to the criterion of interest. [p. 289]

In other words, imprecision in predictive analyses can arise because frequently only some but not all aspects of a broad trait are correlated with the criterion of interest. Figure 1 illustrates a set of personality items aggregated alternatively into a few broadband, multidimensional trait composites or several narrower bandwidth composites within each domain. In this diagram, only one specific dimension of the broad trait composite shares variance with the criterion measure. This is an oversimplified case because often other dimensions of the broadband composite will correlate to some degree with a criterion. However, in the simplified case of Figure 1, the broadband composite will correlate less with the criterion than the relevant specific composite because the broadband trait has a lower ratio of shared to unshared variance with the criterion. In other words, items or factors in the broadband composite that are unrelated to the outcome of interest [i.e., irrelevant information] will attenuate the correlation. Empirical results support this [; ; , ; ; cf. also ].

Open in a separate window

FIGURE 1

The same set of personality items may be aggregated into a smaller number of broad, general traits or a greater number of more specific, narrow traits. Often only portions of variance within broad traits [i.e., a specific narrow trait subcomponent] may be relevant to a criterion of interest.

A third reason narrow traits remain important is that the exact substantive meaning of a correlation between a broad trait and a criterion may be difficult to discern because the broad trait involves several components. Interpretations of such results are perforce overly general because it is impossible to know which aspects of the broad trait are more or less related to the outcome of interest. The understanding of an initial broadband trait effect can be enhanced by examining which of its specific dimensions are more or less related to the outcome. For instance, recently supplemented their overall finding that Conscientiousness was inversely related to mortality with facet-level analyses that indicated that it was primarily the self-discipline facet that confers longevity.

These arguments for the use of narrow trait composites suggest that NEO-FFI subcomponents represent a potentially important tool for users of the NEO-FFI. However, their replicability and psychometric characteristics in cross-validation samples remain largely unknown. With the exception of examination of norms and test-retest reliability in an Australian sample, little work has built on initial efforts. In this study, I pursued four primary aims in this respect. First, I assessed the degree to which the item clusters could be recovered in two independent samples of varying age and demographics via multiple-group confirmatory factor analysis [CFA]. Saucier also expressed concern about the imbalance between positively and negatively keyed items within item clusters. Therefore, my second objective was to assess whether item keying explained additional covariance among items within each domain beyond that attributable to the subcomponents. This was supplemented by examining the subcomponents’ differential correlations with a measure of socially desirable responding. Third, Saucier noted small reliability shrinkages in his cross-validation sample using coefficient alpha. Reliability assessment therefore constituted another aim of the study. Consistent with recent suggestions that alpha alone presents a limited picture of internal consistency [], I used three different internal consistency estimates varying in their psychometric assumptions to describe the subcomponents: alpha [α]; rho [ρ]; and , ] dimension free, lower bound, reliability estimate, theta [θ]. Finally, a fourth aim was to determine whether specific subcomponents either [a] correlated higher than broad domains with a small set of criterion items or [b] showed differential criterion correlations, thereby illuminating which elements of a broad domain drive criterion correlations.

I selected items to represent criteria for which narrow traits might vary in importance as well as criteria highly salient to each sample’s life stage. Items for Sample 1, young adult university students, included the extent to which, over the last semester, participants had socialized with the other sex, felt depressed, anxious, or contemplated suicide. I selected these items to reflect psychosocial conflict of intimacy versus isolation and well-known symptoms of college maladjustment []. I hypothesized the following subcomponent-criterion correlations to exceed those of other subcomponents within the domain and of the broad domain score: sociability and time spent socializing with the opposite sex, depression and amount of depressed mood reported over the semester, self-reproach and thoughts of suicide over the semester, and anxiety and the extent of anxious feelings reported over the prior semester.

Items for Sample 2, middle-age community adults, reflected satisfaction in domains salient to midlife from the perspective of ; see also ] midlife transitions: life satisfaction, marital satisfaction, job satisfaction, and engagement with creative or artistic pursuits. I hypothesized the following subcomponent-criterion correlations to exceed those of the criterion with other subcomponents within the domain or the broad domain score: positive affect and life satisfaction, goal striving and job satisfaction, prosocial orientation and marital satisfaction, and aesthetic interests and artistic hobbies. Finally, the revised University of California, Los Angeles [UCLA] Loneliness Scale [] served as a criterion measure for both samples. Social relationships remain important throughout the life course but serve different ends at different points in life. Because younger adults tend to cultivate a broad array of relationships in the service of knowledge acquisition goals [], I hypothesized that intellectual interests and sociability would be stronger [negative] predictors of loneliness than other subcomponents and domain scores in Sample 1. By midlife, Carstensen’s theory suggests that social relationships begin to function more as a means of emotion regulation; and thus, in Sample 2, depression and lower positive affect were hypothesized to predict loneliness better than other subcomponents or broad domains. I used both standard correlations and correlations corrected for attenuation due to measurement error [CAME; cf. ] in these analyses to gauge the effect on criterion correlations of probable decreased subcomponent reliability due to shorter length.

METHOD

Participants and Procedure

Sample 1 consisted of 308 young adults drawn from large introductory psychology classes [74% women; age M = 20, SD = 2.2; 66% White]. Sample 2 consisted of 256 middle-age adults drawn from the community [62% women; age M = 49, SD = 5.4; 76% White]. Approximately 70% of the sample was recruited through contacts in undergraduate classes using a snowball technique in which students provided surveys to their parents who then dispensed surveys to middle-age friends. The remainder of the sample was recruited through churches and a hospital, again using a snowball technique. The middle-age sample contained proportionally more women and Anglo-Americans and was also more educated than the college sample [15.23 years vs. 13.58 years]. Participants confidentially completed a battery of measures for a large-scale project on personality, intelligence, emotional intelligence, and social adjustment in young adulthood and middle age []. A total of 305 young adults and 209 midlife adults provided usable NEO-FFI data. I reverse scored all negatively keyed items prior to analyses. In an earlier article on emotional intelligence, reported means and standard deviations of NEO-FFI domains and the Revised UCLA Loneliness Scale for 291 of the young adult sample.

Other Measures

To assess social desirability, participants completed the 10-item version of the Marlowe-Crowne Social Desirability Scale []. Criterion variables in each sample consisted of the UCLA Loneliness Scale, Revised [], and single item measures from brief surveys. Three items asked middle-aged participants to rate their overall life satisfaction, marital satisfaction, and job satisfaction on a 5-point Likert scale ranging from 1 [not at all satisfied] to 5 [very satisfied]. Another item asked them to report the degree to which they were active in artistic pursuits such as music, painting, or writing as either a producer or consumer on a scale ranging from 1 [no involvement at all] to 5 [frequent involvement]. Four items were adapted for the young adults from the College Adjustment Rating Scales [CARS; , ] to assess the frequency with which young adults had engaged in certain behaviors or experienced certain problems. One behavioral item asked how often in the past semester participants had socialized with the other sex; and three mood items asked how frequently over the past semester they had felt anxious, felt depressed, or contemplated suicide. Response format ranged from 1 [not at all] to 5 [quite a bit]. The use of single-item criterion measures is common in survey research methodology and is consistent with prior bandwidth-fidelity research [cf. ; , ].

Overview of Analyses

I included only participants with complete data in the analyses, which proceeded in several steps. CFAs modeled the item clusters as oblique factors within each NEO-FFI domain separately and fixed their variances to unity for scaling purposes and freely estimated all item loadings, residuals, and factor covariances [cf. ]. I estimated models from polychoric correlation matrices, which approximate the continuous distributions underlying response scales involving ordinal categories [] and were suggested for use with the ordinal response scales of personality instruments [Panter, Swygert, Dahlstrom, & Tanaka, 1997]. I utilized maximum likelihood [ML] robust estimation [] to produce a chi-square fit statistic scaled to correct for departures from multivariate normality [the Satorra-Bentler Chi-Square].

CFA models proceeded in the following order. First, I tested models including only latent trait factors for each domain separately in each sample. Then I tested a second set of models in which each item loaded on a corresponding trait factor and on a secondary method factor representing common variance among either the positively or negatively keyed items of each domain [cf. ]. Figure 2 depicts the parameterization of this model for the Conscientiousness item clusters. I evaluated the extent to which these models fit the data better than models without method factors with the Akaike information criterion [AIC; ], which can be used to compare nonnested factor models [lower values indicate better fit]. Finally, in a third series of nested models, I assessed the subcomponents’ measurement invariance across age groups. These models incrementally constrained trait factor loadings, method factor loadings, residual variances, and factor covariances [cf. ; ] to ascertain whether items functioned differently across age groups. I gauged fit degradations by use of the Satorra-Bentler [] Chi-Square Difference Test only in cases in which baseline models had yielded nonsignificant chi-squares []. In all other cases, I made nested comparisons on the basis of the comparative fit index [CFI] difference test suggested by ; e.g., CFI changes greater than .01].

Open in a separate window

FIGURE 2

In this trait-method model, variance in the 12 items of the NEO-FFI Conscientious domain is partitioned into three latent trait factors, two method factors corresponding to the direction of item keying, and residual or item-specific variance.

recommended that the root mean square error of approximation [RMSEA], an absolute fit index, be given the heaviest weight in assessing models based on personality data because it is less affected by violations of multivariate normality typically seen in personality data. Similarly, after extensive simulation studies intended to reflect conditions typical of personality data [i.e., simple structure violations, non-normality, smaller sample sizes], also suggested the RMSEA as a trustworthy index of model fit. The RMSEA was supplemented with the CFI, a commonly used and well-supported incremental fit index. RMSEA vales below .05 indicate good fit, .05 to .08 acceptable fit, and .10 or above poor fit []; whereas CFI values greater than .90 are generally considered indicative of adequate fit [].

The reliability analyses also proceeded in multiple steps. Because , demonstrated that Cronbach’s α underestimates reliability in composites that are not essentially tau equivalent [i.e., in which items differ in amount of true score variance, regardless of intercept], I first tested each subcomponent for essential tau equivalence. These tests consisted of specifying a simple one-factor model in which item loadings were free to vary and then a similar model in which I constrained all item loadings to equality [cf. ]. Significant decreases in fit of the more restrictive model, which I identified in the same manner as previously, identified scales that did and did not meet the assumption of essential tau equivalence. Next, I compared three reliability coefficients for each subcomponent in each sample: Cronbach’s α; reliability coefficient ρ, which is derived from a unidimensional latent-variable model but does not assume essential tau equivalence among items; and , ] dimension-free, lower bound estimate of reliability θ, which does not even assume that a set of items is congeneric [i.e., multiple factors may characterize the item set].

Following reliability analyses, I calculated means, standard deviations, and subcomponent intercorrelations for each sample because adult and college norms for the NEO-FFI domains are different [; see also ]. Finally, I tested criterion correlations of NEO-FFI broad domain scores in each sample. To reduce Type I error rate, I only considered correlations significant at .01. For each significant domain correlation, I then examined the pattern of subcomponent correlations. I then corrected all correlations for measurement error. I conducted analyses in SPSS 10.0 and EQS 6.1.

RESULTS

Subcomponent Replication

Table 2 depicts the fit indexes for factor models of each NEO-FFI domain including the final multigroup model in which I constrained all possible item loadings, residuals, and factor covariances to equality across groups. In general, the most restrictive models specifying only latent traits evidenced adequate fit, according to the RMSEA [e.g., .048 for the Conscientiousness subcomponents to .086 for the Extraversion subcomponents, both in the middle-aged sample]. However, these values suggested that additional interitem covariance remained unexplained by models including only latent trait factors. When I added positive and negative item-keying method factors, RMSEAs decreased considerably. Substantial decreases in the AIC values also indicate that models including method factors provided a better fit. In general, the maximally constrained multigroup models with method factors provided the most general representation of the subcomponents as evidenced by their largely metric invariance, good absolute fit, and low AICs relative to less constrained multigroup models.

TABLE 2

Fit Indexes for Confirmatory Factor Models of NEO–Five Factor Inventory [NEO–FFI] Subcomponents by Domain

NEO-FFI Domain/ Factor ModelSB χ2dfpCFIRMSEAAICNeuroticism [scoring alternative 1] Self-reproach, negative affect  Young adults, trait factors only98.7453

Chủ Đề