These analyses of the HLQ in disparate settings using, largely, the Bayesian approach to structural equation modelling have provided a rigorous assessment of its psychometric properties in a sample of clients of a diverse group of community agencies. The principal goal of the paper was to contribute to the development of a sound evidence base for the valid use of the HLQ in community health settings. This goal was to be addressed by replicating the homogeneity, reliability and 9-factor structure of the HLQ scales for use in this setting, investigating further the discriminant validity of the scales, and establishing their measurement invariance across a diverse range of organisations and salient sociodemographic variables. These specific aims are addressed in turn in the following paragraphs.
When a small variance Bayesian prior was used to allow modest correlations among the item residuals, single factor CFA models for all HLQ scales were found to fit the data very well, thus establishing a satisfactory level of scale homogeneity. Additionally, the composite reliability of all scales, with only between 4 and 6 items, was >0.8.
A 9-factor model using small variance Bayesian priors for both cross-loadings and residual correlations similarly fitted the data very well thus replicating the hypothesised factor structure. All statistically significant cross-loadings were ≤0.25 and lower that their associated target loading. This model was also used to investigate the discriminant validity of the scales. Comparing the inter-factor correlations of each pair of HLQ scales to the average variance extracted by each scale in the pair clearly established the discriminant validity of 6 of the HLQ scales: all ‘agree/disagree’ scales and (even though its AVE was relatively low) Scale 9 from the ‘cannot do/very easy’ group (Understanding health information well enough to know what to do). The three other ‘cannot do/very easy’ scales did not show sufficient discriminant validity to establish a clear psychometric distinction between the constructs, however. The suggestion was made in the HLQ development paper that a higher-order factor may explain the relatively higher correlations between some of the ’cannot do/very easy’ scales . The cluster of scales with high inter-factor correlations in the present analysis supports this view. All items in Scales 6, 7 and 8 broadly connote a proactive approach to interactions with the healthcare system in relation to contact and collaboration with healthcare providers, navigating the system and obtaining information.
Notwithstanding these psychometric indications of insufficient discriminant validity of some scales, extensive field work, clinical interactions, and epidemiological work continue to support the application of the scales as independent indicators of a broad range of personal and social dimensions of health literacy. A recent epidemiological report showed different patterns of association of the HLQ scales with a number of important socio-demographic variables , while studies across three groups of South African residents living in an informal settlement outside Cape Town indicated somewhat different scale scores, and the items and scales were very meaningful to local clinicians and researchers . Additionally, numerous clinical consultations using each scale employing the Ophelia process  indicate the content of individual scales provides separate and useful information.
Overall, the data from this study and concurrent field work clearly show that HLQ scales measure different concepts. The inter-factor correlations indicate that some scales are highly correlated, namely 6, 7 and 8. This suggests that either a higher order factor or underlying causal connections in specific population groups might be present. In some settings high correlations can mean construct overlap and that the items might be best combined. This is unlikely to be the case here for several reasons: (a) the item content is underpinned by the results of concept-mapping that clearly differentiated distinct constructs; (b) the scales tend to be associated differently with important exogenous variables; and (c) clinical and health promotion groups have carefully considered potential interventions related to these scales and quite different interventions have been proposed . Factor analysis does not necessarily fully resolve issues associated with the conceptual structure of psychological measures . The logic of construct validation requires consideration of both the internal structure of a measuring instrument and its relationships with theoretically relevant exogenous variables (its ‘nomological network’ ). Differentiation both within the structure of a multi-scale measure and between the individual scales of the multi-scale measure and theoretically salient variables in the nomological network provides accumulating evidence to support discriminant validity in varying contexts. The wide variety of studies using the HLQ that are underway will continue to expand the evidence base for the discriminant validity of the nine scales in specific contexts.
Measurement invariance of the HLQ was investigated one scale at a time. A very strict measurement invariance model was studied in which all factor loadings, item intercepts, factor variances and item residual covariances were fixed to equality across groups and which also resulted in equality of item residual variances. When non-invariance was evident a follow-up alignment optimisation analysis was performed to establish more fully the nature of the non-invariance. All ‘disagree/agree’ scales were found to be fully invariant across the gender, age, educational level and the language background of the respondents as well as the organisations in which they were clients. Measurement invariance of Scales 6, 7, 8 and 9 was less well established. All four of these ‘cannot do/very easy’ scales were invariant across gender. Scale 6 (Ability to actively engage with healthcare providers) was, however, not fully invariant across organisation, Scales 7 and 9 (Navigating the healthcare system, Understanding health information well enough to know what to do) were not fully invariant across education and organisation, while Scale 8 (Ability to find good health information) was not fully invariant across education, home language and organisation. The follow-up alignment analysis indicated that all non-invariance detectable by this method was, however, metric (non-invariance of factor loadings) rather than scalar. A recent simulation study has shown that scalar non-invariance is a much more important source of bias than metric non-invariance when composite scale scores are compared across groups by, for example, ANOVA and ‘t’ tests .
From the perspective of the causal interpretation of the factor model, factor loadings are interpretable as validity coefficients [32, 39] in that they represent the “direct structural relation” between the latent variable and the indicator (, p. 197). Thus non-invariance of factor loadings reflects variation in the validity of the item as a measure of the latent construct in particular population sub-groups. In most instances this variation in HLQ item validity is readily interpretable. Thus, for example, the item ‘Get health information … you understand’ was found to have a higher factor loading in the group where English was not typically spoken at home, suggesting that ‘understanding’ health information has enhanced validity as an indicator of the ‘Ability to find good health information’ compared with the other factor indicators for this specific population group. Similarly, two items (‘Ask healthcare providers questions …’; and ‘Work out what the best care is …’) had comparatively enhanced validity for respondents from municipal community services whose clients may, generally, have had less familiarity and ease engaging with health practitioners, while both items that referred to understanding information delivered by healthcare providers had lower validity for clients of a domiciliary nursing service where regular contact with a specific provider may have reduced the salience of these items in comparison with the other items in the scale that referred to written health information. Such variations in item validity will be an important consideration for group comparisons if items are weighted in relation to their factor loadings in the generation of composite scores, but will be of limited concern if items are equally weighted as is typical with HLQ scoring.
When researchers, program managers and policymakers wish to make decisions on services or program needs of specific groups from data obtained from questionnaires, measurement invariance, particularly scalar invariance, is critically important. A questionnaire that is invariant returns unbiased estimates of mean differences or similarities of groups and unbiased estimates of other associations with exogenous variables. This study has demonstrated that comparisons across the great majority of population subgroups were invariant, and when non-invariant were very likely to involve factor loadings rather than the more critical item intercepts suggesting unbiased estimates of health literacy differences using composite scores can be obtained to support program and policy decisions.
While this study sought to provide evidence to support the valid use of the HLQ in the community-setting, it was limited to the use of the English-language HLQ and to data provided by clients of 8 organisations in one state in Australia. Additionally, while care was taken to select organisations from regions in the state with diverse sociodemographic and geographic characteristics, the healthcare organisations studied were in a sense self-selected in that they all responded positively to invitations to participate. Furthermore, while the organisations recruited for the study were encouraged to collect HLQ data from a sample that was as representative as possible of their target group, with substantial efforts to collect data from the ‘harder-to-reach’ clients, these efforts may not have been fully successful. These study characteristics potentially restrict the generalisability of the results and should be kept in mind by organisations in other regions in Australia and other English-speaking countries who intend to use the HLQ to study their client intakes. In particular, the study may have under-represented respondents with lower health literacy. It is arguable that such under-representation might influence the positive findings of measurement invariance across variables such as respondent education and home language. Accumulated experience with the use of the HLQ with difficult-to-reach client groups should assist in addressing this issue in the future.
Similarly, the present results may not be directly applicable to the use of the HLQ in other languages and cultures. Validity studies of translations of the HLQ are underway in German, Dutch, Czech, French, Spanish and other languages or, for the Danish version, are recently published , thus the process of cross-cultural validation for the use, interpretations and recommendations for action derived from the questionnaire are underway in these other settings.
Additionally, the present study does not address the issue of the sensitivity of the HLQ to change anticipated to derive from health-literacy focussed interventions, nor does it address whether any observed change is reliable and clinically meaningful. Sensitivity studies of this kind require longitudinal data, at least baseline to follow-up, and should include preliminary investigation of the longitudinal invariance of the scales.
Finally, the length of the HLQ might be seen as a limitation when used in some settings such as clinical locations. While the HLQ itself consists of 44 items and is typically accompanied by 13 or more sociodemographic and health status questions it was successfully administered in the present study by busy clinicians in the course of their usual clinical work. This suggests that the HLQ is written using words and concepts that respondents find straightforward to understand and can answer quite quickly. As there are 9 scales in the HLQ, the 44 items are necessary to ensure scale reliability while maintaining comprehensive coverage of the multidimensional health literacy concept. While comprehensive coverage of the health literacy construct is required in many studies, some, such as national surveys  and studies seeking to answer questions about select aspects of health, may use only one or more of the scales.