FUNCTIONAL STATUS


Functional status is usually conceptualized as the “ability to perform self-care, self- maintenance and physical activities.” The majority of indices of physical health and some psychological scales build their operational definitions of health on the concept of functioning: how far is the individual able to function normally and to carry on his typical daily activities? Alterations in function are commonly assessed at three sequential stages, termed impairment, disability, and handicap. Although functional status has primarily been evaluated within the general population, within chronic illness groups of various ages (e.g., arthritis), and in the community- dwelling elderly, review of the literature of the Medline database was conducted using the search terms functional status and quality of life, measurement, instrument and terminal care. The instruments reviewed here were selected only in terms of potential applicability for the physical/functional assessment of patients within the last 30 days of life. Multi-dimensional health and quality of life instruments were evaluated by others.

I. LISTING OF POTENTIAL INSTRUMENTS

a. The PULSES Profile (Moskowitz 1957).
b. Index of Independence in Activities of Daily Living (ADL) (Katz 1963).
c. The Barthel Index (Mahoney 1958).
d. The Kenny Self-Care Evaluation (Schoening 1965; Schoening 1968).
e. The Physical Self-Maintenance Scale (Lawton 1969).
f. The Medical Outcomes Study Physical Functioning Measure (Stewart 1992).
g. A Rapid Disability Rating Scale (Linn 1982).
h. The Dartmouth COOP Functional Health Assessment Charts (Nelson 1996).
i. The Functional Status Index (Jette 1978; Jette 1980).
j. The Edmonton Functional Assessment Tool (Kaasa 1997).
k. The Self-Evaluation of Life Function Scale (Linn 1984).
l. The Functional Activities Questionnaire (Pfeffer 1982; Pfeffer 1984).
m. The Lambeth Disability Screening Questionnaire (Patrick 1981).
n. Stanford Health Assessment Questionnaire (Fries 1982).
o. FIM
Instrument (Hamilton 1987).

Based on a review of the above functional assessment instruments, we selected six potential instruments for consideration, which are listed below.

II. LISTING OF SELECTED INSTRUMENTS

a. Index of Independence in Activities of Daily Living (ADL)

b. The Barthel Index

c. The Physical Self-Maintenance Scale

d. A Rapid Disability Rating Scale

e. Stanford Health Assessment Questionnaire

f. FIM™ Instrument

 

III. REVIEW OF SELECTED INSTRUMENTS

a. Index of Independence in Activities of Daily Living (ADL)

i. Conceptual and Measurement model. (Does the scale represent a single domain or do model scales measure distinct domains? Is the variability of the scale reported? If so, please document it. What is the intended level of measurement, i.e., ordinal, interval, ratio or category?)

The Index of Independence in Activities of Daily Living is an ordinal index designed to assess the physical functioning of elderly and chronically ill patients. It is also used as an indicator of the severity of chronic illness and to evaluate the effectiveness of treatment. A dichotomous rating (dependent/independent) of six ADL functions (in order of decreasing dependency): bathing, dressing, going to the toilet, transferring from bed to chair, continence, and feeding, rated on a three-point scale of independence. The most dependent degree of performance during a two- week period is recorded.

Empirically, the six activities included in the index were found to lie in a hierarchical order while other items, such as mobility, walking, or stair climbing, did not. The scale represents a natural progression in both the loss of ADL capacities and the return of these abilities upon recovery or rehabilitation.

ii. Reliability (Did developers address internal consistency; reproducibility?)

Little formal reliability (or validity) testing has been reported. Katz et al. assessed inter-rater reliability, reporting that differences between observers occurred once in 20 evaluations or less. Guttman analyses on 100 patients in Sweden yielded coefficients of scalability ranging from 0.74 to 0.88, suggesting that the index forms a successful cumulative scale.


iii. Validity (how did they address content validity? construct validity? criterion validity?)

Despite the widespread use of the scale, there is little evidence of the validity of the measure. Content validity: Katz presents some theoretical justification for the selection and inclusion of items on the scale. Construct validity: Katz et al. applied the index to 270 patients at discharge from a hospital for the chronically ill. Index scores were found to correlate (0.50) with a mobility scale and with a house confinement scale (0.39), evidencing a somewhat low degree of validity to not very well known instruments. The index of ADL was shown to predict the long- term course and social adaptation of patients with a number of conditions, including strokes and hip fractures, and was used to evaluate out-patient treatment for rheumatoid arthritis (Katz 1964; 1966; 1968).

iv. Responsiveness (Has scale been used as an outcome measure? What populations)

The scale has been used as an outcome by Asberg (1987) to predict length of hospital stay, likelihood of discharge home, and death (N=129). In predicting mortality, sensitivity was 73% and specificity, 80%; in predicting discharge, sensitivity was 90% and specificity, 63%. Similar results were obtained from ratings by physicians. Kane (1985) reports an unpublished study by Sherwood et al (1977) that showed the scale highly reproducible (coefficients .948 for patients in Worchester Home Care Study and .976 for the Fall River Sheltered Housing sample. However, such concise indices tend to be insensitive to small changes in disease severity and to focus on physical- performance measures which does not take adaptation to the environment into account.

Donaldson, Wagner and Gresham (1973) evaluated 100 patients using the Kenny Self-Care measure, the Barthel Self-Care measure, and the Katz ADL measure. The patients were re- evaluated one month later. The results demonstrated that the three scales moved in a parallel fashion for 68 of the patients. In 32 divergent scores, the expected hierarchy of sensitivity prevailed: the Kenny was most sensitive to change, followed by the Barthel Index (see below) and then by Katz. Most of the 8 unexpected patterns were accounted for because the Kenny Index does not include continence.

v. Interpretability (What populations has it been applied to? Is the score translated into a clinically relevant event? Does the score predict outcome events?)

The patient’s overall performance is then summarized on an eight-point scale that considers the numbers of areas of dependency and their relative importance. An alternative system counts the number of activities in which the individual is dependent, on a scale from 0 through 6, where 0 = independent in all 6 functions and 6 = dependent in all functions. Although the eventual rating is dichotomous, the form on which observations are made allows a differentiation of those able to perform the activity with human help; but for scoring, only those who can perform it are rated as independent. The scale has also been adapted as a Likert-type scale with each item assigned points according to a defined decision rule (e.g., 0 = no help needed; 1 = uses a device; 2 = needs human assistance; 3 = completely dependent). The sum of all items is then used to describe ADL activities (Kane, 1985).

Scoring of the Katz version has been criticized by Chen and Bryant (1975) and Kane (1985). Scores on ADL are usually based on the degrees of independence attained for each function. There is a need to establish decision rules for subjects who do not perform the activity or have no opportunity to perform the activity and Williams et al. (1976) questioned whether the Index actually functions as a Guttman scale.

vi. Burden (time/cost of administration)

Short 6 item rating scale. Time of administration is not reported.

vii. Alternate Forms (What are the modes of administration? Alternatives? If any, what is know for each of the above on the alternate versions?)

Katz’s version or some variant has used in the ill elderly, stroke patients, polio patients, rehabilitation, rheumatoid arthritis, Down’s syndrome individuals, home care patients, and hip fracture patients to name a few.

viii. Cultural and Language Adaptations

None reported.

ix. Conclusion

The measure is most appropriate for patients who are severely sick since minor disability frequently does not translate into the limitations in basic activities of daily living covered in the scale. It is not suitable for health surveys or in general practice as they are not sensitive to minor deviations from complete well-being. Issues of “in institutional setting” can be addressed in the response set, i.e., like PACE II for nursing homes including a response as “against nursing home policy” such as bathing. It is a useful index with a restricted range of patients though the range of disabilities included in the instrument is not comprehensive. The single index with a dichotomous score is limited due to loss of information about variability. Not Recommended.


b. The Barthel Index (Mahoney 1965)

i. Conceptual and Measurement model. (Does the scale represent a single domain or do model scales measure distinct domains? Is the variability of the scale reported? If so, please document it. What is the intended level of measurement, i.e., ordinal, interval, ratio or category?)

The Barthel Index is an ordinal scale that measures functional independence in the domains of personal care and mobility. It was designed to monitor performance in chronic patients and long-term hospital patients with paralytic conditions before and after treatment. It has been used with rehabilitation patients to predict length of stay and to indicate the amount of nursing care needed. Two main versions exist: the original 10-item form and expanded 15-item versions (Granger 1979,1984; Fortinsky 1981). Granger’s 15 item scale added 4 point scale of intact/limited/helper required/null.

ii. Reliability (Did developers address internal consistency; reproducibility?)

10-item version: Shah reported alpha internal consistency coefficients of 0.87 to 0.92 (admission and discharge) for the original scoring system and 0.90 to .93 for a revised scoring system. Wartski and Green (1971) retested 41 patients after a 3 week delay. For 35 patients, scores fell within ten points; in six cases of discrepant scores, two could be explained. Collin et al. (1988) studied agreement among four ways of administering the scale: self-report, RN clinical observation, testing by a nurse, and testing by a physiotherapist. Concordance among the four rating methods was 0.93 (no major disagreement for 60% of patients, disagreement on 1 rating for 28%, 12% had more discrepancies). Self-report accorded least well with the other methods; agreement was lowest for items on transfers, feeding, dressing, grooming, and toileting. Roy et al. (1988) found an inter- rater correlation of 0.99 and with patient self-report, 0.88. Sherwood et al (1977) reported high alpha reliability’s (ranging from .953 to .965) for three samples of hospital patients suggesting that the test is consistent internally as a measure of self-care activities.

15 item version: Granger et al (1979) reported a test-retest reliability of 0.89 with severely disabled adults; the inter-rater agreement exceeded 0.95. Shinar et al (1987) obtained an inter-rater agreement of 0.99 and a Cronbach’s alpha of 0.98 on 18 patients. They also compare a telephone interview with observation on 72 outpatients. Total scores correlated 0.97 and rho correlations exceeded 0.85 for all but one item.

iii. Validity (how did they address content validity? construct validity? criterion validity?)

10-item version: Wade (1987) reported validity correlation’s between 0.73 and 0.77 with an index of motor ability for 976 stroke patients. A factor analysis identified two factors, which approximate the mobility and personal-care groupings. Construct Validity: Granger et al (1979) found the 15-item version correlated with the PULSES Profile (-0.74 to -0.90). Wylie and White (1964) and Wylie (1967) found that the Barthel Index correlated well with clinical judgment and was shown to predict mortality and ability to be discharged to a less restrictive environment.

iv. Responsiveness (Has scale been used as an outcome measure? What populations?)

See above.

v. Interpretability (What populations has it been applied to? Is the score translated into a clinically relevant event? Does the score predict outcome events?)

The original ten activities cover personal care and mobility, omitting everyday tasks essential for life in the community (e.g., cooking, shopping). The original Index is a 3-item ordinal rating scale completed by a therapist or other observer in 2-5 minutes. Each item is rated in terms of whether the patient can perform the task independently, with some assistance, or is dependent on help based on observation (0=unable, 1=needs help, 2=independent). An overall score is formed by adding scores on each rating. Scores range from 0 to 100, in steps of 5, with higher scores indicating greater independence. Items are weighted and includes instructions for assessing the time it takes a subject to perform a task as a dimension of ability (although the utility of this score is open to question).

Scoring: different values are assigned to different activities. Individuals are scored on ten/fifteen activities which are summed to give a score of 0 (totally dependent) to 100 (fully independent). The scores are designed to reflect the amount of time and assistance a patient requires. However the scoring method is inconsistent in that changes by a given number of points do not reflect equivalent changes in disability across different activities.

Several authors have proposed guidelines for interpreting Barthel scores. Shah et al suggested that scores of 0-20 (on either the ten or 15 item version) indicate ‘total’ dependency, 21- 60 indicate ‘severe’ dependency, 61-90 indicate ‘moderate’ dependency, and 91-99 indicates ‘slight’ dependency. Lazar et al proposed the following for the 15-item version: 0-19: dependent; 20-59: self-care assisted; 60-79: wheelchair assisted; 80-89: wheelchair independent; 90-99: ambulatory assisted; 100 indicates independence. Granger takes a score of 60 as the threshold between more marked dependence and independence. Forty or below indicates severe dependence, with markedly diminished likelihood of living in the community. Twenty or below reflects total dependence in self-care and mobility. Most studies apply the 60/61 cutting point, with the recognition that the Barthel Index should not be used alone for predicting outcomes.

vi. Burden (time/cost of administration)

It takes 5 minutes to complete.

vii. Alternate Forms (What are the modes of administration? Alternatives? If any, what is know for each of the above on the alternate versions?)

Many modifications have been made to the Barthel scale, of which two have been commonly used: a variant of the ten-item version proposed by Collin and Wade (1988), which reordered the original ten items, clarified the rating instructions, and modified the scores for each item (3 point score). Total scores range from 0 to 20. This version moves from capacity rating to a performance rating. Shah et al (1989) retained the original ten items but proposed five-point rating scales for each item to improve sensitivity to detecting change. There is no consensus on what items are the “definitive” ten-item version.

A 15-item version proposed by Granger et al (1977; 1979; 1982), the Modified Barthel Index, which extended the index to cover 15 topics. The 1981 Granger version uses a four-point response scales for most items, with overall scores ranging from 0 to 100 and appears to be the more use version.

viii. Cultural and Language Adaptations

It has been translated into Japanese.

ix. Conclusion

The Barthel Index is widely used and refined and has been incorporated into broader evaluation instruments such as the Long-Range Evaluation System, developed by Granger et al. and the Uniform Data System for Medical Rehabilitation. The Barthel Scale should always be applied as a rating scale, self-reports may differ from professionals ratings.

The major criticism of the Index is the scoring system. While the original scale is simple to administer and focuses on physical limitations, interpreting the middle categories of the scale is difficult and inconsistent. The Collin and Wade and Shah approaches to scoring improve it but a definitive scoring approach is needed.

Moreover, the scale is restricted in that low levels of disability may not be detected, reflecting its origins as a measure for severely ill patients. Thus, while a score of 100 indicates independence in all ten areas, assistance may still be required with some IADLs, which are not included in the Barthel Index. Other scales also cover broader topics, such as communication, psychosocial, and situational factors. The 15 item adaptations hold more promise. Lawton has suggested that the original weighted scale is most useful for rehabilitation-center patients while the Likert self-care scales are a better measure for geriatric patients. Validity data are more extensive than those for many other ADL scales and the results appear superior to others that are reviewed. Further testing is required before its use can be recommended for use in community surveys and evaluations. However, there is literature supporting its use with specific groups of patients, such as those with neurological disability (Collin 1988; Wade 1988). While useful, the Barthel Index is not recommended since the Functional Status Independence Measure incorporates the Barthel Index.


c. The Physical Self-Maintenance Scale (PSMS) (Lawton 1969)


i. Conceptual and Measurement model. (Does the scale represent a single domain or do model scales measure distinct domains? Is the variability of the scale reported? If so, please document it. What is the intended level of measurement, i.e., ordinal, interval, ratio or category?)

The Physical Self-Maintenance Scale is a Guttman scale containing 6 items of self-care. The PSMS was designed as a disability measure for use in planning and evaluating treatment in elderly people living in the community or in institutions. The scale is based on the theory that human behavior can be ordered in a hierarchy of complexity, an approach similar to that used by Katz for the Index of ADL. The hierarchy runs from physical health through self-maintenance ADL and IADL, cognition, time use (hobbies), and social interaction. Within each category, a further hierarchy of complexity runs from basic to complex activities. The PSMS includes both ADL and IADL items.

ii. Reliability (Did developers address internal consistency; reproducibility?)

Little validity or reliability information is reported on the 8 IADL items. Rating scale and self-administered versions of the scale have been developed. Both were developed for people over 60 years of age and concentrate on observable behaviors. The rating version of the ADL may be administered by a variety of professionals. Inter-rater reliability for the 6-item ADL scale was demonstrated among pairs of nurses who rated 36 patients (Pearson r=0.87); between research assistance rating 14 patients (0.91). The 6 items fell on a Guttman scale when cutting points were set between independent (code 1 in each item) and all levels of dependency. The rank order of the items was feeding (77% independent), toilet (66%), dressing (56%), bathing (43%), grooming (42%), and ambulation (27%). A Guttman reproducibility coefficient was 0.96 (N=263).

iii. Validity (how did they address content validity? construct validity? criterion validity?)

The PSMS was tested on elderly persons, some in institutions others living at home. It moderately correlated (0.62) with a physician’s rating of functional health (N=130) and with an IADL scale(0.61) (N=77). It correlated less well with the Kahn Mental Status Questionnaire (r=0.38 and 0.38 with a behavioral rating of social adjustment).

iv. Responsiveness (Has scale been used as an outcome measure? What populations?)

Not useful for predicting outcomes.

v. Interpretability (What populations has it been applied to? Is the score translated into a clinically relevant event? Does the score predict outcome events?)

A 5-point rating scale, ranging from total independence to total dependence, is used for all 6 ADL items, which fall on a Guttman scale. There are two scoring methods: a count of the number of items on which any degree of disability is identified, or a severity scale that sums the response codes for each item, resulting in an overall score ranging from 6 - 30. No cutpoints were reported.

vi. Burden (time/cost of administration)

Not reported.

vii. Alternate Forms (What are the modes of administration? Alternatives? If any, what is know for each of the above on the alternate versions?)

A self-rating version of the PSMS ADL items were expanded for the OARS (both were based on the Katz ADL items) as well as the self-rating version of the PSMS (Lawton 1988). The PSMS IADL items were also included in the OARS and further adapted in the Multilevel Assessment Instrument.

viii. Cultural and Language Adaptations

None reported.

ix. Conclusion

The PSMS appears to be a reliable and valid ADL scale for clinical and survey research. The PSMS has not been widely reported on in the literature on its own but primarily when used in combination with other instruments. Not Recommended.


e. The Rapid Disability Rating Scale (RDRS) (Sherwood 1977)

i. Conceptual and Measurement model. (Does the scale represent a single domain or do model scales measure distinct domains? Is the variability of the scale reported? If so, please document it. What is the intended level of measurement, i.e., ordinal, interval, ratio or category?)

Based on Linn et al’s Cumulative Illness Rating scale, the instrument contains an ordinal scale of 16 items that are rated by medical staff on a three-point scale: no impairment or no special help (1); moderate impairment or assistance needed (2); and substantial or complete impairment or assistance needed (3). RN’s can complete the scale from first-hand knowledge of the patient in two minutes. The scale was developed as a research tool for summarizing the functional capacity and mental status of elderly chronic patients in hospital or community.

A revised scale of 18 items was published by Linn and Linn as the RDRS-2 in 1982 and added three items covering mobility, toileting, and adaptive tasks (i.e., managing money, telephoning, shopping) with a question on safety supervision omitted. A four-point scale replaced the earlier three-point scale. The RDRS-2 included 8 items on activities of daily living, three on mental capacity, and one on dietary changes, continence, medications, and confinement to bed.

ii. Reliability (Did developers address internal consistency; reproducibility?)

Inter-rater reliability of the original version was assessed by comparing ratings of 20 patients made independently by three raters; resulting in a Kendall’s W concordance coefficient of 0.91 (Linn, 1967). Test-retest reliability was investigated by repeating rating of 238 patients before and after admission to nursing homes. With a mean delay of three and a half days, the correlation between ratings was 0.83, and the mean scores of the two sets of rating were within one point of each other (Linn, 1967). Linn et al (1977) reported a one-week test-retest correlation of 0.89 on 1,000 male patients for the original version. Linn and Linn (1982) reported item reliability results for the revised version: two nurses independently rated 100 patients; item correlations ranged from 0.62 to 0.98; the three lowest correlations were for the mental status items. Test-retest reliability on 50 patients after 3 days produced correlation’s between 0.58 and 0.96.

iii. Validity (how did they address content validity? construct validity? criterion validity?)

A factor analysis of ratings of 120 hospitalized patients provided a three-factor solution reflecting activities of daily living, disability, and psychological problems (Linn 1982). Rating of 845 men were used to predict subsequent mortality using multiple regression and discriminant function analysis. Twenty percent of the variance in mortality was explained, correctly classifying 72% of patients who eventually died. Correlations of 0.27 were obtained between the RDRS-2 and a physician’s 13-item rating scale of impairment (N-172 community dwelling elderly); a low correlation of 0.43 was obtained with a six-point self-report scale of health (Linn 1982).

iv. Responsiveness (Has scale been used as an outcome measure? What populations?)

See above.

v. Interpretability (What populations has it been applied to? Is the score translated into a clinically relevant event? Does the score predict outcome events?)

No formal reference standards are available but the scores range from 18 to 72, with higher values indicating greater disability; items may be combined to provide three subscores indicating the degree of assistance required with activities of daily living, physical disabilities, and psychosocial problems. Linn and Linn (1982) reported that for the RDRS-2 scores for elderly community residents with minimal disabilities, average scores were 21 to 22. For hospitalized elderly patients, the average was around 32 and for those transferred to nursing homes, it was about 36. Response categories are phrased in terms of the amount of assistance the patient requires so that the instrument indicates handicap rather than impairment.

vi. Burden (time/cost of administration)

Very short: 2 minutes.

vii. Alternate Forms (What are the modes of administration? Alternatives? If any, what is know for each of the above on the alternate versions?)

The RDRS-2. See Above.

viii. Cultural and Language Adaptations

A French version has been used.

ix. Conclusion

This is a broad scale that rates the amount of assistance required in 18 activities, broader in scope than the PULSES, Barthel, and most ADL scales. It has been used in several evaluative studies. Its research orientation is reflected in the reliability and validity test, which is better than most scales. However, the validity of the measure could be improved (i.e., correlations with physicians’ ratings were low). The use of predictive validation is useful but rarely attempted with such scales and so the 20% explained variance is difficult to judge as high or low. Granger expressed predictive validity of the Barthel Index in terms of percentages of patients with low scores who died.

Moreover, criticisms have been made of the scoring system, in which the same weight is given to different degrees of disability (permanent confinement to bed and following a special diet both rate 3 points). This limits the validity of the scale in giving absolute indications of disability, although it may be less serious if the scale is used to monitor change over time (McDowell 1996). Kane argues that the RDRS lacks the definitional rigor that is preferred in a research instrument. However its utility may be in its ease of administration. Recommended due to the broader scope of items and efficient administration time requirements.


e. The Stanford Health Assessment Questionnaire (Fries 1980)

i. Conceptual and Measurement model. (Does the scale represent a single domain or do model scales measure distinct domains? Is the variability of the scale reported? If so, please document it. What is the intended level of measurement, i.e., ordinal, interval, ratio or category?)

The Health Assessment Questionnaire (HAQ) measures difficulty in performing activities of daily living. It was designed for the clinical assessment of adult arthritics but has been used in a wide range of research settings to evaluate care. The questionnaire is based on a hierarchical model that considers the effect of a disease in terms of death, disability, discomfort, the side effects of treatment, and medical costs. Except for death, these dimensions are divided into two sub- dimensions: upper/lower limb problems for the disability dimension and physical and psychological problems for the discomfort dimension. The sub-dimensions are then divided into components, which are further divided into individual question items.

The scale measures twenty items on daily functioning during the past week: dressing and grooming, rising, eating, walking, hygiene, reach , grip and outdoor activities. Each component includes 2-3 questions each. The scale can be self-administered or conducted by telephone or by personal interview. It can be completed in 5-8 minutes and scored in less than one minute. Each response is scored on a four-point scale of ability patterned after the American Rheumatism Association functional classification: from “without any difficulty” to “unable to do” and a check list records any aids used or assistance received. The highest score in each of the eight components is added to form a total (0-24); this is divided by 8 to provide a 0-3 continuous score, termed the Functional Disability Index.

ii. Reliability (Did developers address internal consistency; reproducibility?)

Fries (1980) compared interview and self-administered versions of the disability scale (N=20). Spearman correlation was 0.85, while correlations for individual sections ranged from 0.5 (IADL activities and hygiene) to 0.85 (eating). An abbreviated version with highly correlated items removed produced moderate item-total correlations, ranging from 0.51 to 0.81. Pincus et al (1983) reported alpha coefficients ranging from 0.71 (reaching) to 0.89 (eating). Milligan et al (1993) found alpha coefficient of 0.94 for the complete instrument, with maximum inter-item correlations of 0.75. Two-week test-retest reliability of the disability section was examined with 37 rheumatoid arthritics, showing no significant difference by t-test and a Spearman correlation of 0.87 (Fries 1990).

Goeppinger et al (1985) reported a one-week test-retest reliability of 0.95 (N=30 rheumatoid arthritis patients) and 0.93 (N=30 osteoarthritis patients). Fries et al (1982) administered the HAQ on successive occasions and obtained a retest correlation of 0.98 after six months.

iii. Validity (how did they address content validity? construct validity? criterion validity?)

Fries (1980) compared the self-administered HAQ responses to observations of performance made during a home visit (N=25). The Spearman correlation for the overall score was 0.88, while correlations for component scores ranged from 0.47 (arising) to 0.88 (walking). Various studies have demonstrated the validity of the HAQ in predicting health services utilization, clinical progression, and mortality. Wolfe et al (1988) show that the relative risk associated with a one- point increase in baseline disability score to mortality was 1.77. Ramey et al (1992) listed several dozen studies that have compared the HAQ with clinical and laboratory variables and also cited several studies that have used it as an outcome measure in randomized trials.

Principal component analyses have broadly confirmed the dimensions originally postulated by Fries: one main factor underlay 15 of the disability questions. The eight disability subscales are substantially correlated with each other: a median correlation of 0.44 among them has been reported (Daltroy 1990). Brown et al (1984) compared the HAQ with the Arthritis Impact Measurement Scale (AIMS). The correlations were 0.91 for the disability dimension and 0.64 for the pain questions. The two scales correlated 0.89 in another study (Hakala 1994). Laing et al (1985) compared the HAQ to the Sickness Impact Profile (SIP), the Functional Status Index (FSI), the AIMS, and the Quality of Well-Being Scale (QWB). The overall score on the HAQ correlated 0.84 with the AIMS, 0.78 with the SIP, 0.75 with the FSI, and 0.60 with the QWB. For the mobility scale, correlations of the HAQ and the other instruments were lower than the correlations among the other four scales.

Laing et al (1985) compared the relative efficiency of five measures, indicating their ability to identify change within subjects before and after hip or knee surgery. The HAQ placed in fourth place among the five measures. Other studies confirm that the overall and mobility scores on the HAQ require larger sample sizes to demonstrate a significant effect of treatment than would the AIMS or the SIP. However, the HAQ may be more sensitive to change than physical measures such as ESR, grip strength, or morning stiffness. Hawley and Wolfe (1992) showed the HAQ more responsive than physical measures or depression following rheumatoid arthritis treatment and the HAQ pain score was especially sensitive. Fitzpatrick et al (1989) found sensitivity to improvement in disease state over 15 months to be modest, at 65% (specificity 61%), with sensitivity to deterioration was 60% (specificity 73%).

iv. Responsiveness (Has scale been used as an outcome measure? What populations?)

See above.

v. Interpretability (What populations has it been applied to? Is the score translated into a clinically relevant event? Does the score predict outcome events?)

The hierarchic model expresses results at various levels of generality: question scores may be combined to form component (e.g., eating or dressing) and dimension (e.g., disability) scores. However, Fries et al argue against combining dimension scores to give an overall score as this involves value judgments of the relative importance of dimensions that my not hold across patients. Empirically, correlations across dimensions are lower than within dimensions so that disability, discomfort, psychological outcomes, cost, and death are separable outcomes. The full number of dimensions seems likely to be between 5 and 8.

Siegert et al suggested the following interpretations of overall scores: 0.0 - 0.5: the patient is completely self-sufficient; 0.5-1.25: the patient is reasonably self-sufficient and experiences some minor and even major difficulties in performing ADL; 1.25-2.0: the patient is still self-sufficient, but has many major problems with ADL; 2.0-3.0: the patient may be called severely handicapped.

The discomfort dimension of the HAQ includes a single question on physical pain in the past week using a 15 cm. visual analogue pain scale. Fries noted that attempts to elaborate pain activity by part of the body involved, times during the day which were painful, and severity of pain in different body parts failed to yield indexes that outperformed a simple analog scale.

vi. Burden (time/cost of administration)

The scale may be self-administered or may be applied in a telephone or personal interview. It can be completed in five to 8 minutes.

vii. Alternate Forms (What are the modes of administration? Alternatives? If any, what is know for each of the above on the alternate versions?)

Pincus et al abbreviated the HAQ by retaining only one question in each of the 8 disability components and added questions on satisfaction and change in activities, the Modified HAQ (MHAQ). An AIDS-HAQ includes 14 items from the Medical Outcomes Study instruments and 16 items from the HAQ. The items cover physical function, mental health, cognitive function, energy levels, and general health.

viii. Cultural and Language Adaptations

The HAQ has been adapted for use in Britain, Sweden, Spanish-speaking countries, Holland and France.

ix. Conclusion

The HAQ is a widely used instrument including use in the American Rheumatism Association Medical Information system and the National Health and Nutrition Survey. The design of the HAQ offers a scale that is broad in scope, yet brief. Compared to the Functional Limitations Profile, the HAQ showed similar agreement with clinical indicators and similar sensitivity to change over time but was much more quickly administered. The available evidence shows the HAQ to have strong reliability and validity (e.g., a correlation of 0.91 between the AIMS and HAQ physical scales).

There is a need to establish population reference standards for people with a range of disabilities and there is little information on its adequacy in non-arthritics. The major criticisms are focused on the HAQ’s scoring-- designed for simplicity but with a loss of precision. By counting only the highest score in each section, the HAQ summarizes the patient’s major difficulty but does not use all the information collected. In comparing scores over time, therefore, improvements in less severely affected areas of functioning may be missed, which may account for the high test- retest reliability of the HAQ, combined with its comparative insensitivity to measuring change. Liang et al concluded that the HAQ and the Index of Well-Being are poor candidates for use when mobility change is a major functional outcome. Thus, the HAQ is a good descriptive instrument but may be less appropriate as a tool for measuring clinical change in outcome studies. Recommended with caution.

g. FIM Instrument (Hamilton 1987).
FIM Instrument

i. Conceptual and Measurement model. (Does the scale represent a single domain or do model scales measure distinct domains? Is the variability of the scale reported? If so, please document it. What is the intended level of measurement, i.e., ordinal, interval, ratio or category?)

The FIM Instrument assesses physical and cognitive disability in terms of burden of care. It has been used to monitor patient progress and to assess outcomes of rehabilitation. It is a rating scale applicable to patients of all ages and diagnoses, by clinicians or by non-clinicians, and has been widely adopted by rehabilitation facilities in the US and Europe.

The FIM™ was based on the Uniform Data System for Medical Rehabilitation (UDS) to measure disability and estimating payments for rehabilitative medicine. The FIM™ is the central measure in the UDS which also measures demographic data, diagnosis, impairment groups, length of hospital stay, and charges. The UDS distinguishes between alterations in structure or function (impairment), activity (disability), and role (handicap). The FIM™ covers the activity and role levels, termed life functions. The FIM™ is not a comprehensive instrument but a basic indicator of disability and focuses on the burden of care: the level of a patient’s disability indicates the burden of caring for them and items are scored on the basis of how much assistance is required for the individual to carry out activities of daily living. Several stages of rehabilitation are identified and efficiency of care may be estimated by dividing the increase in life function (e.g., measured by improvement in FIM™ scores) by the cost of the rehabilitation services.

The FIM™ includes 18 items covering independence in self-care, sphincter control, mobility, locomotion, communication, and social cognition. The physical items were based on the Barthel Index. Three cognition items cover social interaction, problem solving, and memory. Ratings consider performance rather than capacity and may be based on observation, patient interview, or medical records. A decision tree is available for telephone interview. Evaluators are usually physicians, nurses, or therapists. Laypersons can be trained in 1 hour. The FIM™ takes 30 minutes to administer and score for each patient.

ii. Reliability (Did developers address internal consistency; reproducibility?)

Inter-rater tests conducted on patients from 25 facilities by physicians, nurses, and therapists produced intraclass correlation (the 4 point rating version) of 0.86 for 303 pairs of clinical assessments at admission and 0.88 for 184 pairs at discharge. Kappa indices of agreement for the 18 items averaged 0.54. The seven-point rating version, intraclass correlations for pairs of clinicians rating 263 patients ranged from 0.93 (locomotion subscale) to 0.96 (self-care and mobility). The mean kappa index of agreement between ratings for each items was 0.71. Alpha coefficients of 0.93 (admission) and 0.95 (discharge) were also found in 11,102 rehabilitation patients. The internal consistency of the locomotion subscale was lower, 0.68.

iii. Validity (how did they address content validity? construct validity? criterion validity?)

Content validity has been assessed. A Rasch analysis supported the division into motor and cognitive components; contrasting patterns of responses of different patient groups reflected the types of disability to be expected. Granger et al examined the predictive validity of the FIM™ in multiple sclerosis patients over a seven day period. The FIM™ items predicted the time required to provide help for personal care tasks (R2=0.77); correlations for several items exceeded 0.80; a change of one point on the FIM™ total score represented 3.8 minutes of care per day. Similar analyses for stroke patients (N=21) yielded an R2 of 0.65. Of 11,102 patients, Dodds et al found that FIM™ scores improved between admission and discharge and reflected the patients’ destinations. Scores also reflected the presence of coexisting conditions and the severity of impairments.

Correlations with other measures include 0.84 with the Barthel Index, 0.68 with Katz’s Index of ADL, and 0.45 with Spitzer’s Quality of Life Index.

v. Interpretability (What populations has it been applied to? Is the score translated into a clinically relevant event? Does the score predict outcome events?)

The scale has been applied to adults and children as young as 6 months to 7 years (the WeeFIM™). Seven-point ordinal ratings represent gradations of independence and reflect the amount of assistance a patient requires. For each item, two levels of independent functioning distinguish complete independence from modified independence--when the activity is performed with some delay, safety risk, or with an assistive device. Two dependent levels refer to the provision of assistance: modified dependence is when the assistant provides less than half the effort required to complete the task, and complete dependence is when the assistant provides more than half the effort. Finer gradations can be made within each level. A tool score sums the individual ratings; higher scores indicate more independent function: scores range from a low of 18 to a maximum of 126.

Granger et al outline a analysis that translates the ordinal score into an interval scale. An alternative scoring approach by Linacre and Heinemann argue that the 13 physical items should be scored separately from the five cognitive items.

vi. Burden (time/cost of administration)

Evaluators are usually physicians, nurses, or therapists. Laypersons can be trained in 1 hour. The FIM™ takes 30 minutes to administer and score for each patient.

vii. Alternate Forms (What are the modes of administration? Alternatives? If any, what is know for each of the above on the alternate versions?)

The FIM™ has been used with children as young a 7 years old.

viii. Cultural and Language Adaptations

The FIM™ has been translated into French, German, Japanese, and Swedish.

ix. Conclusion

The FIM™ is a widely used scale with proven reliability and validity. Documentation for the FIM™ is outstanding. The major strength of the FIM™ lies in its basis in the UDS. The physical components of the FIM™ appear comparable to the best among the other ADL instruments. The cognitive and social communication dimensions may have low sensitivity. Overall, the FIM™ is a sound instrument which benefits from outstanding support services. Viewed as a brief disability measure rather than a general health instrument, it can be used as a patient assessment tool and also as an evaluative instrument. Recommended.

Conclusion

The six highlighted instruments each have something to recommend them for the measurement of functional status in the last month of life. Reliability, validity, brevity as well as breadth of dimensions were important considerations in the evaluation of these instruments. Of the six instruments, three had good potential for application in the toolkit. Those instruments are: the Rapid Disability Rating Scale (RDRS), the Health Assessment Questionnaire (HAQ), and the Functional Independence Measure (FIM™).

The Rapid Disability Rating Scale (RDRS-2) includes 8 items on activities of daily living, 3 on mental capacity, and one each on dietary changes, continence, medications, and confinement to bed using a four-point scale. Thus, it is broader in scope than the PULSES, Barthel, and most ADL scales. It is easy to administer: RN’s can complete the scale from first-hand knowledge of the patient in a short two-five minutes. It has fairly high reliability (0.68 - 0.98) where the three lowest correlations were for the mental status items. Test-retest reliability on 50 patients after 3 days was between 0.58 and 0.96. However, the validity of the scale could be improved. Correlations of 0.27 were obtained between the RDRS-2 and a physician’s rating scale of impairment and a low correlation of 0.43 was obtained with a six-point self-report scale of health. The use of predictive validation was useful but the 20% explained variance is difficult to interpret. Criticisms have been made of the scoring system, where the same weight is given to different degrees of disability. This limits the validity of the scale in giving absolute indications of disability, although it may be less serious if the scale is used to monitor change over time.

The Health Assessment Questionnaire (HAQ) is a widely used instrument with strong reliability and validity (e.g., a correlation of 0.91 between the AIMS and HAQ physical scales). The major criticisms are focused on the HAQ. By counting only the highest score in each section, the HAQ summarizes the patient’s major difficulty but does not use all the information collected. In comparing scores over time, therefore, improvements in less severely affected areas of functioning may be missed. The HAQ is a good descriptive instrument but may be less appropriate as a tool for measuring clinical change in outcome studies. Moreover, there is little information on its adequacy in non-arthritics.

The FIM™ is an ordinal scale covering 18 items measuring physical and cognitive disability in terms of burden of care. It uses a 7-point rating scale and takes approximately 30 minutes to administer and score each patient. The FIM™ has proven reliability and validity. For the seven-point rating version, intraclass correlations for pairs of clinicians rating 263 patients ranged from 0.93 (locomotion subscale) to 0.96 (self-care and mobility). The mean kappa index of agreement between ratings for each item was 0.71. Correlations with other measures include 0.84 with the Barthel Index, 0.68 with Katz’s Index of ADL, and 0.45 with Spitzer’s Quality of Life Index.

It has been applied to adults as well as children. The rating scale distinguishes complete independence from modified independence as well as if the activity is performed with some delay, safety risk, or with an assistive device. The physical components of the FIM™ appear comparable to the best among the other ADL instruments. The cognitive and social communication dimensions may have low sensitivity. Overall, the FIM™ is a sound. Viewed as a brief disability measure rather than a general health instrument, it can be used as a patient assessment tool and also as an evaluative instrument.

All three scales are recommended. The differences are in terms of items included in the scales and time to administer. The FIM™ is the more established (reliability, validity, and discrimination). However, if time is a major consideration, then the HAQ or the RSDS is recommended for brevity. Also recommended is the inclusion of an overall “perceived” physical condition, such as the EORTC QLQ-C30:

“How would you rate your overall physical condition during the past week?”
Very Poor 1 2 3 4 5 6 7 Excellent

For more information go to http://www.udsmr.org


IV. Priorities for Future Research

1. More theoretical development and psychometric testing of the functional status measures in terminally ill patients is needed. Information about the sensitivity and scope, validity, reliability, and conceptual clarity of the functional status measures in this special population is required.


References

Asberg KH. Disability as a predictor of outcome for the elderly in a department of internal medicine. Scand J Soc Med 1987;15:261-265.

Chen MK, Bryant BE. The measurement of health-a critical and selective overview. Int J Epidemiol 1975;4:257-264.

Collin C, Wade DT. The Barthel ADL Index: a reliability study. Int Disabil Stud 1988;10:64-7.

Daltroy LH, Larson MG, Roberts NW, et al. A modification of the Health Assessment Questionnaire for the spondyloarthrolpaties. J Rheumotol 1990;17:946-50.

Dodds TA, Martin DP, Stolov WC, et al. A validation of the Functional Independence Measurement and its performance among rehabilitation inpatients. Arch Phys Med rehabil 1993;74:531-536.

Donaldson SW, Wagner CC, Gresham GE. A unified ADL evaluation form. Arch Phys Med Rehabil 1973;54:175-9

Fitzpatrick R, Newman S, Lanb R, et al. A comparison of measures of heath status in rheumatoid arthritis. Br J Rheumatol 1989;28:201-206.

Fortinsky RH, Granger CV, Sletzer GB. The use of the functional assessment in understanding home care needs. Med Care 1981;19:489-97.

Fries JF, Spitz PW. The hierarchy of patient outcomes. In: Spilker B, ed. Quality of life assessments in clinical trials. New York: Raven Press, 1990: 25-35.

Fries JF, Spitz P, Kraines RG, et al. Measurement of patient outcome in arthritis. Arthritis Rheum 1980; 23:137-145.

Fries JF, Spitz PW, Young DY. The dimensions of health outcomes: the Health Assessment Questionnaire, disability and pain scales. J Rheumatol 1982; 9:789-793.

Goeppinger J, Doyle M, Murdock B. Self-administered function measures: the impossible dream? Arthritis Rheum 1985;28:145.

Granger CV, McNamara MA. Functional assessment utilization: the Long-Range Evaluation System (LRES). In Granger CV, Gresham GE, eds. Functional assessment in rehabilitation medicine. Baltimore, Maryland: Williams and Wilkins, 1984:99-121.

Granger CV. Health accounting-functional assessment of the long-term patient. In: Kottke FJ, Stillwell GK, Lehman JF, eds. Krusen’s handbook of physical medicine and rehabilitation. 3rd ed. Philadelphia, Pennsylvania: WB Saunders, 1982: 253-274.

Granger CV, Albrecht GL, Hamilton BB. Outcome of comprehensive medical rehabilitation; measurement by PULSES Profile and the Barthel Index. Arch Phys Med Rehabil 1979;60:145-154.

Granger CV, Sherwood CC, Greer DS. Functional status measures in a comprehensive stroke care program. Arch Phys Med Rehabil 1977;58:555-561.

Hakala M, Nierminen P, Manelius J. Joint Impairment is strongly correlated with disability as measured by self-report questionnaires Functional status assessment in individuals with rheumatoid arthiritis in a population based series. J Rheum 1985;28:542-547.

Hamilton BB, Granger CV, Sherwin FS, et al. A uniform national data system for Rehabilitation outcomes: analysis and measurement. Baltimore, MD: Paul H. Brookes, 1987:137-147.

Hawley DJ, Wolfe F. Sensitivity to change of the health assessment questionnaire (HAQ) and other clinical and health status measures in rheumatoid arthritis. Arthritis Care Research 1992;5:130-6.

Jette AM, Deniston OL. Inter-observer reliability of a functional status assessment instrument. J Chronic Dis 1978; 31:573-580.

Jette AM. Functional capacity evaluation: an empirical approach. Arch Phys Med Rehabil 1980; 61:85-89.

Kaasa T, Loomis J, Gillis K, et al. The Edmonton functional assessment tool: preliminary development and evaluation for use in palliative care. J Pain Symptom Manage 1997;13:10-9.

Katz S, Vignos PJ, Moskowitz RW, et al. Comprehensive outpatient care in rheumatoid arthritis: a controlled study. JAMA 1986;206:1249-1254.

Katz S, Ford AB, Chinn AB, et al. Prognosis after strokes: II. Long-term course of 159 patients with stroke. Medicine 1966;45:236-246.

Katz S, Ford AB, Moskowitz RW, et al. Studies of illness in the aged. The Index of the ADL: a standardized measure of biological and psychosocial function. JAMA 1963;185:914-919.

Laing MH, Larson MG, Cullen KE, et al. Comparative measurement efficiency and sensitivity of five health status instruments for arthritis research. Arthritis Rheum 1985;28:542-547.

Lawton MP. Scales to measure competence in everyday activities. Psychopharmacol Bull 1988;24:609-614.

Lawton MP, Brody E. Assessment of older people: Self-maintaining and instrumental activities of daily living. Gerontologist 1969, Vol. 9:179-186.

Lawton MP, Brody EM. Assessment of older people: Self-Maintaining and Instrumental Activities of Daily Living. Gerontologist 1969; 9:180.

Lazar RB, Yarkony GM, Ortolano D, et al. Prediction of functional outcome by motor capability after spinal cord injury. Arch Phys Med Rehabil 1989;70:819-822.

Linacre JH, Heinemann AW. The structure and stability of the Functional Independence Measure. Arch Phys Med Rehabil 1994;75:133-43.

Linn MW, Gurel L, Linn BS. Patient outcomes as a measure of quality of nursing home care. Am J Public Health 1977;67:337-44.

Linn MW, Linn BS. Physical resistance in the aged. Geriatrics 1967;22:134-8.

Linn MW, Linn BS. Self-evaluation of life function (SELF) scale: a short, comprehensive self- report of health for elderly adults. J Gerontol 1984;19:603-12.

Linn MW, Linn BS. The Rapid Disability Rating Scale-2. JAGS 1982; 30:378-382.

Mahoney FI, Wood OH, Barthel DW. Rehabilitation of chronically ill patients: the influence of complications on the final goal. Southern Med Journal 1958; 51:605-609.

Malhoney FI, Barthel DW. Maryland State Med Journal 1965:14-62.

McDowell T, Newell C. Measuring health: a guide to rating scales and questionnaires. 2nd ed. New York: Oxford University Press, 1996.

Milligan SE, Hom DL, Ballou SP, et al. An assessment of the Health Assessment Questionnaire functional ability index among women with systemic lupus erythematosus. J Rheumatol 1993;20:972-976.

Moskowitz E, McCann CB. Classification of disability in the chronically ill and aging. J Chronic Dis 1957; 5:342-346.

Nelson EC, Wasson JH, Johnson DJ, Hays RD. Dartmouth COOP functional health assessment charts; brief measures for clinical practice. In: Spilker B. Quality of life in pharmacoeconomics in clinical trials. Philadelphia: Lippincon-Raven Publishers, 1996.

Patrick DL, Darby SC, Green S, et al. Screening for disability in the inner city. J Epidemiol Community Health 1981; 35:65-70.

Pfeffer RI, Kurosaki TT, Chance JM et al. Use of the Mental Function Index in older adults: reliability, validity, and measurement of change over time. Am J Epidemiol 1984; 120:922-935.

Pfeffer RI, Kurosaki TT, Harrach CH, et al. Measurement of functional activities in older adults in the community. J Gerontol 1982; 37:323-329.

Pincus T, Summey JA, Soraci SA, et al. Assessment of patient satisfaction in activities of daily living using a modified Stanford Health Assessment Questionnaire. Arthritis Rheum 1983;26:1346-53.

Ramey DR, Raynauld JP, Fries JF. The health assessment questionnaire 1992: status and review. Arthritis Care Res 1992; 5:119-29.

Roy CW, Togneri J, Hay E, et al. An interrater reliability study of the Barthel Index. Int J Rehabil Res 1988;11:67-70.

Schoening HA, Anderegg L, Bergstrom D, et al. Numerical scoring of self-care status of patients. Arch Phys Med Rehabilitation 1965; 46:689-697.

Schoening HA, Iversen IA. Numerical scoring of self-care status: a study of the Kenny Self-Care Evaluation, Arch Phys Med Rehabil 1968; 49:221-229.

Shah S, Vanclay F, Cooper B. Improving the sensitivity of the Barthel Index for stroke rehabilitation. J Clin Epidemiol 1989;42:703-9.

Sherwood S, Morris J, Mor V, et al. Compendium of Measures for describing and assessing long- term care populations. Boston: Hebrew Rehabilitation Center for the Aged, 1977.

Shinar R, Gross CR, Bronstein KS, et al. Reliability of the activities of daily living scale and its use in telephone surveys. Arch Phys Med Rehabil 1987;68:723-8.

Siegert CE, Vleming LJ, Venderbrouke JP, et al. Measurement of disability in Dutch rheumatoid arthritis patients. Clin Rheumotol 1984; 3:305-9.

Stewart A, Kamberg CJ. Physical functioning measures. In Stewart A, Ware JE (eds). Measuring functioning and well-being: the Medcial Outcomes Study approach. Durham, North Carolina: Duke University Press, 1992:86-101.

Wade DT, Hewer RL. Functional abilities after stroke: measurement, natural history and prognosis. J Neurol Neurosurg Psychiatry 1987;50:177-82.

Wade DT, Collin C. The Barthel ADL Index: a standard measure of physical disability? Int Disabil Stud 1988;10:64-7.

Wartski SA, Green DS. Evaluation in home-care program. Med Care 1971;9:352-64.

Williams RE, Johnston RH, Al-Badran, RH, et al. Disability: a model and measurement technique. Br J Prev Soc Med 1976;30:261-9.

Wolfe F, Kleinheksel SM, Cathey MA, et al . The clinical value of the Stanford Health Assessment Questionnaire Functional Disability Index in patients with rheumatoid arthritis. J Rheumatol 1988; 15: 1480-8.

Wylie CM. Measurement and results of rehabilitation of patients with stroke. Public Health Rep 1967; 82:893-8.

Wylie CM, White BK. A measure of disability. Arch Environ Health 1964;8:834-839.



Section prepared by Anne Wilkinson

 



Funding for this project was provided by
rwjf.gif (2879 bytes)

This web site is published by the Center for Gerontology and Health Care Research, Brown Medical School. For further information, e-mail Dr. Joan Teno or contact her at Brown Medical School, Box G-HLL, Providence, RI, 02912, USA. For questions or comments regarding this website, please e-mail the webmaster. Last edited February 17, 2004.