The Multiple Resources Questionnaire
David B. Boles
University of Alabama
Introduction
The Multiple Resources Questionnaire (MRQ) is an easily-administered paper-and-pencil rating instrument for measuring subjective workload both process-wise and in aggregate. It was developed out of recognition that alternative subjective workload instruments such as NASA-TLX and SWAT, while valuable as measures of overall workload, are not diagnostic with respect to workload within specific mental processes. Most of the resources included in the MRQ are factor-analytic derived and orthogonal to one another. The currently recommended, expanded-scale instrument is available here, but if desired the original instrument can be downloaded here.
Since its first publication in 2001, the questionnaire has been described in book chapters, and has been the subject of reliability and validity studies that have also refined its administration, scoring, and interpretation. In the text following, the references to these chapters and papers are given along with descriptions of content.
Background
An encyclopedia chapter by Boles (2006), revising an earlier chapter (Boles, 2001), placed the multiple resources construct within historical context as a refinement of resource pool theory. It extensively cited the seminal work of C.D. Wickens and colleagues, and noted that Boles and Law (1998) proposed a reinterpretation and expansion of Wickens' Multiple Resource Theory (MRT). The reinterpretation was that resources should be identified with particular mental processes, regardless of whether they are better regarded as attentional or structural in nature. The proposed expansion was based on the assertion that "orthogonal mental processes have orthogonal resources", which if true would require a substantial expansion of the resources encompassed by MRT. Boles and Law presented original data supporting the assertion.
The 2006 chapter continued by describing the resources included in the MRQ, giving as references Boles and Law (1998) for 11 of them, Boles (2002) for an additional 3, and Wickens (1984) for the remaining 3. The first 14 ultimately derived from factor analytic research indicating that all involved largely orthogonal lateralized processes (besides Boles, 2002, also Boles, 1991, 1992, 1996).
Although originally published in 2001 (Boles & Adair, 2001a), the MRQ was republished in more widely accessible form in 2007 (Boles, Bursk, Phillips, & Perdelwitz, 2007). On that occasion it was accompanied by a critique by Vidulich and Tsang (2007), a reply by Boles and Phillips (2007), and a commentary by Wickens (2007).
Administration
Boles and Adair (2001a) raised two issues with respect to the administration of the MRQ. One issue concerned the means by which the instructions are administered. Although written instructions are provided as part of the MRQ, some users appear not to read them carefully, and as a result are inaccurate in their ratings. It is suggested that orally reading the instructions to the user may circumvent some of these problems. The second issue raised in the article is the possibility that it might be desirable to delete items from the questionnaire under some circumstances, for example where an item is clearly irrelevant. Boles and Adair (2001b) even suggested that items could be added if believed appropriate to a particular work environment.
Scoring
Boles and Adair (2001b) scored the MRQ by summing ratings across all 17 items. This worked well in Experiment 1 when correlating overall workload on the MRQ to overall workload derived from other instruments. However, when the MRQ was used to predict dual-task interference in Experiment 2 it was not obvious a priori whether overall workload should be used, or some measure taking into account similarities and differences between tasks across the 17 items. Experiment 2 therefore empirically constrasted the metrics of (a) overall demand (the sum of ratings), (b) overlap similarity (the sum of the minimum rating of the two tasks on each item), and (c) profile similarity (the correlation between ratings of the two tasks, across the 17 items). In this experiment none of the metrics proved a better predictor than the others.
However, in a later study using computer-based tasks the overlap similarity metric was found to be the best predictor (Boles et al., 2007). This article also presented a computational example for each metric. It is currently recommended that the overlap similarity metric be used in dual-task applications of the MRQ.
Although as noted, Boles and Adair (2001b) summed ratings to compare overall workload across instruments, an alternative scoring method was proposed by Finomore, Warm, Matthews, Riley, Dember, Shaw, et al. (2006) for use in such situations. Using a vigilance task, they identified those items rated with nonzero values by at least 50% of the participants, and assumed that those items reflected the resources used in the task. Overall workload was then measured as the sum of ratings across those items. An argument in favor of this method is that it proved reproduceable: Of 8 "at least 50%" resources identified in Experiment 1, 7 were also "at least 50%" in a very similar Experiment 2. In a later vigilance study involving detection of changes in barrel lengths of simulated tanks, Finomore, Shaw, Warm, Matthews, Weldon, and Boles (2009) found that the same criterion identified between 8 and 13 resources as important depending on whether the task involved simultaneous or successive discriminations, or was performed alone or together with a second task.
However, a complication of using the "at least 50%" criterion emerged in a vigilance study reported by Finomore, Shaw, Warm, Matthews, Riley, Boles, et al. (2008). In this study, both visual and auditory conditions were used, and somewhat different resources met the criterion across the two modalities. Measuring overall workload as the sum of ratings across those resources would therefore have resulted in potentially noncomparable scores. Thus, in order to compare modalities, Finomore et al. used what they termed a joint resource profile that counted only those resources rated "at least 50%" in both the visual and auditory vigilance tasks (specifically, the manual process and short term memory items). This profile was found to be sensitive to event rate, both alone and in interaction with sensory modality. Finomore et al. (2009) used the same approach when examining common resources used in simultaneous vs. successive and single- vs. dual-task conditions of a vigilance task, identifying 7 joint resources.
Klein, Warm, Riley, Matthews, Gaitonde, Donovan, and Doarn (2008) used a conceptually similar approach to reducing the number of resources. They subjected MRQ ratings of workload in a simulated surgical task to Bonferroni-corrected t-tests against a null hypothesis of zero. In other words, they assessed usage that was significantly nonzero. This approach identified 8 resources used in the task. Klein, Lio, Grant, Carswell, and Strup (2009) used the same procedure to identify 10 resources used in simulated surgery with 2D and 3D displays. One point in favor of this approach is that it is statistically more rigorous than the "at least 50%" criterion. However, it may also run the risk of overidentifying resources. Thus Fincannon, Evans, Phillips, Jentsch, and Keebler (2009) found that all 17 resources were identified as significant in a military operations teamwork setting.
Two generalizations emerge from this literature. First, the scoring of the MRQ should emphasize the process-related similarities of tasks when used to predict dual-task performance. Thus far it appears that overlap similarity, the sum across the items of minimum ratings, is at least as good as other alternatives. Second, when assessing the overall workload of tasks, paring down the number of resources that are included in the scoring provides a more focused comparison that eliminates potential "noise" deriving from rarely-used items. There is presently no consensus how to best achieve the reduction, although both the "at least 50%" and "significantly nonzero" approaches appear useful. Another possibility yet to be explored is that either one of these approaches could be used to identify critical resources separately for each condition in the study, with all of the ones so identified then included in comparisons between conditions. For example, if resources 1, 8, and 12 are identified for Condition A, and resources 2, 8, and 15 are identified for Condition B, the overall workload index for purposes of comparison would be based on resources 1, 2, 8, 12, and 15.
Interpretation
Although a 17-item scale may be capable of measuring workload within particular resources, it is difficult to characterize tasks by referring to all 17 items. Boles and Adair (2001b) and Boles et al. (2007) suggested characterizing them by identifying the 3 top-rated resources. However, as indicated, Finomore et al. (2006) used an alternative criterion involving identifying those items rated with nonzero values by at least 50% of participants, and Klein et al. (2008) used a significance criterion to achieve a similar aim. Their alternatives can be viewed as striking a reasonable balance between the sparseness of items identified among the "top three", which may not sufficiently distinguish between tasks, and the superfluity of items provided by the full 17-item scale.
Reliability
Reliability data were published by Boles and Adair (2001a). In that article, pairwise interrater reliabilities were reported to range from +.57 to +.83 across a variety of computer-based video games, and from +.62 to +.63 for two simple laboratory-based perceptual tasks. However, in most envisaged uses of the MRQ, it is likely that multiple raters will complete the MRQ. In that regard, the article reported that reliability is approximately +.9 when the results are aggregated over 8 or more raters.
Validity
The initial validity data were published by Boles and Adair (2001b). Experiment 1 of that study used two computer-based video games and examined intercorrelations between workload measures based on (a) the MRQ, (b) Overall Workload (OW), and (c) a Demand Workload (DW) scale that incorporated key dimensions from NASA-TLX and SWAT. Intercorrelations ranged from +.27 to +.65, and all were significant, results supporting the construct validity of the MRQ. In Experiment 2, the MRQ was found to predict the amount of mutual interference in reaction time and errors between dual tasks constructed from the possible pairings of 4 laboratory tasks. These results were taken as supporting the criterion validity of the MRQ.
Boles et al. republished the laboratory task data in more detail in 2007, and added a new dual-task study using 3 computer-based games. In the latter, the MRQ was found to predict dual-task interference with a correlation of +.83, using the overlap similarity metric. This was again supportive of the criterion validity of the questionnaire.
Finomore et al. (2006) reported a significant correlation (r= +.29) between the mental demand subscale of the NASA-TLX, and overall workload as measured by the MRQ, within a vigilance task. Finomore et al. (2008) reported a rather similar, significant correlation (r= +.35) between their joint resource profile score and the global NASA-TLX. They also found several significant correlations between MRQ items and the NASA-TLX subscales, mostly involving the mental demand subscale.
Although Fincannon et al. (2009) found no correlation between workload measured by the MRQ and by the NASA-TLX, both were measured globally, using all 17 items of the MRQ and all subscales of the NASA-TLX. This may have resulted in less powerful assessment than a more targeted approach would have provided, using only the most critical resources of the MRQ and the mental demand subscale of the NASA-TLX.
While the results are not entirely consistent, when taken together they indicate that the MRQ has construct validity. Future research on the relationship of the MRQ to the NASA-TLX might best focus on specific resources and subscales.
In addition to construct and criterion validity, the content validity of the MRQ is attested by its diagnosticity in identifying critical resources, an issue discussed separately below.
Sensitivity
Finomore et al. (2006) compared the sensitivity of the MRQ to the NASA-TLX workload measure, using a vigilance task. In Experiment 1, using the original 0-to-4 rating scale of the MRQ, the NASA-TLX but not the MRQ was sensitive to the higher workload involved in target-absent trials relative to target-present trials. However, the authors suspected that the small scoring range of the original MRQ might have been responsible for its insensitivity. In Experiment 2, they used an expanded 0-to-100 scale, and the MRQ proved sensitive to the absent-present workload difference. Because of this outcome, the 0-to-100 scale is recommended in order to maximize the sensitivity of the MRQ.
Klein et al. (2009) used the 0-to-100 scale and found that the MRQ, unlike the NASA-TLX, was not sensitive to a workload difference between 2D and 3D simulated surgery tasks. However, the data were subjected to an analysis of variance (ANOVA) using all the items of the MRQ as one factor (17 levels) and display as a second factor (2 levels). It is likely that this resulted in a less powerful analysis than would have been the case if only the critical items of the MRQ were included.
Consistent with that conclusion, Finomore et al. (2009) confined their analysis to the 7 resources meeting the "at least 50%" criterion across all conditions of their study. The 0-to-100 scale was used. It was found that while both the NASA-TLX and the MRQ were sensitive to simultaneous vs. successive discriminations, with simultaneous discriminations showing lower workload, only the MRQ found workload to be lower in the single-task than in the dual-task condition.
Obviously these data are limited, but they suggest that when the items on the MRQ are restricted to those most important in a task, the sensitivity of the MRQ to workload manipulations is at least equal to that of the NASA-TLX.
Diagnosticity
The major motivation for development of the MRQ was to provide greater diagnosticity in mental workload measurement than is available using other instruments.
Klein, Riley, Warm, and Matthews (2005) used the MRQ to measure workload during use of an endoscopic surgery simulator, and found ratings that were significantly greater than zero for the manual, short term memory, spatial attentive, spatial categorical, spatial concentrative, and spatial positional processes. All were judged appropriately diagnostic of task demands. Three other workload items were found to be significant in some but not all conditions of the study. While all three had relatively low ratings (less than 2 units in all conditions, on the original 4-unit scale), they were also viewed as diagnostic. For example, the facial motive item was significant when the simulator was used with a TV display, corresponding to the observation that participants grimaced when performing under such conditions.
Finomore et al. (2006) reported that of 8 "at least 50%" resources identified by the MRQ in a vigilance task, 6 matched expectations for such a task. They concluded that the instrument was diagnostic and possessed content validity.
Boles et al. (2007), Experiment 2, also presented diagnosticity data. Three computer-based games were found to differ in resource demand as measured by the MRQ, in several ways that matched game characteristics. For example, two continuous-navigation games, Greebles and Super Maze Wars, placed significantly greater demand on the manual resource than did Word Tracer, a task requiring intermittent responses. Conversely, Word Tracer placed significantly higher demand on the spatial emergent and visual lexical resources, presumably because it required "picking out" letters from a matrix and forming words.
Finomore et al. (2008) further supported the diagnosticity of the MRQ by showing that in a visual vigilance task, more visual resources were identified by the "at least 50%" criterion than were identified in an auditory vigilance task. Manual and short-term memory resources were also identified in both tasks, and both were viewed by the researchers as diagnostic.
Finally, Finomore et al. (2009) pointed out that in their barrel length spatial discrimination study, as well as in a previous study emphasizing spatial processing (Finomore et al., 2006), it was the spatial items on the MRQ that were dominant in the workload estimates. Yet this was not true in a study involving detection of changes in the duration of auditory and visual pulses (Finomore et al., 2008). They concluded that these differential outcomes speak well for the content validity of the MRQ.
Together these findings suggest that the MRQ is substantially diagnostic of the specific mental demands imposed by a variety of tasks. However, most if not all of the supportive studies have a retroactive quality to them. This could be addressed in future studies by predicting the resources that will vary in usage in response to task manipulations.
Limitations
Although the MRQ has proved useful in a variety of lab task, computer gaming, vigilance, and simulated surgery situations, involving both single-task and dual-task performance, there are some recognized limitations of the instrument.
One limitation concerns the interpretability of the MRQ as a relative versus absolute measure of workload. Boles, Phillips, Perdelwitz, and Bursk (2004) showed that individual regressions of an MRQ-derived similarity metric and the amount of dual-task interference showed wide variability, with some research participants showing a rapid increase of interference, and others only a modest increase, with the same increase in task similarity. Boles et al. (2007) suggested that this indicates that the MRQ, like other workload measures, is a relative measure of workload, not an absolute one. In other words, it can be used to predict relative amounts of workload or dual-task interference between tasks but cannot be assumed to measure some absolute amount of mental work involved in a task. It was also pointed out that some interference appears to occur whenever two tasks are paired, regardless of their resource structure, supporting previous suggestions by others that there may be coordination costs in managing dual tasks, or some generalized resource that produces interference regardless of the characteristics of the tasks. By introducing influences on dual-task interference outside of resource structure, these factors also suggest that the MRQ is a relative measure of workload.
Finomore et al. (2008) highlighted a resource limitation within the MRQ, namely a paucity of auditory resources compared to visual resources. Although the auditory emotional resource was found to be engaged at high auditory event rates, presumably reflecting the participants' own emotional reaction to closely-spaced events, the MRQ really contains no items reflecting the white noise stimuli used in the study, or (for example) pertaining to auditory attention. This is a reflection of a broader limitation deriving from the origin of the MRQ in factor analyses of lateralized processes. Processes that were not investigated in the original factor analytic studies were generally not considered for inclusion in the MRQ, with the exception of 3 resources derived from the dual-task literature (the manual process, short term memory, and vocalization).
Recognizing this limitation, Boles and Adair (2001b) suggested that other task-related resources could be added to the MRQ if believed appropriate to a particular work environment. Boles et al. (2007) and Boles and Phillips (2007) elaborated this position, suggesting that additional items can be added if motivated by valid multiple resource considerations, and envisioning the possibility of a future in which there is an emergence of different versions of the MRQ, each emphasizing resources most important to a particular work domain.
Return to David B. Boles faculty website
Return to General professional website
References
Boles, D.B. (1991). Factor analysis and the cerebral hemispheres: Pilot study and parietal functions. Neuropsychologia, 29, 59-91. [abstract & access]
Boles, D.B. (1992). Factor analysis and the cerebral hemispheres: Temporal, occipital, and frontal functions. Neuropsychologia, 30, 963-988. [abstract & access]
Boles, D.B. (1996). Factor analysis and the cerebral hemispheres: "Unlocalized" functions. Neuropsychologia, 34, 723-736. [abstract & access]
Boles, D.B. (2001). Multiple resources. In W. Karwowski (Ed.), International Encyclopedia of Ergonomics and Human Factors. Taylor & Francis, 271-275. [volume preview & ordering]
Boles, D.B. (2002). Lateralized spatial processes and their lexical implications. Neuropsychologia, 40, 2125-2135. [abstract & access]
Boles, D.B. (2006). Multiple resources. In W. Karwowski (Ed.), International Encyclopedia of Ergonomics and Human Factors. Taylor & Francis, 2nd edition, v. 1, 442-448. [partial chapter & ordering]
Boles, D.B., & Adair, L.P. (2001a). The Multiple Resources Questionnaire (MRQ). Proceedings of the Human Factors and Ergonomics Society, 45, 1790-1794. [abstract & access]
Boles, D.B., & Adair, L.P. (2001b). Validity of the Multiple Resources Questionnaire (MRQ). Proceedings of the Human Factors and Ergonomics Society, 45, 1795-1799. [abstract & access]
Boles, D.B., Bursk, J.H., Phillips, J.B., & Perdelwitz, J.R. (2007). Predicting dual-task performance with The Multiple Resources Questionnaire (MRQ). Human Factors, 49, 32-45. [abstract & access]
Boles, D.B. & Law, M.B. (1998). A simultaneous task comparison of differentiated and undifferentiated hemispheric resource theories. Journal of Experimental Psychology: Human Perception and Performance, 24, 204-215. [abstract link & access]
Boles, D.B., & Phillips, D.B. (2007). A reply to the methodological and theoretical concerns of Vidulich and Tsang. Human Factors, 49, 50-52. [access]
Boles, D.B., Phillips, J.B., Perdelwitz, J.R., & Bursk, J.H. (2004). Application of the Multiple Resources Questionnaire (MRQ) to a complex gaming environment. Proceedings of the Human Factors and Ergonomics Society, 48, 1968-1972. [abstract & access] [Note: Incorrect conclusions regarding workload metrics were drawn in this article because of a spreadsheet error.]
Fincannon, T.A., Evans, W., Phillips, E., Jentsch, F., & Keebler, J. (2009). The influence of team size and communication modality on team effectiveness with unmanned systems. Proceedings of the Human Factors and Ergonomics Society, 53, 419-423. [abstract & access]
Finomore, V.S., Shaw, T.H., Warm, J.S., Matthews, G., Riley, M.A., Boles, D.B., & Weldon, D. (2008). Measuring the workload of sustained attention: Further evaluation of the Multiple Resources Questionnaire. Proceedings of the Human Factors and Ergonomics Society, 52, 1209-1213. [abstract & access]
Finomore, V.S., Shaw, T.H., Warm, J.S., Matthews, G., Weldon, D., & Boles, D.B. (2009). On the workload of vigilance: Comparison of the NASA-TLX and the MRQ. Proceedings of the Human Factors and Ergonomics Society, 53, 1057-1061. [abstract & access]
Finomore, V.S., Warm, J.S., Matthews, G., Riley, M.A., Dember, W.N., Shaw, T.H., Ungar, N.R., & Scerbo, M.W. (2006). Measuring the workload of sustained attention. Proceedings of the Human Factors and Ergonomics Society, 50, 1614-1618. [abstract & access]
Klein, M.I., Lio, C.H., Grant, R., Carswell, C.M., & Strup, S. (2009). A mental workload study on the 2d and 3d viewing conditions of the da Vinci surgical robot. Proceedings of the Human Factors and Ergonomics Society, 53, 1186-1190. [abstract & access]
Klein, M.I., Riley, M.A., Warm, J.S., & Matthews, G. (2005). Perceived mental workload in an endoscopic surgery simulator. Proceedings of the Human Factors and Ergonomics Society, 49, 1014-1018. [abstract & access]
Klein, M.I., Warm, J.S., Riley, M.A., Matthews, G., Gaitonde, K., Donovan, J.F., & Doarn, C.R. (2008). Performance, stress, workload, and coping profiles in 1st year medical students' interaction with endoscopic/laparoscopic and robot-assisted surgical techniques. Proceedings of the Human Factors and Ergonomics Society, 52, 885-889. [abstract & access]
Phillips, J.B., & Boles, D.B. (2004). Multiple Resources Questionnaire and Workload Profile: Application of competing models to subjective workload measurement. Proceedings of the Human Factors and Ergonomics Society, 48, 1963-1967. [abstract & access] [Note: Incorrect conclusions regarding workload metrics were drawn in this article because of a spreadsheet error.]
Vidulich, M.A., & Tsang, P.S. (2007). Methodological and theoretical concerns in multitask performance: A critique of Boles, Bursk, Phillips, and Perdelwitz. Human Factors, 49, 46-49. [access]
Wickens, C.D. (1984). Engineering Psychology and Human Performance. Columbus, OH, Charles E. Merrill Publishing Co.
Wickens, C.D. (2007). How many resources and how to identify them? Commentary on Boles et al. and Vidulich and Tsang. Human Factors, 49, 53-56. [access]
ver. 5/10