Measuring Intrinsic Quality of Human Decisions

Tamal T. Biswas

Measuring Intrinsic Quality of Human Decisions

Tamal T. Biswas

2015

Abstract

Research on judging decisions made by fallible (human) agents is not as much advanced as research on finding optimal decisions, and on supervision of AI agents´ decisions by humans. Human decisions are often influenced by various factors, such as risk, uncertainty, time pressure, and depth of cognitive capability, whereas decisions by an AI agent can be effectively optimal without these limitations. The concept of `depth´, a well-defined term in game theory (including chess), does not have a clear formulation in decision theory. To quantify ´depth´ in decision theory, we can configure an AI agent of supreme competence to `think´ at depths beyond the capability of any human, and in the process collect evaluations of decisions at various depths. One research goal is to create an intrinsic measure of the depth of thinking required to answer certain test questions, toward a reliable means of assessing their difficulty apart from item-response statistics. We relate the depth of cognition by humans to depths of search, and use this information to infer the quality of decisions made, so as to judge the decision-maker from his decisions. Our research extends the model of Regan and Haworth to quantify depth, plus related measures of complexity and difficulty, in the context of chess. We use large data from real chess tournaments and evaluations from chess programs (AI agents) of strength beyond all human players. We then seek to transfer the results to other decision-making fields in which effectively optimal judgements can be obtained from either hindsight, answer banks, or powerful AI agents. In some applications, such as multiple-choice tests, we establish an isomorphism of the underlying mathematical quantities, which induces a correspondence between various measurement theories and the chess model. We provide results toward the objective of applying the correspondence in reverse to obtain and quantify measure of depth and difficulty for multiple-choice tests, stock market trading, and other real-world applications.

References

Allis, L. V. (1994). Searching for solutions in games and artificial intelligence. PhD thesis, Rijksuniversiteit Maastricht, Maastricht, the Netherlands.
Andersen, E. (1973). Conditional inference for multiplechoice questionnaires. Brit. J. Math. Stat. Psych., 26:31-44.
Andrich, D. (1978). A rating scale formulation for ordered response categories. Psychometrika, 43:561-573.
Andrich, D. (1988). Rasch Models for Measurement. Sage Publications, Beverly Hills, California.
Baker, F. (2004). Item response theory : parameter estimation techniques. Marcel Dekker, New York.
Baker, F. B. (2001). The Basics of Item Response Theory. ERIC Clearinghouse on Assessment and Evaluation.
Biswas, T. and Regan, K. (2015). Quantifying depth and complexity of thinking and knowledge. In proceedings, International Conference on Agents and Artificial Intelligence (ICAART).
Busemeyer, J. R. and Townsend, J. T. (1993). Decision field theory: a dynamic-cognitive approach to decision making in an uncertain environment. Psychological review, 100(3):432.
Chabris, C. and Hearst, E. (2003). Visualization, pattern recognition, and forward search: Effects of playing speed and sight of the position on grandmaster chess errors. Cognitive Science, 27:637-648.
DiFatta, G., Haworth, G., and Regan, K. (2009). Skill rating by Bayesian inference. In Proceedings, 2009 IEEE Symposium on Computational Intelligence and Data Mining (CIDM'09), Nashville, TN, March 30-April 2, 2009, pages 89-94.
Fox, C. R. (1999). Strength of evidence, judged probability, and choice under uncertainty. Cognitive Psychology, 38(1):167-189.
Fox, C. R. and Tversky, A. (1998). A belief-based account of decision under uncertainty. Management Science, 44(7):879-895.
Gladwell, M. (2002). The tipping point : how little things can make a big difference. Back Bay Books, Boston.
Gladwell, M. (2011). Outliers : the story of success. Back Bay Books, New York.
Kahneman, D. and Tversky, A. (1979). Prospect theory: An analysis of decision under risk. Econometrica: Journal of the Econometric Society, pages 263-291.
Linacre, J. M. (2006). Rasch analysis of rank-ordered data. JOURNAL OF APPLIED MEASUREMENT.
Loomes, G. and Sugden, R. (1982). Regret theory: An alternative theory of rational choice under uncertainty. The Economic Journal, pages 805-824.
Lopes, L. L. (1987). Between hope and fear: The psychology of risk. Advances in experimental social psychology, 20:255-295.
Maas, H. v. d. and Wagenmakers, E.-J. (2005). A psychometric analysis of chess expertise. American Journal of Psychology, 118:29-60.
Masters, G. (1982). A Rasch model for partial credit scoring. Psychometrika, 47:149-174.
Mauboussin, M. (2013). More than you know : finding financial wisdom in unconventional places. Columbia University Press, New York.
Morris, G. A., Branum-Martin, L., Harshman, N., Baker, S. D., Mazur, E., Dutta, S. N., Mzoughi, T., and McCauley, V. (2005). Testing the test: Item response curves and test quality. Am. J. Phys., 74:449-453.
Muraki, E. (1992). A generalized partial credit model: Application of an em algorithm. Applied psychological measurement, 16(2):159-176.
Ostini, R. and Nering, M. (2006). Polytomous Item Response Theory Models. Sage Publications, Thousand Oaks, California.
Rasch, G. (1960). Probabilistic models for for some intelligence and attainment tests. Danish Institute for Educational Research, Copenhagen.
Rasch, G. (1961). On general laws and the meaning of measurement in psychology. In Proceedings, Fourth Berkeley Symposium on Mathematical Statistics and Probability, pages 321-334. University of California Press.
Regan, K. and Biswas, T. (2013). Psychometric modeling of decision making via game play. In proceedings,IEEE Conference on Computational Intelligence in Games.
Regan, K., Biswas, T., and Zhou, J. (2014). Human and computer preferences at chess. In Proceedings of the 8th Multidisciplinary Workshop on Advances in Preference Handling (MPref 2014).
Regan, K. and Haworth, G. (2011). Intrinsic chess ratings. In Proceedings of AAAI 2011, San Francisco.
Regan, K., Macieja, B., and Haworth, G. (2011). Understanding distributions of chess performances. In Proceedings of the 13th ICGA Conference on Advances in Computer Games. Tilburg, Netherlands.
Shannon, C. E. (1950). Xxii. programming a computer for playing chess. Philosophical magazine, 41(314):256- 275.
Thorpe, G. L. and Favia, A. (2012). Data analysis using item response theory methodology: An introduction to selected programs and applications. Psychology Faculty Scholarship, page 20.
Tversky, A. and Fox, C. R. (1995). Weighing risk and uncertainty. Psychological review, 102(2):269.
Wichmann, F. and Hill, N. J. (2001). The psychometric function: I. Fitting, sampling, and goodness of fit. Perception and Psychophysics, 63:1293-1313.
WikiBooks (2012). Bestiary of behavioral economics/satisficing - Wikibooks, the free textbook project. [Online; accessed 7-August-2014].

Download

Paper Citation

in Harvard Style

T. Biswas T. (2015). Measuring Intrinsic Quality of Human Decisions . In Doctoral Consortium - DCAART, (ICAART 2015) ISBN , pages 40-51

in Bibtex Style

@conference{dcaart15,
author={Tamal T. Biswas},
title={Measuring Intrinsic Quality of Human Decisions},
booktitle={Doctoral Consortium - DCAART, (ICAART 2015)},
year={2015},
pages={40-51},
publisher={SciTePress},
organization={INSTICC},
doi={},
isbn={},
}

in EndNote Style

TY - CONF
JO - Doctoral Consortium - DCAART, (ICAART 2015)
TI - Measuring Intrinsic Quality of Human Decisions
SN -
AU - T. Biswas T.
PY - 2015
SP - 40
EP - 51
DO -