論文の公開元へ

書き出し

Refer/BibIX

RIS

BibTeX

TSV

Decoding reward–curiosity conflict in decision-making from irrational behaviors

Konaka, Yuki Naoki, Honda 京都大学 DOI:10.1038/s43588-023-00439-w

2023.05

概要

Humans and animals are not always rational. They not only rationally exploit rewards but also explore an environment owing to their curiosity. However, the mechanism of such curiosity-driven irrational behavior is largely unknown. Here, we developed a decision-making model for a two-choice task based on the free energy principle, which is a theory integrating recognition and action selection. The model describes irrational behaviors depending on the curiosity level. We also proposed a machine learning method to decode temporal curiosity from behavioral data. By applying it to rat behavioral data, we found that the rat had negative curiosity, reflecting conservative selection sticking to more certain options and that the level of curiosity was upregulated by the expected future information obtained from an uncertain environment. Our decoding approach can be a fundamental tool for identifying the neural basis for reward–curiosity conflicts. Furthermore, it could be effective in diagnosing mental disorders.

論文の公開元へ

この論文で使われている画像

参考文献

1. 2. 3. Estimation of parameters in iFEP

The ReCU model has several parameters: σ2w, α, β, Po and ϵ. In the estimation, we set ϵ to 1, which was the optimal value for estimation in the

artificial data (Supplementary Fig. 3). We assumed the unit intensity

of reward, that is, ln Po /(1 − Po ) = 1 , because it is impossible to estimate

both Po and β caused by multiplying β and ln Po /(1 − Po ) in the expected

net utility (equations (30) and (42)). This treatment is suitable for relative comparison between the curiosity meta-parameter and the reward.

In addition, we addressed βct as a latent variable as ĉt = βct because of

the multiplication of β in the expected net utility (equations (30)

and (42)). Thus, the estimation of ct can be obtained by dividing the

estimated ĉt by the estimated β. Therefore, the hyperparameters to be

estimated were σ2w, α and β.

To estimate these parameters θ = {σ2w , α, β} , we extended the

observer-SSM to a self-organizing SSM44 in which θ was addressed as

constant latent variables:

P (zt , θ|x1∶t ) ∝ P (xt |zt ) ∫P (zt |zt−1 , θ) P (zt−1 , θ|x1∶t−1 ) dzt−1 ,

(78)

where P (θ) = Uni (σ2 |aσ , bσ ) Uni(α|aα , bα )𝒩𝒩𝒩β|mβ , vβ ). To sequentially calculate the posterior P (zt , θ|x1∶t ) using the particle filter, we used

100,000 particles and augmented the state vector of all particles by

adding the parameter θ, which was not updated from randomly sampled initial values.

The hyperparameter values used in this estimation were μ0 = 0,

σ2μ = 0.012 , ag = 10, bg = 0.001, au = −15, bu = 15, aσ = 0.2, bσ = 0.7, aα = 0.04,

bα = 0.06, aβ = 0 and bβ = 50, which were heuristically given as parameters

correctly estimated using the artificial data (Supplementary Fig. 2).

4. 5. 6. 7. 8. 9. 10. 11. 12. 13. Statistical testing with Monte Carlo simulations

Supplementary Fig. 5 shows statistical testing of the negative curiosity

estimated in Fig. 5. A null hypothesis is that an agent has no curiosity

(that is, ct = 0) decides on a choice only depending on its recognition of

the reward probability. Under the null hypothesis, model simulations

were repeated 1,000 times under the same experimental conditions as

in Fig. 5 and the curiosity was estimated for each using iFEP. We adopted

the temporal average of the estimated curiosity as a test statistic and

plotted the null distribution of the test statistic. Compared with the

estimated curiosity of the rat behavior, we computed the P value for a

one-sided left-tailed test.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

Source data for Figs. 2, 3, 5 and 6 are available with this paper. Source

data for Supplementary Figures are available in Supplementary Data.

Nature Computational Science | Volume 3 | May 2023 | 418–432

14. 15. 16. 17. 18. 19. 20. 21. Helmholtz, H. Handbuch der Physiologischen Optik (Andesite

Press, 1867).

Yuille, A. & Kersten, D. Vision as Bayesian inference: analysis by

synthesis? Trends Cogn. Sci. 10, 301–308 (2006).

Millett, J. D. & Simon, H. A. Administrative behavior: a study of

decision-making processes in administrative organization.

Polit. Sci. Q. 62, 621 (1947).

Dubey, R. & Griffiths, T. L. Understanding exploration in humans

and machines by formalizing the function of curiosity. Curr. Opin.

Behav. Sci. 35, 118–124 (2020).

Kidd, C. & Hayden, B. Y. The psychology and neuroscience of

curiosity. Neuron 88, 449–460 (2015).

Klein, U. & Nowak, A. J. Characteristics of patients with autistic

disorder (AD) presenting for dental treatment: a survey and chart

review. Spec. Care Dentist. 19, 200–207 (1999).

Lockner, D. W., Crowe, T. K. & Skipper, B. J. Dietary intake and

parents’ perception of mealtime behaviors in preschoolage children with autism spectrum disorder and in typically

developing children. J. Am. Diet. Assoc. 108, 1360–1363 (2008).

Schreck, K. A. & Williams, K. Food preferences and factors

influencing food selectivity for children with autism spectrum

disorders. Res. Dev. Disabil. 27, 353–363 (2006).

Esposito, M. et al. Sensory processing, gastrointestinal symptoms

and parental feeding practices in the explanation of food

selectivity: clustering children with and without autism.

Int. J. Autism Relat. Disabil. 2, 1–12 (2019).

Hobson, R. P. Autism and the development of mind. Essays Dev.

Psychol. (Routledge, 1993).

Burke, R. Personalized recommendation of PoIs to people with

autism. Commun. ACM 65, 100 (2022).

Ghanizadeh, A. Educating and counseling of parents of children

with attention-deficit hyperactivity disorder. Patient Educ. Couns.

68, 23–28 (2007).

Sedgwick, J. A., Merwood, A. & Asherson, P. The positive

aspects of attention deficit hyperactivity disorder: a qualitative

investigation of successful adults with ADHD. ADHD Atten. Deficit

Hyperact. Disord. 11, 241–253 (2019).

Redshaw, R. & McCormack, L. ‘Being ADHD’: a qualitative study.

Adv. Neurodev. Disord. 6, 20–28 (2022).

Sutton, R. S. & Barto, A. G. Reinforcement Learning: an Introduction

(MIT Press, 1998).

Friston, K. A theory of cortical responses. Philos. Trans. R. Soc. B

360, 815–836 (2005).

Friston, K., Kilner, J. & Harrison, L. A free energy principle for the

brain. J. Physiol. Paris 100, 70–87 (2006).

Friston, K. The free-energy principle: a unified brain theory?

Nat. Rev. Neurosci. 11, 127–138 (2010).

Lindley, D. V. On a measure of the information provided by an

experiment. Ann. Math. Stat. 27, 986–1005 (1956).

MacKay, D. J. C. Information-based objective functions for active

data selection. Neural Comput. 4, 590–604 (1992).

Berger, J. O. Statistical Decision Theory and Bayesian Analysis,

Springer Series in Statistics (Springer, 2011).

431

Article

22. Friston, K. et al. Active inference and epistemic value. Cogn.

Neurosci. 6, 187–214 (2015).

23. Friston, K., FitzGerald, T., Rigoli, F., Schwartenbeck, P. & Pezzulo, G.

Active inference: a process theory. Neural Comput. 29, 1–49 (2017).

24. Attias, H. Planning by probabilistic inference in Proc. 9th Int. Work.

Artif. Intell. Stat. 4, 9–16 (2003).

25. Botvinick, M. & Toussaint, M. Planning as inference. Trends Cogn.

Sci. 16, 485–488 (2012).

26. Kaplan, R. & Friston, K. J. Planning and navigation as active

inference. Biol. Cybern. 112, 323–343 (2018).

27. Matsumoto, T. & Tani, J. Goal-directed planning for habituated

agents by active inference using a variational recurrent neural

network. Entropy 22, (2020).

28. Schwartenbeck, P. et al. Computational mechanisms of curiosity

and goal-directed exploration. eLife 8, 1–45 (2019).

29. Millidge, B., Tschantz, A. & Buckley, C. L. Whence the expected

free energy? Neural Comput. 33, 447–482 (2021).

30. Houthooft, R. et al. VIME: variational information maximizing

exploration. Adv. Neural Inf. Process. Syst. 0, 1117–1125 (2016).

31. Smith, R. et al. Greater decision uncertainty characterizes a

transdiagnostic patient sample during approach-avoidance

conflict: a computational modelling approach. J. Psychiatry

Neurosci. 46, E74–E87 (2021).

32. Smith, R. et al. Long-term stability of computational parameters

during approach-avoidance conflict in a transdiagnostic

psychiatric patient sample. Sci Rep. 11, 1–13 (2021).

33. Schwartenbeck, P. & Friston, K. Computational phenotyping in

psychiatry: a worked example. eNeuro 3, 1–18 (2016).

34. Daunizeau, J. et al. Observing the observer (I): meta-Bayesian models

of learning and decision-making. PLoS ONE 5, e15554 (2010).

35. Patzelt, E. H., Hartley, C. A. & Gershman, S. J. Computational

phenotyping: using models to understand individual differences

in personality, development, and mental illness. Personal.

Neurosci. 1, e18 (2018).

36. Ito, M. & Doya, K. Validation of decision-making models and

analysis of decision variables in the rat basal ganglia. J. Neurosci.

29, 9861–9874 (2009).

37. Samejima, K., Doya, K., Ueda, Y. & Kimura, M. Estimating internal

variables and parameters of a learning agent by a particle filter.

Adv. Neural Inf. Process. Syst. 16 (2003).

38. Samejima, K., Ueda, Y., Doya, K. & Kimura, M. Neuroscience:

representation of action-specific reward values in the striatum.

Science (80-.) 310, 1337–1340 (2005).

39. Ortega, P. A. & Braun, D. A. Thermodynamics as a theory of

decision-making with information-processing costs. Proc. R. Soc.

London. A 469, 20120683 (2013).

40. Gottwald, S. & Braun, D. A. The two kinds of free energy and the

Bayesian revolution. PLoS Comput. Biol. 16, (2020).

41. Parr, T. & Friston, K. J. Generalised free energy and active

inference. Biol. Cybern. 113, 495–513 (2019).

42. Kitagawa, G. Monte Carlo filter and smoother for non-Gaussian

nonlinear state space models. J. Comput. Graph. Stat. 5, 1–25 (1996).

43. Bishop, C. M. Pattern Recognition and Machine Learning

(Springer, 2006).

44. Kitagawa, G. A self-organizing state-space model. J. Am. Stat. 93,

1203–1215 (1998).

45. Konaka, Y. & Naoki, H. Codes for Konaka and Honda 2023. Zenodo

https://doi.org/10.5281/zenodo.7722905 (2023)

Nature Computational Science | Volume 3 | May 2023 | 418–432

https://doi.org/10.1038/s43588-023-00439-w

Acknowledgements

We are grateful to K. Doya and M. Ito for providing rat behavioral

data. We thank the organizers of the tutorial on the free energy

principle in 2019, which inspired this research, and I. Higashino

and M. Fujiwara-Yada for carefully checking all the equations in

the manuscript. This study was supported in part by a Grant-in-Aid

for Transformative Research Areas (B) (no. 21H05170), AMED

(grant no. JP21wm0425010), Moonshot R&D–MILLENNIA program

(grant no. JPMJMS2024-9) by JST, the Cooperative Study

Program of Exploratory Research Center on Life and Living Systems

(ExCELLS) (program no. 21-102) and the grant of Joint Research

by the National Institutes of Natural Sciences (NINS program no.

01112102).

Author contributions

H.N. conceived of the project. Y. K. and H.N. developed the method,

and Y.K. implemented the model simulation. Y.K. and H.N. wrote

the manuscript.

Competing interests

The authors declare no competing interests.

Additional information

Supplementary information The online version

contains supplementary material available at

https://doi.org/10.1038/s43588-023-00439-w.

Correspondence and requests for materials should be addressed to

Honda Naoki.

Peer review information Nature Computational Science thanks

Junichiro Yoshimoto and Karl Friston for their contribution to the

peer review of this work. Primary Handling Editors: Ananya Rastogi

and Jie Pan, in collaboration with the Nature Computational Science

team.

Reprints and permissions information is available at

www.nature.com/reprints.

Publisher’s note Springer Nature remains neutral with

regard to jurisdictional claims in published maps and

institutional affiliations.

Open Access This article is licensed under a Creative Commons

Attribution 4.0 International License, which permits use, sharing,

adaptation, distribution and reproduction in any medium or format,

as long as you give appropriate credit to the original author(s) and the

source, provide a link to the Creative Commons license, and indicate

if changes were made. The images or other third party material in this

article are included in the article’s Creative Commons license, unless

indicated otherwise in a credit line to the material. If material is not

included in the article’s Creative Commons license and your intended

use is not permitted by statutory regulation or exceeds the permitted

use, you will need to obtain permission directly from the copyright

holder. To view a copy of this license, visit http://creativecommons.

org/licenses/by/4.0/.

432

...

参考文献をもっと見る

分野

大学

学位論文種類・取得年

言語

Decoding reward–curiosity conflict in decision-making from irrational behaviors

概要

この論文で使われている画像

関連論文

Features and Performance of Sarsa Reinforcement Learning Algorithm with Eligibility Traces and Local Environment Analysis for Bots in First Person Shooter Games

進化ゲーム及び数理疫学に関する研究

契約農業の継続における両面モラルハザード問題：理論と定性的事例分析の統合的アプローチ

Identification of periodic attractors in Boolean networks using a priori information

Analysis of Coordination Structures of Partially Observing Cooperative Agents by Multi-Agent Deep Q-Learning

参考文献

分野

大学

学位論文種類・取得年

言語

コピーが完了しました

URLをコピーしました

Decoding reward–curiosity conflict in decision-making from irrational behaviors

概要

この論文で使われている画像

関連論文

Features and Performance of Sarsa Reinforcement Learning Algorithm with Eligibility Traces and Local Environment Analysis for Bots in First Person Shooter Games

進化ゲーム及び数理疫学に関する研究

契約農業の継続における両面モラルハザード問題：理論と定性的事例分析の統合的アプローチ

Identification of periodic attractors in Boolean networks using a priori information

Analysis of Coordination Structures of Partially Observing Cooperative Agents by Multi-Agent Deep Q-Learning

参考文献