論文の公開元へ

書き出し

Refer/BibIX

RIS

BibTeX

TSV

Convolutional Neural Network Models for Subcortical Processing of Facial Expression

Lim, Chanseok 大阪大学 DOI:10.18910/92999

2023.09.25

概要

Title

Convolutional Neural Network Models for
Subcortical Processing of Facial Expression

Author(s)

Lim, Chanseok

Citation

大阪大学, 2023, 博士論文

Version Type VoR
URL
rights

https://doi.org/10.18910/92999
Reproduced with permission from Springer Nature

Note

Osaka University Knowledge Archive : OUKA
https://ir.library.osaka-u.ac.jp/
Osaka University

様式３

論
氏

名

文
（

内

容
林

の

要

旨

燦碩

）

Convolutional Neural Network Models for Subcortical Processing of Facial Expression

論文題名

（皮質下顔表情処理のための畳込みニューラルネットワークモデル）

論文内容の要旨
Perception of facial expression is crucial in the social life of primates. This visual information is processed along the ventral
cortical pathway and the subcortical pathway. They process information in parallel, and finally meet at the amygdala. The
ventral cortical pathway consists of a network of areas in the occipito-temporal region of the cerebral cortex (e.g., visual areas
V1, V2, V4, and inferior temporal cortex, IT), and its processing of face information is slow but precise. The subcortical
pathway is composed of a few processing stages of phylogenetically ancient regions; the superior colliculus in the midbrain,
the pulvinar nucleus of the posterior thalamus, and the amygdala in the medial limbic system. The subcortical processing is
fast but coarse. Though the difference of processing speeds is explained by the difference in the number of processing stages
between the two pathways, it is unclear whether the difference in the number of stages also leads to the difference of
processing granularities. What computational properties in the subcortical pathway make its processing coarse-grained? What
computational models faithfully mimic the subcortical processing?
To address these questions, I constructed convolutional neural networks incorporating three prominent properties of the
subcortical pathway; shallow layer architecture, concentric receptive fields at the first stage, and a greater degree of spatial
pooling. I trained these networks, referred to as shallow neural networks (SNNs), and their modified versions to classify seven
facial expressions (angry, disgusted, fearful, happy, sad, surprised, and neutral), analyzed their performance, and examined the
internal representation of spatial frequency (SF) information across computational units of the final processing layer.
The SNNs can be trained to classify the seven facial expressions with a correct rate of 51% (chance level, 14%). The
performance was well above chance, but substantially below perfect. This modest performance was gradually improved by
replacing the three properties, one-by-one, two together, or all three together, with the corresponding features in the cortical
pathway; additional convolution layer, Gabor-type filters at the first convolution layer, and narrower pooling windows. The
results indicate that all three subcortical features are essential for the coarse processing. The effects of the three features on the
classification performance were partially additive, suggesting that the three features exerted their effects partially
independently.
A previous study (Inagaki and Fujita, 2011) revealed a prominent difference in the reference frame of neuronal tuning to SFs
between the two pathways. Neurons in the IT, the final stage of the ventral cortical pathway, are tuned to object-based SFs
(cycles/object) and represent face patterns in a size-invariant, hence distance-invariant, manner.

In contrast, responses of a

major population of amygdala neurons are affected by retina-based SFs (cycles/degree). Some units in the final layer of the
SNNs were sensitive to SFs in the retina-based reference frame, whereas others were sensitive to object-based SF, in a similar
way to neurons in the amygdala. Replacement of any one of the three properties changed the reference frames of units in the
final layer. The modified models with added layers or Gabor-type filters reduced the units with the retina-based SF selectivity.
On the other hand, units in the models with the shallowness, DoG-type filters, and the narrow pooling showed various
selectivities between the two reference frames. The results suggest that both shallow architecture and DoG-type filters were
necessary for preserving the sensitivity of the final layer units to the retina-based SFs, and a greater pooling window hindered
creation of the intermediate representation between the two reference frames.

In the SNNs and the narrow-pooling models, units tuned to low SFs encoded object-based SFs, whereas units tuned to high
SFs encoded the retina-based SFs. Reasoning that non-linear operation is a cause of these effects on the unit responses, I
developed a formal description of max pooling. My mathematical analysis revealed that the shift invariance by the max
pooling operation led to the size and homogenous invariance. I verified this effect with experimental analysis; outputs from the
pooling layer were more similar between the different sizes (i.e., became size-invariant) than the inputs, only when the inputs
were composed of low SFs.
I provided the first computational model for facial expression processing in the subcortical pathway. Despite the celebrated
success of deep neural networks (DNNs) in modeling visual processing in the ventral cortical pathway, it has remained unclear
whether and how the convolutional neural network architecture can be adapted to processing in the subcortical pathway. I
demonstrated that the SNNs implemented with the three computational properties of the subcortical pathway successfully learn
facial discrimination with a modest correct rate. The three properties are all essential for reproducing the modest performance
by V1-lesioned patients who discriminate facial expressions with the subcortical pathway. These properties also necessary for
reproducing the representation of SFs in the retina-based coordinate observed in a population of amygdala neurons. Research
interests on the role of the subcortical structures in cognitive functions has recently surged, but physiological data are still
much sparser for the subcortical structures than for the cerebral cortex. Computational approach such as the one I present here
is expected to complement the sparseness of the data and guide future research.

様式７

論文審査の結果の要旨及び担当者
氏

名

（

林

燦碩

（職）

論文審査担当者

主
副
副
副

査
査
査
査

）

氏

教授

西本

伸志

教授

北澤

茂

教授

八木

健

准教授

田村

弘

名

論文審査の結果の要旨
本論文は、系統発生学的に古い領野（上丘―視床枕―扁桃体）から構成される皮質下視覚経路の顔
情報処理について、計算モデルを用いた解析と考察を行ったものである。皮質下視覚経路は、大脳
視覚経路と比較して、より浅い階層構造によって構成されること、同心円状の初期層情報表現を持
つこと、より広い空間的なプーリングが行われていること、などの性質を持つ。本論文では、これ
らの性質をもつ畳み込みニューラルネットワークの構築・学習を行い、同ネットワークが獲得する
特性の定量解析を行った。これにより、同ネットワークが顔表現や空間周波数選択性において生理
学的な知見を再現すること、低空間周波数帯の視覚情報の扱いが重要であること、これらのネット
ワーク特性の獲得に上記の性質が重要な役割を示すこと、などの結果を得た。
本論文は、皮質下視覚経路における計算論に新たな知見をもたらすものであり、博士の学位を授与
するに値するものと認める。
なお、チェックツール“iThenticate”を使用し、剽窃、引用漏れ、二重投稿等のチェックを終え
ていることを申し添えます。

論文の公開元へ

この論文で使われている画像

参考文献

Publication (international)

Lim, C., Inagaki, M., Shinozaki, T. and Fujita, I. Analysis of convolutional neural

networks reveals the computational properties essential for subcortical processing of facial

expression. Scientific Reports, 13(1), 10908 (2023).

Publication (domestic)

林燦碩, 稲垣未来男, 篠崎隆志, 藤⽥⼀郎. 畳み込みニューラルネットワークを⽤い

た⽪質下経路の粗い情報処理の説明. 信学技報, vol. 121, no. 271, NC2021-28, pp. 1-6, 2021 年

11 ⽉.

⼩松優介, 稲垣未来男, 林燦碩, 篠崎隆志, 藤⽥⼀郎. 顔表情弁別を⾏う畳み込みニュ

ーラルネットワークの内部における空間周波数特性. 信学技報, vol. 118, no. 367, NC2018-29,

pp. 5-10, 2018 年 12 ⽉.

林燦碩, 稲垣未来男, ⼩松優介, 篠崎隆志, 藤⽥⼀郎. CNN における扁桃体細胞類似

特性獲得のための視覚体験的学習法. 信学技報, vol. 118, no. 322, NC2018-24, pp. 5-10, 2018 年

11 ⽉.

Presentation (international)

Lim, C., Inagaki, M., Shinozaki, T. and Fujita, I. Why is Visual Information Conveyed

by the Subcortical Pathway Coarse? The 7th CiNet Conference: New horizons in brain mapping,

online, 2022/02/01.

Presentation (domestic)

稲垣未来男, 佐藤優, 渡辺直樹, 林燦碩, ⼩松優介, 篠崎隆志, 藤⽥⼀郎. ⼤脳⽪質経路

と⽪質下経路が表情認識に与える影響：畳み込みニューラルネットワークと⼼理実験によ

る検討. ⽇本視覚学会 2022 年夏季⼤会, ⾦沢⼤学, 2022 年 9 ⽉ 5 ⽇.

林燦碩, 稲垣未来男, 篠崎隆志, 藤⽥⼀郎. 畳み込みニューラルネットワークを⽤い

た⽪質下経路の粗い情報処理の説明. ニューロコンピューティング研究会, オンライン開催,

2021 年 11 ⽉ 26 ⽇.

⾚星宏知, 稲垣未来男, 林燦碩, ⼩松優介, 篠崎隆志, 藤⽥⼀郎. ⼤脳⽪質経路と⽪質

下経路の表情表現：畳み込みニューラルネットワークを⽤いた⽐較. ⽇本視覚学会 2021 年

夏季⼤会, オンライン開催, 2021 年 9 ⽉ 22 ⽇.

渡辺直樹, 稲垣未来男, 林燦碩, ⼩松優介, 篠崎隆志, 藤⽥⼀郎. ⼤脳⽪質と⽪質下に

おける恐怖表情処理の⽐較: 畳み込みニューラルネットワークによる検討. ⽇本視覚学会

2020 年夏季⼤会, オンライン開催, 2020 年 9 ⽉ 17 ⽇.

⼩松優介, 稲垣未来男, 林燦碩, 篠崎隆志, 藤⽥⼀郎. 顔表情弁別を⾏う畳み込みニュ

ーラルネットワークの内部における空間周波数特性. ニューロコンピューティング研究会,

名古屋⼯業⼤学, 2018 年 12 ⽉ 15 ⽇.

林燦碩, 稲垣未来男, ⼩松優介, 篠崎隆志, 藤⽥⼀郎. CNN における扁桃体細胞類似

特性獲得のための視覚体験的学習法. ニューロコンピューティング研究会, 京都⼤学, 2018 年

11 ⽉ 22 ⽇.

稲垣未来男, 林燦碩, ⼩松優介, 篠崎隆志, 藤⽥⼀郎. ⼤脳⽪質経路と⽪質下経路にお

ける顔処理のモデル化の試み. ⽇本視覚学会 2018 年夏季⼤会, ⽂部科学省研究交流センター,

2018 年 8 ⽉ 2 ⽇.

Lim, C., Inagaki, M., Komatsu, Y., Shinozaki, T., and Fujita, I. A CNN model for the

subcortical pathway acquires amygdala neuron-like properties when trained in human-like

experiences. 第 41 回⽇本神経科学⼤会, 神⼾コンベンションセンター, 2018 年 7 ⽉ 28.

Komatsu, Y, Inagaki, M., Lim, C., Shinozaki, T., and Fujita, I. Internal representation

of convolutional neural network in classifying facial expression. 第 41 回⽇本神経科学⼤会, 神

⼾コンベンションセンター, 2018 年 7 ⽉ 28 ⽇.

Grants (completed)

Apr. 2020 – Mar. 2023

Grant-in-Aid for JSPS Fellows (DC1). Funding Agency: Japan Society

for the Promotion of Science (JSPS). Project number: 20J22553. Tittle:

霊⻑類扁桃体への迅速な脅威信号伝達の視覚経路における顔表

情検出原理の解明 . Role: PI. Amount: 200,000 JPY/month +

3,100,000 JPY (Direct: 3,100,000 JPY).

Awards

林燦碩, 稲垣未来男, 篠崎隆志, 藤⽥⼀郎. 優秀研究賞. ⽇本神経回路学会. 2022 (2021

年度).

Chanseok Lim. Young Researcher Award (IEICE Neurocomputing). IEEE Computational

Intelligence Society Japan Chapter. 2022 (2021 年度).

⼩松優介, 稲垣未来男, 林燦碩, 篠崎隆志, 藤⽥⼀郎. 最優秀研究賞. ⽇本神経回路学会.

2019 (2018 年度).

林燦碩. ⼤阪⼤学楠本賞（学科⾸席）. ⼤阪⼤学. 2018 (2017 年度).

...

参考文献をもっと見る

分野

大学

学位論文種類・取得年

言語

Convolutional Neural Network Models for Subcortical Processing of Facial Expression

概要

この論文で使われている画像

関連論文

Functional MRI Studies on Cross-Correlation and Cross-Matching Binocular Disparity Representations across the Human Visual Cortex

ワーキングメモリ処理における位相移動エントロピーを用いた方向情報フロー

脳波、NIRSにおけるマルチスケールエントロピーによる脳の複雑性解析

Understanding the non-linear functional systems of neural networks at multiple scales with dimensionality reduction techniques

Identification and analysis of functional neural circuits that regulate behavioral strategies for thermotaxis in the nematode Caenorhabditis elegans

参考文献

分野

大学

学位論文種類・取得年

言語

コピーが完了しました

URLをコピーしました

Convolutional Neural Network Models for Subcortical Processing of Facial Expression

概要

この論文で使われている画像

関連論文

Functional MRI Studies on Cross-Correlation and Cross-Matching Binocular Disparity Representations across the Human Visual Cortex

ワーキングメモリ処理における位相移動エントロピーを用いた方向情報フロー

脳波、NIRSにおけるマルチスケールエントロピーによる脳の複雑性解析

Understanding the non-linear functional systems of neural networks at multiple scales with dimensionality reduction techniques

Identification and analysis of functional neural circuits that regulate behavioral strategies for thermotaxis in the nematode Caenorhabditis elegans

参考文献