Convolutional Neural Network Models for Subcortical Processing of Facial Expression
概要
Title
Convolutional Neural Network Models for
Subcortical Processing of Facial Expression
Author(s)
Lim, Chanseok
Citation
大阪大学, 2023, 博士論文
Version Type VoR
URL
rights
https://doi.org/10.18910/92999
Reproduced with permission from Springer Nature
Note
Osaka University Knowledge Archive : OUKA
https://ir.library.osaka-u.ac.jp/
Osaka University
様式3
論
氏
名
文
(
内
容
林
の
要
旨
燦碩
)
Convolutional Neural Network Models for Subcortical Processing of Facial Expression
論文題名
(皮質下顔表情処理のための畳込みニューラルネットワークモデル)
論文内容の要旨
Perception of facial expression is crucial in the social life of primates. This visual information is processed along the ventral
cortical pathway and the subcortical pathway. They process information in parallel, and finally meet at the amygdala. The
ventral cortical pathway consists of a network of areas in the occipito-temporal region of the cerebral cortex (e.g., visual areas
V1, V2, V4, and inferior temporal cortex, IT), and its processing of face information is slow but precise. The subcortical
pathway is composed of a few processing stages of phylogenetically ancient regions; the superior colliculus in the midbrain,
the pulvinar nucleus of the posterior thalamus, and the amygdala in the medial limbic system. The subcortical processing is
fast but coarse. Though the difference of processing speeds is explained by the difference in the number of processing stages
between the two pathways, it is unclear whether the difference in the number of stages also leads to the difference of
processing granularities. What computational properties in the subcortical pathway make its processing coarse-grained? What
computational models faithfully mimic the subcortical processing?
To address these questions, I constructed convolutional neural networks incorporating three prominent properties of the
subcortical pathway; shallow layer architecture, concentric receptive fields at the first stage, and a greater degree of spatial
pooling. I trained these networks, referred to as shallow neural networks (SNNs), and their modified versions to classify seven
facial expressions (angry, disgusted, fearful, happy, sad, surprised, and neutral), analyzed their performance, and examined the
internal representation of spatial frequency (SF) information across computational units of the final processing layer.
The SNNs can be trained to classify the seven facial expressions with a correct rate of 51% (chance level, 14%). The
performance was well above chance, but substantially below perfect. This modest performance was gradually improved by
replacing the three properties, one-by-one, two together, or all three together, with the corresponding features in the cortical
pathway; additional convolution layer, Gabor-type filters at the first convolution layer, and narrower pooling windows. The
results indicate that all three subcortical features are essential for the coarse processing. The effects of the three features on the
classification performance were partially additive, suggesting that the three features exerted their effects partially
independently.
A previous study (Inagaki and Fujita, 2011) revealed a prominent difference in the reference frame of neuronal tuning to SFs
between the two pathways. Neurons in the IT, the final stage of the ventral cortical pathway, are tuned to object-based SFs
(cycles/object) and represent face patterns in a size-invariant, hence distance-invariant, manner.
In contrast, responses of a
major population of amygdala neurons are affected by retina-based SFs (cycles/degree). Some units in the final layer of the
SNNs were sensitive to SFs in the retina-based reference frame, whereas others were sensitive to object-based SF, in a similar
way to neurons in the amygdala. Replacement of any one of the three properties changed the reference frames of units in the
final layer. The modified models with added layers or Gabor-type filters reduced the units with the retina-based SF selectivity.
On the other hand, units in the models with the shallowness, DoG-type filters, and the narrow pooling showed various
selectivities between the two reference frames. The results suggest that both shallow architecture and DoG-type filters were
necessary for preserving the sensitivity of the final layer units to the retina-based SFs, and a greater pooling window hindered
creation of the intermediate representation between the two reference frames.
In the SNNs and the narrow-pooling models, units tuned to low SFs encoded object-based SFs, whereas units tuned to high
SFs encoded the retina-based SFs. Reasoning that non-linear operation is a cause of these effects on the unit responses, I
developed a formal description of max pooling. My mathematical analysis revealed that the shift invariance by the max
pooling operation led to the size and homogenous invariance. I verified this effect with experimental analysis; outputs from the
pooling layer were more similar between the different sizes (i.e., became size-invariant) than the inputs, only when the inputs
were composed of low SFs.
I provided the first computational model for facial expression processing in the subcortical pathway. Despite the celebrated
success of deep neural networks (DNNs) in modeling visual processing in the ventral cortical pathway, it has remained unclear
whether and how the convolutional neural network architecture can be adapted to processing in the subcortical pathway. I
demonstrated that the SNNs implemented with the three computational properties of the subcortical pathway successfully learn
facial discrimination with a modest correct rate. The three properties are all essential for reproducing the modest performance
by V1-lesioned patients who discriminate facial expressions with the subcortical pathway. These properties also necessary for
reproducing the representation of SFs in the retina-based coordinate observed in a population of amygdala neurons. Research
interests on the role of the subcortical structures in cognitive functions has recently surged, but physiological data are still
much sparser for the subcortical structures than for the cerebral cortex. Computational approach such as the one I present here
is expected to complement the sparseness of the data and guide future research.
様式7
論文審査の結果の要旨及び担当者
氏
名
(
林
燦 碩
(職)
論文審査担当者
主
副
副
副
査
査
査
査
)
氏
教授
西本
伸志
教授
北澤
茂
教授
八木
健
准教授
田村
弘
名
論文審査の結果の要旨
本論文は、系統発生学的に古い領野(上丘―視床枕―扁桃体)から構成される皮質下視覚経路の顔
情報処理について、計算モデルを用いた解析と考察を行ったものである。皮質下視覚経路は、大脳
視覚経路と比較して、より浅い階層構造によって構成されること、同心円状の初期層情報表現を持
つこと、より広い空間的なプーリングが行われていること、などの性質を持つ。本論文では、これ
らの性質をもつ畳み込みニューラルネットワークの構築・学習を行い、同ネットワークが獲得する
特性の定量解析を行った。これにより、同ネットワークが顔表現や空間周波数選択性において生理
学的な知見を再現すること、低空間周波数帯の視覚情報の扱いが重要であること、これらのネット
ワーク特性の獲得に上記の性質が重要な役割を示すこと、などの結果を得た。
本論文は、皮質下視覚経路における計算論に新たな知見をもたらすものであり、博士の学位を授与
するに値するものと認める。
なお、チェックツール“iThenticate”を使用し、剽窃、引用漏れ、二重投稿等のチェックを終え
ていることを申し添えます。