Automated Handwriting Pattern Recognition for Multi-Level Personality

Classiﬁcation Using Transformer OCR (TrOCR)

Marzieh Adeli Shamsabad

and Ching Yee Suen

Centre for Pattern Recognition and Machine Intelligence (CENPARMI), Concordia University, Montreal, Quebec, Canada

Keywords:

Handwriting Analysis, Automated Feature Extraction, Imbalanced Dataset, Multi-Level Classiﬁcation,

TrOCR, Big Five Personality Traits.

Abstract:

Automated personality trait assessment from handwriting analysis offers applications in psychology, human-

computer interaction, and personal proﬁling. However, accurately classifying different levels of personality

traits remains challenging due to class imbalances in real-world datasets. This study addresses the issue by

comparing multi-class and multi-label binary classiﬁcation methods to predict levels of the Big Five person-

ality traits: Extraversion, Neuroticism, Agreeableness, Conscientiousness, and Openness, each categorized as

low, average, and high in an imbalanced dataset of 873 French handwriting samples. A new approach is intro-

duced by adapting the TrOCR pre-trained model for feature extraction, modifying its encoder to capture local

and global handwriting features relevant to personality classiﬁcation. This model is compared with three other

pre-trained models: ResNet50 and Vision Transformer base 16 with input resolutions of 224 and 384. Results

demonstrate that multi-label binary classiﬁcation, which treats each trait level as an independent binary task,

effectively addresses data imbalance, enhancing accuracy and generalization. The proposed TrOCR model

achieves the highest performance, with an accuracy of 84.26%, an F1-score of 83.26%, and an AUROC of

91% on the test set. These ﬁndings emphasize the effectiveness of the presented framework for automated

multi-level personality trait classiﬁcation from handwriting in imbalanced datasets.

1 INTRODUCTION

Personality represents the unique patterns of thoughts,

emotions, and behaviors that deﬁne an individual’s

character and inﬂuence their interactions with the

world (Costa and McCrae, 1997). Understanding

personality traits can provide insights into how in-

dividuals respond in various situations, shaping per-

sonal, social, and professional outcomes (Roberts

and Mroczek, 2008). Traditionally, personality as-

sessment is achieved through self-reported question-

naires, such as the Myers-Briggs Type Indicator

(MBTI) and the Big Five personality traits model,

which includes Openness, Conscientiousness, Ex-

traversion, Agreeableness, and Neuroticism. How-

ever, these methods are often time-consuming and can

be inﬂuenced by the individual’s self-awareness and

response biases (Alshouha et al., 2024). As a result,

alternative methods for personality prediction gained

attention like handwriting.

Handwriting analysis, or graphology, suggests

https://orcid.org/0009-0005-7680-6689

https://orcid.org/0000-0003-1209-7631

that an individual’s handwriting reﬂects their psycho-

logical state and personality traits. This idea arises

from the observation that handwriting is a psycho-

motor activity, where the brain guides the hand’s

movements, resulting in unique patterns in letter for-

mation, slant, pressure, and spacing. Thus, handwrit-

ing is shaped by physiological factors, socio-cultural

inﬂuences, and personal experiences (Rahman and

Halim, 2022). Manual handwriting analysis for per-

sonality assessment is subjective and requires exper-

tise, leading researchers to explore automated solu-

tions through image processing and machine learning

techniques (Gavrilescu and Vizireanu, 2018).

Many studies relied on manual feature extraction

and traditional machine learning classiﬁers. Mukher-

jee et al. (Mukherjee et al., 2022) approached per-

sonality prediction by extracting character-based fea-

tures such as speciﬁc letters (e.g., ’a’, ’g’, ’n’, ’t’)

and the word ”of,” from handwritten samples and

used classiﬁers like K-nearest neighbor (KNN) and

Multi-layer Perceptron (MLP) for prediction. Nair

et al. (B J et al., 2024) manually extracted features

such as stroke pressure and letter slant, then classiﬁed

Adeli Shamsabad, M. and Suen, C. Y.

Automated Handwriting Pattern Recognition for Multi-Level Personality Classiﬁcation Using Transformer OCR (TrOCR).

DOI: 10.5220/0013318800003905

In Proceedings of the 14th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2025), pages 141-150

ISBN: 978-989-758-730-6; ISSN: 2184-4313

141

personality traits using algorithms like Support Vector

Machine (SVM), Decision Trees, and Random Forest.

Chin et al. (Chin et al., 2021) used a Histogram of Ori-

ented Gradients to extract handwriting features and

classiﬁed them using multiclass Support Vector Ma-

chines, with logistic regression. Another early work

by Gavrilescu and Vizireanu (Gavrilescu and Vizire-

anu, 2018) employed a semi-automatic approach in-

volving handwriting feature extraction followed by a

neural network-based model for classifying the Big

Five personality traits.

With artiﬁcial intelligence and deep learning ad-

vancements, automatic feature extraction methods

have become more prevalent. Ahmed et al. (Sayed

et al., 2024) proposed a deep learning framework us-

ing Convolutional Neural Networks (CNNs) such as

VGG16, DenseNet201, ResNet, and InceptionV3 to

automatically extract handwriting features like let-

ter size, slant, and pressure on the IAM handwrit-

ing database. Similarly, Nair et al. (Nair et al., 2021)

compared the performance of ResNet50 and CNN for

handwriting analysis to predict personality traits, in-

dicating that deep learning models provide a robust

solution for capturing handwriting patterns. Addi-

tionally, Puttaswamy et al. (Puttaswamy and Thilla-

iarasu, 2025) employed Fine DenseNet with attention

mechanisms to extract intricate handwriting features

for personality classiﬁcation, showcasing the role of

attention-based feature extraction in enhancing clas-

siﬁcation accuracy.

Dhumal et al. (Dhumal et al., 2023) applied Trans-

former and LSTM networks to automatically extract

handwriting features and predict personality traits, ex-

ploring the multi-label classiﬁcation approach. Shree

et al. (Shree and Dr.Siddaraju, 2022) used YOLO v5

and ResNet34 for feature extraction and personality

classiﬁcation, demonstrating an effective strategy to

improve accuracy and efﬁciency. Recent advance-

ments in attention mechanisms and vision transform-

ers have further improved the feature extraction capa-

bilities (Koepf et al., 2022), allowing for more analy-

sis of handwriting features and improved personality

prediction performance.

Another line of research focused on semi-

supervised and hybrid models for personality classi-

ﬁcation. Rahman and Halim (Rahman and Halim,

2022) employed a Semi-supervised Generative Ad-

versarial Network (SGAN) to classify personality

traits based on handwriting samples, utilizing a com-

bination of labeled and unlabeled data to improve

classiﬁcation accuracy. This approach highlighted the

efﬁcacy of semi-supervised learning in addressing the

challenges posed by limited labeled data.

Previous studies have advanced the ﬁeld, but many

of them focused on independent trait classiﬁcation

or relied on manual feature extraction, which limits

their scalability and adaptability. Our research aims

to overcome these limitations by developing an end-

to-end deep learning framework with fully automated

feature extraction. Building on our previous work,

where only two personality traits were analyzed, this

research expands the dataset to have all ﬁve traits out-

lined by the Big Five Factor Model (BFFM). This

broader approach enables multi-level classiﬁcation of

all ﬁve traits simultaneously, facilitating a more com-

prehensive personality analysis.

A French handwriting dataset from the CEN-

PARMI lab is used in this study, expanded from

873 full-page handwriting images to 5,765 line-

segmented images to increase dataset size while pre-

serving essential handwriting patterns. The model

implicitly learns handwriting patterns through deep

learning mechanisms like convolutional ﬁlters or at-

tention heads. Based on our previous research, Fo-

cal Loss is applied to effectively address class imbal-

ance (Adeli Shamsabad and Suen, 2024), in combi-

nation with Softmax for multi-class classiﬁcation and

BCE with logit loss for multi-binary classiﬁcation.

A new method for handwriting feature extrac-

tion is proposed, using the base version of TrOCR, a

model with an encoder-decoder structure pre-trained

on general text data and ﬁne-tuned on the IAM Hand-

writing Database with an input size of 384 to adapt

to handwriting-speciﬁc features. Instead of using the

decoder to generate text, TrOCR is modiﬁed to output

encoder features and then fed for classiﬁcation. To the

best of our knowledge, this represents the ﬁrst use of

TrOCR in a classiﬁcation framework, speciﬁcally for

predicting different levels of personality traits from

handwriting images, showcasing its potential beyond

traditional OCR applications.

To evaluate the proposed approach, three

other pre-trained deep learning models are com-

pared: ResNet50, chosen for its efﬁcient CNN

architecture and proven performance in prior

work (Adeli Shamsabad and Suen, 2024), and Vision

Transformer (ViT) base 16 with input sizes of 224

and 384 that are pre-trained on ImageNet by Google

that allows for an analysis of performance differences

between transformer models tailored for OCR tasks

and those designed for general-purpose vision tasks.

This paper is organized as follows: Section 2 de-

tails the materials, methods, and evaluation metrics.

Section 3 analyzes the results and compares them

with existing methods. Section 4 concludes the study

and suggests future work.

ICPRAM 2025 - 14th International Conference on Pattern Recognition Applications and Methods

142

2 MATERIALS AND METHODS

2.1 Data Collection

For this research, a custom dataset is created to enable

automatic handwriting analysis for predicting multi-

level personality traits based on the BFFM. Since

no public dataset is available to meet these speciﬁc

requirements, data collection is conducted with eth-

ical approval from Concordia University’s Human

Research Ethics Committee. The dataset is com-

posed of two main parts: responses to a BFFM per-

sonality questionnaire and corresponding handwriting

samples. Participants are recruited from Concordia

University to ensure a diverse dataset, and to main-

tain consistency and minimize external inﬂuences on

handwriting, data collection is carried out in a con-

trolled environment within a dedicated lab at Concor-

dia University’s CENPARMI research center.

Participants are asked to complete the BFFM

questionnaire, which assesses ﬁve personality traits:

Extraversion (EX), Neuroticism (NE), sometimes re-

ferred to as Emotional Stability in its inverse form,

Agreeableness (AG), Conscientiousness (CO), and

Openness to Experience (OE) (Costa and McCrae,

1997). Scores for each trait are categorized into

three levels: low, average, and high. Each handwrit-

ing sample is manually analyzed by a professional

graphologist for speciﬁc features, such as slant, spac-

ing, and letter formation, which correspond to each

of the BFFM traits. Each sample is labeled with all

ﬁve personality traits, with a unique level assigned to

each trait, indicating that every sample reﬂects differ-

ent levels for each of the ﬁve traits. In total, 1110

handwriting samples are collected, digitized at a high

resolution of 600 dots per inch (DPI), and processed

to remove noise and irrelevant marks.

The dataset includes handwriting samples in mul-

tiple languages, with the majority in French (873 sam-

ples), followed by English (181 samples), along with

smaller quantities in languages such as Persian and

Korean. However, the distribution of personality traits

is severely imbalanced, with the medium level being

predominant in each trait category. Due to the large

number of French samples, this study primarily fo-

cuses on analyzing this subset and its personality trait

distribution, as follows:

• EX: Low - 125, Average - 333, High - 415;

• NE: Low - 88, Average - 473, High - 312;

• AG: Low - 96, Average - 703, High - 74 ;

• CO: Low - 44, Average - 319, High - 510;

• OE: Low - 38, Average - 558, High - 277.

In our previous study, the dataset included labels

for only two personality traits, Extraversion and Con-

scientiousness (Adeli Shamsabad and Suen, 2024).

For this research, the dataset has been expanded to

include all ﬁve BFFM traits, enabling comprehen-

sive analysis through multi-level classiﬁcation and al-

lowing simultaneous prediction of all ﬁve personality

traits.

2.2 Data Preprocessing

The primary goal of this research is to develop an

end-to-end automated system for handwriting pattern

recognition using deep learning techniques. To en-

hance the model’s learning capacity and ensure effec-

tive generalization across various handwriting styles,

it is essential to increase the number of handwrit-

ing samples and address dataset imbalance (Shorten

and Khoshgoftaar, 2019). Additionally, the original

TrOCR base model is designed for single-line text

segments, optimizing its performance for line-by-line

tasks rather than continuous multi-line or paragraph

text (Li et al., 2021). To address these limitations,

each handwriting sample is segmented line by line us-

ing OpenCV, with careful attention to preserving the

original text structure. This segmentation approach

ensures that the most signiﬁcant features of each let-

ter shape are retained, as illustrated in Figure 1.

Figure 1: Line Segmentation Process.

Through line-level segmentation, the dataset is ex-

panded to 5,765 handwriting subsamples. This seg-

mentation enhances the distribution of traits, partic-

ularly for those traits that initially had fewer sam-

ples (Table 1).

The handwriting samples are digitized through

scanning, with some images resulting in low qual-

ity, which necessitates further preprocessing. Based

on our previous ﬁndings, Otsu’s binarization method

combined with bilateral ﬁltering is the most effec-

Automated Handwriting Pattern Recognition for Multi-Level Personality Classiﬁcation Using Transformer OCR (TrOCR)

143

Table 1: Trait Distribution After Line Segmentation.

Total Number of Sub-Samples: 5765

Trait Low Average High

EX 1058 2287 2420

NE 392 3019 2354

AG 512 4569 684

CO 233 1803 3729

OE 339 3765 1661

tive preprocessing approach (Adeli Shamsabad and

Suen, 2024). Otsu’s binarization converts the images

into a binary format, enhancing the contrast between

text and background, while bilateral ﬁltering reduces

noise while preserving important edge details (Fig-

ure 1). These methods signiﬁcantly improve hand-

writing clarity, thereby facilitating more effective fea-

ture detection by the model (Xu et al., 2024).

After processing, the images are converted into

a three-channel format to ensure compatibility with

neural networks, which typically require RGB input

channels. The dataset is then divided into 60% for

training, 20% for validation, and 20% for testing.

This preprocessing pipeline including line segmenta-

tion and image enhancement ensures that the dataset

is optimized for training a neural network model,

promoting improved generalization capabilities and

greater potential for accurate personality trait predic-

tion.

2.3 Model Development

A new approach is introduced to automatically ex-

tract relevant features for classifying multi-level per-

sonality traits by adapting the TrOCR model for

classiﬁcation tasks. For a baseline comparison, the

performance of the proposed model is evaluated

against three pre-trained models: ResNet50 and Vi-

sion Transformer (ViT) base 16 at two input resolu-

tions (224 × 224 and 384 × 384). Two classiﬁcation

approaches: multi-class and multi-label binary clas-

siﬁcation, are investigated to identify the method that

best addresses data imbalance and supports effective

learning for traits with limited samples. In both ap-

proaches, focal loss is employed to minimize the im-

pact of well-classiﬁed instances, focus the model on

challenging samples, and manage the imbalanced dis-

tribution of trait levels (Lin et al., 2017).

2.3.1 Classiﬁcation Layer

Multi-Class Classiﬁcation: In this approach, each

personality trait is treated as a separate task with

three mutually exclusive levels: low, average, and

high. Five classiﬁcation heads are used, each ded-

icated to one trait, producing logits for these three

levels. Softmax activation followed by cross-entropy

loss is applied to assign probabilities across the levels

for each trait (Goodfellow et al., 2016). To address the

dataset imbalance, targeted data augmentation tech-

niques including random rotation, afﬁne transforma-

tions, perspective distortion, color jitter, and Gaus-

sian blur are selectively applied to minority classes to

enrich their representation, thus promoting more bal-

anced learning and improving generalization (Shorten

and Khoshgoftaar, 2019).

To further reduce the effects of imbalance, over-

sampling is implemented alongside sample weight-

ing based on class frequencies. This approach in-

creases the presence of underrepresented classes dur-

ing training and encourages the model to learn their

features effectively, enhancing its ability to differenti-

ate among traits (Luo et al., 2024).

Multi-Label Binary Classiﬁcation: The model in

this approach is conﬁgured to use 15 binary classiﬁca-

tion heads, one for each level across all ﬁve traits. Bi-

nary Cross-Entropy with Logits Loss (BCEWithLog-

itsLoss) combined with sigmoid activation is applied,

allowing each level within each trait to be treated as

an independent binary classiﬁcation task (Nam et al.,

2014). In this framework, data augmentation is ap-

plied uniformly across all classes, allowing the model

to develop a balanced understanding of each trait

level, regardless of class distribution. This conﬁgu-

ration enables the model to learn each trait’s charac-

teristics independently, without being inﬂuenced by

the distribution of other levels, making it particularly

effective for traits with imbalanced levels.

These two approaches establish a structure for

comparing the efﬁcacy of multi-class and multi-label

binary classiﬁcation in managing imbalanced data.

By evaluating both methods, ﬁndings are provided

into which approach better supports balanced learn-

ing and improves classiﬁcation performance across all

personality traits.

2.3.2 CNN Architecture: ResNet50

Convolutional Neural Networks (CNNs) are well-

known for their success in extracting meaningful

features from images due to their ability to handle

spatial data processing effectively (Vargoorani and

Suen, 2024). ResNet50, a deep CNN architecture,

is selected in this study for its powerful feature ex-

traction capabilities and its proven effectiveness in

handwriting classiﬁcation tasks shown in our previ-

ous research. It demonstrates strong performance

in capturing meaningful features from handwriting

ICPRAM 2025 - 14th International Conference on Pattern Recognition Applications and Methods

144

data, outperforming other models in simpler classi-

ﬁcation settings (Adeli Shamsabad and Suen, 2024).

ResNet50 offers a favorable balance between perfor-

mance and computational efﬁciency, making it a prac-

tical choice for handwriting feature extraction, es-

pecially when compared to more resource-intensive

models like transformers (Raghu et al., 2021).

2.3.3 Vision Transformer: ViT Base 16

The Vision Transformer (ViT) is a transformer-based

model adapted speciﬁcally for computer vision tasks.

Unlike CNNs, which rely on convolutional ﬁlters to

capture local patterns, ViT splits images into patches,

treats each patch as a token, and processes these to-

kens using a transformer encoder. This structure en-

ables ViT to capture global dependencies across the

entire image, which can be particularly advantageous

for tasks requiring an understanding of spatial rela-

tionships, such as handwriting analysis (Koepf et al.,

2022).

To explore the effectiveness of ViT at different

scales and to compare its performance with TrOCR,

two conﬁgurations are evaluated in this study: ViT

base 16 with input resolutions of 224 x 224 and 384 x

384 (Zhai et al., 2021). The ViT base 16-224 is cho-

sen as a computationally efﬁcient baseline, offering

faster processing and a general overview of handwrit-

ing features. In contrast, the ViT base 16-384 allows

for the capture of ﬁner handwriting details, provid-

ing insights into how higher resolutions impact fea-

ture details and classiﬁcation performance (Dosovit-

skiy, 2020).

These conﬁgurations evaluate the applicability of

vision transformers for handwriting-based personal-

ity trait classiﬁcation, focusing on computational ef-

ﬁciency and feature extraction compared to TrOCR.

Both models use a transformer encoder with self-

attention mechanisms to process patch embeddings,

identifying spatial and stylistic handwriting patterns.

2.3.4 The Proposed Transformer Model:

TrOCR

TrOCR, or Transformer Optical Character Recogni-

tion, is a transformer-based model developed by Mi-

crosoft speciﬁcally for OCR applications. Unlike

traditional OCR systems that rely on CNNs for im-

age processing and RNNs for sequential text gener-

ation (Campiotti and Lotufo, 2022), TrOCR is de-

signed as an end-to-end transformer model that inte-

grates a ViT encoder, initialized with BEiT weights

for image encoding, and a RoBERTa-based text de-

coder for autoregressive text generation. The encoder

processes images by dividing them into 16x16 ﬁxed-

size patches, embedding each patch as a sequence

token, and using absolute positional embeddings to

retain spatial information (Li et al., 2021). This

architecture effectively captures local(e.g., character

styles, ligatures) and global (e.g., ink width, slant)

dependencies within an image, demonstrating state-

of-the-art performance for OCR tasks like printed

and handwritten text recognition without requiring

complex pre- or post-processing steps (Bhunia et al.,

2021).

In this study, the pre-trained TrOCR model, ﬁne-

tuned on the IAM handwriting dataset, is repurposed

for personality trait classiﬁcation from handwriting

images. Through this modiﬁcation, instead of con-

verting handwriting images into text, TrOCR’s en-

coder processes handwriting images by transforming

them into a sequence of visual tokens, capturing es-

sential handwriting characteristics such as stroke pat-

terns, shapes, and distinctive features. These visual

tokens serve as the basis for identifying patterns as-

sociated with different personality traits. The text de-

coder is replaced by a custom classiﬁcation head that

outputs predictions for each personality trait, allowing

it to perform both multi-class and multi-label classi-

ﬁcation depending on the task requirements (Figure

2).

Figure 2: Proposed TrOCR Model for Classiﬁcation.

This adaptation highlights TrOCR’s ﬂexibility,

showing that it can go beyond OCR tasks to han-

dle complex classiﬁcation. The model’s transformer-

based design captures detailed handwriting features,

making it useful for analyzing personality traits from

handwriting images.

2.4 Performance Evaluation Metrics

The effectiveness of each model in classifying person-

ality traits from handwriting images is evaluated using

a range of performance metrics. Given the dataset’s

imbalance and the multi-class, multi-label classiﬁca-

tion structure, metrics such as accuracy, precision, re-

call, F1-score, and Area Under the Receiver Oper-

Automated Handwriting Pattern Recognition for Multi-Level Personality Classiﬁcation Using Transformer OCR (TrOCR)

145

ating Characteristic Curve (AUROC) are applied to

ensure a comprehensive assessment (He and Garcia,

2009). These metrics are derived from values in the

confusion matrix, which reﬂects model performance

by displaying the counts of true positives, true neg-

atives, false positives, and false negatives for each

class i:

• True Positives (TP

): Instances that are Correctly

identiﬁed as class i.

• True Negatives (TN

): Instances that are Correctly

identiﬁed as not class i.

• False Positives (FP

): Instances that are Incor-

rectly identiﬁed as class i.

• False Negatives (FN

): Instances of class i Incor-

rectly identiﬁed as another class.

2.4.1 Accuracy

This metric measures the proportion of correct predic-

tions among the total predictions. However, due to the

imbalanced dataset, accuracy alone may not reﬂect

the model’s true performance across all classes, espe-

cially on minority traits (Tanha et al., 2020). There-

fore, accuracy is considered alongside other metrics

for a more balanced evaluation.

2.4.2 Precision

Precision calculates the ratio of correctly predicted

positive instances to the total predicted positives.

High precision indicates a low false positive rate,

which is important in this context to ensure that traits

are not misclassiﬁed as other traits. Precision is

particularly valuable for evaluating the model’s per-

formance on minority classes, where false positives

could have a more signiﬁcant impact.

2.4.3 Recall

Recall (or sensitivity) is the ratio of correctly pre-

dicted positive instances to all actual positives. High

recall means the model effectively identiﬁes the tar-

get class, minimizing false negatives. This metric is

essential for ensuring that all personality traits, espe-

cially those with fewer samples, are accurately de-

tected by the model.

2.4.4 F1-Score

The F1-score, calculated as the harmonic mean of pre-

cision and recall, provides a single metric that bal-

ances both measures. The F1-score is particularly rel-

evant in the case of imbalanced data, as it offers a

more comprehensive view of a model’s performance

when precision and recall are equally signiﬁcant. For

the multi-class and multi-label classiﬁcation tasks, a

weighted average F1-score is calculated across all

traits to assess overall performance.

2.4.5 AUROC

AUROC is used to evaluate the model’s ability to dis-

tinguish between classes, a high AUROC score re-

ﬂects strong performance, balancing sensitivity (true

positive rate) and speciﬁcity (true negative rate) in-

dicating better separability. In this study, AUROC

is calculated separately for each trait to assess the

model’s performance in differentiating between lev-

els(low, average, high) within each trait which allows

for a detailed evaluation of the model’s strengths and

weaknesses in classifying handwriting features linked

to different personality traits.

Each of these metrics is calculated individually for

each personality trait, and the average performance

across all traits is reported. This approach provides

a clear assessment of the proposed method’s effec-

tiveness and allows for meaningful comparisons with

other deep-learning models. It ensures the evalua-

tion captures the challenges of classifying individual

traits while highlighting how well each model handles

class imbalance and distinguishes between personal-

ity traits.

3 RESULTS AND DISCUSSION

This section presents the result of two classiﬁca-

tion strategies: multi-class classiﬁcation using Cross-

Entropy with Softmax and multi-label binary classi-

ﬁcation using BCEWithLogitsLoss, to predict multi-

level personality traits from imbalanced handwriting

data. The performance of the proposed TrOCR model

is analyzed in comparison to three baseline models:

ResNet50 and ViT base 16 with input resolutions of

224 and 384. All models were trained for 100 epochs

using an NVIDIA A100 Tensor Core GPU and the

outcomes of these experiments are detailed in the sub-

sequent subsections.

3.1 Multi-Label vs. Multi-Class

Classiﬁcation

In the multi-class classiﬁcation approach, each per-

sonality trait is predicted as a single multi-class

problem using Softmax and cross-entropy loss, with

class weighting and focal loss emphasizing minority

ICPRAM 2025 - 14th International Conference on Pattern Recognition Applications and Methods

146

classes. In contrast, the multi-binary classiﬁcation ap-

proach treats each class (low, average, high) as an in-

dependent binary problem, using BCE loss with fo-

cal loss to handle imbalances. Results indicate that

the multi-binary method captures patterns more effec-

tively, improving performance across all four models.

Based on the results indicated in table 2, in

the multi-class classiﬁcation approach, ResNet-50

achieves the highest accuracy of 65.80% and an F1-

score of 0.616, showcasing its capability in handling

multi-class predictions. However, the overall perfor-

mance of all models in this approach remains rela-

tively constrained, with TrOCR achieving an accu-

racy of 61.47% and an F1-score of 0.600, which are

slightly lower than ResNet-50 but still competitive.

In contrast, the multi-label binary classiﬁca-

tion approach signiﬁcantly improves the performance

metrics across all models, underscoring the advan-

tages of independently optimizing each trait. ResNet-

50 shows a marked improvement, achieving an ac-

curacy of 81.22% and an F1-score of 0.712, reﬂect-

ing its enhanced ability to handle imbalanced data

when traits are treated as independent binary prob-

lems. Similarly, ViT models exhibit notable gains in

performance, with ViT-384 attaining an accuracy of

80.89% and an F1-score of 0.776. TrOCR outper-

forms all other models in the multi-label binary con-

ﬁguration, achieving the highest accuracy of 84.46%

and an F1-score of 0.810.

Table 2: Comparative Evaluation of Classiﬁcation Methods

on a Validation Dataset.

Multi-Class Classiﬁcation with Cross-Entropy with Softmax

Models Loss Accuracy Precision Recall F1-Score

ResNet50 0.293 65.80 % 0.636 0.648 0.616

ViT-224 0.499 61.81 % 0.577 0.608 0.576

ViT-384 0.354 63.03 % 0.560 0.620 0.577

TrOCR 0.343 61.47 % 0.566 0.634 0.600

Multi-Label Binary Classiﬁcation with BCELogitLoss

Models Loss Accuracy Precision Recall F1-Score

ResNet50 0.136 81.22 % 0.727 0.699 0.712

ViT-224 0.178 77.18 % 0.769 0.762 0.765

ViT-384 0.124 80.89 % 0.771 0.772 0.776

TrOCR 0.106 84.46 % 0.808 0.807 0.810

The AUROC scores in table 3 further illustrate

the advantages of multi-binary classiﬁcation in cap-

turing trait-speciﬁc distinctions, with consistent im-

provements across all traits compared to the multi-

class approach. The AUROC for the CO trait in the

proposed TrOCR model increases substantially from

0.5393 in the multi-class approach to 0.8943 in the

multi-binary approach. Similarly, notable gains are

observed for EX, NE, AG, and OE traits. These

signiﬁcant improvements across all traits highlight

the effectiveness of the multi-binary classiﬁcation ap-

proach in addressing data imbalance, optimizing each

trait independently, and capturing patterns unique to

each personality factor, thereby enhancing the overall

model performance.

Table 3: Comparison of AUROC Scores for Classiﬁcation

Methods Across Models.

Model Traits Multi-Class AUROC Multi-Binary AUROC

ResNet-50

EX 0.8307 0.8678

NE 0.7638 0.8255

AG 0.7273 0.8770

CO 0.8412 0.8827

OE 0.8863 0.9291

ViT-224

EX 0.7532 0.8178

NE 0.6340 0.7827

AG 0.6117 0.8434

CO 0.7694 0.8498

OE 0.8330 0.8971

ViT-384

EX 0.7716 0.8585

NE 0.7553 0.8238

AG 0.7015 0.8738

CO 0.7813 0.8591

OE 0.8470 0.9192

TrOCR

EX 0.6419 0.9179

NE 0.6493 0.8850

AG 0.6642 0.9138

CO 0.5393 0.8943

OE 0.6719 0.9334

3.2 Performance Comparison of

TrOCR vs. Other Models

To provide a reliable comparison for evaluating the

performance of the proposed TrOCR model, three

pre-trained deep learning models, ResNet50, ViT-

224, and ViT-384, are trained using the same classi-

ﬁcation approach and dataset. This approach ensures

a reasonable and consistent comparison, as one of the

primary objectives of this study is the multi-level clas-

siﬁcation of personality traits. Since no directly com-

parable work is found addressing multi-level person-

ality classiﬁcation from handwriting using a similar

methodology, these models are selected to benchmark

the effectiveness of TrOCR.

The results reveal distinct differences in the mod-

els’ abilities to process handwriting data. As shown in

the training accuracy plot (Figure 3), TrOCR achieves

the highest training accuracy, exceeding 95% in later

epochs. ViT-384 demonstrates competitive accuracy

but remains below TrOCR. ResNet50 shows slower

improvements and surpasses ViT-224, but both mod-

els perform lower than TrOCR and ViT-384.

Automated Handwriting Pattern Recognition for Multi-Level Personality Classiﬁcation Using Transformer OCR (TrOCR)

147

The training loss plot (Figure 3) further empha-

sizes TrOCR’s superior performance. The lowest ﬁnal

loss values are achieved by TrOCR, reﬂecting its abil-

ity to learn handwriting features effectively and mini-

mize errors efﬁciently. While ViT-384 shows moder-

ate performance with lower loss values than ResNet50

and ViT-224, these baseline models exhibit higher

loss values, indicating less effective learning and fea-

ture extraction.

Figure 3: Training Performance Comparison Across Mod-

els in Multi-Label Binary Classiﬁcation.

The AUROC comparison, shown in Figure 4,

highlights the performance differences among the

models in distinguishing between classes in the multi-

label binary classiﬁcation task. The TrOCR model

achieves the highest AUROC of 0.91, demonstrating

its strong capability in extracting handwriting features

and separating classes effectively. This score reﬂects

the model’s ability to handle the complexities of hand-

writing data with precision.

ResNet50 achieves an AUROC of 0.88, showing

competitive performance but still falling short of the

TrOCR model. Among the Vision Transformer mod-

els, ViT-384 performs slightly better with an AUROC

of 0.87, while ViT-224 achieves a lower AUROC of

0.84. This performance suggests that the higher input

resolution used by ViT-384 aids in capturing more de-

tailed handwriting features compared to ViT-224.

Looking at the test metrics in Table 4, TrOCR

once again stands out as the top performer, with an

accuracy of 84.26%, precision of 0.823, recall of

0.842, and F1-score of 0.832. These results show

that TrOCR handles unseen data more effectively.

Figure 4: AUROC Score Comparison Across Models in

Multi-Label Binary Classiﬁcation.

ResNet50 follows with an accuracy of 80.52% and

an F1-score of 0.778, showing reasonable but lower

performance. ViT-224 and ViT-384 perform less ef-

fectively, with ViT-224 achieving the lowest test ac-

curacy at 77.71% and an F1-score of 0.742. ViT-

384 performs slightly better, with an accuracy of

80.07% and an F1-score of 0.772, close to the results

of ResNet50. The consistently high performance of

TrOCR across training accuracy (Figure 3), AUROC

score (Figure 4), and test metrics (Table 4) indicate

that TrOCR captures handwriting patterns more ef-

fectively than the other models.

Table 4: Performance of Models on Test Dataset for Multi-

Label Binary Classiﬁcation.

Models Loss Accuracy Precision Recall F1-Score

ResNet50 0.121 80.52 % 0.781 0.775 0.778

ViT-224 0.171 77.71 % 0.750 0.737 0.742

ViT-384 0.138 80.07 % 0.777 0.768 0.772

TrOCR 0.106 84.26 % 0.823 0.842 0.832

The confusion matrices (Figure 5) demonstrate

that TrOCR performs effectively in capturing patterns

for most personality traits, with notable TP rates such

as 466 for AG in the Low class and 389 for OE in the

Low class. High TN values are also observed across

traits, including 678 for OE in the High class, re-

ﬂecting robust class separation. However, higher FN

counts in traits like CO and NE, particularly in the

Low class, indicate challenges in distinguishing these

levels due to the smaller number of samples available.

These results highlight TrOCR’s capability to extract

distinctive handwriting patterns and classify personal-

ity traits accurately, even when faced with imbalanced

data.

ICPRAM 2025 - 14th International Conference on Pattern Recognition Applications and Methods

148

Figure 5: TrOCR Confusion Matrix per Trait.

4 CONCLUSION

The study introduces a new approach by adapting the

pre-trained TrOCR model for automatic handwriting

pattern recognition and multi-level personality trait

classiﬁcation. The results show that TrOCR con-

sistently outperforms ResNet50, ViT-224, and ViT-

384 across all metrics. On the test dataset, TrOCR

achieves an AUROC score of 0.91, the highest ac-

curacy of 84.26%, a precision of 0.823, a recall of

0.842, and an F1-score of 0.832, demonstrating its

superior capability to learn and generalize handwrit-

ing patterns associated with personality traits. The

model also records the highest training accuracy and

lowest training loss, further emphasizing its effective-

ness. Confusion matrix analysis highlights its strong

True Positive (TP) and True Negative (TN) rates for

traits like Agreeableness and Openness to Experience,

though challenges persist for low-class samples in

traits such as Conscientiousness and Neuroticism due

to limited representation.

The ﬁndings underline the effectiveness of the

multi-label binary classiﬁcation approach in address-

ing class imbalances, which are common in personal-

ity trait datasets. By treating each trait independently,

this approach enhances the model’s ability to learn

from underrepresented classes, thereby improving its

overall performance.

Future work will focus on expanding the diver-

sity and size of handwriting datasets to improve the

model’s robustness and generalizability. Combining

the strengths of CNNs and transformers through en-

semble modeling could also help achieve better accu-

racy and stability. Since the model learns handwrit-

ing patterns through mechanisms like convolutional

ﬁlters and attention heads, techniques such as Grad-

CAM or SHAP could be used to highlight which

handwriting features or regions inﬂuence predictions

the most. Aligning these ﬁndings with graphology

principles could clarify how handwriting relates to

personality traits, making the model more practical

and transparent.

Additionally, applying these techniques would

build trust in applications like psychological assess-

ments and forensic analysis. Collaborating with ex-

perts in psychology and linguistics to tailor the model

for real-world use will help validate its effectiveness

and reﬁne its design, ultimately making handwriting-

based personality analysis more scalable and reliable.

REFERENCES

Adeli Shamsabad, M. and Suen, C. Y. (2024). Deep

multi-label classiﬁcation of personality with hand-

writing analysis. In IAPR Workshop on Artiﬁcial Neu-

ral Networks in Pattern Recognition, pages 218–230.

Springer.

Alshouha, B., Serrano-Guerrero, J., Chiclana, F., Romero,

F. P., and Olivas, J. A. (2024). Personality trait de-

tection via transfer learning. Computers, Materials &

Continua, 78(2).

B J, B., S, K., Suraj, A., K, N., and Venkitesan, P. (2024).

Handwriting analysis for classiﬁcation of human per-

sonality. pages 1–6.

Bhunia, A., Khan, S., Cholakkal, H., Anwer, R., Khan, F.,

and Shah, M. (2021). Handwriting transformers.

Campiotti, I. and Lotufo, R. (2022). Optical character

recognition with transformers and ctc. pages 1–4.

Chin, X. Y., Lau, H. Y., Chong, Z. X., Chow, M. P., and

Salam, Z. A. A. (2021). Personality prediction using

machine learning classiﬁers. Journal of Applied Tech-

nology and Innovation (e-ISSN: 2600-7304), 5(1):1.

Costa, P. T. and McCrae, R. R. (1997). Chapter 11 - lon-

gitudinal stability of adult personality. In Hogan, R.,

Johnson, J., and Briggs, S., editors, Handbook of Per-

sonality Psychology, pages 269–290. Academic Press,

San Diego.

Dhumal, Y. R., Shinde, A., Chaudhari, K., Oza, S., Sapkal,

R., and Itkarkar, S. (2023). Automatic handwriting

analysis and personality trait detection using multi-

task learning technique. In 2023 10th International

Conference on Computing for Sustainable Global De-

velopment (INDIACom), pages 348–354.

Dosovitskiy, A. (2020). An image is worth 16x16 words:

Transformers for image recognition at scale. arXiv

preprint arXiv:2010.11929.

Gavrilescu, M. and Vizireanu, N. (2018). Predicting the

Automated Handwriting Pattern Recognition for Multi-Level Personality Classiﬁcation Using Transformer OCR (TrOCR)

149

big ﬁve personality traits from handwriting. EURASIP

Journal on Image and Video Processing, 2018(1):57.

Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep

Learning. MIT Press. http://www.deeplearningbook.

org.

He, H. and Garcia, E. A. (2009). Learning from imbalanced

data. IEEE Transactions on Knowledge and Data En-

gineering, 21(9):1263–1284.

Koepf, M., Kleber, F., and Sablatnig, R. (2022). Writer

Identiﬁcation and Writer Retrieval Using Vision

Transformer for Forensic Documents, pages 352–366.

Springer International Publishing.

Li, M., Lv, T., Cui, L., Lu, Y., Florencio, D., Zhang, C.,

Li, Z., and Wei, F. (2021). Trocr: Transformer-based

optical character recognition with pre-trained models.

Lin, T.-Y., Goyal, P., Girshick, R., He, K., and Doll

ar, P.

(2017). Focal loss for dense object detection. In 2017

IEEE International Conference on Computer Vision

(ICCV), pages 2999–3007.

Luo, J., Yuan, Y., and Xu, S. (2024). Improving gbdt per-

formance on imbalanced datasets: An empirical study

of class-balanced loss functions.

Mukherjee, S., Ghosh, I., and Mukherjee, D. (2022). Big

Five Personality Prediction from Handwritten Char-

acter Features and Word ‘of’ Using Multi-label Clas-

siﬁcation, pages 275–299.

Nair, G., Rekha, V., and Krishnan, M. S. (2021). Handwrit-

ing Analysis Using Deep Learning Approach for the

Detection of Personality Traits, pages 531–539.

Nam, J., Kim, J., Gurevych, I., and F

urnkranz, J. (2014).

Large-scale multi-label text classiﬁcation — revisiting

neural networks. In Machine Learning and Knowl-

edge Discovery in Databases, pages 437–452, Berlin,

Heidelberg. Springer Berlin Heidelberg.

Puttaswamy, B. S. and Thillaiarasu, N. (2025). Fine

densenet based human personality recognition using

english hand writing of non-native speakers. Biomed-

ical Signal Processing and Control, 99:106910.

Raghu, M., Unterthiner, T., Kornblith, S., Zhang, C., and

Dosovitskiy, A. (2021). Do vision transformers see

like convolutional neural networks? Advances in neu-

ral information processing systems, 34:12116–12128.

Rahman, A. and Halim, Z. (2022). Predicting the big

ﬁve personality traits from hand-written text features

through semi-supervised learning. Multimedia Tools

and Applications, 81:1–17.

Roberts, B. W. and Mroczek, D. (2008). Personality trait

change in adulthood. Current Directions in Psycho-

logical Science, 17(1):31–35. PMID: 19756219.

Sayed, A. M. A., Selim, A. W. G., Ashraf, A., and Emam,

E. (2024). Analyzing handwriting to infer personal-

ity traits: A deep learning framework. In 2024 In-

telligent Methods, Systems, and Applications (IMSA),

pages 58–63.

Shorten, C. and Khoshgoftaar, T. (2019). A survey on image

data augmentation for deep learning. Journal of Big

Data, 6:60.

Shree, N. and Dr.Siddaraju (2022). Analysis of personality

based on handwriting using deep learning.

Tanha, J., Abdi, Y., Samadi, N., Razzaghi, N., and Asad-

pour, M. (2020). Boosting methods for multi-class im-

balanced data classiﬁcation: an experimental review.

Journal of Big Data, 7(1):70.

Vargoorani, Z. E. and Suen, C. Y. (2024). License plate de-

tection and character recognition using deep learning

and font evaluation. In Suen, C. Y., Krzyzak, A., Ra-

vanelli, M., Trentin, E., Subakan, C., and Nobile, N.,

editors, Artiﬁcial Neural Networks in Pattern Recog-

nition, pages 231–242. Springer Nature Switzerland.

Xu, Y., Tang, Y., and Suen, C. Y. (2024). Two key factors

in handwriting analysis for personality prediction. In

Fifth International Conference on Image, Video Pro-

cessing, and Artiﬁcial Intelligence (IVPAI 2023), vol-

ume 13074, pages 102–107. SPIE.

Zhai, X., Kolesnikov, A., Houlsby, N., and Beyer, L.

(2021). Scaling vision transformers. arXiv preprint

arXiv:2106.04560.

ICPRAM 2025 - 14th International Conference on Pattern Recognition Applications and Methods

150