
for large datasets or real-time applications. Therefore,
to address this challenge, instead of using the point at
which the regression coefficient for a group,
⃗
β
j
, first
becomes zero as the threshold, we propose estimating
the threshold near the point where
⃗
β
j
first becomes
zero. Building on this, the essence of our approach
can be captured from Equation (7). If γλ > ||
⃗
β
j
||
2
,
⃗
β
j
remains non-zero. Conversely, if γλ < ||
⃗
β
j
||
2
,
⃗
β
j
becomes zero. This implies that γλ = ||
⃗
β
j
||
2
serves
as a crucial threshold in our framework. For our
study, we adopt γλ ≈ ||
⃗
β
j
||
2
as the threshold for the
group. Further, to estimate this threshold, we compute
multiple regression coefficients
⃗
β for varied λ values.
Our objective is to locate the threshold ||
⃗
β
j
||
2
≈ γλ
where
⃗
β remains non-zero. With γ being constant
at this threshold, λ provides a measure of group im-
portance. We define λ
j
for each group j such that
||
⃗
β
j
||
2
≈ γλ. This value, λ
j
, is indicative of the im-
portance of group j in our proposed method.
To facilitate a more comprehensive comparison
among groups, we introduce an importance measure
o
j
, which quantifies the relative importance of group
j. Formally, it is defined as:
o
j
=
λ
j
∑
J
k=1
λ
k
(8)
A thorough step-by-step explanation of this method is
provided in Algorithm 2.
4 EXPERIMENTS
In this section, we conduct experiments using both
generated data and real data to demonstrate the ef-
ficacy of the estimated group importance using our
proposed method. Section 4.1 provides a detailed
description of the generated and real-world datasets.
Section 4.2 presents the common experimental con-
ditions for both sets of experiments. Finally, section
4.3 shows the results from both the generated and real
data experiments.
4.1 Experimental Data
4.1.1 Generated Data
We generated data using specific parameters. For the
vector⃗v[θ, η], we define:
400
∑
n=1
⃗v[θ, η]
n
= θ,
400
∑
n=1
⃗v[θ, η]
n
2
= η
From this, we derive the vector ⃗x
m
as:
⃗x
m
=⃗v[0,1] (m = 1,··· ,15)
The target variable, ⃗y, is then generated as:
⃗y = X
⃗
β +⃗v[0, 2]
The regression coefficients
⃗
β are detailed in Table 1.
As indicated in Table 1, the coefficients for group 1,
⃗
β
1
, are defined as β
{1}m
j
= 0.4 for m
j
= 1, 2, 3. By
setting the elements of the regression coefficients for
each group to the same value in this experiment, the
values presented in the third row of Table 1 represent
the importance of each group.
4.1.2 Real Data
The data utilized in this study is sourced from the
open datasets made publicly available by the Ministry
of Health, Labour and Welfare in Japan. Our experi-
ments span data points collected from May 10, 2020,
to May 8, 2023, totaling 1094 entries. The target vari-
able (or the dependent variable) is the number of daily
deaths in Japan due to the novel coronavirus (COVID-
19) infection. The independent variables (or explana-
tory variables) represent the number of daily infec-
tions with the COVID-19, broken down by prefecture
in Japan. Out of all the prefectures, 12 were selected
for this study based on the criterion that they ac-
counted for at least 2% of the total infections in Japan
as of May 8, 2023. These prefectures are Hokkaido,
Saitama, Chiba, Tokyo, Kanagawa, Shizuoka, Aichi,
Kyoto, Osaka, Hyogo, Hiroshima, and Fukuoka. The
rationale behind this selection is to ensure the model’s
appropriateness by avoiding prefectures with signif-
icantly low infection rates compared to the national
total as of May 8, 2023.
In Japan, prefectures are often categorized into
regions: Hokkaido, Tohoku, Kanto, Chubu, Kinki,
Chugoku, Shikoku, Kyushu, and Okinawa. To define
the group importance in our experiments, we used the
total number of deaths due to COVID-19 as of May 8,
2023, in the selected 12 prefectures. We then aggre-
gated these death counts according to the aforemen-
tioned regional groupings. The rationale behind using
the total number of deaths in each region is to provide
an estimate of the group’s importance in each area.
By utilizing the total death count, we can indicate the
severity and impact of the pandemic in each region,
which serves as an approximation of the group’s im-
portance. The defined group importance is presented
in the fourth column of Table 2.
In this study, we conducted statistical tests us-
ing Ridge regression to assess whether the real data
is suitable for the regression model. Specifically,
we evaluated whether the model’s residuals were ho-
moscedastic by conducting the Breusch-Pagan test.
The results showed a p-value of 0.078, indicating
Group Importance Estimation Method Based on Group LASSO Regression
201