Hardware-oriented Algorithm for Human Detection

using GMM-MRCoHOG Features

Ryogo Takemoto

1 a

, Yuya Nagamine

, Kazuki Yoshihiro

, Masatoshi Shibata

, Hideo Yamada

Yuichiro Tanaka

3 b

, Shuichi Enokida

4 c

and Hakaru Tamukoh

1,3 d

Graduate School of Life Science and Systems Engineering, Kyushu Institute of Technology,

2-4 Hibikino, Wakamatsu-ku, Kitakyushu, Fukuoka, 808-0196, Japan

AISIN CORPORATION, 2-1 Asahi-machi, Kariya, Aichi, 448-8650, Japan

Research Center for Neuromorphic AI Hardware, Kyushu Institute of Technology,

2-4 Hibikino, Wakamatsu-ku, Kitakyushu, Fukuoka, 808-0196, Japan

Department of Artiﬁcial Intelligence, Faculty of Computer Science and Systems Engineering,

Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka, 820-8502, Japan

Keywords:

Image Processing, Human Detection, HOG, MRCoHOG, GMM-MRCoHOG, FPGA.

Abstract:

In this research, we focus on Gaussian mixture model-multiresolution co-occurrence histograms of oriented

gradients (GMM-MRCoHOG) features using luminance gradients in images and propose a hardware-oriented

algorithm of GMM-MRCoHOG to implement it on a ﬁeld programmable gate array (FPGA). The proposed

method simpliﬁes the calculation of luminance gradients, which is a high-cost operation in the conventional

algorithm, by using lookup tables to reduce the circuit size. We also designed a human-detection digital

architecture of the proposed algorithm for FPGA implementation using high-level synthesis. The veriﬁcation

results showed that the processing speed of the proposed architecture was approximately 123 times faster than

that of the FPGA implementation of VGG-16.

1 INTRODUCTION

The demand for home service robots and self-driving

cars has been increasing in response to the recent

acceleration in the aging population and decline in

birthrate. Because t hese robots and cars with artiﬁ-

cial intelligence are expected to operate near humans,

high-precision and high-speed human detection func-

tions are required from the viewpoint of safety. How-

ever, the more accurate the human detection, the more

complex is the computation and the longer the com-

putation time. Parallelization is one of the effective

solutions to accelerate the computation.

A typical device for parallel processing is a graph-

ics processing unit (GPU). However, GPUs are not

suitable for embedded systems such as home service

robots and self-driving cars in terms of power con-

https://orcid.org/0000-0002-6795-0794

https://orcid.org/0000-0001-6974-070X

https://orcid.org/0000-0001-6309-3185

https://orcid.org/0000-0002-3669-1371

sumption and heat exhaustion . Instead of software

implementation on GPUs, hardware implementation,

where a dedicated circuit with parallel architecture

for some computation is designed, can achieve a low-

power system with high-speed processing because the

operation on the dedicated circuit can be more effec-

tive than that on GPUs. Therefore, we aim to design a

dedicated circuit for human detection and implement

it on a ﬁeld-programmable gate array (FPGA). Be-

cause FPGAs have limited physical circuit resources,

we need a hardware-oriented algorithm that reduces

the number of complex operations in the original al-

gorithm to efﬁciently utilize the limited resources.

For high-accuracy human detection, histograms

of oriented gradients (HOG) features have been pro-

posed (Dalal and Triggs, 2005) and used in multi-

ple applications. This method extracts features of

object shapes from luminance gradients in images,

and represents the features as histograms of the gra-

dients. For higher-accuracy and smaller-memory re-

source implementation of human detection compared

Takemoto, R., Nagamine, Y., Yoshihiro, K., Shibata, M., Yamada, H., Tanaka, Y., Enokida, S. and Tamukoh, H.

Hardware-oriented Algorithm for Human Detection using GMM-MRCoHOG Features.

DOI: 10.5220/0010848100003124

In Proceedings of the 17th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2022) - Volume 4: VISAPP, pages

749-757

ISBN: 978-989-758-555-5; ISSN: 2184-4321

 2022 by SCITEPRESS – Science and Technology Publications, Lda. All rights reser ved

749

with HOG features, the Gaussian mixture model-

multiresolution co-occurrence histograms of oriented

gradients (GMM-MRCoHOG) features that approx-

imate the conventional histogram-based state space

with a mixed Gaussian distribution and optimize the

feature space have been proposed (Higashi et al.,

2018; Nagamine et al., 2019). However, the algorithm

still requires a large number of complex operations

that are not suitable for FPGA implementation.

In this study, we propose a hardware-oriented

algorithm of GMM-MRCoHOG that simpliﬁes the

complex operation in the original algorithm, such

as the calculation of luminance gradients by using a

lookup table (LUT); we then design a dedicated cir-

cuit for human recognition integrating the hardware-

oriented GMM-MRCoHOG with a binarized neural

network (BNN) (Hubara et al., 2016), and implement

it on an FPGA to achieve a high-accuracy, high-speed,

and low-power system.

2 RELATED WORKS

MRCoHOG (Iwata and Enokida, 2014), a derivative

work of HOG, extracts features by down-sampling

an image in two steps; it represents the gradient co-

occurrence between images of three resolutions as a

two-dimensional co-occurrence histogram. Feature

extraction methods using gradient histograms, such

as HOG and MRCoHOG, require a manual determi-

nation of the optimal class width of the histogram to

discretize the luminance gradients. This is difﬁcult

because the discretization error of the gradient infor-

mation and the generalization ability of the features

vary depending on the class width. Moreover, these

methods require many memory resources to represent

gradient histograms.

Conversely , GMM-MRCoHOG constructs an op-

timal state space by approximating the co-occurrence

histogram with a mixed Gaussian distribution, as

shown in Fig. 1 and performs feature extraction based

on the state space. The approximation results in re-

ducing the required memory resources for gradient

histograms in the original algorithm because only a

small number of memories is required to represent the

mixed Gaussian distribution.

Figures 2 and 3 show the processing ﬂow of

GMM-MRCoHOG. First, the co-occurrences of the

luminance gradient pairs (36 gradient directions for

each axis in Fig. 2) of the positive and negative

data of the training images are mapped to the feature

space as continuous values, and each feature is ap-

proximated by a mixture Gaussian distribution. Then,

using the Jensen– Shannon (JS) information content

(Michishita et al., 2018), only features that can ef-

fectively separate the positive and negative data are

extracted from the respective mixed Gaussian distri-

butions and approximated to a mixed Gaussian dis-

tribution using the EM algorithm (Dempster et al.,

1977). The resulting mixed Gaussian distribution is

then used as the feature space, and the responsibility

(described as “resp” in Fig. 3) of each Gaussian dis-

tribution is calculated and used as the feature value.

In GMM-MRCoHOG, the ﬁnal number of feature di-

mensions is determined by the number of Gaussian

distributions in 2D space, and not by the number of

gradient quantization.

Figure 1: Sample of Gaussian Mixture Model.

Figure 2: Training Process of State Space in GMM-

MRCoHOG.

Figure 3: Feature Extraction Process in GMM-MRCoHOG.

GMM-MRCoHOG has difﬁculties in hardware

implementation because it includes an arctangent

function for the luminance gradient angle decision

and the responsibility calculation for the feature value

decision, which are complex operations that require

considerable circuit resources. Nagamine et al. pro-

posed a hardware-oriented algorithm that approxi-

mates these calculations to reduce the circuit re-

sources (Nagamine et al., 2019). The algorithm deter-

mines the luminance gradient angles by using a con-

dition branch of the horizontal and vertical luminance

VISAPP 2022 - 17th International Conference on Computer Vision Theory and Applications

750

gradients f

and f

. Figure 4 shows a ﬁrst quadrant

in the luminance gradient space of f

and f

, which is

divided into several areas at intervals of 16 in Manhat-

tan distance. The condition branch determines an an-

gle by subtracting f

and f

according to the divided

area; therefore, the angle decision does not require

complex operations. For the feature value decision,

the algorithm infers a responsibility from the distance

between the input vector and each Gaussian distri-

bution. The algorithm also approximates the Gaus-

sian distribution width as a power of two and changes

the Gaussian shape as a rectangle so that the compu-

tation can be represented by bit-shift operations and

fuzzy inferences. Although the hardware-oriented al-

gorithm reduces most circuit resources from the orig-

inal algorithm, the condition branch for the angle cal-

culation still requires many LUTs, which worsens the

performance of the algorithm because of the impre-

cise angle approximation .

Figure 4: Condition Branch for Luminance Gradient Angle

Decision in (Nagamine et al., 2019).

3 PROPOSED METHODS

To improve the method proposed by Nagamine et al.,

we propose a novel coarse angle calculation method

using a ﬁxed-point tan θ table. We then construct a

hardware-oriented GMM-MRCoHOG-based human

recognition circuit using the method for a high-speed

and low-power human detection system.

3.1 Coarse Angle Calculation Method

using Fixed-point Tangent Table

In the GMM-MRCoHOG algorithm, the luminance

gradient angle θ is calculated as θ = tan

−1

( f

/ f

)

and discretized in 36 directions. Here, assuming that

the angle θ appears in the ﬁrst quadrant of the lumi-

nance gradient space, we calculate tan θ from tan0

◦

tan80

◦

in advance, as given by Eq. (1) and discretize

Figure 5: Overview of Discretized tan θ.

it, as shown in Fig. 5.

i f tan 0

◦

≤

<tan10

◦

direction = 1(θ : 0

◦

∼ 10

◦

)

eli f tan 10

◦

≤

<tan20

◦

direction = 2(θ : 10

◦

∼ 20

◦

)

eli f tan 80

◦

≤

direction = 9(θ : 80

◦

∼ 90

◦

)

(1)

Then, we create a tanθ table representing the re-

lationship between the discretized tan θ, f

, and f

which enables us to obtain rough angles of luminance

gradients. By utilizing the symmetry of the trigono-

metric functions, the tan θ table can be applied to the

second through fourth quadrants.

Additionally, we eliminate divisions that require

most circuit resources in the conditional branch in the

tanθ table. As f

≥ 0 and f

≥ 0, we can replace Eq.

(1) with Eq. (2), where no division is required.

i f f

× tan0

◦

≤ f

< f

× tan10

◦

direction = 1(θ : 0

◦

∼ 10

◦

)

eli f f

× tan10

◦

≤ f

< f

× tan20

◦

direction = 2(θ : 10

◦

∼ 20

◦

)

eli f f

× tan80

◦

≤ f

direction = 9(θ : 80

◦

∼ 90

◦

)

(2)

The values in the tan θ table are then approximated

with ﬁxed-point numbers that enable faster computa-

tion and fewer circuit resource implementations com-

pared with ﬂoating-point numbers.

Hardware-oriented Algorithm for Human Detection using GMM-MRCoHOG Features

751

3.2 Human Recognition Circuit

Integrating Hardware-oriented

GMM-MRCoHOG and BNN

We designed a dedicated human recognition circuit

using the proposed coarse angle calculation algorithm

and the responsibility inference method proposed by

Nagamine et al. (Nagamine et al., 2019), as shown in

Fig. 6.

This circuit receives a 32 × 64 pixels image as in-

put and continuously transfers one pixel at a clock

cycle from the top-left to the bottom-right pixel of

the image to the image buffers. Here, we set the

GMM-MRCoHOG extract features from three reso-

lution images: the original size image, a 1/2-resized

image, and a 1/4-resized image; therefore, we im-

plemented three image buffers for these resolutions.

Each of the buffers is a three-line buffer to calculate

the luminance gradient from 3 × 3 pixels in the im-

age. The derivative ﬁlter blocks receive three lines of

pixels and calculate the horizontal and vertical lumi-

nance differences. The angle calculation blocks cal-

culate the angles of the luminance gradients, and the

results are stored in the two-line buffers of the second

stage. Then, the gradient co-occurrence is calculated,

and the GMM-MRCoHOG feature is extracted. The

obtained feature is fed into the BNN, which classiﬁes

the input image as human or not human. The synaptic

weights and activation of the BNN are binarized such

that the circuit requires small memory resources.

Here, the number of mixtures of the Gaussian dis-

tribution used in the GMM-MRCoHOG is 6. The

BNN has three layers: input, hidden, and output lay-

ers, and the number of neurons in the hidden layer is

4 EXPERIMENT

We veriﬁed the proposed coarse angle calculation

method, implemented a human recognition circuit

integrating the hardware-oriented GMM-MRCoHOG

and the BNN using high-level synthesis, and esti-

mated the processing speed and circuit size. The ex-

perimental environment is presented in Table 1.

4.1 Coarse Angle Calculation Method

using the Fixed-point Tangent Table

In this experiment, we veriﬁed the proposed coarse

angle calculation method with respect to circuit size,

estimated the angle matching rate to true angles, pro-

cessing speed of the circuit, and the approximation

effect on accuracy for human recognition tasks.

First, we veriﬁed the circuit sizes of the tanθ table

when the integer part of the ﬁxed-point numbers in

the table was ﬁxed to three bits, and the fraction part

was varied from zero to seven bits. The target device

was a Xilinx Zynq XC7Z020 FPGA on a Zedboard

with a clock frequency of 200 MHz.

Next, we veriﬁed the matching rate between

the estimated angles calculated using the proposed

method and the true angle values. The bit width set-

ting of the ﬁxed-point numbers in the table was the

same as the circuit size veriﬁcation. The true an-

gle values were calculated by feeding f

and f

into

the atan2 function of the cmath library in C language

and discretized in 36 directions. In addition, we com-

pared the proposed method with the angle approxima-

tion method from a previous study (Nagamine et al.,

2019).

Next, we compared processing speeds of angle

calculations of the following three methods:

1. Software implementation of angle calculation by

atan2 function

2. Software implementation of angle calculation by

the proposed method

3. Hardware implementation of angle calculation by

the proposed method

In the software implementation, the average of the

calculation times of all 261,121 input luminance gra-

dients executed on an Intel Core i7-8700K central

processing unit (CPU) was used as the angle calcu-

lation time for software. In the hardware implemen-

tation, clock cycles to calculate an angle by the circuit

multiplied by the clock cycle time was used as the an-

gle calculation time. Here, the fraction part of the

ﬁxed-point numbers in the table was set to six bits,

and the target board and its clock frequency were the

same as the circuit size veriﬁcation. Thus, the clock

cycle time was set to 5 ns.

Next, we veriﬁed the approximation effect of the

proposed method on the accuracy of human recog-

nition tasks. To avoid the effect of the binarization

of the discriminator using the BNN, we used a sup-

port vector machine (SVM) (Cristianini and Shawe-

Talor, 2000), which is a ﬂoating-point number model,

as a discriminator. Here, we compared the accuracy

of three algorithms for GMM-MRCoHOG: the origi-

nal algorithm, the hardware-oriented algorithm of the

previous study (Nagamine et al., 2019), and the pro-

posed algorithm. We set the number of mixtures of

Gaussian distribution as 16 and 32. The datasets used

in this experiment were the Daimler Pedestrian Clas-

siﬁcation Benchmark Dataset (Gavrila and Enzweiler,

2008) and INRIA Person Dataset (Dalal and Triggs,

VISAPP 2022 - 17th International Conference on Computer Vision Theory and Applications

752

Figure 6: Human Recognition Circuit Integrating Hardware-Oriented GMM-MRCoHOG and BNN.

Table 1: Experimental Environment.

CPU Intel Core i7-8700K 3.70[GHz]

Memory 16GB

OS Windows 10

Circuit Synthesis Environment

Vivado HLS 2018.2

GUINNESS

FPGA Board ZedBoard, XC7Z020CLG484-1 (200[MHz])

FPGA Board ZCU102, XCZU9EG-2FFVB1156 (100[MHz])

Figure 7: Examples of Daimler Pedestrian Classiﬁcation

Benchmark Dataset.

Figure 8: Examples of INRIA Person Dataset.

2005), which consist of human and non-human im-

ages of size 32 × 64 pixels. The details of these

datasets are summarized in Table 2, and example im-

ages of these datasets are shown in Figs. 7 and 8.

We also veriﬁed the accuracy of a human recog-

nition system using the BNN as a discriminator and

compared it with that of a binarized version of the

VGG-16 network (Simonyan and Zisserman, 2015).

Table 2: Dataset.

Train

Dataset Images Resolution

Daimler human: 10,000

32 × 64 pixels

INRIA not human: 10,000

Test

Dataset Images Resolution

Daimler human: 1,126

32 × 64 pixels

INRIA not human: 4,840

4.2 Human Recognition Circuit

Integrating Hardware-oriented

GMM-MRCoHOG and BNN

The designed human recognition circuit was synthe-

sized using Vivado HLS 2018.2, to estimate the pro-

cessing speed and circuit size. The target device was a

Xilinx Zynq UltraScale+ MPSoC XCZU9EG FPGA

on a ZCU102 board with a clock frequency of 100

MHz. For comparison, we also implemented the bina-

rized VGG-16 in the XCZU9EG FPGA using GUIN-

NESS (Nakahara et al., 2019).

For the speed comparison between software and

hardware implementations of the human recognition

systems, the average of the software execution time

to process 5,955 images of size 32 × 64 pixels on an

Intel Core i7-8700K CPU was used as the image pro-

cessing time for the software implementation. For the

hardware implementation, clock cycles to process an

image of 32 × 64 pixels, estimated by C Synthesis of

Vivado HLS 2018.2, multiplied by the clock cycle

time 10 ns, was used as the image processing time.

For the binarized VGG-16, clock cycles to process an

Hardware-oriented Algorithm for Human Detection using GMM-MRCoHOG Features

753

image of 48 × 48 pixels, estimated by GUINNESS,

multiplied by the clock cycle time 10 ns, was used as

the image processing time.

We also estimated the circuit size of the human

recognition system using the Export RTL of Vivado

HLS 2018.2, and the circuit size of the binarized

VGG-16 using GUINNESS. Moreover, we estimated

the power consumption of the circuit using Vivado

2018.2.

5 RESULTS

5.1 Coarse Angle Calculation Method

by using Fixed-point Tangent Table

Figures 9 and 10 show the circuit resource utilization

of the tanθ table. As shown in Fig. 9, both the LUT

and ﬂip-ﬂop (FF) utilization increased almost linearly

while the bit width of the fraction part of the ﬁxed-

point numbers was zero to six bits. However, in the

case of the seven-bit model, the number of resources

was lower than that in the six-bit model. As shown in

Fig. 10, a digital signal processor (DSP) was required

only in the case of the seven-bit model whereas no

DSP was required in the range of zero to six bits.

Figure 9: Circuit Resource Utilization of LUTs and FFs.

Figure 10: Circuit Resources Utilization of DSPs.

Figure 11 shows the angle matching rate between

approximated angles by the proposed method and true

angles obtained by atan2 function. According to a

previous study (Nagamine et al., 2019), the matching

rate was 91Therefore, the matching rate of the pro-

posed method was higher than that of the previous

study when the bit width of the fraction part of the

ﬁxed-point numbers was four or more, and it was ap-

proximately 99 % when the bit width was six or more.

The maximum error of the angle in the ﬁgure repre-

sents the maximum absolute difference between the

angles approximated by the proposed method and the

true angles. For example, if some angle is classiﬁed

by atan2 function in the third direction while the angle

is classiﬁed by the proposed method as the fourth di-

rection, the error is 1. From the ﬁgure, the maximum

error of the angle was 1 in cases of more than two bits

for the fraction part of the ﬁxed-point numbers.

Table 3 shows the processing time of the angle

calculation. As shown in the table, the proposed

hardware-oriented algorithm on the CPU required ap-

proximately 14 times longer processing time than

that of the atan2 function. The proposed hardware-

oriented algorithm on the FPGA was approximately

twice as fast as the atan2 function, and approximately

28 times faster than the proposed algorithm on the

CPU.

Table 3: Processing Time of Angle Calculation.

Methods Time [ns]

atan2 (software) 59.6

Proposed (software) 837.7

Proposed (hardware) 30

Figures 12 and 13 show the accuracy of the human

recognition system with 16 and 32 Gaussian mixtures

with the SVM implemented by MATLAB. As shown

in these ﬁgures, the proposed method improved the

accuracy of the human recognition task from the pre-

vious study in both mixture cases.

Table 4 presents the human recognition accuracy

of the proposed method with the BNN where the num-

ber of mixtures was set as six, and Table 5 shows the

human recognition accuracy of the binarized VGG-

16. The proposed human recognition system with a

BNN having one neuron in the hidden layer was able

to classify humans with high accuracy and outperform

the binarized VGG-16.

Table 4: Human Recognition Accuracy by Hardware-

Oriented GMM-MRCoHOG with BNN.

Accuracy rate

train 99.4 [%]

test 97.1 [%]

VISAPP 2022 - 17th International Conference on Computer Vision Theory and Applications

754

Figure 11: Angle Matching Rate and Maximum Error between Angles by the Proposed Method and atan2 Function.

Figure 12: Human Recognition Accuracy in the case of 16

Mixtures.

Table 5: Human Recognition Accuracy of Binarized VGG-

16.

Accuracy rate

train 77.4 [%]

test 44.3 [%]

5.2 Human Recognition Circuit

Integrating Hardware-oriented

GMM-MRCoHOG and BNN

Table 6 presents the estimated processing time of hu-

man recognition. The proposed hardware was ap-

proximately 118 times faster than the software im-

plementation and approximately 123 times faster than

the hardware implementation of the binarized VGG-

16.

Figure 13: Human Recognition Accuracy in the case of 32

Mixtures.

Table 6: Processing Time of Human Recognition.

Methods Time[ms]

Proposed (software) 5.2

Proposed (hardware) 0.044

Binarized VGG-16 (hardware) 5.4

Table 7 presents the estimated circuit resource uti-

lization of the proposed human recognition circuit and

Table 8 shows the estimated circuit resource utiliza-

tion of the binarized VGG-16. As presented in Ta-

ble 7, the proposed circuit can be implemented in the

XCZU9EG FPGA, whereas the circuit could not be

implemented in the XC7Z020 FPGA owing to a lack

of resources. The dominant resource in the circuit was

the block random access memory (BRAM), which

was determined by the number of center coordinates

Hardware-oriented Algorithm for Human Detection using GMM-MRCoHOG Features

755

and width of the mixture Gaussian distribution, and

the synaptic weights of the BNN. Compared with the

binarized VGG-16, the proposed human recognition

circuit consumed fewer FFs and LUTRAMs, but more

BRAMs and LUTs.

Table 7: Circuit Resource Utilization of the Proposed Hu-

man Recognition Circuit.

Used Available Utilization [%]

BRAM 154 912 16.9

DSP48E 0 2,520 0

FF 11,529 548,160 2.1

LUT 27,331 274,080 10.0

LUTRAM 111 144,000 0.1

Table 8: Circuit Resource Utilization of the Binarized

VGG-16.

Used Available Utilization [%]

BRAM 148 912 16.2

DSP48E 0 2,520 0

FF 21,751 548,160 3.9

LUT 21,765 274,080 7.9

LUTRAM 1,934 144,000 1.3

Table 9 lists the estimated power consumption of

the circuit. As shown in the table, the power con-

sumption of the proposed circuit is 0.923 [W]. It

is noteworthy that this power was for only the pro-

grammable logic on the XCZU9EG chip, not for the

entire FPGA board, including the processing system

on the chip and dynamic RAMs on the board.

Table 9: Estimated Power Consumption of the Circuit.

Power [W]

Proposed circuit 0.923

Binarized VGG-16 0.949

6 DISCUSSION

6.1 Coarse Angle Calculation Method

by using Fixed-point Tangent Table

As shown in the experimental results (Figs. 9 and

10), the number of LUTs and FFs increased linearly

while the fraction part of ﬁxed-point numbers was in

range from zero to six bits. In the case of the seven-bit

model for the fraction part, the number of LUTs and

FFs decreased, and the number of DSPs increased be-

cause the high-level synthesis compiler estimated us-

ing the DSP was more efﬁcient than using LUTs and

FFs to represent multiplications.

Table 10 is a summary of the comparison of FFs

and LUTs utilization for the tan

−1

function between

the high-level synthesis of atan2 function, the method

of the previous study (Nagamine et al., 2019), and

the proposed method. As presented in the table, the

proposed method, even with six bits for the fraction

part, which was the most resource-intensive method

among the proposed method, required approximately

1/30 of the circuit resources for both FF and LUT of

the high-level synthesis of the atan2 function. More-

over, the number of LUTs in the proposed circuit was

signiﬁcantly smaller than that in the previous study.

Therefore, the proposed method succeeded in reduc-

ing the size of the circuit.

Table 10: Circuit Resource Utilization of the Original Al-

gorithm, Previous Study, and Proposed Method.

FF LUT

tan

−1

6,000 10,000

Previous study 76 3,087

Proposed (0 bit) 52 97

Proposed (1 bit) 75 112

Proposed (2 bit) 100 167

Proposed (3 bit) 119 197

Proposed (4 bit) 130 236

Proposed (5 bit) 137 266

Proposed (6 bit) 183 297

The accuracy of the proposed method for the

human recognition task was better than that of the

binarized VGG-16, as well as in a previous study

(Nagamine et al., 2019). According to a pre-

vious study, the accuracy for the same task was

92.4whereas, the accuracy of the proposed method

was 97.1Additionally, a discrepancy in the angle cal-

culation of the previous method was 9Therefore, the

proposed method extracted more precise features, re-

sulting in better performance in the human recogni-

tion task.

6.2 Human Recognition Circuit

Integrating Hardware-oriented

GMM-MRCoHOG and BNN

Although there was no signiﬁcant difference between

the proposed circuit and binarized VGG-16 in terms

of circuit size and power consumption, the proposed

circuit outperformed the binarized VGG-16 for the

human recognition task, and the processing time of

the proposed circuit was signiﬁcantly faster than that

of the binarized VGG-16 because the proposed cir-

cuit computed the algorithm in parallel using an effec-

tive pipeline architecture with line buffers. Therefore,

we concluded that the proposed circuit is more suit-

VISAPP 2022 - 17th International Conference on Computer Vision Theory and Applications

756

able for a human detection system than the binarized

VGG-16.

7 CONCLUSIONS

For robots and self-driving cars operating near hu-

mans, a high-accuracy, high-speed, and low-power

human detection function is required. In this study,

we designed a dedicated circuit of GMM-MRCoHOG

with high human recognition performance and imple-

mented it in an FPGA to realize a high-speed and low-

power human recognition system. Using the tanθ ta-

ble, the proposed hardware-oriented algorithm sim-

pliﬁes the calculation of luminance gradients, which

is a high-cost operation in the original algorithm. The

experimental results show that the proposed method

improves the accuracy and processing speed of the

human recognition task while reducing the circuit re-

sources.

In future work, we plan to implement a human de-

tection system on an FPGA by feeding multiple re-

gions of interest from an image to the proposed circuit

for human recognition. Because the processing speed

of the circuit is high, the realization of a real-time hu-

man detection system can be expected.

REFERENCES

Cristianini, N. and Shawe-Talor, J. (2000). An introduction

to support vector machines. In Cambridge University

Press.

Dalal, N. and Triggs, B. (2005). Histograms of oriented gra-

dients for human detection. In Proc. IEEE Computer

Vision and Pattern Recognition (CVPR), volume 1,

pages 886–893.

Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977).

Maximum likelihood from incomplete data via the em

algorithm. Journal of the Royal Statistical Society,

39:1–38.

Gavrila, D. M. and Enzweiler, M. (2008). Monocular pedes-

trian detection: Survey and experiments. In IEEE

Transactions on Pattern Analysis and Machine Intel-

ligence (TPAMI), volume 31, pages 2179–2195.

Higashi, S., Michishita, Y., Enokida, S., Shibata, M., and

Yamada, H. (2018). Pedestiran detection based on

gaussian mixture model multiresolution cohog. In

Proc. 4th World Congress on Electrical Engineering

and Computer Systems and Sciences.

Hubara, I., Courbariaux, M., Soudry, D., El-Yaniv, R., and

Bengio, Y. (2016). Binarized neural networks. In

Advances in Neural Information Processing Systems

(NIPS), volume 29, pages 4107–4115.

Iwata, S. and Enokida, S. (2014). Object detection based

on multiresolution cohog. In Proc. 10th International

Symposium on Visual Computing, pages 427–437.

Michishita, Y., Higashi, S., Shibata, M., Muramatsu, R., Ya-

mada, H., and Enokida, S. (2018). Autonomous state

space construction method based on mixed normal

distributions for pedestrian detection. In IEEJ Trans-

actions on Electronics, Information and Systems, vol-

ume 138, pages 1100–1107.

Nagamine, Y., Yoshihiro, K., Enokida, S., M. Shibata, H. Y.,

and Tamukoh, H. (2019). Human detection using

hardware oriented gmm-mrcohog. In 35th Fuzzy Sys-

tem Symposium, pages 715–719.

Nakahara, H., Yonekawa, H., Fujii, T., Shimoda, M., and

Sato, S. (2019). Guinness: A gui based binarized deep

neural network framework for software programmers.

In IEICE Transactions on Information and Systems,

volume E102.D, pages 1003–1011.

Simonyan, K. and Zisserman, A. (2015). Very deep con-

volutional networks for large-scale image recognition.

In Proc. International Conference on Learning Repre-

sentations (ICLR).

Hardware-oriented Algorithm for Human Detection using GMM-MRCoHOG Features

757