Fast Detection of Jitter Artifacts in Human Motion Capture Models

Mateusz Pawłowicz

, Witold Alda

and Krzysztof Boryczko

AGH University of Krakow, Cracow, Poland

{mpawlowicz, alda, boryczko}@agh.edu.pl

Keywords:

Character Animation, Motion Capture, Jitter, Animation Datasets, BVH.

Abstract:

Motion capture is the standard when it comes to acquiring detailed motion data for animations. The method is

used for high-quality productions in many industries, such as ﬁlmmaking and game development. The quality

of the outcome and the time needed to achieve it are incomparable with the keyframe-based manual method.

However, the motion capture data sometimes gets corrupted, which results in animation artifacts that make it

unrealistic and unpleasant to watch. An example of such an artifact is a jitter, which can be deﬁned as the rapid

and chaotic movement of a joint. In this work, we focus on detecting the jitter in animation sequences created

using motion capture systems. To achieve that, here is proposed a multilevel analysis framework that consists

of two metrics: Movement Dynamics Clutter (MDC) and Movement Dynamics Clutter Spectrum Strength

(MDCSS). The former measures the dynamics of a joint, while the latter metric allows the classiﬁcation of a

sequence of frames as a jitter. The framework was evaluated on popular datasets to analyze the properties of

the metrics. The results of our experiments revealed that two of the popular animation datasets, LAFAN1 and

Human3.6M, contain instances of jitter, which was not known before inspection with our method.

1 INTRODUCTION

Motion capture (MoCap) is currently the most potent

method for realistic human animations in movies and

video games. It has many advantages compared to tra-

ditional frame-by-frame creation and procedural an-

imation formulas. With motion capture techniques,

animators can quickly obtain even very complex and

unique motion, thus drastically reducing overall costs

and production time. Motion capture also has its

drawbacks and disadvantages. It requires expensive

and complex equipment, including cameras, sensors,

and sophisticated software. Moreover, to get the most

out of this technology, it is also required to cooperate

with professionally trained motion capture actors.

The method itself brings about several technical

issues that may limit its effectiveness. The ﬁrst well-

known problem is the need for proper calibration of

the whole system, including the conﬁguration of cam-

eras and sensors and the correct illumination of the

screen. The second common issue, especially with

marker-based solutions, is the occlusion of the sen-

sors and/or too fast movement of the sensor, which

may cause ”losing” it by the software. The actor also

https://orcid.org/0009-0008-2109-505X

https://orcid.org/0000-0002-6769-0152

https://orcid.org/0000-0002-3392-3739

has physical limitations, but we omit them here.

Instead, we notice that the recorded motion cap-

ture animation is often noisy and contains spikes, jit-

ter, gaps, and other errors and artifacts. Such data

must be cleaned up in a post-processing procedure

for smooth and realistic movement. Many tools can

help clean up, with popular ones such as Blender,

Maya, and MotionBuilder among them. There are

also several denoising algorithms, starting from rel-

atively simple ones and ending with sophisticated

machine learning-based approaches (Holden, 2018).

Still, the processing of raw data is long and often

painstaking. Even though we are aware of using post-

processing algorithms, we can easily ﬁnd ﬁnal anima-

tions stored in databases that have errors and produce

unnatural motion. Checking the quality of recorded

animation simply by carefully observing the motion

is a natural process that takes much time.

This paper aims to address this problem by ﬁnding

measures that can help us detect jitter in animation

sequences through automatic analysis of the dataset

itself. Our contributions are as follows:

• We propose two metrics: Movement Dynam-

ics Clutter (MDC) metric to detect dynamic and

irregular movement of a joint and DFT-based

Movement Dynamics Clutter Spectrum Strength

(MDCSS) metric for detecting jitter in these

Pawłowicz, M., Alda, W. and Boryczko, K.

Fast Detection of Jitter Artifacts in Human Motion Capture Models.

DOI: 10.5220/0013144400003912

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2025) - Volume 1: GRAPP, HUCAPP

and IVAPP, pages 77-88

ISBN: 978-989-758-728-3; ISSN: 2184-4321

movements;

• We suggest an analysis method that allows the in-

spection of joint movement dynamics in anima-

tion datasets at various levels: dataset level, se-

quence level, and frame level;

• We evaluate our analysis method to verify

its properties and compare popular animation

datasets concerning jitter presence;

• Using our proposed metrics, we also detect in-

stances of jitter in two widely-used animation

datasets: LAFAN1 (Harvey et al., 2020) and Hu-

man3.6M (Ionescu et al., 2014).

2 RELATED WORK

2.1 Errors and Animation Data

Clean-up

The technical aspect of MoCap technology is not cru-

cial for our paper. We assume that we get the data

”as is” and do not make improvements in the data ac-

quisition process. However, we have to make even a

general analysis of the used technology, as it has an

important impact on types of errors and artifacts in

the animation sequences.

We may distinguish two main approaches to

gathering motion data: marker-based and marker-

less. Optical-based motion capture (OMC) (Callejas-

Cuervo et al., 2023) is the most popular and reli-

able method in the ﬁrst group. Currently, it has out-

performed others, such as inertial or magnetic mark-

ers. It uses either passive, reﬂective markers or active

LED markers. Despite their advantages, optical mark-

ers have certain ﬂaws, such as occlusion when mark-

ers hide from the camera or fast marker movements,

which cause gaps and noisy animation. The occa-

sional change of marker position (e.g., when slipped

or detached) adds extra distortion. It is also well

known that the optical markers system is excellent in

the coarse movement of the entire body. At the same

time, it loses control of tiny details (such as ﬁngers)

and gestures, which often move unnaturally. A com-

prehensive overview of the detection and classiﬁca-

tion of errors in optical MoCap systems can be found

in (Skurowski and Pawlyta, 2022).

In the second group, the markerless approach, we

have depth-sensing cameras that can capture motion

without physical markers, thus being convenient and

comfortable for the actors. However, this method

needs a clean-up process to improve the quality of

motion. It can be based on smoothing and denois-

ing algorithms, beneﬁcial for cheap home motion cap-

ture systems with, say, a single RGBD camera (Hoxey

and Stephenson, 2018), where smoothing is achieved

in two steps: by getting rid of positions that differ

by more than 5 percent from the average and sub-

sequently using the Kalman ﬁlter. Similar solutions

based on moving average, B-spline smoothing, and

Kalman ﬁlter are presented in (Ardestani and Yan,

2022).

2.2 Animation Dataset Analysis

Animation of data collected by motion capture sys-

tems has a spatiotemporal nature. This data consists

of poses that sample continuous movement performed

by an actor at different frames (timesteps). Frames

can be described in various ﬁle formats, one of which

is the BVH format (Meredith and Maddock, 2001).

It represents a hierarchy of joints, as well as anima-

tion details, such as framerate (FPS) and number of

frames. Then it follows with the global position of

a skeleton root and local rotations of all the joints in

each frame. This format is one of the most popular, as

it has public speciﬁcations and describes well human

movements.

Such animation created from motion capture data

has many applications in the modern world. It appears

in movies, computer games (Geng and Yu, 2003) and

in the entertainment industry (Bregler, 2007). Less

recognizable applications are analyses for automatic

recognition and classiﬁcation of the type of human

movement (Kadu and Kuo, 2014; Ijjina and Mohan,

2014; Patrona et al., 2018). This technology is also

used in industry (Menolotto et al., 2020), where it is

most often a component of real-time systems. Re-

gardless of the application, the quality of the ﬁnal ef-

fect largely depends on the quality of the data that

constitute information about the movement sequence.

Hence, a preliminary analysis is often carried out to

detect potential errors and imperfections.

Even with the best motion capture data, some

post-processing is usually being done. The role of

data denoising is of increasing importance because

the quality of animation is something that the human

eye can verify instantly. It is always a non-trivial task;

however, currently, the process itself becomes more

and more automatic. There are several elaborate algo-

rithms. In (Liu et al., 2014) the authors present a so-

phisticated, hence classical approximation algorithm.

Holden in (Holden, 2018) uses an innovative method

based on neural networks to map the positions and ro-

tations of an animated skeleton based on the raw posi-

tions of markers captured by a motion capture system.

Some well-known utility programs that pro-

cess animation data include Matlab MoCap Tool-

GRAPP 2025 - 20th International Conference on Computer Graphics Theory and Applications

box (Burger and Toiviainen, 2013) and RMo-

Cap (Hachaj and Ogiela, 2020). The ﬁrst is a set of

Matlab functions for visualization and statistical anal-

ysis (e.g., calculation of mean, standard deviation) of

various metrics in motion capture data (e.g., velocity,

acceleration). The toolbox is still being maintained

and developed. It also allows us to perform Princi-

pal Component Analysis (PCA) on the animation se-

quence to derive complexity-related movement fea-

tures. The second solution uses the R programming

language package with a similar purpose, i.e., to vi-

sualize, analyze, and perform statistical processing. It

allows motion correction to reduce foot skating and

motion averaging to remove random errors, provided

that a motion has been recorded multiple times. The

solution also includes utility for conversions between

hierarchical and direct kinematic models.

Other work focused on comparing data from mo-

tion capture for particular purposes. Four metrics

were proposed to distinguish the motion of the hand

of a subject who suffered cerebral palsy from the reg-

ular movement of the hand (Montes et al., 2014).

Those metrics are logarithmic dimensionless jerk,

mean arrest period ratio, peaks metric, and spectral

arc length, among which the latter achieved the best

results. Another piece of research compared various

motion capture systems by collecting treadmill walk-

ing animations (Manns et al., 2016). The authors per-

formed the analysis using PCA and calculated Shan-

non entropy to compare four different systems: three

marker-based and one markerless. This analysis was

correlated with the fact that the markerless solution

gives a lower quality of captured animation than other

solutions.

Recently, a new framework for analysis of lo-

comotive datasets has been proposed that could be

used, e.g., for the motion matching approach (Abiche-

quer Sangalli et al., 2022). It focuses on the cover-

age of linear and angular speeds of animated charac-

ters, frame usage frequency, planned locomotion path,

used and unused animations, transition cost, and num-

ber. The solution also incorporates visualization of

linear and angular speed coverage across the anima-

tion dataset. It allows for identifying types of transi-

tions in motion matching that may be difﬁcult to per-

form due to the lack of animation sequences with cor-

responding speeds.

2.3 Anomaly Detection in Time Series

Jitter detection in animation can also be viewed as

an anomaly detection problem applied to time se-

ries. Among the commonly used methods for this

purpose are PCA, Savitzky-Golay, and Kalman ﬁlters

and recurrent neural networks, such as those based on

LSTM and GRU layers. As qualitative artifact de-

tection in animation is a rarely addressed topic, we

would like to brieﬂy present approaches used in mul-

tivariate time series anomaly detection problems re-

lated to other ﬁelds. Many learning-based approaches

suffer from inaccuracies at an early stage due to ini-

tialization time. An outlier-resistant sampling can be

used in conjunction with domain-speciﬁc clustering

algorithms to mitigate such an issue in online ser-

vices systems (Ma et al., 2021). Recent literature also

uses a weighted hybrid algorithm that combines mul-

tiple methods (e.g., moving median, Kalman ﬁlter,

Savitzky-Golay ﬁlter) for anomaly detection in long-

term cloud resource usage planning (Nawrocki and

Sus, 2022). Moreover, deep ensemble models were

successfully applied in intrusion monitoring applica-

tions and fraud detection (Iqbal et al., 2024).

3 DATASET ANALYSIS METHOD

3.1 Deﬁnition of Movement Dynamics

Clutter (MDC) Metric

Animation sequences based on data collected by mo-

tion capture systems allow modeling movement in

great detail. However, this data can also contain some

artifacts that impact the realism of the actions cap-

tured by the actor. An example of such errors is jit-

ter, which can be described as a rapid and chaotic

movement of a joint. The magnitude of the noise in-

troduced by this phenomenon can vary from barely

noticeable to very disruptive. As a result, on many

occasions, rotations of joints change rapidly, and the

skeleton’s bones twist unnaturally. Although there are

some automatic ways to mitigate this problem (e.g.,

applying Savitzky-Golay smoothing), it is often the

case that animators need to perform a manual review

or statistical analysis to take care of the issue.

To address this inconvenience, we decided to de-

sign a metric that can be used to detect jitter and com-

pare animation sequences and datasets in terms of the

presence of jitter. In order to achieve that, this metric

also needs to consider various technical aspects re-

lated to this data. The spatiotemporal data is multi-

dimensional, which makes it hard to analyze quickly

without any aggregation. The structure of a skeleton

can also be different between datasets, which trans-

lates to different numbers of joints and their hier-

archy. Two animation sequences from two datasets

could vary in terms of recording framerate. Finally,

the scale of the skeleton can differ between datasets,

as most animation formats deﬁne poses using simply

Fast Detection of Jitter Artifacts in Human Motion Capture Models

Figure 1: A visualization of MDC metric calculation

method. Positions of joint j in the 3 consecutive frames

are indicated as p

j,t

, p

j,t+1

and p

j,t+2

. These positions are

used to derive velocities v

j,t

and v

j,t+1

to calculate length

of dv

j,t:t+1

and angle α

j,t

, which are core components of

the suggested metric.

abstract ﬂoats. All these problems must be considered

when designing a metric that allows us to compare

any two animation sequences.

Therefore, we suggest a Movement Dynamics

Clutter (MDC) metric that considers all the mentioned

technical aspects. The deﬁnition of calculation for a

single joint can be seen in Equation 1. We identiﬁed

that the core nature of the jitter is chaotic changes in

both the velocity of a joint (v

j,t

, v

j,t+1

) and the angle

of these velocities (α

j,t

) between consecutive frames t

and t +1 (illustrated in Figure 1). The angle is deﬁned

in radians and squared to smooth the metric in case of

minor changes and amplify signiﬁcant changes. This

information can be derived from most animation for-

mats, as all that is needed are the global positions of

all the joints. We also normalize the metric by mul-

tiplying by FPS (F) and dividing by the sum of the

lengths of all the bones of the skeleton (S). These two

operations allow us to standardize the calculated val-

ues concerning the time and space dimensions of the

datasets. Normalization of space dimensions could

also be achieved by calculating the height of the skele-

ton instead of the sum of the bone lengths. However,

this proved to be cumbersome without manual work,

as the rest of the poses of skeletons in MoCap datasets

are not always T-poses or A-poses, making it hard to

calculate the distance from feet to head.

MDC

j,t

||dv

j,t:t+1

|| (1)

The metric can also be generalized for the calcula-

tion at the frame level by simply aggregating the val-

ues from all joints using the maximum (Equation 2);

as for a frame, the crucial information is whether there

is jitter in any of the joints. Although this loses some

detail, it allows for easy, high-dimensional data anal-

ysis. Moreover, the metric can also be averaged over

consecutive frames W (a window) to further aggregate

the MDC metric (Equation 3) and, e.g., apply it to the

whole dataset.

MDC

= max

j∈J

MDC

j,t

(2)

MDC

|W |

∑

t=1

MDC

(3)

3.2 Movement Dynamics Clutter

Spectrum Strength (MDCSS)

Metric

While the proposed MDC metric can detect chaotic

movement successfully, it does not distinguish it from

a single rapid joint movement. An example of such a

rapid movement might be a dynamic stomp when a

character’s foot bounces off the ﬂoor. As we sam-

ple continuous movement, the change in the velocity

vectors between consecutive frames could be angled

to almost 180

◦

. According to Equation 1, this results

in a very high metric value for the joint and, conse-

quently, for the entire frame. Our analysis indicated

this problematic scenario when comparing the regular

animation sequence from the LAFAN1 dataset with

the jittery sequence from Contemporary Dance (Aris-

tidou et al., 2017). The dynamic stomp of the charac-

ter in the ﬁrst sequence achieves an indistinguishable

value of the MDC metric from the jitter that occurs on

the left hand of the second animated character, which

is demonstrated in Figure 2. The highlighted window

refers to these examples, and as can be seen, the MDC

metric has an even higher value for stomp than for jit-

ter.

To address this shortcoming, we decided to treat

our metric as a signal and look for periodicity in it.

We achieve this by calculating the discrete Fourier

transform (DFT) using the FFT algorithm on a sliding

time window W , which consists of multiple consecu-

tive frames. For most of our experiments, we used a

window that reﬂected a third of a second, as it gives

a satisfying sample size for all the common FPS con-

ﬁgurations in motion capture animation sequences.

The actual values of DFT represent the energy dis-

tribution between frequencies, so this mechanism is

expected to differentiate the problematic cases. In

regular movement, we do not expect a peak of en-

ergy on any particular frequency, as our MDC metric

usually has low values. On the other hand, when jit-

ter occurs, we could expect that some frequencies will

dominate in terms of energy value, since jitter causes

regular spikes in the MDC metric.

Therefore, as our second metric for jitter analysis,

we use a maximum of real DFT components, omitting

the base frequency component (since it is just a sum

of the values of the signal samples). We call that met-

ric Movement Dynamics Clutter Spectrum Strength

(MDCSS). Assuming that the sequence of values of

GRAPP 2025 - 20th International Conference on Computer Graphics Theory and Applications

Figure 2: Comparison of MDC (blue) and MDCSS (green) metrics on dynamic stomp in LAFAN1 example (aim-

ing1 subject1) and short jitter at the left hand in Contemporary Dance (Sophie Afraid-01). The animation of these examples

is visualized on the right using joint colors. The gray frames are preceding the event, while the colored frames are related to

the event. For LAFAN1 (30 FPS), visualization has a step of 1 frame. For Contemporary Dance, the preceding frames are 12

frames apart, while the event frames are two frames apart. The trajectories of the joints are also visualized using color lines

(green - preceding trajectory, red - following trajectory), with points indicating joint locations in subsequent frames.

our MDC metric in window W could be deﬁned as

MDC

(according to Equation 4), MDCSS metric can

be expressed as in Equation 5.

MDC

= {MDC

}

t=0,1,...,|W |−1

(4)

MDCSS

|W |−1

max

k=1

(

|W |−1

∑

t=0

MDC

−i2πkt

|W |

(5)

With the MDCSS metric deﬁned in this manner,

we empirically assigned metric thresholds so that the

characteristics of the movement in the window could

be categorized. A window with value FM

< 8

can be considered a window with regular movement,

without any signiﬁcant jitter artifacts or extraordinary

dynamics. Beyond that value, we observed some in-

stances of the artifact and many rapid movements per-

formed by the character. A window with FM

∈

[8, 20] is at the warning level. Such a fragment of an

animation is likely to contain some dynamic move or,

on some occasions, could contain some less chaotic

instances of jitter. It is advised to inspect such parts

of an animation sequence. Lastly, any window with

> 20 should be treated as one with erroneous

data with jitter. This threshold was determined by in-

specting the most dynamic joint moves in the tested

datasets, which were not jitter.

3.3 Dataset Analysis Methodology

By using both metrics in conjunction, we can ana-

lyze animation data at various levels of detail: dataset

level, animation sequence level, and frame level. As

the metrics are based solely on the global positions of

joints, they can be applied to most formats of anima-

tion data.

At the dataset level, the MDC and MDCSS met-

rics can be averaged with respect to the total number

of frames. This statistic can be used to quantify the

jitter and unpredictability of the dataset. Properly cal-

culated values can then be used to compare different

datasets, independently of the joint hierarchy, fram-

erate, or scale used for the skeleton representing the

animated character.

The metrics can also be used at the dataset level,

i.e., by averaging each animation sequence. This

method allows pinpointing the sequence that is an

outlier when it comes to quality. Such problematic

animation could be, e.g., re-recorded while avoiding

the issue that caused the jitter or ﬁxed in the post-

processing stage. Moreover, this approach can be

used to compare jitter in animation sequences from

different datasets using a single number.

In some cases, excluding entire animation se-

quences would result in having a small dataset. Here,

frame-level analysis of the MDCSS metric of anima-

tion sequences can be used. The metric calculated

Fast Detection of Jitter Artifacts in Human Motion Capture Models

on a sliding window makes distinguishing parts with

jitter from those without it possible. The jittery parts

(error level) could then be ﬁltered out from the dataset

instead of removing the whole sequence, if such a

solution ﬁts the purpose of use. On the other hand,

the parts at a warning level with dynamic movement

should be inspected, provided that they are detected

in an unexpected fragment of animation (e.g., a non-

dynamic one). This warning may indicate a slightly

noticeable jitter there. Furthermore, frame-level anal-

ysis can also be used to compare two animation se-

quences from two different datasets in greater detail

than using a single number (as in Figure 2).

Finally, the MDC metric allows us to perform the

jitter inspection at the joint level in a single frame. If a

given window has a high value of the MDCSS metric,

we can inspect the MDC metric values in that window

and ﬁnd the frames with values that are outliers. In

these frames, we can inspect the distribution of MDC

values across the skeleton’s joints. It allows for the se-

lection of the subset of joints that cause a high metric

value and are likely to be jittery. This level of analysis

helps when analyzing warning levels of the MDCSS

metric. It also allowed us to detect the subsampling

problem in the case of a stomp in one of the anima-

tion sequences in the LAFAN1 dataset (Figure 2).

4 EXPERIMENTS AND

DISCUSSION

4.1 Popular Evaluated Datasets

For the evaluation of the suggested metrics, we de-

cided to use several popular datasets. Their quantita-

tive details are presented in Table 1.

The LAFAN1 (Harvey et al., 2020) consists of

animation sequences captured by ﬁve actors in a

production-grade motion capture studio in collabora-

tion with Ubisoft. The subjects perform various ac-

tions, such as moving and aiming weapons, walk-

ing, running, ﬁghting, or navigating obstacles. The

authors have published the dataset in the BVH ﬁle

format. It has been used in various research publi-

cations, most commonly to test the performance of

neural networks used for motion in-betweening (Qin

et al., 2022; Ren et al., 2023; Oreshkin et al., 2024).

Human3.6M (Ionescu et al., 2014) is a dataset

dedicated to the estimation of human poses, but it also

contains animation data captured from motion capture

to match these poses. The shared animation part of

the dataset was captured at 50 FPS by seven actors

performing different actions (e.g., walking, phoning,

eating). The data is available as CDF (Common Data

Format) ﬁles in various parameterization formats. We

exported the animation in BVH format based on pose

data with angles and subsampled it to 25 FPS, as this

setup was also used to benchmark results for machine

learning models in research (Harvey et al., 2020).

Two more datasets were collected by Bandai

Namco Research (Kobayashi et al., 2023) that con-

tain short animation sequences performed in vari-

ous styles. The ﬁrst dataset consists of 17 activities

(e.g., ﬁghting, dancing, waving hands) recorded in 15

styles. The other has 10 action types (mostly loco-

motion and hand actions) performed in 7 styles but in

much larger quantities. Both datasets were published

in the BVH animation format.

Contemporary Dance (Aristidou et al., 2017) is

another dataset that contains dances performed by

nine actors in various moods (e.g., bored, afraid, an-

gry, or relaxed). The dataset is also a part of a much

larger AMASS dataset (Mahmood et al., 2019). It was

shared in various formats (BVH, FBX, C3D) to in-

crease accessibility, but the BVH ﬁle format contains

some very noticeable instances of jitter. We used this

animation format, as the animation quality is much

worse compared to other datasets.

4.2 Experiments Setup

All the metric evaluation experiments were performed

on a machine with AMD Ryzen 9 5950X, 64 GB

DDR4 RAM, and RTX 3080 TI. Due to large vol-

umes of data, we calculated metrics using PyTorch

with GPU acceleration. We used Python 3.10.14 for

experiments, along with PyTorch 2.4.0 and NumPy

1.26.4. We derived the positions of joints based on the

BVH format of animation using forward kinematics.

All animation and motion path visualizations were

created using Blender 4.2.2. We used a time window

s when calculating the MCDSS metric in the ex-

periments and the same thresholds as deﬁned in Sec-

tion 3.2.

4.3 Comparison of Popular Datasets

We benchmark our metrics by evaluating them on all

the datasets mentioned in Section 4.1. We perform

dataset-level analysis to calculate the average metrics

values for these datasets and count the number of er-

rors and warnings detected. We also count the number

of error and warning windows as a chain of consecu-

tive frames classiﬁed similarly. These windows could

be used to count the number of instances of jitter or

dynamic movement.

The experiment outcome was collected in Table 2

and Table 3. As expected, Contemporary Dance has

GRAPP 2025 - 20th International Conference on Computer Graphics Theory and Applications

Table 1: Comparison of datasets’ properties used for the experiments.

Dataset Joints FPS Animation sequences Total frames Total time [s]

LAFAN1 22 30 77 496,672 16,556

Human3.6M 25 25 210 263,743 10,550

Bandai Namco Dataset 1 22 30 175 36,665 1,222

Bandai Namco Dataset 2 22 30 2,902 384,931 12,831

Contemporary Dance 31 120 133 989,150 8,242

the highest metrics scores due to multiple jitter in-

stances. The average value of the MDCSS metric

strongly deviates from the other datasets and suggests

that every frame in that dataset is at a warning level.

The large numbers of error and warning frames and

windows further signify it. Bandai Namco Dataset 2

achieves the best results, as most animations there do

not contain much character movement and are not as

dynamic as some sequences in other datasets. Both

Bandai Namco datasets do not contain errors accord-

ing to our MDCSS metric.

Table 2: Average metrics values per frame for tested

datasets.

Dataset MDC MDCSS

Contemporary 0.408 8.109

LAFAN1 0.113 0.462

Bandai 1 0.106 0.431

Bandai 2 0.084 0.409

Human3.6M 0.099 0.396

Table 3: Number of error and warning frames (f) and win-

dows (w) in the compared datasets.

Dataset Errors (f/w) Warnings (f/w)

Contemporary 85372/7543 77892/17563

LAFAN1 16/6 206/140

Bandai 1 0/0 45/43

Bandai 2 0/0 469/458

Human3.6M 21/17 355/205

All the datasets, except for Contemporary Dance,

achieve relatively similar average values for our met-

rics. However, in both LAFAN1 and Human3.6M,

our metrics detected errors. We conducted further in-

vestigation using frame-level analysis and found that

both datasets contain short instances of jitter on these

error windows. Human3.6M contains very short er-

rors in many animation sequences (e.g., in the se-

quence ”Sitting Down” performed by subject 5) and

looks like short occlusions. Artifacts are most com-

monly found in the hands. In the case of LAFAN1,

all the errors are present in a single animation se-

quence, ”Obstacles 5,” performed by subject 3 (ob-

stalces5

subject3). The sequence contains dynamic

maneuvers, such as running, jumping, and tripping

over. In some instances, it looks like the motion cap-

ture system mismatched the position of all the joints,

resulting in a visible discontinuity and jumps in the

motion path of the character. On other occasions, it

again looks like the reason is occlusion. Two warn-

ings are also detected in this sequence: a barely no-

ticeable short foot jitter and a dynamic landing from

a jump.

We decided to manually remove the instances of

jitter at error levels from the LAFAN1 example by

applying interpolation between keyframes to compare

the metrics’ values with and without artifacts in the

animation. While the manual adjustment of the posi-

tions of character joints seemed to give a more real-

istic result, we would like to disclose that we are not

professional animators, and the actual realism of the

ﬁxed animation could be further improved. The main

objective of the adjustment was to remove the jitter

from the sequence, which was achieved. The output

of the frame-level analysis of this animation clip is

presented in Figure 3. We can see that the application

of changes causes our MDCSS metric to return to the

normal range of values. This reinforces the usabil-

ity of our framework in detecting instances of jitter in

animation sequences.

4.4 Detecting Artiﬁcial Jitter in

LAFAN1 Dataset

To evaluate our MDCSS metric comparatively, we de-

cided to benchmark it on the detection of artiﬁcial

jitter added to a subset of the LAFAN1 dataset. We

are unaware of any other research that focused on de-

tecting this type of artifact, so we chose to compare

against some commonly used methods for anomaly

detection in multidimensional time series. We se-

lected the PCA and Savitzky-Golay ﬁlter to serve as a

baseline for comparison.

The methods were benchmarked on animation se-

quences performed by subject 3, as real jitter is also

present. Artiﬁcial jitter was added to all the ani-

mation sequences except the one that contains ac-

tual instances of the artifact. This was achieved by

corrupting every 10-second window of the animation

with a 0.25-1s jittery fragment applied to a single

skeleton joint. The displacement of the joint loca-

Fast Detection of Jitter Artifacts in Human Motion Capture Models

Figure 3: Comparison of metrics before and after manually attempting to ﬁx animation sequence obstacles5 subject3 from

LAFAN1 dataset. The yellow and red zones on the plots correspond to warning and error event intervals. The visualization

on the right shows a jitter instance around frame 4700. Yellow and red frames are one frame apart. In the original sequence,

the left leg of the character is unnaturally twisted in the yellow frame and makes a sudden correction in the red frame. The

ﬁxed version does not have this artifact. Some preceding frames with jitter have been omitted to improve visibility in the top

example.

tion is added to all its coordinates according to the

normal distribution N (0, σ). We parameterized σ us-

ing skeleton length and tested the following values of

σ: 0.02S, 0.0175S, 0.015S, 0.0125S, 0.01S, 0.0075S

and 0.005S. This methodology contains a simpliﬁca-

tion, as displacement of the joint location does not

preserve its length. The sequence with real jitter was

not altered in any way to benchmark the methods in a

real-life scenario.

The detection was carried out on 10-frame clips.

A jittery window was classiﬁed as detected if a 10-

frame clip intersects it, and the method labeled it as

anomalous. A normal clip was deﬁned as one that

does not intersect any jitter window.

Our MDCSS metric has a naturally deﬁned error

level, so we used that to classify anomalous clips.

We decided on 22 principal components while per-

forming PCA and using the quadratic reconstruction

error as the detection metric. Dimensionality reduc-

tion is performed on the joint acceleration vectors in

each frame (reduction of 660-dimensional data), and

we extract the maximum reconstruction error over all

joints and clip frames. This offers an opportunity

to pinpoint the joint that caused the jitter, similar to

what our MDC metric allows. The clip is classiﬁed

as anomalous when the reconstruction error exceeds

1.5 times the maximum seen in the training dataset.

We used the rest of the LAFAN1 dataset as training

data to ﬁt the PCA model to regular data. Finally,

the Savitzky-Golay ﬁlter was conﬁgured to analyze

10 frames and smooth using second-degree polyno-

mials. We use the same jitter detection method as for

PCA, setting the error threshold to 5.5. All acceler-

ation vectors were normalized by skeleton scale and

framerate, similarly to Equation 1.

The results of the experiment are averaged over

10 repetitions. All the methods managed to detect

the real jitter instances with the described conﬁgura-

tions. We present a comparison of the F1 score from

the experiment in Table 4, as well as the recall and

precision in Table 5. Our metric remains quite com-

parable in terms of the F1 score and even outperforms

other methods on most values of σ, although only

slightly. Further inspection reveals that precision is

the strong side of our MDCSS metric while remaining

only slightly behind when it comes to recall. In terms

of recall, the Savitzky-Golay ﬁlter is the best for most

noise levels. We also performed frame-level analysis

to visualize the behavior of MDCSS metric on a sin-

gle example from that dataset. Figure 4 presents the

sample animation sequence for σ = 0.0125S, which

shows that spikes in our metric align with artiﬁcial

jitter windows.

Table 4: F1 score of compared methods on LAFAN1 dataset

for different values of σ in N (0, σ). Best results are high-

lighted in bold.

σ MDCSS PCA Savitzky-Golay

0.02S 0.832 0.778 0.778

0.0175S 0.852 0.824 0.823

0.015S 0.857 0.870 0.868

0.0125S 0.823 0.883 0.884

0.01S 0.683 0.723 0.714

0.0075S 0.332 0.200 0.189

0.005S 0.012 0.000 0.001

GRAPP 2025 - 20th International Conference on Computer Graphics Theory and Applications

Table 5: Precision and recall of compared methods on LAFAN1 dataset for different values of σ in N (0, σ). Best results are

highlighted in bold.

Recall Precision

σ MDCSS PCA Savitzky-Golay MDCSS PCA Savitzky-Golay

0.02S 0.955 0.996 0.997 0.737 0.639 0.637

0.0175S 0.927 0.990 0.991 0.789 0.705 0.704

0.015S 0.869 0.973 0.970 0.846 0.787 0.786

0.0125S 0.757 0.884 0.884 0.904 0.882 0.884

0.01S 0.536 0.586 0.574 0.942 0.945 0.950

0.0075S 0.200 0.112 0.105 0.978 0.974 0.969

0.005S 0.006 0.000 0.0004 0.900 0.000 0.100

Figure 4: Metrics values for the aiming2 subject3 exam-

ple from LAFAN1 dataset. Highlighted windows in the top

chart indicates where artiﬁcial jitter was added according to

N (0, 0.0125S) distribution.

4.5 Scale Invariance

We also decided to verify the invariance of the skele-

ton scale of the proposed metrics. To verify this prop-

erty, we conducted frame-level and dataset-level anal-

yses on the Contemporary Dance dataset. For the

frame-level analysis, we used an example animation

sequence called ”Sophie Afraid-01”. We artiﬁcially

scaled the size of the skeleton as well as the root posi-

tion from source BVH ﬁles using scales x0.1 and x2.

Figure 5 presents the results of the frame-level

analysis in the example for all test scales. The met-

ric plots are indistinguishable, having the same values

and warning and error windows. The dataset-level

analysis presented in Table 6 further conﬁrms this.

For all given scales, the value of our metrics is the

same when calculated across the whole dataset.

Table 6: Average metrics values per frame on the Contem-

porary Dance dataset for different skeleton scales.

Scale MDC MDCSS

x1 (baseline) 0.408 8.109

x0.1 0.408 8.109

x2 0.408 8.109

4.6 Analysis of Different MDCSS Time

Window Lengths

To further analyze the properties of our MDCSS

metric, we decided to compare how it changes

for different lengths of time windows used. We

achieved that by conducting frame-level analysis for

two examples from LAFAN1 - aiming1 subject1 and

ﬁght1 subject2 - and one example from Contempo-

rary Dance - Sophie Afraid-01. The ﬁrst example

from LAFAN1 represents regular, non-dynamic an-

imation, while the other example from that dataset

contains plenty of dynamic punches, kicks, and jump-

kicks. On the other hand, the example from Contem-

porary Dance is ﬁlled with jittery fragments of anima-

tion and is used to refer to jitter detection properties.

We analyze the following time windows: 0.1s, 0.2s,

0.25s, 0.33s, 0.5s, 0.67s, and 1s. Since we operate on

frames, we round down the number of frames in the

case of FPS indivisibility.

The MDCSS metric values for these examples

Fast Detection of Jitter Artifacts in Human Motion Capture Models

Figure 5: Comparison of metrics for Contemporary Dance example (sequence Sophie Afraid-01) for different scales of the

skeleton.

are plotted in Figure 6. Our analysis suggests that

increasing the time window also magniﬁes the val-

ues of the metric. The values that are magniﬁed the

most are the ones that correspond to jittery or dy-

namic fragments. This observation makes sense be-

cause a longer time window contains more samples,

and therefore, more energy to be distributed. In the

case of a jitter, more energy might accumulate under

the same frequency, magnifying the value of the MD-

CSS metric.

4.7 Limitations

Our current deﬁnition of the animation analysis

method has some limitations. The most important

limitation of our metrics is that a jitter that does not

change the core trajectory of the joint will probably

not be detected. Although this limits the detection ca-

pabilities of our metrics, it also reduces the visibility

of such a jitter.

Our metric also achieves the best values if all char-

acter’s joints start moving in the same direction. An

example of a scenario is a character in a T-pose with

the root joint moving in a straight line. Some ani-

mation formats also allow for the rotation of a bone

around its vertical axis. Such a scenario would also

not be detected by our metrics, as it operates solely

on the positions of joints, not the rotation of bones.

While these shortcomings of our metrics are concern-

ing, they are unlikely to occur in most motion capture

animation sequences.

Figure 6: Comparison of MDCSS values calculated on 3

chosen animation sequences. Each sequence is represented

as a column and each row represents a different length of

time window.

GRAPP 2025 - 20th International Conference on Computer Graphics Theory and Applications

5 DISCUSSION

As the results of the method comparison suggest, our

framework proves to be quite comparable with base-

lines while requiring no ﬁtting to the dataset, in con-

trast to PCA. Moreover, the PCA method requires a

constant number of input variables, which makes it

unﬁt for working on multiple skeletons with various

hierarchies. The MDCSS metric also proved its use-

fulness in real cases, as we managed to detect undis-

covered jitter in the LAFAN1 dataset. This dataset

is widely used in research to evaluate neural models.

The found jitter instances are located in the training

part of the dataset and negatively inﬂuence the train-

ing process of such models, thus affecting the output

result used in the evaluation. We think that our frame-

work could be used to perform preprocessing checks

on motion capture datasets and ﬁlter out in such appli-

cations. The proposed metrics consider problematic

technical aspects, which should make such integra-

tion relatively easy.

6 FUTURE WORK

In our subsequent work, we would like to address

some of the limitations of our work, especially the

problem of detecting jitter that does not change the

joint trajectory. We identiﬁed that this scenario is

problematic for our analysis framework. Address-

ing this issue could greatly increase the number of

detected jitter instances. One possible solution is to

monitor the rotations of the bones instead of the po-

sitions of the joints. While this approach is likely to

address this problem, it would restrict applications of

our framework, as such data is not always available

for animation sequences. Instead, we would like to

try to solve this problem by inspecting the characteris-

tics of the trajectory. We also aim to experiment with

automatic correction using smoothing algorithms and

neural models in sequences classiﬁed as errors by our

metrics.

In addition, we plan to evaluate the proposed

metrics against the performance of neural models to

ﬁnd a correlation between the value of the metric

and the performance of state-of-the-art neural models.

This work will focus mainly on the warning level of

our MDCSS metric, as we hypothesize that such se-

quences contain more irregular joint movements and

could be harder to predict by neural networks. This

could potentially help us to understand the shortcom-

ings and problematic animation sequences of current

research related to machine learning applications in

frame generation.

7 CONCLUSIONS

In this work, we proposed a novel framework for jit-

ter detection in animation datasets that consists of two

metrics: MDC and MDCSS. The framework can op-

erate at multiple levels of detail and allows datasets to

be compared with different skeleton scales, numbers

of joints, and FPS. We also evaluated this framework

on several popular datasets to prove its usefulness.

Our experiments found that two of the popular ani-

mation datasets, LAFAN1 and Human3.6M, contain

instances of jitter, which was not known before. This

further emphasizes the need for jitter detection frame-

works in professional motion capture environments,

such as the one proposed in this work. The compar-

ison with commonly used anomaly detection meth-

ods also proved that our framework is well-suited for

this task requiring no adjustments or ﬁtting to anima-

tion data. We hope that our metrics can contribute

in the future to cleaning motion capture datasets used

for machine learning purposes, as well as improving

the overall quality of animation by detecting jittery

sequences early in motion capture recordings.

ACKNOWLEDGEMENTS

The research presented in this article was partially

supported by the funds of the Polish Ministry of Sci-

ence and Higher Education, assigned to the AGH Uni-

versity of Krakow (Faculty of Computer Science).

REFERENCES

Abichequer Sangalli, V., Hoyet, L., Christie, M., and Pettr

J. (2022). A new framework for the evaluation of

locomotive motion datasets through motion match-

ing techniques. In Proceedings of the 15th ACM

SIGGRAPH Conference on Motion, Interaction and

Games, MIG ’22, New York, NY, USA. Association

for Computing Machinery.

Ardestani, M. M. and Yan, H. (2022). Noise reduction

in human motion-captured signals for computer an-

imation based on b-spline ﬁltering. Sensors (Basel,

Switzerland), 22(12).

Aristidou, A., Zeng, Q., Stavrakis, E., Yin, K., Cohen-Or,

D., Chrysanthou, Y., and Chen, B. (2017). Emotion

control of unstructured dance movements. In Proceed-

ings of the ACM SIGGRAPH / Eurographics Sympo-

sium on Computer Animation, SCA ’17, New York,

NY, USA. Association for Computing Machinery.

Bregler, C. (2007). Motion capture technology for enter-

tainment [in the spotlight]. IEEE Signal Processing

Magazine, 24(6):160–158.

Fast Detection of Jitter Artifacts in Human Motion Capture Models

Burger, B. and Toiviainen, P. (2013). MoCap Toolbox – A

Matlab toolbox for computational analysis of move-

ment data. In Bresin, R., editor, Proceedings of the

10th Sound and Music Computing Conference, pages

172–178, Stockholm, Sweden. KTH Royal Institute of

Technology.

Callejas-Cuervo, M., Espitia-Mora, L. A., and V

elez-

Guerrero, M. A. (2023). Review of optical and inertial

technologies for lower body motion capture. Journal

of Hunan University Natural Sciences, 50(6).

Geng, W. and Yu, G. (2003). Reuse of motion capture data

in animation: A review. In International Conference

on Computational Science and Its Applications, pages

620–629. Springer.

Hachaj, T. and Ogiela, M. (2020). Rmocap: an r language

package for processing and kinematic analyzing mo-

tion capture data. Multimedia Systems, 26.

Harvey, F. G., Yurick, M., Nowrouzezahrai, D., and Pal, C.

(2020). Robust motion in-betweening. ACM Trans.

Graph., 39(4).

Holden, D. (2018). Robust solving of optical motion cap-

ture data by denoising. ACM Transactions on Graph-

ics (TOG), 37(7):1–12.

Hoxey, T. and Stephenson, I. (2018). Smoothing noisy

skeleton data in real time. In EG 2018 - Posters. The

Eurographics Association.

Ijjina, E. P. and Mohan, C. K. (2014). Human action recog-

nition based on mocap information using convolution

neural networks. In 2014 13th International Confer-

ence on Machine Learning and Applications, pages

159–164.

Ionescu, C., Papava, D., Olaru, V., and Sminchisescu, C.

(2014). Human3.6m: Large scale datasets and pre-

dictive methods for 3d human sensing in natural envi-

ronments. IEEE Transactions on Pattern Analysis and

Machine Intelligence, 36(7):1325–1339.

Iqbal, A., Amin, R., Alsubaei, F. S., and Alzahrani, A.

(2024). Anomaly detection in multivariate time se-

ries data using deep ensemble models. PLOS ONE,

19(6):1–25.

Kadu, H. and Kuo, C.-C. J. (2014). Automatic human mo-

cap data classiﬁcation. IEEE Transactions on Multi-

media, 16(8):2191–2202.

Kobayashi, M., Liao, C.-C., Inoue, K., Yojima, S., and

Takahashi, M. (2023). Motion capture dataset for

practical use of ai-based motion editing and styliza-

tion.

Liu, X., Ming Cheung, Y., Peng, S.-J., Cui, Z., Zhong,

B., and Du, J.-X. (2014). Automatic motion capture

data denoising via ﬁltered subspace clustering and low

rank matrix approximation. Signal Process., 105:350–

362.

Ma, M., Zhang, S., Chen, J., Xu, J., Li, H., Lin, Y., Nie,

X., Zhou, B., Wang, Y., and Pei, D. (2021). Jump-

Starting multivariate time series anomaly detection for

online service systems. In 2021 USENIX Annual Tech-

nical Conference (USENIX ATC 21), pages 413–426.

USENIX Association.

Mahmood, N., Ghorbani, N., Troje, N. F., Pons-Moll, G.,

and Black, M. J. (2019). AMASS: Archive of motion

capture as surface shapes. In International Conference

on Computer Vision, pages 5442–5451.

Manns, M., Otto, M., and Mauer, M. (2016). Measuring

motion capture data quality for data driven human mo-

tion synthesis. Procedia CIRP, 41:945–950. Research

and Innovation in Manufacturing: Key Enabling Tech-

nologies for the Factories of the Future - Proceedings

of the 48th CIRP Conference on Manufacturing Sys-

tems.

Menolotto, M., Komaris, D.-S., Tedesco, S., O’Flynn, B.,

and Walsh, M. (2020). Motion capture technology in

industrial applications: A systematic review. Sensors,

20(19):5687.

Meredith, M. and Maddock, S. C. (2001). Motion capture

ﬁle formats explained. Department of Computer Sci-

ence, University of Shefﬁeld.

Montes, V. R., Quijano, Y., Chong Quero, J. E., Ayala,

D. V., and P

erez Moreno, J. C. (2014). Comparison

of 4 different smoothness metrics for the quantitative

assessment of movement’s quality in the upper limb

of subjects with cerebral palsy. In 2014 Pan American

Health Care Exchanges (PAHCE), pages 1–6.

Nawrocki, P. and Sus, W. (2022). Anomaly detection in the

context of long-term cloud resource usage planning.

Knowl. Inf. Syst., 64(10):2689–2711.

Oreshkin, B. N., Valkanas, A., Harvey, F. G., M

enard, L.-

S., Bocquelet, F., and Coates, M. J. (2024). Mo-

tion in-betweening via deep δ-interpolator. IEEE

Transactions on Visualization and Computer Graph-

ics, 30(8):5693–5704.

Patrona, F., Chatzitoﬁs, A., Zarpalas, D., and Daras, P.

(2018). Motion analysis: Action detection, recogni-

tion and evaluation based on motion capture data. Pat-

tern Recognition, 76:612–622.

Qin, J., Zheng, Y., and Zhou, K. (2022). Motion in-

betweening via two-stage transformers. ACM Trans.

Graph., 41(6).

Ren, T., Yu, J., Guo, S., Ma, Y., Ouyang, Y., Zeng, Z.,

Zhang, Y., and Qin, Y. (2023). Diverse motion in-

betweening from sparse keyframes with dual posture

stitching. IEEE Transactions on Visualization & Com-

puter Graphics, (01):1–12.

Skurowski, P. and Pawlyta, M. (2022). Detection and classi-

ﬁcation of artifact distortions in optical motion capture

sequences. Sensors, 22(11).

GRAPP 2025 - 20th International Conference on Computer Graphics Theory and Applications