Table 4: Classification Benchmark I.
Best hyperparameters for the PhysioNet-a dataset
ITTS: n-blocks=3, bottleneck-channels=2, n-filters=12, kernel-
sizes=[4, 8, 16], activation=hardswish, out-activation=sigmoid, n-
embeddings=15, linear-hidden=[256], use-residual=True, min-max-
scaling=False; GRUTS: rnn-hidden=55, rnn-depth=5, activation=gelu,
linear-hidden=[65, 33], n-embeddings=69, min-max-scaling=False
Table 5: Classification Benchmark II.
Best hyperparameters for the PhysioNet-b dataset
ITTS: n-blocks=1, bottleneck-channels=12, n-filters=16, kernel-
sizes=[4, 8, 16], activation=tanh, out-activation=linear, n-
embeddings=20, linear-hidden=[128], use-residual=True, min-
max-scaling=False; GRUTS: rnn-hidden=235, rnn-depth=7,
activation=tanh, linear-hidden=[424], n-embeddings=176, min-
Table 6: Regression Benchmark.
Best hyperparameters for the VECA dataset
Phased-LSTM: n-units=256, use-peepholes=True, leak=0.001,
period-init-max=1000.0; GRU-D: n-units=120, dropout=0.0,
recurrent-dropout=0.01; IPNet: n-units=60, dropout=0.0, recurrent-
dropout=0.1, imputation-stepsize=1.0, reconst-fraction=0.01; SeFT:
n-phi-layers=1, phi-width=165, phi-dropout=0.0, n-psi-layers=3,
psi-width=28, psi-latent-width=121, dot-prod-dim=90, n-heads=7,
attn-dropout=0.1, latent-width=40, n-rho-layers=4, rho-width=24, rho-
dropout=0.0, n-positional-dims=4, max-timescale=100.0; mTAND:
query-steps=196, rec-hidden=94, embed-time=88, num-heads=4,
freq=9, learn-emb=True, regressor-layer-size=134; ITTS: n-
blocks=6, bottleneck-channels=2, n-filters=13, kernel-sizes=[4,
32, 128], activation=relu, out-activation=sigmoid, n-embeddings=1,
linear-hidden=[256], use-residual=True, min-max-scaling=True;
GRUTS: rnn-hidden=64, rnn-depth=5, activation=hardswish, linear-
hidden=[128, 64, 32, 16], n-embeddings=1, min-max-scaling=True
