Estimating Uncertainty in Time-difference and Doppler Estimates

Gabrielle Flood, Anders Heyden and Kalle

Astr

Centre for Mathematical Sciences, Lund University, Lund, Sweden

Keywords:

Time-difference of Arrival, Sub-sample Methods, Doppler Effect, Uncertainty Measure.

Abstract:

Sound and radio can be used to estimate the distance between a transmitter and a sender by correlating the

emitted and received signal. Alternatively by correlating two received signals it is possible to estimate dis-

tance difference. Such methods can be divided into methods that are robust to noise and reverberation, but

give limited precision and sub-sample reﬁnements that are sensitive to noise, but give higher precision when

initialized close to the real translation. In this paper we develop stochastic models that can explain the limits

in the precision of such sub-sample time-difference estimates. Using such models we provide new methods

for precise estimates of time-differences as well as Doppler effects. The method is veriﬁed on both synthetic

and real data.

1 INTRODUCTION

Audio and radio sensors are becoming ubiquitous

among our everyday tools, e.g. smartphones, laptops,

and tablet pc’s. They also form the backbone of

internet-of-things, e.g. small low-power units that can

run for years on batteries or use energy harvesting to

run for extended periods. Given the location of each

one of these units, it is possible to use them as an ad-

hoc acoustic sensor network. Such sensor networks

can be used for many interesting applications. One

use-case is localization, cf. (Brandstein et al., 1997;

Cirillo et al., 2008; Cobos et al., 2011; Do et al.,

2007). Another application is to improve the sound

quality using so called beam-forming, (Anguera et al.,

2007). A third application is so called speaker di-

arization, i.e. to determine who spoke when, (Anguera

et al., 2012). If the sensor positions are unknown or

only known to a certain accuracy, the results of such

applications are inferior as is shown in (Plinge et al.,

2016). However, even without any prior information

it is possible to estimate both sender and receiver po-

sitions up to a choice of coordinate system, (Polle-

feys and Nister, 2008; Crocco et al., 2012; Kuang

et al., 2013; Kuang and

Astr

om, 2013a; Zhayida et al.,

2014), thus providing accurate sensor positions. A

key component for all of these methods is the pro-

cess of obtaining features such as time-difference es-

timates from pairwise channels. In this paper we will

primarily focus on sound. However the same princi-

ples are applicable for radio (Batstone et al., 2016).

Sound source

Microphones

Figure 1: Precise time-difference of arrival estimation can

be used for many purposes, e.g. beam-forming, diarization,

anchor free node calibration and positioning. The ﬁgure

illustrates its use for anchor free node calibration, sound

source movement and room reconstruction.

All of these applications depend on accurate ways

of comparing the sound (or radio) signals so as to

extract information. The most common approach is

to estimate time-differences, which are then used for

subsequent processing. For the applications it is of

interest to obtain the most precise estimate possible.

In (Zhayida and

Astr

om, 2016) sub-sample methods

were used to improve on the time-difference estimates

and it is empirically shown to give better estimates of

the receiver-sender conﬁgurations. However, no anal-

ysis of the sub-sample time-difference uncertainties

was provided.

Flood, G., Heyden, A. and Åström, K.

Estimating Uncertainty in Time-difference and Doppler Estimates.

DOI: 10.5220/0006722002450253

In Proceedings of the 7th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2018), pages 245-253

ISBN: 978-989-758-276-9

245

Mirrored microphones

Figure 2: The ﬁgure exampliﬁes one usage of precise time-

difference of arrival estimation. The image illustrates the

estimated microphone positions (green dots), estimated mir-

rored microphone positions (red dots) and sound source

motion (green curve) from Fig 1. The estimated reﬂective

planes are also shown in the ﬁgure. These three planes cor-

respond to the ﬂoor, the ceiling and the wall.

The main contributions of this paper are:

• A framework for estimating time-difference es-

timates and for estimating the precision of such

time-difference estimates.

• An extension of the method to also estimate

minute Doppler effects.

• In practice there is also signiﬁcant difference in

amplitude of the signals. Accurate results require

that also these amplitude changes are estimated

and accounted for.

• A synthetic evaluation that demonstrates both the

validity of the models, but also provides knowl-

edge on the failure modes of the method.

• Evaluation on real data that demonstrate that

there is useful information not only in the time-

difference estimates but also in the amplitude

changes and in the minute Doppler effects, even

for speeds as small as 0.1 m/s.

2 MODELING PARADIGM

2.1 Measurement and Error Model

In this paper, we study discretely sampled signals,

such as audio signals. We assume that signals have

been sampled at a ﬁxed and known sampling rate. A

reasonable measurement model is that the measured

signal y is the result of ideal sampling and added

noise, i.e.

y(k) = Y (k) +e(k) . (1)

Here Y : R 7→ R is the original, continuous signal and

e is a discrete stationary stochastic process.

Let B denote the set of continuous functions

Y : R → R that are also square integrable and such

that the Fourier transform is zero outside the inter-

val [−π,π]. Let ` denote the set of discrete func-

tions y : Z → R that are also square integrable. For

such functions we introduce the discretization opera-

tor D : B → `, such that

y(i) = D(Y )(i) = Y (i).

We also introduce the interpolation operator I

` → B, such that

Y (x) = I

(y)(x) =

∞

∑

i=−∞

g(x −i)y(i).

The ideal interpolation operator I : ` → B is in-

terpolation using the normalized sinc function, with

g(x) = sinc(x). We have that ideal interpolation re-

stores the sampled function, (Shannon, 1949), i.e.

I(D(Y )) = Y .

Other interpolation methods can often be written

in a similar way. E.g. by substituting sinc by

(x) =

√

2πa

/2a

we get Gaussian interpolation.

2.2 Scale-space Smoothing and Ideal

Interpolation

The measured and interpolated signal can be

smoothed to decrease the impact of measurement

noise. This also makes it easier to capture patterns

on a coarser scale, (Lindeberg, 1994). We have used

a Gaussian kernel G

, with standard deviation a

, for

the smoothing. Later, a

will also be referred to as the

smoothing parameter.

Given a sampled signal y, the ideally interpolated

and smoothed signal can be expressed

Y (x) = (G

∗I

sinc

(y))(x) = I

∗sinc

(y)(x).

For a sufﬁciently large a

we have G

∗sinc ≈

. Thus ideal interplation followed by Gaussian

smoothing can be approximated to interpolation us-

ing the Gaussian (

Astr

om and Heyden, 1999), s.t.

Y (x) = I

∗sinc

(y)(x) ≈ (I

(y))(x). (2)

What sufﬁciently large means will be discussed later,

in Section 4.1.

Furthermore we will use that discrete w.s.s. Gaus-

sian noise interpolates to continuous w.s.s. Gaussian

noise and vice versa, as is shown in (

Astr

om and Hey-

den, 1999).

ICPRAM 2018 - 7th International Conference on Pattern Recognition Applications and Methods

246

3 TIME-DIFFERENCE AND

DOPPLER ESTIMATION

Assume that we have two signals, W (t) and

W (t). The

signals are measured and interpolated as described in

the previous section. Also assume that the two signals

are similar, but not identical. This can e.g. arise when

a single audio signal is picked up by two different re-

ceivers. Then, one of the signals can be described by

the other and a few parameters. We describe the rela-

tion as follows

W (t) =

W (αt + h). (3)

Here, h is a translation or the time-difference of ar-

rival. If the distances from the sound source to the

two microphones are the same, h = 0. The second pa-

rameter, α, is a Doppler factor. This is interesting if

either the sound source or the microphones are mov-

ing. For a stationary setup α = 1.

When the two signals are picked up by the micro-

phones they are disturbed by Gaussian w.s.s. noise.

Thus, the received signals are better described by

V (t) = W (t) + E(t),

V (t) =

W (t) +

E(t) , (4)

where E(t) and

E(t) denotes the two independent

noise signals after interpolation.

Now, assume that the signals V and

V are given.

Also, denote z

z =







h α



. Then, the

parameters for which (3) is true can be estimated by

the z

z that minimizes the integral

F(z

z) =

(V (t) −

V (z

t + z

))

dt . (5)

3.1 Estimating Standard Deviation of

the Parameters

If z





is the true parameter and

z is the

parameter that has been estimated by (5), the estima-

tion error can be expressed as

X =

z −z

Assume, without loss of generality, that z



0 1



. The standard deviation of

z will be the same

as the standard deviation of X and the mean of those

two will only differ by z

. Thus, we can study X to

get statistical information about

Linearizing F(z

z) around the true displacement



0 1



we get

F(X) =

aX +bX + f ,

with

a = ∇

F(z

), b = ∇F(z

), f = F(z

Using (4) and (3), this gives

f =F(



0 1



) =

(V (t) −

V (1 ·t −0))

(W (t) + E(t) −(

W (t) +

E(t)))

(E −

dt =

+ 2E

E +

dt.

Straightforward calculations give

b = −2



ϕdt



with

ϕ = E

+ E

−

Furthermore,

∇

F =



φdt

tφdt

φdt



using

φ(z

z) = (

(αt + h))

−

V (t)

(αt + h) +

V (αt + h)

(αt + h).

If we let

φ =φ(z

) = (

(t) +

(t))

−(W(t) + E(t))(

(t) +

(t))

+ (

W (t) +

E(t))(

(t) +

(t))

)

+ 2

+ (

)

−E

we get

a = ∇

F(z

) =



φdt



We have that F(X ) = 1/2 ·X

aX + bX + f . To

minimize this function, one should ﬁnd the X for

which the derivative of F(X) is zero.

∇F(X) = aX + b = 0 ⇔ X = g(a, b) = −a

−1

In the calculations we will assume that a is invertible.

Now we want to ﬁnd the mean and covariance of

X. For this, Gauss’ approximation formula is used.

If we denote the expected value of a and b with µ

E[A] and µ

= E[b] respectively the expected value of

X can be approximated to

E[X] =E[g(a,b)] ≈ E[g(µ

,µ

) + (a −µ

(µ

,µ

)

+ (b −µ

(µ

,µ

)]

=g(µ

,µ

) + (E[a] −µ

(µ

,µ

)

+ (E[b] −µ

(µ

,µ

)

=g(µ

,µ

) = −µ

−1

= −E[a]

−1

E[b].

(6)

Estimating Uncertainty in Time-difference and Doppler Estimates

247

In the same manner the covariance of X is

C[X] =C[g(a,b)] ≈g

(µ

,µ

)

C[a]g

(µ

,µ

)

+ g

(µ

,µ

)

C[b]g

(µ

,µ

)

+ 2g

(µ

,µ

)

C[a,b]g

(µ

,µ

(7)

where C[a,b] denotes the cross-covariance between a

and b. For further computations g

(a,b) = −(a

−1

)

(a,b) = −a

−1

, E[a], E[b], C[b] and C[a,b] are

needed.

By computing the expected value of

ϕ] =E[E

+ E

−

]

=E[E]

+ E[E]E[

] −E[

−E[

E]E[

]

we get

E[b] =E



−2



ϕ dt

ϕdt



= −2



ϕ]dt

tE[

ϕ]dt



= −2



0dt

t ·0 dt







In the second step of the computation of E[

ϕ] we have

used that for a weakly stationary process the process

and its derivative at a certain time are uncorrelated,

and thus E[

] = E[

E]E[

], (Lindgren et al., 2013).

Hence,

E[X] = −E[a]

−1

E[b] = −E[a]

−1









Since E[b] = 0

0, we get that g

(µ

,µ

) = 0

0. Thus

the ﬁrst and the last term in (7) cancel, leaving

C[X] =g

(µ

,µ

)

C[b]g

(µ

,µ

)

=(−E[a]

−1

)

C[b](−E[a]

−1

)

=(E[a]

−1

)

C[b]E[a]

−1

(8)

To ﬁnd the expected value of a we need the ex-

pected value of

φ. This is

φ] =(

)

+ 2

] + E[(

)

] −

E[E]

−E[E]E[

] +

E] + E[

]

)

+ E[(

)

] + E[

] = (

)

In the last equality we have used that E[

] =

−E[(

)

], (Lindgren et al., 2013). Thus, the two last

terms cancel. The expected value of a is therefore

E[a] =2



φ]dt

E[t

φ]dt

E[t

φ]dt

E[t

φ]dt





(

)

(

)



Now, since the expected value of b is zero, the

covariance of b is

C[b] = (−2)





with

= E



ϕ(t

)dt

ϕ(t

)dt



= E



ϕ(t

)dt

ϕ(t

)dt



= E



ϕ(t

)dt

ϕ(t

)dt



= E



ϕ(t

)dt

ϕ(t

)dt



Note that by changing the order of the terms in C

is clear that C

= C

. Furthermore we get



ϕ(t

)dt

ϕ(t

)dt



h

(E −

E)(

)dt





(E −

E)(

)dt

i

(E(t

) −

E(t

))(

) +

))

·(E(t

) −

E(t

))(

) +

))dt

Denoting E[(E(t

) −

E(t

))(E(t

) −

E(t

))] =

E−

−t

) and assuming that E[

)

)] is

small yields



ϕ(t

)dt

ϕ(t

)dt



E[(E(t

) −

E(t

))(E(t

) −

E(t

))]

·(

)

)E[

)] + E[

)]

)

+ E[

)

)])dt

≈

E−

−t

)

)dt

)(

∗r

E−

)(t

)dt

The time t is not stochastic and the other elements in

C[b] can be computed similarly. Finally

(t)(

∗r

E−

)(t)dt

= C

(t)(

∗r

E−

)(t)dt

(t)((t

) ∗r

E−

)(t)dt

and through (8) an expression for the variance and

thus also the standard deviation of X is found.

ICPRAM 2018 - 7th International Conference on Pattern Recognition Applications and Methods

248

-5

-4

-3

-2

-1

Figure 3: The simulated signal that was used for the experi-

mental validation. Later, noise of different levels was added

to achieve a more realistic signal.

3.2 Expanding the Model

The model (3) can easily be changed or expanded to

contain more (or fewer) parameters. One example is

the addition of an amplitude parameter γ, such that

W (t) = γ

W (αt + h). (9)

The integral (5) would then be changed accord-

ingly and one would instead optimize over z

z =







h α γ



In practice, the parameter computations will not

be harder for more parameters. However the analy-

sis carried out in the previous section does get more

complex.

4 EXPERIMENTAL VALIDATION

For validation we use both synthetic data and real

data. When we use synthetic data, the purpose is both

to demonstrate the validity of the model and the ap-

proximations, but also to explore at what signal-to-

noise ratio the approximations no longer hold. For

the real data experiments, the purpose is to show that

there is useful information in the estimated parame-

ters. For time-difference this is well-known, but for

the Doppler effects it is less known.

4.1 Synthetic Data - Validation of

Method

Initially we tested the model on simulated data. This

was done to investigate when the approximations in

the model are valid. Examples of such are the lin-

earization from Gauss’ approximation formula, e.g.

(3.1) and (7), and that ideal interpolation followed by

convolution with a Gaussian can be approximated to

Gaussian interpolation (2).

To do this we compared the theoretical standard

deviations of the parameters calculated according to

0.2 0.4 0.6 0.8

0.5

1.5

2.5

Theoretical std

Empirical std

0.2 0.4 0.6 0.8

0.5

1.5

2.5

3.5

×10

-3

Theoretical std

Empirical std

Figure 4: The plots show the standard deviation of the pa-

rameters in z

z for different values of the smoothing param-

eter a

. The red stars (∗) represent the theoretical values

and the blue crosses (x) the empirical values

. The

left plot shows the results for the translation z

= h and the

right plot for the Doppler factor z

= α. It is clear that the

approximation is valid approximately when a

> 0.55.

Section 3.1 with empirically computed standard de-

viations. While these agree our approximations are

assumed to be valid.

An original continuous signal W(x) was simu-

lated, see Fig 3. The second signal was created ac-

cording to (3) s.t.

W = W(1/α ·(x −h)). Thereafter

both signals were ideally sampled and then Gaussian

white discrete noise with standard deviation σ

was

added to the discrete signals. During the analysis

both signals were interpolated using a Gaussian ker-

nel with standard deviation a

, according to Section

2.2. At this point the signals could be described by

V (t) and

V (t) as before.

To study the effect of a

and σ

the signals V and

V were re-simulated 1000 times using the same orig-

inal signals W and

W , but different noise. The the-

oretical standard deviation of the parameter vector z





, was then computed according to the

presented theory, while the empirical standard devi-

ation,





, was computed from the 1000

achieved parameter estimations.

First the effect of changing a

was studied. The

noise level was set to σ

= 0.03, the translation was

h = 3.63 and the Doppler factor was set to α = 1.02 –

though the actual numbers are not of importance. The

standard deviation of z

z was then computed according

to above. This was done for several different a

∈

[0.3,0.8].

The results are shown in Fig 4. It can be seen

that the theoretical and empirical values σ

and

disagree for both parameters when a

is lower than

≈ 0.55. Thus the approximation (2) of ideal inter-

polation should only be used for a smoothing param-

eter a

> 0.55.

Secondly we investigated the impact of the noise.

Estimating Uncertainty in Time-difference and Doppler Estimates

249

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6

0.5

1.5

2.5

Theoretical std

Empirical std

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6

×10

-3

Theoretical std

Empirical std

Figure 5: The standard deviation of the translation (on top)

and Doppler factor (bottom) for different levels of noise in

the signal. The red stars (∗) mark the theoretical values σ

and the blue crosses (x) the empirical

. For the translation

the values agree for signals with a noise level up to σ

≈

0.8. For the Doppler factor the theoretical values follow the

empirical values for σ

< 1.1.

The smoothing parameter was now ﬁxed at a

= 2 and

the translation and the Doppler factor were again set

to h = 3.63 and α = 1.02. Then σ

and

were com-

puted as before. This time it was done for several

different σ

∈ [0,1.6].

The results are shown in Fig 5. The uppermost

plot contains the results for the translation parame-

ter h and the plot below the corresponding results for

the Doppler parameter α. Each plot shows the stan-

dard deviation of the parameter for different levels of

noise. The theoretical and empirical standard devi-

ation of the translation agree for values lower than

≈ 0.8, after which they start to differ. Concern-

ing the Doppler factor the theoretical and empirical

values agree well until σ

≈ 1.1, after which the esti-

mation is poor.

Thus, by studying the plots we can conclude that

the system can handle noise with a standard deviation

up to σ

≈ 0.8. The amplitude for the original signal

varies between 1 and 3.5. Using the standard devi-

ation of the original signal, σ

, the signal-to-noise

ratio is

SNR =

≈ 0.21.

4.2 Real Data - Validation of Method

For the real data experiments we used 8 T-Bone MM-

1 microphones, connected to an audio interface (M-

Audio Fast Track Ultra 8R), which was connected

to a computer. The sound was recorded at f = 96

kHz and we made the experiments in an anaechoic

chamber. The eight microphones were placed approx-

imately 0.3-1.5 meters away from each other and so

that they spanned 3D. We generated sounds by play-

ing a song on a mobile phone connected to a simple

loudspeaker, while moving around in the room.

Using the sound generated from all eight micro-

phones we estimated the sound source path, i.e. a 3D

trajectory s(t) and the eight 3D positions of the mi-

crophones r

,...,r

. This was done using the tech-

nique described in (Zhayida et al., 2014) and re-

ﬁned in (Zhayida et al., 2016). The method builds

on RANSAC algorithms based on minimal solvers

(Kuang and

Astr

om, 2013b) to ﬁnd initial estimates

of the sound trajectory s(t) and microphone positions

,...,r

. These estimates are then reﬁned using non-

linear optimization of a robust error norm, but also

including a smooth motion prior.

For the validation of the method presented in this

article we used data from two of the microphones.

The played song lasted for approximately 29 s and the

loudspeaker moved during the whole time. Further-

more, the motion was not constant – both the speed

and direction were changed.

Since our method assume a constant parameter z

in a window the recording was divided into patches

for which the parameters were approximately con-

stant. We divided the signal into 2834 patches of 1000

samples (i.e. approximately 0.01 s) each. Each of

these patches was then investigated separately. For

each patch i a constant loudspeaker position was

given from ground truth. Thus, we had one sender

position s

(i)

, its derivative

∂s

(i)

∂t

(i) and two receiver po-

sitions r

and r

to compare our results with.

4.2.1 Estimating the Parameters

As mentioned in Section 3.2 it is in practice not harder

to estimate three model parameters. Therefore, to get

a more precise solution, we have used (9) as model for

the received signals. In this case V

(i)

(t) will be signal

patch i from the ﬁrst microphone and

(i)

(t) the cor-

responding signal patch from the second microphone.

Our method is developed to estimate small trans-

lations, s.t. h ∈ [−10,10] samples and the delays in

the experiments were larger than that. Therefore we

pre-estimated the translation using GCC-PHAT. For

a description of the method, see (Knapp and Carter,

1976). The pre-estimation gave an integer delay

(i)

whereas our method did a subsample reﬁnement, es-

timated the Doppler parameter and the amplitude fac-

ICPRAM 2018 - 7th International Conference on Pattern Recognition Applications and Methods

250

-4

-2

×10

-3

-4

-2

×10

-3

Figure 6: One example of the received signal patches at a

certain time – the ﬁrst signal in red and the second in green.

The top image shows the signals before any modiﬁcation.

The bottom image shows the same patches after modiﬁca-

tions using the optimal parameters.

tor. This was done by minimizing the intergral

(i)

(t) −γ

(i)

(α

(i)

t +

(i)

+ h

(i)

))

dt.

Note that

(i)

should be viewed as a constant in the

equation above, while we were optimizing over h

(i)

and γ

(i)

This optimization was performed for all different

patches. The result from one of these can be seen in

Fig 6.

4.2.2 Comparing to Ground Truth

Using r

, r

and s

(i)

we calculated the distances d

(i)

and d

(i)

from the loudspeaker to the microphones,

(i)

= |r

−s

(i)

|, d

(i)

= |r

−s

(i)

The difference between these two distances, ∆d

(i)

−d

(i)

, is connected to the time difference of ar-

rival, and thus our computed translation h

(i)

. While

(i)

in measured in samples, ∆d

(i)

is a measure in me-

ters. Thus, we multiplied h

(i)

with a scaling factor

c/ f , where c = 340 m/s is the speed of sound and

f = 96 kHz is the recording frequency. By that we

had an estimation of ∆d

(i)

∆d

(i)

·h

(i)

In a similar manner the time in samples can be ex-

pressed in seconds using f . Thereafter we could com-

pare our estimation to ground truth. In Fig 7 ground

truth ∆d

(i)

is plotted together with our estimations for

all patches, i.e. over time. It is clear that these agree

well.

0 5 10 15 20 25 30

time [s]

-0.8

-0.6

-0.4

-0.2

0.2

0.4

0.6

-d

[m]

Our results

Ground truth

Figure 7: The ﬁgure shows the difference between the dis-

tances from receiver 1 to the sender (d

) and receiver 2 to

the sender (d

) over time. Each dot represents the value for

one signal patch. The ground truth is plotted in green and

our estimations in blue. It is hard to distinguish any green

dots since the estimations agree well with ground truth.

Concerning the Doppler parameter this is a mea-

sure of the change of distance differences, i.e.

∂∆d

∂t

∂d

∂t

−

∂d

∂t

Here d

and d

denotes the distances over time, i.e.

for all patches. If we look at one of the derivatives we

have that d

(t) = |r

−s(t)| and

∂d

∂t

−s

−s|

∂s

∂t

where · denotes the scalar product between the two

time dependent vectors. The derivative of d

is found

similarly. Let n

(i)

and n

(i)

be unit vectors in the direc-

tion from s

(i)

to r

and r

respectively, i.e.

(i)

−s

(i)

−s

(i)

, n

(i)

−s

(i)

−s

(i)

Then, for a certain time step, the derivatives will be

∂d

(i)

∂t

= n

(i)

∂s

(i)

∂t

∂d

(i)

∂t

= n

(i)

∂s

(i)

∂t

and thus

∂∆d

(i)

∂t

= n

(i)

∂s

(i)

∂t

−n

(i)

∂s

(i)

∂t

These ground truth Doppler values show how

much ∆d changes each second. However, our esti-

mated Doppler factor α is a unitless constant. The

relation between the two values is

∂∆d

∂t

= (α −1) ·c,

with c still denoting the speed of sound. In Fig 8 we

have plotted the ground truth value in green together

Estimating Uncertainty in Time-difference and Doppler Estimates

251

0 5 10 15 20 25 30

time [s]

-3

-2

-1

∂ ∆d/

∂ t [m/s]

Our results

Ground truth

Figure 8: The derivative of the distance differences ∆d plot-

ted over time. The blue dots are our estimations and the

solid green line is computed from ground truth. We see

that even though the estimations are noisy they agree with

ground truth.

with our estimations in blue. Even though the estima-

tions are noisy the pattern and the similarities are well

distinguishable.

There is also a relation between the amplitude fac-

tor for the two received signals and the distances d

(i)

and d

(i)

. Our estimated amplitude γ

(i)

should be re-

lated to the distance quotient d

(i)

. Furthermore,

the sound from the loudspeaker spreads as on the sur-

face of a sphere. Thus, the distance quotient is pro-

portional to the square root of the amplitude γ

(i)

= C ·

(i)

We estimated the proportionality constant using data

and got C = 1.3.

The distance quotient is plotted over time in Fig 9

– our estimations as blue dots and ground truth as a

green line. Again we see that they clearly follow the

same pattern.

The results from above show that our estimated

parameters do contain relevant information. Though,

the estimations in Fig 7, 8 and 9 are all quite noisy.

This can be reduced by computing a moving aver-

age of the estimations. Fig 10 is similar to Fig 8 but

instead of plotting the estimated derivative for each

patch we have plotted the 20-patch moving average

of the distance difference derivative. This means we

have averaged over approximately 0.2 s. A compar-

ison between the two ﬁgures shows that the moving

average substantially reduces the noise in the esti-

mates. The same method can be applied to the es-

timations of the translation h and the amplitude γ to

reduce noise.

0 5 10 15 20 25 30

time [s]

0.6

0.7

0.8

0.9

1.1

1.2

1.3

1.4

1.5

Our results

Ground truth

Figure 9: The distance quotient d

plotted over time.

The green solid line represents the ground truth and each

blue dot is the estimation for a certain patch. While the

estimations are somewhat noisy there is no doubt that the

pattern is the same.

0 5 10 15 20 25 30

time [s]

-3

-2

-1

∂ ∆d/∂ t [m/s]

Our results

Ground truth

Figure 10: This plot shows essentially the same thing as

Fig 8, i.e. ∂∆d/∂t, but with a 20-patches moving average

over the estimations. The averaging substantially reduces

the noise.

Still after averaging, the estimations are noisy and

poor in the beginning of the signal. This can be ex-

plained by the song that was played. The song starts

with occasional drum beats with silence in between.

Since the sound is not persistent the information is not

sufﬁcient to make good estimates. When the song is

more continuous, after 5-6 s, this shows as the esti-

mates get better.

5 CONCLUSIONS

In this paper we study the estimation of time-

differences, amplitude changes and minute Doppler

effects from two signals. We also study how to esti-

mate the uncertainty in these estimated parameters.

ICPRAM 2018 - 7th International Conference on Pattern Recognition Applications and Methods

252

The results are useful both for simultaneous determi-

nation of sender and receiver positions, but also for

localization, beam-forming and diarization. In the pa-

per we use previous results on stochastic analysis of

interpolation and smoothing in order to give explicit

formulas for the covariance matrix of the estimated

parameters. We show that the approximations used

are valid as long as the smoothing is at least 0.55 sam-

ple points and as long as the signal-to-noise ratio is

smaller than 0.21. Furthermore we show in experi-

mental studies on real data that these estimates pro-

vide useful information for subsequent analysis.

ACKNOWLEDGEMENTS

This work is supported by the strategic research

projects ELLIIT and eSSENCE, Swedish Foundation

for Strategic Research project ”Semantic Mapping

and Visual Navigation for Smart Robots” (grant no.

RIT15-0038) and Wallenberg Autonomous Systems

and Software Program (WASP).

REFERENCES

Anguera, X., Bozonnet, S., Evans, N., Fredouille, C., Fried-

land, G., and Vinyals, O. (2012). Speaker diarization:

A review of recent research. IEEE Transactions on

Audio, Speech, and Language Processing, 20(2):356–

370.

Anguera, X., Wooters, C., and Hernando, J. (2007). Acous-

tic beamforming for speaker diarization of meetings.

IEEE Transactions on Audio, Speech, and Language

Processing, 15(7):2011–2022.

Astr

om, K. and Heyden, A. (1999). Stochastic analysis of

scale-space smoothing. Advances in Applied Proba-

bility, 30(1).

Batstone, K., Oskarsson, M., and

Astr

om, K. (2016). Ro-

bust time-of-arrival self calibration and indoor local-

ization using wi-ﬁ round-trip time measurements. In

proc. of International Conference on Communication.

Brandstein, M., Adcock, J., and Silverman, H. (1997). A

closed-form location estimator for use with room en-

vironment microphone arrays. Speech and Audio Pro-

cessing, IEEE Transactions on, 5(1):45 –50.

Cirillo, A., Parisi, R., and Uncini, A. (2008). Sound map-

ping in reverberant rooms by a robust direct method.

In Acoustics, Speech and Signal Processing, IEEE In-

ternational Conference on, pages 285 –288.

Cobos, M., Marti, A., and Lopez, J. (2011). A modi-

ﬁed SRP-PHAT functional for robust real-time sound

source localization with scalable spatial sampling.

Signal Processing Letters, IEEE, 18(1):71 –74.

Crocco, M., Del Bue, A., Bustreo, M., and Murino, V.

(2012). A closed form solution to the microphone po-

sition self-calibration problem. In ICASSP.

Do, H., Silverman, H., and Yu, Y. (2007). A real-time SRP-

PHAT source location implementation using stochas-

tic region contraction(SRC) on a large-aperture micro-

phone array. In ICASSP 2007, volume 1, pages 121–

124.

Knapp, C. and Carter, G. (1976). The generalized correla-

tion method for estimation of time delay. IEEE Trans-

actions on Acoustics, Speech and Signal Processing,

24(4):320–327.

Kuang, Y. and

Astr

om, K. (2013a). Stratiﬁed sensor net-

work self-calibration from tdoa measurements. In EU-

SIPCO.

Kuang, Y. and

Astr

om, K. (2013b). Stratiﬁed sensor net-

work self-calibration from tdoa measurements. In 21st

European Signal Processing Conference 2013.

Kuang, Y., Burgess, S., Torstensson, A., and

Astr

om, K.

(2013). A complete characterization and solution to

the microphone position self-calibration problem. In

ICASSP.

Lindeberg, T. (1994). Scale-space theory: A basic tool for

analyzing structures at different scales. Journal of ap-

plied statistics, 21(1-2):225–270.

Lindgren, G., Rootz

en, H., and Sandsten, M. (2013). Sta-

tionary Stochastic Processes for Scientists and Engi-

neers. CRC Press.

Plinge, A., Jacob, F., Haeb-Umbach, R., and Fink, G. A.

(2016). Acoustic microphone geometry calibration:

An overview and experimental evaluation of state-of-

the-art algorithms. IEEE Signal Processing Magazine,

33(4):14–29.

Pollefeys, M. and Nister, D. (2008). Direct computa-

tion of sound and microphone locations from time-

difference-of-arrival data. In Proc. of ICASSP.

Shannon, C. E. (1949). Communication in the presence of

noise. Proceedings of the IRE, 37(1):10–21.

Zhayida, S., Andersson, F., Kuang, Y., and

Astr

om, K.

(2014). An automatic system for microphone self-

localization using ambient sound. In 22st European

Signal Processing Conference.

Zhayida, S. and

Astr

om, K. (2016). Time difference estima-

tion with sub-sample interpolation. Journal of Signal

Processing, 20(6):275–282.

Zhayida, S., Segerblom Rex, S., Kuang, Y., Andersson, F.,

and

Astr

om, K. (2016). An Automatic System for

Acoustic Microphone Geometry Calibration based on

Minimal Solvers. ArXiv e-prints.

Estimating Uncertainty in Time-difference and Doppler Estimates

253