• A single-path expansion (SE) in the remaining
tree-levels n
T
− L. The SE stage starts from
each retained path and proceeds down in the
tree calculating the solution of the remaining
succesive-interference-cancellation (SIC) prob-
lem (Berenguer and Wang, 2003) as:
ˆs
i
= Q
(
˜y
i
−
∑
n
T
j=i+1
R
i j
ˆs
j
R
ii
)
, i = n
T
, . . . , 1.
(14)
The function Q (·) assigns the closest constella-
tion value. Note that, the efficient PED calcula-
tion using matrix T can also be used to accelerate
the computation of the sumatory
∑
n
T
j=i+1
R
i j
ˆs
j
in
the SIC problem. The symbols are detected fol-
lowing a specific ordering also proposed by the
authors in (Barbero and Thompson, 2008). As it
was shown in (Jalden et al., 2009), the maximum
detection diversity can be achieved with the FSD
if the following value of L is chose:
L ≥
√
n
T
−1 (15)
• A Soft-Output extension (SOE) to provide soft in-
formation by obtaining an improved list of candi-
dates (Barbero et al., 2008). Figure 2(b) shows
the search-tree of the SFSD for the case with
n
T
= 4 and QPSK symbols. The method starts
from the list of candidates that the hard-output
FSD in (composed by the FE and SE stages) ob-
tains (in Fig. 2(a)) and adds new candidates to
provide more information about the counter bits.
Note that, since the first level of the HFSD tree is
already totally expanded, all the necessary values
to compute the LLRs of the symbol bits in the first
levels are available. Therefore, the list extension
must start from the second level of such path. To
begin the list extension, the best N
iter
paths are se-
lected from the initial hard-ouput FSD list (in this
example, N
iter
= 2). This is based on the heuris-
tics that the lowest-distance paths may be candi-
dates differing from the best paths in only some
bits. The symbols belonging to these N
iter
paths
are picked up from the root up to a certain level l,
and, at level l −1, additional log
2
M branches are
explored, each of them having one of the bits of
the initial path symbol negated. Afterwards, these
new partial paths are completed following the SIC
path, as done in the hard-ouput FSD scheme. The
same operation is repeated until the lowest level
of the tree is reached.
Figure 2: Decoding trees of the SFSD algorithm for a 4 ×4
MIMO system with QPSK symbols, N
iter
= 2 and L = 1: (a)
Hard-Output stage and (b) Soft-Output Extension.
4.2.1 SFSD CUDA Implementation
Algorithm 3 shows the steps needed to perform the
SFSD detection. First, data for input and output
variables are allocated and copied into the GPU-GM
memory. In this case, matrices gray, neg and constel-
lation symbols Ω are copied into constant memory.
The Ω variable is needed to perform the quantization
Q (·) in the SIC problem. Matrices D and s contains
the information of the P = M
L
+ N
iter
·m ·(n
T
−L)
paths computed: M
L
branches of the Hard-Output
stage and the N
iter
·m ·(n
T
−L) new branches of the
Soft-Output extension (SOE) stage.
In Kernel 4, each thread calculates one of the M
L
branches of the HFSD stage. After the hard-output
part is finished, the CPU is in charge to calculate the
N
iter
minimum distances and store it in the matrix min
in ascendent order. This matrix is copied in the GPU
global memory. Then, the N
iter
·m ·(n
T
−L) new can-
didates to be obtained per time index n are equally dis-
tributed among all the threads of the grid using Ker-
nel 5. As mentioned, in the SOE stage, adittional m
branches are explored in the remaining (n
T
−L) lev-
els. Each of them have one of the bits of the initial
path symbol negated. In order to accelerate this ex-
pansion, a matrix (neg) is builded before the detec-
tion. This matrix contains, for each constellation sym-
bol Ω
i
, a list of m constellations symbols resulting of
the kth bit negation. For example using QPSK con-
PECCS2015-5thInternationalConferenceonPervasiveandEmbeddedComputingandCommunicationSystems
340