relationship between two imaging coordinate sys-
tems. The relative orientation, represented by a rota-
tion matrix, and position, represented by a unit vector,
of two successive panoramic images can be derived
from the essential matrix. Camera trajectory can be
recovered through point cloud reconstruction of the
scene and bundle adjustment based on the available
GPS information. The camera path can be refined
through the process of registering it to the aerial im-
age. After each panoramic image’s position with re-
spect to the aerial image’s coordinate system is estab-
lished, the building rooftop edges on the panoramic
images can be identified through a matching process
supported by the information of building footprint
outlines on the aerial image. The height of build-
ing can be calculated based on the detected building
rooftop edge and the location of the panoramic image.
Finally, the building texture can be extracted form the
panoramic image, and it must go through warping and
rectification processes before can be used for texture
mapping. The reminding sections report techniques
used in the framework followed by some 3D model-
ing results and conclusions.
3 CAMERA TRAJECTORY
RECOVERY
The first half of the framework deals with the camera
trajectory recovery task, which can be considered as
a preparation step so that the registration of two types
of input images can take place. Currently, registering
the recovered camera trajectory to the aerial image is
done manually by specifying two end point locations
of the path on the map. Consequently, an image’s co-
ordinates representing a position on the aerial image
associated with each source panoramic image could
be derived.
In order to recover the relative image capturing
positions and orientations of the source panoramas,
we first estimated the spatial relationship between
each adjacent pair of panoramic images, and then in-
tegrated those pairwise results. The spatial relation-
ship in terms of a rotation matrix and a translation
vector, referred to as the external camera parameters,
can be derived from the essential matrix describing
the epipolar constraint between the image correspon-
dences in two panoramas.
The image point correspondences can be estab-
lished by Scale-invariant Feature Transform (SIFT)
detection plus SIFT-based matching. A single thresh-
old D
SIFT
was used to determine if a match was ac-
ceptable in the SIFT-based matching algorithm. The
smaller the value, the more image correspondences
Figure 4: An intermediate result of the reconstructed scene
point cloud for determining the external camera parameters.
Blue dots represent the calculated scene points and red cir-
cles indicate the camera positions.
were identified, and the higher possibility that the re-
sult would include false matches. The eight-point al-
gorithm was employed to estimate the essential ma-
trix. A two-pass approach was proposed to obtain the
final essential matrix. First, an initial essential ma-
trix was derived according to a smaller set of image
corresponding points, which was the matching result
associated to a relatively large threshold value D
SIFT
.
Those sparse corresponding points were believed to
be more accurate but less descriptive. Next, a smaller
threshold value was assigned to obtain a larger set of
point matches. The initial essential matrix was then
used to serve as a constraint to filter out the incorrect
matches. In other words, the matching outliers were
filtered by epipolar constraint. The remaining point
matches were then used to compute the final essential
matrix.
The derived essential matrix was used to solve for
the external camera parameters R and T, which stand
for the rotation matrix and the translation vector, re-
spectively. Pairwise external camera parameters were
first determined and then integrated one by one to
obtain the global camera motion and thus the cam-
era’s moving trajectory. During the integration pro-
cess, the scene points based on the already processed
panoramic image were reconstructed with respect to
the 3D world coordinate system, which are used as
the references to estimate the next camera location.
An example illustrating an intermediate result of the
reconstructed scene point cloud is given in Fig. 4. The
major drawback of such method is that the camera pa-
rameter estimation error would propagate through the
integration process. One way to correct such drift is
by a path closing strategy, which however does not al-
ways work well. Moreover, identifying two or more
panoramic images captured at the same street inter-
section but at different locations and times is a very
3D MODELING OF STREET BUILDINGS FROM PANORAMIC VIDEO SEQUENCES AND GOOGLE MAP IMAGE
111