Here you can find the benchmark dataset from the paper:

The following video gives some indication of the dataset contents.

Six different objects are available under three conditions: noise-free, noisy, and occluded.

Due to the size of the dataset compressed versions of the sequences are provided (about 140MB for each object):

These sequences were compressed using high-quality (but lossy) h.264 video encoding. The raw dataset is also available (about 2GB for each object):

The 3D models corresponding to these objects can be found here: 3D models. The ground-truth pose trace and some example Matlab code for evaluating the tracking error are available here: code.

The following calibration info corresponds to the rectified sequences:

focal_length = 500.6795; % (in pixels)
baseline = 70.7722; % (in mm)
nodal_point_x = 352.1633; % column (in pixels)
nodal_point_y = 260.3113; % row (in pixels)

Pixels are square and focal lengths and nodal point are identical in both (rectified) images.