# 6DoF Visual Positioning

It is possible to position and orientate an object relative to a camera, given only three of its points (e.g. LEDs) in image-space. In fact, this is the minimal visual positioning system. It infers depth from the object’s size, rather than using a stereo camera system. It also infers the object’s 3-DOF orientation (with a single 2-way ambiguity). The pose (position + orientation) of an object is a 6-DOF system (see Expressing Rotation). The three points in image-space each provide 2 coordinate variable, giving 6 variables in total.

The transformation from the 3×2 coordinate variables to 6-DOF pose is a non-trivial, non-linear transformation. By making a “far-field” approximation (that the object’s extent is much smaller than its distance from the camera) we can derive closed-form formulas that accomplish this transformation extremely efficiently.

These formulas provide the position in spherical coordinates ( $\phi, \theta, r$) and the orientation as a quaternion ( $a + b \boldsymbol{i} + c \boldsymbol{j} + d \boldsymbol{k}$) The video below demonstrates this positioning system in action. It uses 4 LEDs rather than 3. Within this set of 4 LEDs, there are four different triples of LEDs that can be positioned using the technique above. This redundancy gives the system additional reliability, in case it temporarily looses track of one of the LEDs.

### 3 Responses to “6DoF Visual Positioning”

• Wang Yu says:

nice~

• Max DeVos says:

Could you help me understand how the coordinates are returned? I understand the quaternion, but I don’t understand how all three spherical coodinates are returned, and I don’t understand how “r” is found. Could you, or someone, please help me with this.

Anything is appreciated,
-Max DeVos

• admin says:

‘h’ is the known length reflecting the size of your target/jig i.e. the real-world-distance separating the points/LEDs nearest to each other. (See definitions of v1, v2, v3.)

Multiply this ‘h’ by ‘r/h’ (from third formula) to get r.

‘phi’ and ‘theta’ come directly from image-space position of point 2 (the blue point in the diagram), but scaled by a camera-determined constant reflecting the camera’s ‘zoom’. You can get this by comparing pixel-distances with real-world radians. You could also calibrate/refine this constant afterward by comparing the ‘r’ you get from these formulae with the actual range, and adjusting accordingly until they match.

Point 2 is essentially the object we are positioning, with points 1 and 3 being there to provide the info we need to calculate range (to point 2) and object orientation (about point 2).