Pose Estimation to Detect People in Scenes | Fritz AI

Pose Estimation

Pose Estimation identifies and tracks a person's body position. By using the Vision API, app developers can build AI-powered coaches for sports and fitness, immersive AR experiences, and more.

Getting Started

import Fritz

let poseModel = FritzVisionPoseModel()

let image = FritzVisionImage(buffer: sampleBuffer)

poseModel.predict(image) { result, error in
  guard error == nil, let poseResult = result else { return }

  // Overlays pose on input image
  let imageWithPose = poseResult.drawPose()

Pose Estimation

The Swift code sample here illustrates how simple it can be to use pose estimation in your app. Use the links below to access additional documentation, code samples, and tutorials that will help you get started.

Support for Unity

We support 2D human pose estimation in Unity. Gaming and AR app developers can now create immersive experiences by integrating Unity's powerful framework with Fritz AI. For more information, visit our docs.


Detect 17 Body Parts

Coordinates for 17 keypoints and body parts are provided for each skeleton detected.

Our mobile-friendly model was trained on COCO, a large-scale pose dataset. Predicts body parts such as:

Single and Multi-pose

Track one person, or several. Both single and multi-pose estimation are possible with this feature.

Model Variants

Fast: Optimized for speed, best for processing video streams in real-time or on older devices.

Accurate: Optimized for higher accuracy where prediction quality is more important than speed.

Small: Optimized for size, keep your application bundle size low and conserve bandwidth.

Runs On-Device

All predictions / model inferences are made completely on-device.

No internet connection is required to interpret images or video.

No internet dependency means super-fast performance.

Cross-Platform SDKs

Supported mobile platforms:

  • Android Pose Estimation
  • iOS Pose Estimation
  • Unity Pose Estimation
Live Video Performance

Runs on live video with a fast frame rate.

Exact FPS performance varies depending on device, but it is possible to run this feature on live video on modern mobile devices.

Technical Specifications


Uses a MobileNet backbone

Model Size

~2 MB


1,600 M


353x257-pixel image


Position of each person and body part detected

Number of people detected

The confidence associated with each detection


Core ML, TensorFlow, TensorFlow Mobile, TensorFlow Lite, Keras


25 FPS on iPhone X

7 FPS on Pixel 2