Sleep tracking devices are everywhere. Oura Ring claims to measure sleep stages and HRV. Apple Watch tracks sleep duration and consistency. Whoop measures sleep quality and recovery. They're all using heart rate, heart rate variability, and sometimes temperature sensors to estimate what's happening in your brain while you sleep.

The problem: the only way to truly measure sleep stages is with electroencephalography (EEG), which requires electrodes on your scalp. Everything else is an estimate based on proxy signals. How good are these estimates?

The validation problem

Several papers have tested wearables against laboratory sleep studies that used actual EEG. The results are consistent: wearables are pretty good at detecting whether you're asleep or awake, but much worse at distinguishing between sleep stages.

A 2018 study in JAMA comparing Fitbit to polysomnography found Fitbit overestimated sleep duration by an average of 45 minutes per night and was poor at detecting wake time. More recent studies of other wearables show similar patterns—high accuracy for total sleep duration (within 10–15%), moderate accuracy for REM vs. non-REM, poor accuracy for deep sleep specifically.

Oura Ring (2021 and earlier models)

Oura uses infrared sensors to measure heart rate and HRV, plus thermography. In head-to-head studies against PSG (polysomnography—the clinical gold standard), Oura shows:

  • Sleep/wake detection: ~85% accurate
  • Sleep duration: Usually within 10–20 minutes of actual
  • REM vs. deep sleep: Moderate accuracy, often overestimating REM
  • Deep sleep specifically: Frequently overstated

The ring's advantage is the non-intrusive form factor, which means more consistent wearing and fewer movement artifacts. Its main limitation is that it can't directly measure brain activity, so it's inferring sleep stages from heart patterns.

Whoop (2019 generation and later)

Whoop uses a chest strap with continuous HR and HRV sampling. In smaller validation studies, Whoop shows:

  • Sleep/wake accuracy: ~88–90% in controlled settings
  • Sleep duration: Usually within 5–15 minutes
  • Sleep stage accuracy: Better than Oura in some measures, but still subject to the fundamental EEG limitation

Whoop's advantage is higher sampling frequency (100+ Hz vs. Oura's intermittent sampling), which can catch transient HRV changes. The disadvantage is the chest strap attachment, which can slip or chafe.

Apple Watch (all generations through Series 5)

Apple Watch didn't include native sleep tracking until Series 6 (2020), so data on earlier models is limited. The Series 5 and earlier use standard accelerometers and HR sensors. Studies using Series 5 and earlier typically show:

  • Sleep/wake detection: ~80–85%
  • Sleep duration: Often underestimated by 20–60 minutes because the accelerometer misses quiet wakefulness
  • Sleep stages: Not provided by Apple until Series 6

The watch is convenient—most people already own one—but less accurate than dedicated sleep trackers. Movement detection is decent, but the inability to measure HRV (until Series 5 updates) limits sleep stage inference.

What these trackers actually measure well

  • Consistency: All of them are reasonably good at tracking whether you got more or less sleep night-to-night, which is useful for spotting trends.
  • Total duration: Within 15–20 minutes most nights.
  • Gross sleep/wake: Which hours you were asleep vs. awake, at a coarse level.

What they measure poorly

  • Deep sleep duration: Routinely overestimated across all devices. A claim of "2 hours of deep sleep" is often inflated.
  • REM sleep: Better than deep sleep, but still subject to error. Wearables often see HRV spikes and misattribute them to REM when they're actually awake moments or movement.
  • Wake time within sleep: All devices struggle to distinguish brief arousals from sustained wakefulness.

The clinical significance

For someone diagnosing sleep disorders, a wearable's data is almost useless—clinical EEG is required. For someone trying to optimize personal sleep health, a wearable's trends are moderately useful as long as you don't over-interpret the stage data.

If your Oura Ring reports 1.5 hours of deep sleep and your Apple Watch reports 1 hour, the difference is probably noise, not signal.

What to actually use them for

  1. Track total sleep duration night-to-night. This is reasonably accurate and genuinely useful.
  2. Spot patterns. Do you sleep more on nights after exercise? The tracker will probably detect this correctly.
  3. Monitor HRV and resting HR trends. This is one thing wearables can measure directly (no stage inference needed), and it's a useful biomarker.
  4. Ignore the deep sleep minutes. Treat stage-specific numbers with heavy skepticism.

The best sleep tracker is one you'll actually wear consistently. If an Oura Ring or Apple Watch motivates you to notice sleep patterns and adjust your schedule accordingly, it's worth it. Just don't confuse correlation with high-resolution truth.