Turning Apple Watch into Keynote presenter with Deep Learning

Viktor Zamaruiev

by Viktor Zamaruiev

Turning Apple Watch into Keynote presenter Rolpb5m

The other day I was trying to hit a “perfect week” goal with my Apple watch. It’s amazing how this device has become a big thing for me, even though, frankly speaking,  I’m not the fitness kind of a person. I try to close all rings every day, commit to all challenges, and rely on the watch to determine which type of sports activity I perform.

Such usage of wearable devices is not something new. In 2014, I took a data science course, and we used accelerometer and gyroscope data collected from smartphones to predict the subject's activity. The data was collected in 2012 from 30 subjects between 19- and 48-years-old, performing one of six standard activities while wearing a waist-mounted smartphone that recorded the movement data. Today, you can find this dataset on multiple websites, particularly on Kaggle as ‘Human Activity Recognition with Smartphones’.

An Apple Watch, Deep Learning, finger-snapping and a presentation

It was quite a fun experience. The model’s accuracy was around 95% from a simple tree ensemble. In 2020, neural networks demonstrate state-of-the-art results in almost all areas, including time-series data from IoT sensors. Recurrent neural networks allow us to achieve incredible accuracy and perfectly predict not only overall activities, but individual gestures as well.

Nowadays, I have fewer fun things to do and a lot of grown-up things, like reports, presentations and tech talks. When I am on a stage or at a whiteboard, I typically gesticulate a lot. So I thought, “Why not embrace this trait and make the presentation process less boring?” What if I wear a magic hat and control a presentation using gestures only? Can we marry an Apple Watch, Deep Learning, finger-snapping and a presentation? Let’s see how it works.

Overview

We will start with a simple architecture, transferring IoT data over the network to a laptop with a neural network responsible for gesture recognition. Once it’s ready, I’ll show how to train the model in the cloud using AWS Sagemaker and run it on the edge directly on your phone.

How to train the model in the cloud using AWS Sagemaker

In the beginning, I will use just one gesture — a finger snap, that moves Keynote to the next slide. Later on, we can extend our list, combining finger snaps and hand waves to navigate back and forth in the presentation.

Data analysis

As is usually the case in the real world, I started with an idea which needed to be proven with data. I wanted to check if finger snaps are visible in the data and can be differentiated from other gestures, so I hastened to prototype a watch application that collects accelerometer data. I’m not proud of my swift skills, but for the sake of consistency, you can find the source code on Github.

Accelerometers measure changes in velocity along the x, y, and z axes.

Data analysis

In iOS world, CMMotionManager reports movement detected by the device's onboard sensors. It provides access to accelerometer, gyroscope, magnetometer and device-motion data. I’ll only be using the accelerometer data at the beginning, to keep things simple. You might also want to try device-motion data. It offers a simple way for you to get motion-related data for your app. Raw accelerometer and gyroscope data must be processed to remove bias from other factors, such as gravity. The device-motion service does this processing for you, giving you refined data that you can use right away.

The frequency of the data updates depends on the devices, but typically it’s capped at 100 Hz. Higher frequency means better resolution, but it results in larger volumes of data and higher battery usage, so we need to find a balance that allows us to identify snaps with a reasonable volume of data.

Chart 1

On the above chart you can see how snaps look with various frequency (25Hz, 50Hz, 100Hz)

An average finger snap takes 100-150ms. The frequency set to 50Hz gives us 5-8 snaps’ specific points. A rolling window of 16 points used for model training guarantees that any snap fits in this window and we have enough data on both edges to identify specific points.

Accelerometer data

On the above chart you can see how various snaps look on the accelerometer data. Frequency — 50Hz

I started collecting the data, moving from simple cases with a static hand position, to more complex scenarios where I waved my hands like a fly hunter. I soon found out that it’s hard to distinguish snaps from other movements without some type of clue. I adjusted the application to record an audio stream along with the accelerometer data, which simplified the markup process a lot. Audio data is used only for labelling purposes and not included in the model.

Audio signal is aligned with accelerometer data chart

You can see how the audio signal (12KHz / 10) is aligned with accelerometer data.

Watch a live demo of the Apple Watch accelerometer data:

 
Visual labeling tool

One of the downsides of using deep learning is that you need a lot of data to cover diverse scenarios. I created a handy visual labeling tool to precisely markup collected data. It can be easily adjusted to use with any time-series based data and along with other code available in the Github repo.

Visual labeling tool

 
Control Keynote with Apple Script

Before I dived too deep into the machine learning stuff, I did a quick feasibility check, and verified that it’s possible to manage Keynote slides using SDK. Apple Script allows controlling Keynote from an external application. The following snippet shows how to play the next slide in the presentation. It’s also available on my Github repo

tell application "Keynote"
    activate
    try
        if playing is false then start the front document
    end try
    tell the front document 
        if slide number of current slide is less than slide number of last slide then
        show next
    end if
    end tell
end tell

There is an OS X command (osascript) to run a script from the shell. It needs to be authorized at the first invocation. You can find more information and additional examples on iWork Automation.

Conclusion

In the next part, I’ll show you how to prepare a training dataset, train a Bi-directional Long Short Term Memory (Bi-LSTM) neural network to achieve 99% accuracy and use it in the end-to-end setup to control a Keynote presentation from an Apple Watch using gestures.

Related topics

Share

Turning Apple Watch into Keynote presenter R2mq5pb5m
Viktor Zamaruiev
Head of Data Science at Star

Viktor has over 15 years of professional experience in IT, revolving around solving business challenges with data. At Star, Viktor has worked on a number of projects as a software engineer, software architect, data scientist, technical project manager, solution architect and data engineer. Today all these roles are blended together in the Head of Data Science title. Viktor grows Star’s data science team and actively participates in customer and internal R&D projects.

Harness the future of technologies

Star uses top-notch technology solutions to create innovative digital experiences for our clients.

Explore our work
Loading...
plus iconminus iconarrow icon pointing rightarrow icon pointing rightarrow icon pointing downarrow icon pointing leftarrow icon pointing toparrow icon pointing top rightPlay iconPause iconarrow pointing right in a circleDownload iconResume iconCross iconActive Badge iconActive Badge iconInactive Badge iconInactive Badge iconFocused Badge iconDropdown Arrow iconQuestion Mark iconFacebook logoTikTok logoLinkedin logoLinkedIn logoFacebook logoTwitter logoInstagram logoClose IconEvo Arrowarrow icon pointing right without lineburgersearch