Crowdsourced Visual Sensing

The Internet of Things is practically littered with sensors—small, purpose-built devices that are exceptionally good at detecting a particular aspect of their environment, like temperature or motion. But turning IoT sensor data into useful information (How cold is it outside? How fast am I going?) requires some kind of program that knows, or can be configured to know, what the data means and how it should be interpreted. Wouldn’t it be nice if instead you could just, y’know, ask a question?

That’s the idea behind Zensors, a proof-of-concept project from computer science grad students and professors at Carnegie Mellon University. Zensors repurposes unused smartphones as camera-equipped visual sensors, and uses a combination of crowdsourcing and machine learning to answer natural-language questions about what the camera sees. The work was presented at the CHI ‘15conference on human-computer interaction in Seoul earlier this month.

In a live demonstration, PhD student Gierad Laput showed how the Zensors app lets users mark an area of interest in a camera’s field of view by drawing a circle with their fingertip, then type a question in plain English about what’s happening in that part of the image. You might point your camera out the window and ask “How many cars are in the parking lot?” or “Is it snowing?” Or you might point it at a room to ask things like “Where’s the dog?” or “How large is the pile of dishes on the kitchen counter?”

The CMU team knew that answering any one of these questions with a specially-built machine vision application could cost thousands of dollars. Instead, they turned to crowdsourcing through Amazon’sMechanical Turk platform, which offers small payments to human workers who complete simple on-screen tasks.

Humans have the advantage of being able to quickly understand the question being asked and make a quick visual assessment. With reasonably frequent camera readings, a week of Mechanical Turk image assessments might cost only a few dollars. At the same time, the Zensors platform learns from the human evaluations, and can take over as soon as its own accuracy is up to snuff. Then it’s just a matter of having humans check in occasionally to make sure the algorithm remains accurate over time and adapts to changing conditions (if you teach it to count cars in a parking lot during nice weather, the first snowfall may throw off the data).

A phone or wireless camera running the Zensors app can answer multiple questions about a single image, and privacy-conscious users can use blurs and other filters to obscure parts of the image that are sent off to the crowd. The platform also includes a rules engine, so the answer to each question you ask it can trigger text or email alerts or prompt behaviors in other apps and connected devices.

Of course, it’s not a perfect system. “Some sensors are just not going to work,” Laput said in the CHI presentation. “Subjective questions tend to yield very poor accuracy.” Teaching a computer to recognize an orderly line at a cashier’s station, for instance, is a lot harder than teaching it to count the number of people waiting.

There are plenty of uses for visual sensors, from home security and proximity/presence detection to more sophisticated machine vision applications, and there are plenty of IoT products offering these capabilities to varying extent. If Zensors can be commercialized, it would offer a simple and user-friendly alternative that reuses hardware—old phones—which many consumers already have lying around the house. To learn more, read through the full research paper or have a look at the video below.

Related: JanOS, Placemeter


Retail Analytics

Trend research, projects and activity