Secure IoT Data Stream MGMT: Interview with PubNub co-founder and CEO Todd Greene
First off, can you give us a little background on PubNub?
PubNub is a Data Stream Network. We are focused on all types of applications on the Internet that need to consume or generate data streams.
It’s a relatively broad, abstract concept, but if you drill it down into different industry verticals, it turns out that we’ve been pretty well-timed in the market. More and more applications, both in and out of the Internet of Things (IoT), are ones where there are streams of data coming off of devices and streams of data that devices are needing to consume. Devices used to send data once in a while or request data once in a while from a server, but now there’s this constant emitting of streams of data.
What we wanted to do was build a global network—like a Content Delivery Network, but specifically for streams of data—and offer services on the streams.
We can replicate those streams around the world, with low latency, and deliver them to everything from connected car applications, to Smart Home applications, to things outside of IoT like multi-player games, financial services applications and telecom applications.
It’s based on this idea of leaving connections open to devices and having a really scalable way of synchronizing and communicating between those devices. That was the core philosophy behind PubNub: Can we create this network in a way that many different kinds of companies can leverage it in an easy, scalable way?
That’s played out really well as we passed a thousand customers, using us to the tune of about 200 million devices a month now. We’re averaging about 3 million transactions a minute. So a lot of data is going through our system.
What kinds of devices and data is PubNub designed for? Is there a “sweet spot” in terms of the amount, or type of device data?
We wanted to provide a single network that could work for a lot of different use cases, so we specifically tried to design PubNub in a way that could support any kind of device. In fact, we have a 10-person team just focused on creating software development kits (SDKs) for different kinds of devices. Whether for a Python server, or a Pic32 chip set, or an Atmel-based Wink device, whatever kind of embedded or large-scale device you have, we’ve got an SDK that works.
We have some customers who are sending one message a day per device and some customers who are doing 1.5 megabits per second. It’s pretty broad, in that you don’t have to have different kinds of solutions for these different streaming applications. In the same way that you can use a browser for a request-response application, whether it’s one hit a day or a million, we wanted the same thing for streaming applications.
So there really hasn’t been a specific sweet spot, but as you look across different industries you start to see patterns, like what gets done in financial services versus what gets done in connected car, versus Smart Home, and so on.
What’s the device footprint for PubNub look like?
We put in a lot of time making sure we can support really tiny devices and are way under 4KB of memory. What’s interesting is that we require a device to have an IP address and at the end of the day, it’s just speaking TCP—so theoretically any device can connect to PubNub, which is absolutely the case.
In order to make it easy for the developer, we provide out of the box a lot of SDKs and the source code for SDKs to make them very easy to use. That publish-subscribe source code that’s on our homepage, we want to make it as simple as that even when you’re doing an embedded chipset. So that’s where we’ve been spending all of our time, because the developers working on IoT are under the same time pressures as everyone else. They’re not going to have an infinite amount of time to do very subtle network programming to make sure the devices can reliably connect in PubNub, so we want to handle that stuff for them automatically.
PubNub is obviously good at things like distributed computing and storage. Do you see the company moving into other areas to help with specific IoT functions like permissioning and security?
A lot of what you just described we already do, in terms of having a specific security model around IoT and ways to do things like device provisioning and firmware updates through our network. In fact, if you talk to a company like Insteon, their entire device provisioning and firmware update methodology is done through PubNub.
The way we see ourselves evolving is we don’t purport to be an “IoT platform”. In any kind of new industry or new technology, when it comes out, you first see platform providers show up—and they’re great right? They give you an A-to-Z solution with custom firmware and custom hardware design and big data solutions. But as an industry matures, you no longer want this kind of brittle, single-vendor solution. You start to see people say, “What are the layers that I need for my IT product?” We saw the same thing in the Internet. Going back to web development, it used to be these big monolithic platforms and then it became the best free database, the best free app server, the best free web server and so on.
So PubNub is always going to remain focused on the communication layer of the IoT, and that means that we don’t do custom firmware, we don’t do hardware design, we don’t do lots of business logic. Where we sit is as that global network that connects the devices together. But what’s interesting about us is that because we have your data going through our network, and because we know a lot about that data about the devices which are connected to our network, we can offer our customers an increasing set of services on that data. For example: multiplexing that data, storing that data, filtering that data, doing stream processing on that data, even running little bits of business logic inside the streams so that you’re doing the processing in the network itself and not at the device or on your big servers. We see ourselves really focused in that area, but offering an increasing set of services. Over the next five years we expect to really blow that up in terms of what we can do with those real-time streams going across our network.
What concerns do you hear about from prospective clients at the moment?
It really depends on the vertical. The people doing connected car are rigged differently than Smart Home, who are very different from a lot of people doing industrial IoT, and the list goes on. What I think is the biggest trend is this new growth in consumer IoT. A lot of larger consumer electronics companies are planning on making devices that will all connect to the Internet, to the tune of millions of devices. Then they really start thinking, “Gosh, how do we scale these devices and how do we make them work on global networks?” Industrially, I think you see a little bit more maturity in that space because they’ve done a lot, they’ve invested a lot, but now they’re in a world where the demands are getting higher and they’re looking for ways to increase their time-to-market.
One thing that’s ubiquitous across all of the verticals is the need for security. They want to be able to receive data from devices and control devices remotely, but they want to do those things in a secure way. That’s a need that we jumped on last fall when we realized that a lot of people were doing these crazy things trying to invent IoT security on their own. It seemed crazy to us because when you build a banking application, you don’t reinvent security. People who build banking apps know how to do security, so they don’t have to reinvent the wheel every time. So that’s another area we put a lot of focus on, to make security a de facto thing out of the box.
How does PubNub handle security?
Security in IoT really breaks down into four things. The first is data encryption, making sure that the data you’re sending out can’t be intercepted by a “man in the middle” attack; and that sounds easy, I mean, encryption has been around forever. Where it starts to get interesting is when PubNub wants to act on the data you send us. In many cases, companies don’t want us to see the data because they want it to be encrypted. So we have a really nice solution for that where we let them encrypt the data with a very strong encryption called AES. We don’t own the key so we never see the data, we can’t decrypt it even if we wanted to. Then we let them take small pieces of the data and put it outside the AES encrypted data so that we can act on it; kind of a nice hybrid model for encryption.
The second one is—and there’s different ways to say this—but the basic idea is that a device should have no open network ports. There should be no way for a hacker to find the device and hack into it. But one of the nice things about IoT is controlling a device from your iPhone or from a server. The way you would normally think about doing that is by leaving a network connection open on the device so the device is listening for a command. But as soon as you do that, you’re dead—because a hacker’s going to be able to scan an IP address, see that open port, and eventually they’re going to be able to hack into it, no matter how great you think you’re security is.
The way that we solve this, and the way people really should solve this across the board, is that we’ve turned it on it’s head. We start by making an outbound connection from the device, and then create an encrypted tunnel to PubNub. That way if a hacker scans the IP address of the device, it doesn’t even look like there’s a device on the network. It literally looks like there’s nothing at the end of that IP address; and yet, we have this quarter-second latency down to the device by starting with an outbound connection tunnel to our network. That’s the second area: no open ports, nothing listening.
Then the third area is around legislative security, which is something people don’t often think about until they get to large-scale deployments. It turns out how you can store your data, how you’re allowed to route it, what kind of encryption you have to use depending on what kind of data it is—that all depends on if you’re in the E.U. versus the U.S. versus Asia. Every country has different laws. Based on the way you configure our system, we can comply with the relevant legislative rules. I think you’re going to see more and more people thinking about legislative security, because it’s a key point for IoT security. But right now people aren’t talking about that one so much.
The fourth area is access control. Imagine now that you have a million devices and they’re all emitting and consuming streams of data that are going through your network. Now for every stream, you want to be able to decide who should have access to it, so you can give anyone or any device the ability to subscribe or publish on that stream. We did this with a product called PubNub Access Manager where you can issue tokens for each device and it allows the device to subscribe or publish to a stream.
What I think is cool about that is if you extrapolate that out, it’s no longer an access control layer. It’s a data syndication layer. We’re starting to see companies do this with IoT data today with PubNub. You may want to monetize that stream of data, and rather than building a complex relationship between you and the business partner, you’re simply publishing data to a public channel. You’re giving access tokens to the business partner, and they simply go to the published website, use whatever SDK they want, and consume the data. Then if you don’t like your business partner, you just turn off the stream or revoke the token. So that’s been an interesting model, not just for security, but for secure syndication of data as well. These are the kinds of things we’ve seen resonating really well with our customers.
There seems to be an evolution going on in the marketplace, as companies that have been really focused on the mobile world start to realize that they have an ability to scale up infrastructure in a way that’s really applicable to the IoT. You see this with Amazon picking up 2lemetry and then Facebook with their Parse acquisition and roll-out. What are your thoughts about how the market will play out and where you see PubNub within this emerging ecosystem?
I think the reality is that the IoT is not a distinct, easily-defined category that you can say is different from mobile players. I mean, absolutely everyone and their brother is an IoT company now, right? Last year I saw the CEO of Evernote saying “We’re an IoT company”, so there’s no doubt that everyone wants to jump on the trend, the buzz, the hype, whatever you want to call it.
But we have all these embedded devices and now we want to hook them up, not just to a network, but specifically to the Internet. That means global accessibility, security issues, massive scale, all the things that make the Internet unique from any other network in the world. There’s a lot of crossover from other systems.
They’re both small devices that connect to the Internet. So it absolutely makes sense to me that solutions that work well for mobile will have a lot of crossover.
The distinction is that there’s a pretty mature stack for mobile development and a pretty mature stack for how we build mobile and deploy mobile applications, but there’s a lot of quick evolution happening in the IoT space. It’s one thing to announce yourself as an IoT platform, but it’s another thing to invest the time and the resources to get to know all the idiosyncrasies of the different chipsets and the different hardware.
A nice thing about mobile apps is that you write one piece of code and it works on every Android device pretty much everywhere. But in today’s IoT, you write a piece of code and it’s going to work on this one chipset from this one manufacturer, but not even the other chipsets from the same manufacturer, much less other manufacturers. So it’s a much more fragmented space.
It seems like there’s a sort of blending going on with companies like yours that are trying to figure out exactly where they sit among the various layers of the IoT tech stack. What are your thoughts on how how that might evolve over the next couple years?
I’ve seen this now three times in my career, this evolution from platforms to stacks. Everyone that today declares themselves as an IoT platform will eventually find out what they’re really good at, and they’ll become that layer of the stack. If they’re really good at big data and analysis, they’ll become a big data analysis community. For the guys who are really, really good at hardware design and firmware, they’ll stop worrying about the connectivity pieces of big data. As you see industries evolve, all the vendors start to focus on their core competencies and stop claiming that they do everything from A to Z; or they become consulting firms and they use the layers of the stack that exist out there to piece things together as solutions for customers.
Where we see ourselves evolving is really as a computation platform for data streams. I think that’s very different from taking your code and running it in our network, because it’s not about that. It’s about saying, “Hey, because we’re a new kind of network, we know something. We know a lot of context about this data stream, we know the devices it’s connecting to, we know something about the data, and we can do smart things to the data.” If you start thinking about doing 3 million transactions a minute, or being able to manage a mass amount of data streams, it doesn’t make sense to do all that computation on your servers.
You can’t really do that computation on devices themselves because they don’t have enough power and/or context. The question is, where do you start doing all that computation? I think the answer is you start to see the computation as it traverses the network. Now, a dumb IP network—Verizon or Comcast or any kind of general ISP—they don’t know enough about the data to do anything smart with it. But because we sit on top of that and we know something about the data, as it goes through our network we can provide really interesting functionality on it. And that’s just one of the many layers that will be in the IoT stack.
Just to give you a real-world example, let’s say you’ve got 10,000 sensors and they’re sending out a temperature reading every second. Does it make sense to ingest all that raw data into your servers all the time? Or maybe you just care about the streams where the average temperature changes more than five degrees in any 10-minute period. So the network has the capability of looking at the data and providing access. Or imagine a world where you’ve got millions of people expressing their sentiment while watching a TV show—they’re all pressing buttons on their mobile phones. Do you really need all of the sentiment data, all the raw likes and dislikes, coming through? Or do you really just need something like trends over 5 minute periods?
From your perspective, are there any missing pieces to help tie the IoT space together going forward? Or do you see the components being there but they just haven’t gelled yet?
It’s an interesting question. What’s different about IoT, compared to software, is that it requires an investment. In other words, any two kids in mom’s garage can build the next WhatsApp or the next Instragram, right? But when you start talking about IoT, you’ve got real tangible products and they involve manufacturing, they involve chip sets, they involve circuit boards, they involve a lot more things that you just don’t do in your mom’s garage. The virality of IoT and the speed that it grows is somewhat hindered by the realities of dealing with hardware versus software.
Two things have to happen. One is that the price points for the actual components need to drop, and that’s happening. Just two years ago, you were spending $50 to $60 for the components to get a device online under a Wi-Fi network. Now, for example, Atmel makes a chip for under $5 a pop that has a full TCP Wi-Fi stack on it, and I think many of their competitors have similar chips. I can now stick a Wi-Fi chip in a light bulb.
The other hurdle is making it easier to build the hardware. We’re starting to see some really cool grassroots manufacturing techniques, but we’re still not there with 3D printers. Any kind of even low-capacity mass production is still done in China, is still done on the assembly line, and to some extent still requires a large five- or six-figure investment. On the other hand, it’s amazing what Kickstarter companies are doing now versus five years ago. It’s just jaw-dropping, the quality of products they’re able to produce and design.
I totally agree. In terms of the component and deployment costs, I think it was Chris Anderson that said the current prices and availability are “the peace dividends of the smartphone wars.”
That’s really cool, I hadn’t heard that. I think that’s part of it, absolutely. A friend of mine runs strategy for a large Chinese telco, and right next door is a large manufacturer that makes baseband chips, and the numbers of units that they ship is just staggering. It’s really hard to imagine a million of anything, but when you start hearing numbers like 100 million, it just boggles the mind. That they can do something like that in a year, it’s just unbelievable.
Let’s wrap up with a question we always like to ask. What are you currently working on that you’re most excited about?
One of the things I’m pretty excited about in general has been dealing with the scale that we see at PubNub. I gave you some of our “vanity metrics”—200 million devices, 3 million transactions a minute and all that good stuff—but those numbers are true, and what it means is that in essence, we’re running a massively distributed computing platform globally. I’m really proud of our team for having gotten to where we are. It seems to be scaling linearly, just as we planned. It is amazing to me—we had a customer come on board recently with five million new devices and the expectation was: “Hey, it’s all just going to work.” It’s an API, so you expect that it’s all going to work and it does. But the expectation of taking five million of anything and sticking them on a network is such an amazing thing to me.
Seven years ago you couldn’t just flip a switch and have five million of any application running. We’ve seen a massive change even in just the last two years with respect to how many devices come online, step-by-step-by-step, and in the expectations around this real-time stream connectivity with them. We now generate so much telemetry data that a couple of bytes difference with a log file can mean literally hundreds of gigabytes by the end of the month. So the level of scale has been pretty exciting, and that’s just been fun to watch.
Thanks for taking the time to talk to us.