We had video calls in science fiction, and we had video conferencing in the 1990s, just as the web was taking off, as a very expensive and impractical tool for big companies. It was proposed as a use case for 3G, which didn’t happen at all, and with the growth of consumer broadband we got all sorts of tools that could do it, but it never really became a mass-market consumer behaviour. Now, suddenly, we’re all locked down, and we’re all on video calls all the time, doing team stand-ups, play dates and family birthday parties, and suddenly Zoom is a big deal. At some point many of those meetings will turn back into coffees, we hope, but video will remain.
Will it still be Zoom, though?
As a breakthrough product, I think it’s useful to compare Zoom with two previous products – Dropbox and Skype.
Part of the founding legend of Dropbox is that Drew Houston told people what he wanted to do, and everyone said ‘there are hundreds of these already’ and he replied ‘yes, but which one do you use?’ That’s what Zoom did – video calls are nothing new, but Zoom solved a lot of the small pieces of friction that made it fiddly to get into a call.
The other comparison, though, is Skype. Just as for video, VOIP had been around for a long time, but Skype solved a lot of pieces of friction, in both engineering and user experience, and by doing so made VOIP a consumer product.
Two things happened to Skype after that, though. The first is that the product drifted for a long time, and the quality of the user experience declined. But the second is that everything has voice now. Imagine trying to do a market map today of which apps on a smartphone, Mac or PC might have voice – it would be absurd. Everything can have voice. And though there’s still a lot of engineering under the hood, it became a commodity. Whether you buy it from Twillio or someone else, saying ‘our app has free computer voice’ is meaningless – what matters is how you wrap it. Why do you have voice? Hence, Clubhouse is built on ideas about psychology and behaviour, not VOIP, and it’s not trying to win ‘voice’. Equally, Pindrop looks at every call going into a call centre and tries to work out which might be fraudulent. If you’d looked at Skype in 2004 and argued that it would own ‘voice’ on ‘computers’, that would not have been the right mental model.
I think this is where we’ll go with video – there will continue to be hard engineering, but video itself will be a commodity and the question will be how you wrap it. There will be video in everything, just as there is voice in everything, and there will be a great deal of proliferation into industry verticals on one hand and into unbundling pieces of the tech stack on the other. On one hand video in healthcare, education or insurance is about the workflow, the data model and the route to market, and lots more interesting companies will be created, and on the other hand Slack is deploying video on top of Amazon’s building blocks, and lots of interesting companies will be created here as well. There’s lots of bundling and unbundling coming, as always. Everything will be ‘video’ and then it will disappear inside.
An important part of this is that there seem to be few real network effects in a video call per se. You don’t necessarily need an account to join a call, and you generally don’t need an application either, especially on the desktop – you just click on a link in your calendar and the call opens in the browser. Indeed, the calendar is often the aggregation layer – you don’t need to know what service the next call uses, just when it is. Skype needed both an account and an app, so had a network effect (and lost even so). WhatsApp uses the telephone numbering system as an address and so piggybacked on your phone’s contact list- effectively it used the PSTN as the social graph rather than having to build its own. But a group video call is a URL and a calendar invitation – it has no graph of its own.
Incidentally, one of the ways that this all feels very 1.0 is the rather artificial distinction between calls that are based on a ‘room’, where the addressing system is a URL and anyone can join without an account, and calls that are based on ‘people’, where everyone joining needs their own address, whether it’s a phone number, an account or something else. Hence Google has both Meet (URLs) and Due (people) – Apple’s FaceTime is only people (no URLs).
Taking this one step further, a big part of the friction that Zoom removed was that you don’t need an account, an app or a social graph to use it: Zoom made network effects irrelevant. But, that means Zoom doesn’t have those network effects either. It grew by removing defensibility.
I compared Zoom with Dropbox and Skype, but another useful comparison is with photo sharing. There have always been hundreds of things that did this, but we saw a succession of companies that worked out something new around user experience and psychology that took them beyond ‘photos’ to some deeper insight – first Flickr, then Facebook and Instagram, and then Snap.
When Snap launched, there were infinite way to share images, but Snap asked a bunch of weird questions that no-one had really asked before. Why do you have to press the camera button – why doesn’t the app open in the camera? Why are you saving your messages – isn’t that like saving all your phone calls? Fundamentally, Snap asked ‘why, exactly, are you sending a picture? What is the underlying social purpose?’ You’re not really sending someone a sheet of pixels – you’re communicating.
That’s the question Zoom and all its competitors haven’t really solved. Zoom has done a good job of asking why it was hard to get into a call, but hasn’t really asked why you’re in the call in the first place. Why, exactly, are you sending someone a video stream and watching another one? Why am I looking at a grid of little thumbnails of faces? Is that the purpose of this moment? What is the ‘mute’ button for – background noise, or so I can talk to someone else, or is it so I can turn it off to raise my hand? What social purpose is ‘mute’ actually serving? What should screen-sharing do? What other questions could one ask? And so if Zoom is the Dropbox or Skype of video, we are waiting for the Snap, Clubhouse and Yo.