VR – Mass or niche market?

Posted 6 Feb 2018

The Interface is a Key to the Successful Deployment of VR.

When you use a computer these days mostly it is quite straightforward, at least for familiar everyday programs. When you try a new application again it is pretty familiar. If you have to consult a manual it is either because you’re trying something very unusual, or because the user interface to the program is not too good. Manuals are often incomprehensible, and anyway these days we don’t have much time to consult them. We expect everything to just ‘work’ and be easy to understand. The interface is typically so good that we don’t even think about it – mostly it feels like we are operating with physical objects. Just as I might pick up an empty bottle of water and move it from a table to the bin, so on the computer I ‘pick up’ an icon representing a file and move it to the ‘bin’. I’m hardly aware that to do this I’m using a mouse (or touchpad) to move a cursor over an icon, I press and hold down the mouse button, I move the mouse so that the icon attached to the cursor moves across the screen until it is over another icon representing a bin, check that the bin is highlighted in some way, and then let go the mouse button, and ‘poof!’ the file is in the bin and deleted. Every so often, just like the bin in my kitchen I have to ‘empty’ the bin, so that finally the things in it are ‘really’ gone – but just like in physical reality they are not really gone, but rather they are scattered, and no one would any more be able to pin-point where a particular item is located.

The interface is typically so good that we don’t even think about it – mostly it feels like we are operating with physical objects. Just as I might pick up an empty bottle of water and move it from a table to the bin, so on the computer I ‘pick up’ an icon representing a file and move it to the ‘bin’. I’m hardly aware that to do this I’m using a mouse (or touchpad) to move a cursor over an icon, I press and hold down the mouse button, I move the mouse so that the icon attached to the cursor moves across the screen until it is over another icon representing a bin, check that the bin is highlighted in some way, and then let go of the mouse button, and the file is in the bin and deleted. Every so often, just like the bin in my kitchen I have to ‘empty’ the bin, so that finally the things in it are ‘really’ gone – but just like in physical reality they are not really gone, but rather they are scattered, and no one would be able to pin-point where a particular item is located.

This direct interface, this transparency between our intentions, actions and what is ‘really’ going on underneath is not a natural order of things. These ideas were invented, researched, argued about, invested in, fought over, rejected, advanced, struggled over, for many years. Many of these ideas go back to the 1960s based on work by Douglas Engelbart. Xerox Parc put about a decade of investment in the 1970s into these ideas, working on bitmapped screens, the idea of a mouse, the cursor, menus, windows, icons etc. Apple carried these ideas much further and eventually transformed the computer interface into the form we know today by building the first really mass consumer products that embodied these concepts. The first Mac was released in January 1984. There is an excellent long read in The New Yorker which gives some of the history.

There are two important lessons here. The first is that there was a radical transformation of computers and how they were used, from huge machines serviced by white coated priest-like technicians locked away in special rooms with whirling tapes, interfaced through punched paper tape or cards – through to consumer items that can sit on your desk, and in fact you can carry in your pocket in the form of a SmartPhone. The development of how we use computers and whatcomputers are in terms of devices went hand in hand, and the everyday language or grammar that we seemingly naturally employ to use computers took decades of research and development. It didn’t just happen. Second, even when the ideas were developed and known, like the tremendous work done in Xerox Parc, it took yet another way of thinking to turn those ideas into products that could be (even happily!) used by millions of people – who know almost nothing about computers.

Virtual Reality is going through a similar process. Started in the 1960s by Ivan Sutherland, developed in the 1980s and early 1990s, for example, by Scott Fisher at NASA Ames, and Jaron Lanier’s VPL’s EyePhone head-mounted display and data glove – it disappeared from public view into University labs and some industries around the world. To do anything with VR you had to have lots of money, and be involved in computer science or have access to technical expertise in the field. VR also had its artistic moments, such as the unforgettable Osmose by Char Davies in 1995. Unfortunately VR did not have the equivalent of a Xerox Parc, a single group working on common goals for a decade to really explore and create this new medium – rather the research was scattered in labs and small companies around the world. One of the most significant of these was the University of Southern California and Fakespace Labs, these two had the common link of Mark Bolas. At the University’s Institute for Creative Technologies Mark and colleagues experimented over many years with, amongst other things, new types of head-mounted display, and the company Fakespace produced the Wide5 HMD, and a line can be traced from all of this work to the Oculus HMD and the formation of the company and its eventual take over by Facebook with huge investment, marking the beginning of the new resurgence of VR and its move towards becoming a mass consumer product. But for VR to actually become a mass consumer product, where people do really use it in their homes, the same kind of completely new thinking has to be fostered, just like the new thinking at Xerox Parc paved the way for the personal computer revolution.

For VR to become a mass consumer product it has to be as usable, with interfaces that are as transparent as the examples given in the first paragraph. There are two aspects to this. The first is the launching of the VR program itself, and the second is what happens while it is running.

Take a look at an example of what you had to do just to delete a file under the DOS (PC) operating system. It is pages long. If you don’t want to follow the link, here is another example (only partially reproduced here):

In my experience starting up a non-trivial VR program is very similar. Typically, I am given a page full of instructions on what I have to do in order to execute a VR program, and this is apart from what has to be done with the hardware itself (like plugging all the things together, making sure cameras are aligned, and so on).

Here is a recent example (some details have been replaced by ‘…’ for reasons of confidentiality).

Now none of this is difficult, and may be easier to understand than typical DOS commands, and to follow these instructions you don’t have to be much of an expert, but this is still very far from being a consumer product. Also, if one of these items in the sequence is done incorrectly or not done, you would typically have to restart the program.

Now one reason for this state of affairs is that up until recently the people who used VR were either experts, or they were being managed by people who were experts. For example, in a University the VR group would run an experiment and the participants would simply experience the scenario and not be at all involved in setting things up or running the actual program. Hence the interface was unimportant, because the program was typically being run by the people who wrote the code, or by people who had practiced running it many times during pilot studies. Obviously, this won’t do for a consumer product.

Setting up and running the program is a traditional 2D user interface problem. This may not be part of VR in itself (though it can be, as discussed below). If you type “User Interface Design” into Google Scholar there are about 94,000 hits. Of course these won’t all be unique articles, but this gives an idea of the scale of research that has been carried out in this area. A lot is known about how to design interfaces that people can use. There are tons of articles on the web about this, for example “User Interface Design Guidelines: 10 Rules of Thumb” which also gives away a free book, and there is one sponsored by the US government. There is an excellent one by Apple on how to design for iOS. Actually this is a good way to think about it – an interface to start up a program should be doable just from a mobile phone. It should be that simple (the mobile could connect by WiFi or Bluetooth to the actual computer running the application – or indeed the application could be running on the phone itself). So, the message is – starting up a VR program, should in principle, be no more difficult than moving an icon to the bin: it might have more features and more actions or selections needed to be made, but it should be a transparent, direct user interface, without modes, with feedback so that you know everything you have selected so far, with an ‘undo’ and ‘redo’ selection, just like any modern, normal user interface. The fact that it is starting a VR program is beside the point.

Apart from the very act of starting up the program it should though be possible to move the entire interface to be part of the VR experience itself. For example, it may be that there are some elements that can only be done once the VR has started, such as a calibration for tracking. Now if the interface for that is split between a computer screen and what the participant in the VR is seeing, then this cannot be done by one person alone. Therefore this has to be done ‘inside’ the VR if it is a single person program. So if there are some elements that must be done inside the VR then why not all of them, apart from the actual fact of starting the program itself? In this case the ‘user interface’ shifts to becoming an issue in VR, and in these circumstances a lot of specific practices from 2D interfaces no longer apply.

This reflects what I’ve said before: In VR we are not ‘users’ but participants. We are in an alternate world, and the whole point of VR is the sense of presence, plausibility and place illusion, that it can engender. A 2D menu that pops up doesn’t belong in that world. There is no ‘cursor’, there are no ‘windows’. However, this highlights another problem. We can think of a user interface as a language. At the base of the language are words – fundamental elements that go into any modern screen based user interface: move the cursor (via the mouse or trackpad), point and click, these can be considered as the verbs. There are abstractions such as files and folders, applications, and representations of these in the form of icons, which can be considered as nouns. A sentence is ‘open a folder’ which strings together the words: move the cursor [to the icon], point and click, and then the folder opens to a window. Any user interface is like a set of sentences. In a way any particular instance of using an interface is a story: “I opened this file, I put the file in that folder, I moved this file to the bin, I moved the cursor to that icon, I clicked it twice, it opened into a window, which displayed another user interface to an application, …”. The elements of this language, in fact this universal language since it is understood anywhere in the world, was developed over many decades as we have seen.

What is interesting, and remarkable, is that in VR there is only one language element that is universally understood: you can look around in VR by moving your head. This is the only universal! When you enter a VR application for the first time this is the only thing that you can expect. Nothing else at all is certain, and you will have to learn it. For example, you enter a VR and you want to change your position – move somewhere else in the virtual environment. How do you do it? If you’re lucky and have a wide area tracked space you can just walk. This has several problems: (i) It is the instinctive thing to do but it might not work simply because there is no support for it in the particular configuration being used. In fact, walking when it is not supported may break various aspects of how the program is supposed to work. For example, if you have a virtual body, and you physically walk but there is no support for this, then the body won’t move with you, so you would have ‘walked out of your body’ – probably not desirable for most applications. (ii) You walk and it is supported, but the size of the virtual space you are in is greater than the physical space you are in. Therefore, there has to be some sort of interface that allows walking beyond the bounds of the virtual space. There are tricks to overcome this – such as redirected walking where the environment is imperceptible swivelled around so that you think you’re walking in a straight line but in fact are walking in a circle. But this still requires some elements to be introduced into the program to make this work. (iii) Walking is supported, but of course you are still in a physical environment, and if you keep walking you are bound eventually to hit a boundary such as a wall. Some systems (e.g., the HTC VIVE) will warn you about this by throwing up a wire-frame grid to let you know when you’re close to a wall. This is a safe but not an ideal solution – you may still have to get to the other side of the physical wall (since in VR the wall is not there) and of course it is not great for maintaining presence.

Physical walking is the best solution for locomotion through a virtual environment, but it is not always a practical solution. Hence metaphors have to be used. One popular one is the 1995 idea of walking-in-place (a more recent paper, 2016) – where you stay physically in one spot but pretend you’re walking. This could be on a treadmill, or it could be on the ground, but your foot movements are tracked in some way so that the system can know when you’re doing this. When it recognizes these walking motions then it moves you forward in the virtual space. Another possibility is much simpler in terms of implementation – the participant holds a wand and points it in a direction, and when a button on a wand is pressed they are moved forward in that direction. Another possibility is the ‘leaning’ metaphor (1993). Imagine a circle drawn around the participant say at chest height. Then when the participant leans in a direction, their body will cut the circle. The system will then move them in the direction of the lean and the speed of movement will be a function of how far they lean.

These are perennial issues in VR that have been discussed and researched for 25 years. There is no universal solution. We found when comparing ‘really walking’, ‘walking in place’ and the ‘pointing’ method that in terms of presence there was not much difference between ‘really walking’ and ‘walking-in-place’ but that of course the former was more accurate. However, that particular advantage of ‘really walking’ is probably lost today, since the devices we can use to detect the leg movements (e.g., just a Wii board would be enough) are accurate and very cheap. Another problem with any interface that moves you through the virtual world is that you might get sick.

When you enter into a virtual environment you will not know how to move – you have to have learned it. Unlike moving a file from one folder to another on a 2D interface, that everyone will know without thinking about it, there is no universal method for walking in VR. The same is true with another basic operation: select an object. So, you’re in the virtual environment and you want to select an object – it may be for the purposes of grabbing it, or just to select it for some other purpose. How should you do this? It won’t be obvious. One possibility is that if one of your hands is represented in the VR and tracked, you can try to move your hand towards the object and touch it, and if successful on touching it, it changes colour momentarily, or makes a sound, so that you have feedback. If the object is outside of your reaching space you can’t do that – either you have to walk toward the object (problems as above) or some other method is needed. One, for example, is that a ray comes out of your hand (or out of your head between your eyes) and as you move your hand around (or look around) so the ray will intersect objects until you find the one that you want.

Now suppose you want to ‘grab’ the object. How you do this depends on the hardware. For example, if your hand is being tracked by something like the Microsoft Kinect, then you could make a grabbing motion with your hand once you have selected the object. The Kinect will recognise that action, and the computer program can then attach the object to your virtual hand until you let it go by opening your hand again. If you’re using a traditional wand then it will have at least one button, such as a trigger button. If you know which is the correct button to press then this will grab the object until you let go of it. Be careful not to press the wrong button though, since this could have been programmed for some other action, and you might suddenly find yourself floating 10m above the (virtual) ground.

So, to reiterate the basic point: VR has no universal grammar that everyone can naturally understand. Putting this another way, apart from the ability to look around by moving your head, there are no recognised conventions about how to accomplish actions in VR. This has to be built from the ground up, and eventually those ‘words’ which are the basic building blocks of any language must be universally recognised. When I go into a VR scenario I should not have to learn how to do something as basic as moving through the space without someone first explaining it to me, or worst still reading about it in the manual. Of course, one of the nice advantages of VR is that the scenario itself can contain a short introductory piece that informs the participant about how to do things. If this is done well it will work, since it only requires the one basic thing that every VR supports – to look around and listen. Of course you may not want to go through this tutorial every time that you enter the environment, but with modern speech recognition systems it should be easy for the program to recognise a “Please shut up!” once the tutorial starts.

In the physical world we have affordances – we see a flat surface of a certain height, we know we can sit on it, no one has to tell us this (apart from a notice that might say “Please do not sit on the …”). We see a window, we know we can look through it. A switch or a button, we know we can press it, and would normally know what would happen as a result. We function very well without people having to explain to us how to do things. We are on a train in a foreign country, and we wonder how to open the door when the train stops – we instinctively look for a button to press or a lever to pull – we know the objects we are looking for that offer us the affordances that we want.

Since VR often depicts scenes from reality (even if they are fantastic scenes that could not happen in reality they still typically have some intersection with reality – like a floor, or ground, rooms, trees, or whatever) there will be natural affordances visible in the environment. For example, a round object on a vertical flat surface that seems to be cut out a bit from the surrounding flat surface is likely to be a handle to a door, and probably if you touch it something will happen like the door opening. An archway beyond which you can see another environment beckons you to go through it.

The lesson here is that in the VR participants should be able to simply infer what they are supposed to be doing and how they are supposed to act from the affordances and events in the environment. Do not make selections through a menu, but exploit what the environment itself offers. If you want people to pick up an object, give it a handle. If you want people to go to a specific place then make that place the obvious place to go through events in the scenario itself. Perhaps objects have certain associated actions or properties. Therefore, an object can be given an inviting button to press, so that when the participant is tempted to touch it, the object gives out its information with explanations or examples about how to use it.

Although not really affordances, participants can learn to do actions by example. For example, if the metaphor for moving is the leaning method as described above, then participants can learn this by accident (not a good idea), or a virtual human character can demonstrate it, and most people will quickly understand that this is the way to move. Here the underling program must have awareness of how well the participant has succeeded in learning the method so that the moment when the tutorial interface can stop is determined.

In an environment we recently built, the participants had a choice of several scenarios to join. One way that was shown to us by a traditional 2D interface designer was for the HMD to display a 2D screen whereby participants would select the environment by ‘looking’ (i.e., where you look is associated with a cursor that highlights the object you are looking at) and clicking. Then you would enter that scenario until it had finished and then you were back to the 2D interface whereby you could make another selection. So the participant was repeatedly moving from 3D (the scenarios) to 2D (the interface). While the 2D interface looked very beautiful, and served its function, somehow this is not appropriate for VR. It requires participants to become ‘users’ in the traditional 2D sense, but even this is broken since they do not move a mouse around to move a cursor and then select the scenario they want by clicking a mouse button. In my view once you are in VR you stay in it until it is time to finish altogether. Use the affordances offered in the scenario itself.

The method we came up with was a little better. The participant started in an office (which was itself naturally part of the scenario). Around the office were little pictures – like front facing books scattered around on book shelves. The participants could use the ‘looking’ method to select a book, click the wand button and then that would open up into that particular scenario. When that was finished they were back in the office and could choose another one. Still that one had some learning associated with it – and also participants had to read the text on the little ‘books’ to know what they were choosing. Also, there is no reason why participants would have thought that choosing a book would lead them to another scenario. This is something that they had to be told.

Perhaps a better method would have been to use the idea of portals. Since this is VR we can be adventurous and place some arches around the office with scenarios that can be seen through them – portals to other scenarios. When a participant spends enough time staring at one of the arches, this is an indication that this is where they want to go. So they are placed there. It is not obvious that this would work well, but it does have the idea that this is not something that people have to learn, but a ‘natural’ part of the environment, with an obvious affordance.

Let’s give one more example. In order to make body tracking work, some programs will require a calibration where the participant has to stand in a certain stance for a while. How though will they know what to do? One obvious solution is that there is an operator of the program, who tells them how to stand and gives them the instructions once the person is in the VR. This works very well – but requires an operator. Another possibility is that you go into the VR and you see a screen with written instructions informing you, with a picture, about how you are supposed to stand. This could work, but does require you to read a lot of text, and currently this is not something that is particularly comfortable to do with the resolution of today’s HMDs. Of course the text could be replaced by a voice to make it easier.

But this is VR! Show a virtual human body, in the same space as you, near you, and standing as you are supposed to stand. The virtual human could be saying “Adopt the same posture as me” or whatever. The program could show a reflection of your body standing next to the virtual demonstrator, as its current calibration allows, and this process of adjusting your body and the testing of the calibration continues until convergence. This is using VR in order to set up the VR. Of course there will be many other possibilities, but the important point is that one – use VR itself as the interface to VR.

A most important message is that whatever interface you choose, someone will break it. Hence a lot of testing with participants is absolutely vital. People will come to interpret things in ways that the designers would never suspect, and they cannot know this without testing. Use natural affordances wherever possible, but still they have to be tested with lots of participants.

To conclude: VR has made enormous strides towards becoming a consumer product in recent years. However, where it is really missing out is that there is no universally recognised interface by which people can carry out activities within the virtual environment. There are many ideas, a lot of past research, there are many poor practices that are automatically carried over from 2D interfaces to 3D, but typically a participant has to learn how to operate in any particular application. This is so unlike normal everyday use of computers, where there is a universally recognised language with its own grammar – where typically you can start up any computer and have a pretty good understanding of how to use it without anyone explaining anything to you. And, dare we say it, without reading a manual.

For VR to become a mass consumer product that is straightforward to enjoy, this universal language has to be developed. Companies developing VR applications must put a huge amount of thought and effort into this, with experimentation and testing with naive participants, and also with participants who have become used to the application (since the needs of each group will be different). Companies must do this if VR is going to break out of its restricted shell into the mass market.

One day some company will get it spot on and develop an interface that everyone will adopt. It could be your company, it should be your company, unless of course you want to be paying royalties to another one.