Publication logo

Vol. 26, No. 5, May 2020

Connect using FacebookConnect using TwitterConnect using RSSWatch video content on YouTube

Front Matter

Front Cover

Table of Contents

Introducing the IEEE Virtual Reality 2020 Special Issue


IEEE Visualization and Graphics Technical Committee (VGTC)

Conference Committee

Paper Reviewers for Journal Papers

International Program Committee for Journal Papers

The 2019 VGTC Virtual Reality Best Dissertation Award

Session: IEEE Virtual Reality 2020 Papers

1841Teleporting through Virtual Environments: Effects of Path Scale and Environment Scale on Spatial Updating

Jonathan Kelly, Iowa State University, USA

Alec Ostrander, Iowa State University, USA

Alex Lim, Iowa State University, USA

Lucia Cherep, Iowa State University, USA

Stephen Gilbert, Iowa State University, USA

Read Article: view PDFPDF

Virtual reality systems typically allow users to physically walk and turn, but virtual environments (VEs) often exceed the available walking space. Teleporting has become a common user interface, whereby the user aims a laser pointer to indicate the desired location, and sometimes orientation, in the VE before being transported without self-motion cues. This study evaluated the influence of rotational self-motion cues on spatial updating performance when teleporting, and whether the importance of rotational cues varies across movement scale and environment scale. Participants performed a triangle completion task by teleporting along two outbound path legs before pointing to the unmarked path origin. Rotational self-motion reduced overall errors across all levels of movement scale and environment scale, though it also introduced a slight bias toward under-rotation. The importance of rotational self-motion was exaggerated when navigating large triangles and when the surrounding environment was large. Navigating a large triangle within a small VE brought participants closer to surrounding landmarks and boundaries, which led to greater reliance on piloting (landmark-based navigation) and therefore reduced - but did not eliminate - the impact of rotational self-motion cues. These results indicate that rotational self-motion cues are important when teleporting, and that navigation can be improved by enabling piloting.

Digital Object Identifier: 10.1109/TVCG.2020.2973051

1851Weakly Supervised Adversarial Learning for 3D Human Pose Estimation from Point Clouds

Zihao Zhang, Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences

Lei Hu, Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences

Xiaoming Deng, Bejing Key Laboratory of Human Computer Interactions, Institute of Software, Chinese Academy of Sciences

Shihong Xia, Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences

Read Article: view PDFPDF

Point clouds-based 3D human pose estimation that aims to recover the 3D locations of human skeleton joints plays an important role in many AR/VR applications. The success of existing methods is generally built upon large scale data annotated with 3D human joints. However, it is a labor-intensive and error-prone process to annotate 3D human joints from input depth images or point clouds, due to the self-occlusion between body parts as well as the tedious annotation process on 3D point clouds. Meanwhile, it is easier to construct human pose datasets with 2D human joint annotations on depth images. To address this problem, we present a weakly supervised adversarial learning framework for 3D human pose estimation from point clouds. Compared to existing 3D human pose estimation methods from depth images or point clouds, we exploit both the weakly supervised data with only annotations of 2D human joints and fully supervised data with annotations of 3D human joints. In order to relieve the human pose ambiguity due to weak supervision, we adopt adversarial learning to ensure the recovered human pose is valid. Instead of using either 2D or 3D representations of depth images in previous methods, we exploit both point clouds and the input depth image. We adopt 2D CNN to extract 2D human joints from the input depth image, 2D human joints aid us in obtaining the initial 3D human joints and selecting effective sampling points that could reduce the computation cost of 3D human pose regression using point clouds network. The used point clouds network can narrow down the domain gap between the network input i.e. point clouds and 3D joints. Thanks to weakly supervised adversarial learning framework, our method can achieve accurate 3D human pose from point clouds. Experiments on the ITOP dataset and EVAL dataset demonstrate that our method can achieve state-of-the-art performance efficiently.

Digital Object Identifier: 10.1109/TVCG.2020.2973076

1860Getting There Together: Group Navigation in Distributed Virtual Environments

Tim Weissker, Bauhaus-Universität Weimar, Germany

Pauline Bimberg, Bauhaus-Universität Weimar, Germany

Bernd Froehlich, Bauhaus-Unveristät Weimar, Germany

Read Article: view PDFPDF

Supplemental Material: Video

We analyzed the design space of group navigation tasks in distributed virtual environments and present a framework consisting of techniques to form groups, distribute responsibilities, navigate together, and eventually split up again. To improve joint navigation, our work focused on an extension of the Multi-Ray Jumping technique that allows adjusting the spatial formation of two distributed users as part of the target specification process. The results of a quantitative user study showed that these adjustments lead to significant improvements in joint two-user travel, which is evidenced by more efficient travel sequences and lower task loads imposed on the navigator and the passenger. In a qualitative expert review involving all four stages of group navigation, we confirmed the effective and efficient use of our technique in a more realistic use-case scenario and concluded that remote collaboration benefits from fluent transitions between individual and group navigation.

Digital Object Identifier: 10.1109/TVCG.2020.2973474

1871Factored Occlusion: Single Spatial Light Modulator Occlusion-capable Optical See-through Augmented Reality Display

Brooke Krajancich, Stanford University, USA

Nitish Padmanaban, Stanford University, USA

Gordon Wetzstein, Stanford University, USA

Read Article: view PDFPDF

Supplemental Material: Supplemental PDF

Occlusion is a powerful visual cue that is crucial for depth perception and realism in optical see-through augmented reality (OST-AR). However, existing OST-AR systems additively overlay physical and digital content with beam combiners – an approach that does not easily support mutual occlusion, resulting in virtual objects that appear semi-transparent and unrealistic. In this work, we propose a new type of occlusion-capable OST-AR system. Rather than additively combining the real and virtual worlds, we employ a single digital micromirror device (DMD) to merge the respective light paths in a multiplicative manner. This unique approach allows us to simultaneously block light incident from the physical scene on a pixel-by-pixel basis while also modulating the light emitted by a light-emitting diode (LED) to display digital content. Our technique builds on mixed binary/continuous factorization algorithms to optimize time-multiplexed binary DMD patterns and their corresponding LED colors to approximate a target augmented reality (AR) scene. In simulations and with a prototype benchtop display, we demonstrate hard-edge occlusions, plausible shadows, and also gaze-contingent optimization of this novel display mode, which only requires a single spatial light modulator.

Digital Object Identifier: 10.1109/TVCG.2020.2973443

1880The Security-Utility Trade-off for Iris Authentication and Eye Animation for Social Virtual Avatars

Brendan John, University of Florida

Sophie Jörg, Clemson University

Sanjeev Koppal, University of Florida

Eakta Jain, University of Florida

Read Article: view PDFPDF

The gaze behavior of virtual avatars is critical to social presence and perceived eye contact during social interactions in Virtual Reality. Virtual Reality headsets are being designed with integrated eye tracking to enable compelling virtual social interactions. This paper shows that the near infra-red cameras used in eye tracking capture eye images that contain iris patterns of the user. Because iris patterns are a gold standard biometric, the current technology places the user's biometric identity at risk. Our first contribution is an optical defocus based hardware solution to remove the iris biometric from the stream of eye tracking images. We characterize the performance of this solution with different internal parameters. Our second contribution is a psychophysical experiment with a same-different task that investigates the sensitivity of users to a virtual avatar's eye movements when this solution is applied. By deriving detection threshold values, our findings provide a range of defocus parameters where the change in eye movements would go unnoticed in a conversational setting. Our third contribution is a perceptual study to determine the impact of defocus parameters on the perceived eye contact, attentiveness, naturalness, and truthfulness of the avatar. Thus, if a user wishes to protect their iris biometric, our approach provides a solution that balances biometric protection while preventing their conversation partner from perceiving a difference in the user's virtual avatar. This work is the first to develop secure eye tracking configurations for VR/AR/XR applications and motivates future work in the area.

Digital Object Identifier: 10.1109/TVCG.2020.2973052

18913D Hand Tracking in the Presence of Excessive Motion Blur

Gabyong Park, KAIST, Republic of Korea

Antonis Argyros, University of Crete and FORTH, Greece

Juyoung Lee, KAIST, Republic of Korea

Woontack Woo, KAIST, Republic of Korea

Read Article: view PDFPDF

Supplemental Material: Video

We present a sensor-fusion method that exploits a depth camera and a gyroscope to track the articulation of a hand in the presence of excessive motion blur. In case of slow and smooth hand motions, the existing methods estimate the hand pose fairly accurately and robustly, despite challenges due to the high dimensionality of the problem, self-occlusions, uniform appearance of hand parts, etc. However, the accuracy of hand pose estimation drops considerably for fast-moving hands because the depth image is severely distorted due to motion blur. Moreover, when hands move fast, the actual hand pose is far from the one estimated in the previous frame, therefore the assumption of temporal continuity on which tracking methods rely, is not valid. In this paper, we track fast-moving hands with the combination of a gyroscope and a depth camera. As a first step, we calibrate a depth camera and a gyroscope attached to a hand so as to identify their time and pose offsets. Following that, we fuse the rotation information of the calibrated gyroscope with model-based hierarchical particle filter tracking. A series of quantitative and qualitative experiments demonstrate that the proposed method performs more accurately and robustly in the presence of motion blur, when compared to state of the art algorithms, especially in the case of very fast hand rotations.

Digital Object Identifier: 10.1109/TVCG.2020.2973057

1902DGaze: CNN-Based Gaze Prediction in Dynamic Scenes

Zhiming Hu, Peking University, China

Sheng Li, Peking University, China

Congyi Zhang, The University of Hong Kong, China

Kangrui Yi, Peking University, China

Guoping Wang, Peking University, China

Dinesh Manocha, University of Maryland, USA

Read Article: view PDFPDF

Supplemental Material: Video

We conduct novel analyses of users' gaze behaviors in dynamic virtual scenes and, based on our analyses, we present a novel CNN-based model called DGaze for gaze prediction in HMD-based applications. We first collect 43 users' eye tracking data in 5 dynamic scenes under free-viewing conditions. Next, we perform statistical analysis of our data and observe that dynamic object positions, head rotation velocities, and salient regions are correlated with users' gaze positions. Based on our analysis, we present a CNN-based model (DGaze) that combines object position sequence, head velocity sequence, and saliency features to predict users' gaze positions. Our model can be applied to predict not only realtime gaze positions but also gaze positions in the near future and can achieve better performance than prior method. In terms of realtime prediction, DGaze achieves a 22.0% improvement over prior method in dynamic scenes and obtains an improvement of 9.5% in static scenes, based on using the angular distance as the evaluation metric. We also propose a variant of our model called DGaze_ET that can be used to predict future gaze positions with higher precision by combining accurate past gaze data gathered using an eye tracker. We further analyze our CNN architecture and verify the effectiveness of each component in our model. We apply DGaze to gaze-contingent rendering and a game, and also present the evaluation results from a user study.

Digital Object Identifier: 10.1109/TVCG.2020.2973473

1912Superhuman Hearing - Virtual Prototyping of Artificial Hearing: A Case Study on Interactions and Acoustic Beamforming

Michele Geronazzo, Aalborg University, Denmark

Luis S. Vieira, Khora VR, Denmark

Niels Christian Nilsson, Aalborg University, Denmark

Jesper Udesen, GNAudio A/S, Ballerup, Denmark

Stefania Serafin, Aalborg University, Denmark

Read Article: view PDFPDF

Directivity and gain in microphone array systems for hearing aids or hearable devices allow users to acoustically enhance the information of a source of interest. This source is usually positioned directly in front. This feature is called acoustic beamforming. The current study aimed to improve users' interactions with beamforming via a virtual prototyping approach in immersive virtual environments (VEs). Eighteen participants took part in experimental sessions composed of a calibration procedure and a selective auditory attention voice-pairing task. Eight concurrent speakers were placed in an anechoic environment in two virtual reality (VR) scenarios. The scenarios were a purely virtual scenario and a realistic 360 degrees audio-visual recording. Participants were asked to find an individual optimal parameterization for three different virtual beamformers: (i) head-guided, (ii) eye gaze-guided, and (iii) a novel interaction technique called dual beamformer, where head-guided is combined with an additional hand-guided beamformer. None of the participants were able to complete the task without a virtual beamformer (i.e., in normal hearing condition) due to the high complexity introduced by the experimental design. However, participants were able to correctly pair all speakers using all three proposed interaction metaphors. Providing superhuman hearing abilities in the form of a dual acoustic beamformer guided by head and hand movements resulted in statistically significant improvements in terms of pairing time, suggesting the task-relevance of interacting with multiple points of interests.

Digital Object Identifier: 10.1109/TVCG.2020.2973059

1923Augmented Virtual Teleportation for High-Fidelity Telecollaboration

Taehyun Rhee, Victoria University of Wellington, New Zealand

Stephen Thompson, Victoria University of Wellington, New Zealand<

Daniel Medeiros, Victoria University of Wellington, New Zealand

Rafael dos Anjos, Victoria University of Wellington, New Zealand

Andrew Chalmers, Computational Media Innovation Centre, New Zealand

Read Article: view PDFPDF

Supplemental Material: Video

Telecollaboration involves the teleportation of a remote collaborator to another real-world environment where their partner is located. The fidelity of the environment plays an important role for allowing corresponding spatial references in remote collaboration.We present a novel asymmetric platform,Augmented Virtual Teleportation(AVT), which provides high-fidelity telepresence of a remote VR user (VR-Traveler) into a real-world collaboration space to interact with a local AR user (AR-Host). AVT uses a 360°video camera(360-camera) that captures and live-streams the omni-directional scenes over a network. The remote VR-Traveler watching the video in a VR headset experiences live presence and co-presence in the real-world collaboration space. The VR-Traveler’s movements are captured and transmitted to a 3D avatar overlaid onto the 360-camera which can be seen in the AR-Host’s display. The visual and audio cues for each collaborator are synchronized in the Mixed Reality Collaboration space (MRC-space), where they can interactively edit virtual objects and collaborate in the real environment using the real objects as a reference. High fidelity, real-time rendering of virtual objects and seamless blending into the real scene allows for unique mixed reality use-case scenarios. Our working prototype has been tested with a user study to evaluate spatial presence, co-presence, and user satisfaction during telecollaboration. Possible applications of AVT are identified and proposed to guide future usage.

Digital Object Identifier: 10.1109/TVCG.2020.2973065

1934Effects of Depth Information on Visual Target Identification Task Performance in Shared Gaze Environments

Austin Erickson, University of Central Florida

Nahal Norouzi, University of Central Florida

Kangsoo Kim, University of Central Florida

Joseph J. LaViola Jr., University of Central Florida

Gerd Bruder, University of Central Florida

Gregory F. Welch, University of Central Florida

Read Article: view PDFPDF

Augmented reality (AR) setups have the capability of facilitating collaboration for collocated and remote users by augmenting and sharing their virtual points of interest in each user's physical space. With gaze being an important communication cue during human interaction, augmenting the physical space with each user's focus of attention through different visualizations such as ray, frustum, and cursor has been studied in the past to enhance the quality of interaction. Understanding each user's focus of attention is susceptible to error since it has to rely on both the user's gaze and depth information of the target to compute the endpoint of the user's gaze. Such information is computed by eye trackers and depth cameras respectively, which introduces two sources of errors into the shared gaze experience. Depending on the amount of error and type of visualization, the augmented gaze can negatively mislead a user's attention during their collaboration instead of enhancing the interaction. In this paper, we present a human-subjects study to understand the effects of eye tracking errors, depth camera accuracy errors, and gaze visualization on users' performance and subjective experience during a collaborative task with a virtual human partner, where users were asked to identify a target within a dynamic crowd. We simulate seven different levels of eye tracking error as a horizontal offset to the intended gaze point and seven different levels of depth accuracy errors that make the gaze point appear in front of or behind the intended gaze point. In addition, we examine four different visualization styles for shared gaze information, including an extended ray that passes through the target and extends to a fixed length, a truncated ray that halts upon reaching the target gaze point, a cursor visualization that appears at the target gaze point, as well as a combination of both cursor and truncated ray display modes.

Digital Object Identifier: 10.1109/TVCG.2020.2973054

1945Mind the Gap: The Underrepresentation of Female Participants and Authors in Virtual Reality Research

Tabitha C. Peck, Davidson College

Laura E. Sockol, Davidson College

Sarah M. Hancock, Davidson College

Read Article: view PDFPDF

A common goal of human-subject experiments in virtual reality (VR) research is evaluating VR hardware and software for use by the general public. A core principle of human-subject research is that the sample included in a given study should be representative of the target population; otherwise, the conclusions drawn from the findings may be biased and may not generalize to the population of interest. In order to assess whether characteristics of participants in VR research are representative of the general public, we investigated participant demographic characteristics from human-subject experiments in the Proceedings of the IEEE Virtual Reality Conferences from 2015-2019. We also assessed the representation of female authors. In the 325 eligible manuscripts, which presented results from 365 human-subject experiments, we found evidence of significant underrepresentation of women as both participants and authors. To investigate whether this underrepresentation may bias researchers’ findings, we then conducted a meta analysis and meta-regression to assess whether demographic characteristics of study participants were associated with a common outcome evaluated in VR research: the change in simulator sickness following head-mounted display VR exposure. As expected, participants in VR studies using HMDs experienced small but significant increases in simulator sickness. However, across the included studies, the change in simulator sickness was systematically associated with the proportion of female participants. We discuss the negative implications of conducting experiments on non-representative samples and provide methodological recommendations for mitigating bias in future VR research.

Digital Object Identifier: 10.1109/TVCG.2020.2973498

1955A Steering Algorithm for Redirected Walking Using Reinforcement Learning

Ryan R. Strauss, Davidson College

Raghuram Ramanujan, Davidson College

Andrew Becker, Bank of America

Tabitha C. Peck, Davidson College

Read Article: view PDFPDF

Redirected Walking (RDW) steering algorithms have traditionally relied on human-engineered logic. However, recent advances in reinforcement learning (RL) have produced systems that surpass human performance on a variety of control tasks. This paper investigates the potential of using RL to develop a novel reactive steering algorithm for RDW. Our approach uses RL to train a deep neural network that directly prescribes the rotation, translation, and curvature gains to transform a virtual environment given a user’s position and orientation in the tracked space. We compare our learned algorithm to steer-to-center using simulated and real paths. We found that our algorithm outperforms steer-to-center on simulated paths, and found no significant difference on distance traveled on real paths. We demonstrate that when modeled as a continuous control problem, RDW is a suitable domain for RL, and moving forward, our general framework provides a promising path towards an optimal RDW steering algorithm.

Digital Object Identifier: 10.1109/TVCG.2020.2973060

1964The Impact of a Self-Avatar, Hand Collocation, and Hand Proximity on Embodiment and Stroop Interference

Tabitha C. Peck, Davidson College

Altan Tutar, Davidson College

Read Article: view PDFPDF

Understanding the effects of hand proximity to objects and tasks is critical for hand-held and near-hand objects. Even though self-avatars have been shown to be beneficial for various tasks in virtual environments, little research has investigated the effect of avatar hand proximity on working memory. This paper presents a between-participants user study investigating the effects of self-avatars and physical hand proximity on a common working memory task, the Stroop interference task. Results show that participants felt embodied when a self-avatar was in the scene, and that the subjective level of embodiment decreased when a participant’s hands were not collocated with the avatar’s hands. Furthermore, a participant’s physical hand placement was significantly related to Stroop interference: proximal hands produced a significant increase in accuracy compared to non-proximal hands. Surprisingly, Stroop interference was not mediated by the existence of a self-avatar or level of embodiment.

Digital Object Identifier: 10.1109/TVCG.2020.2973061

1972Eye-dominance-guided Foveated Rendering

Xiaoxu Meng, University of Maryland, College Park

Ruofei Du, Google LLC

Amitabh Varshney, University of Maryland, College Park

Read Article: view PDFPDF

Supplemental Material: Supplemental PDF

Optimizing rendering performance is critical for a wide variety of virtual reality (VR) applications. Foveated rendering is emerging as an indispensable technique for reconciling interactive frame rates with ever-higher head-mounted display resolutions. Here, we present a simple yet effective technique for further reducing the cost of foveated rendering by leveraging ocular dominance – the tendency of the human visual system to prefer scene perception from one eye over the other. Our new approach, eye-dominance-guided foveated rendering (EFR), renders the scene at a lower foveation level (with higher detail) for the dominant eye than the non-dominant eye. Compared with traditional foveated rendering, EFR can be expected to provide superior rendering performance while preserving the same level of perceived visual quality.

Digital Object Identifier: 10.1109/TVCG.2020.2973442

1981ThinVR: Heterogeneous Microlens Arrays for Compact, 180 degree FOV VR Near-Eye Displays

Joshua Ratcliff, Intel Labs, USA

Alexey Supikov, Intel Labs, USA

Santiago Alfaro, Intel Labs, USA

Ronald Azuma, Intel Labs, USA

Read Article: view PDFPDF

Supplemental Material: Video

Supplemental Material: Supplemental Material

Today’s Virtual Reality (VR) displays are dramatically better than the head-worn displays offered 30 years ago, but today’s displays remain nearly as bulky as their predecessors in the 1980’s. Also, almost all consumer VR displays today provide 90-110 degrees field of view (FOV), which is much smaller than the human visual system’s FOV which extends beyond 180 degrees horizontally. In this paper, we propose ThinVR as a new approach to simultaneously address the bulk and limited FOV of head-worn VR displays. ThinVR enables a head-worn VR display to provide 180 degrees horizontal FOV in a thin, compact form factor. Our approach is to replace traditional large optics with a curved microlens array of custom-designed heterogeneous lenslets and place these in front of a curved display. We found that heterogeneous optics were crucial to make this approach work, since over a wide FOV, many lenslets are viewed off the central axis. We developed a custom optimizer for designing custom heterogeneous lenslets to ensure a sufficient eyebox while reducing distortions. The contribution includes an analysis of the design space for curved microlens arrays, implementation of physical prototypes, and an assessment of the image quality, eyebox, FOV, reduction in volume and pupil swim distortion. To our knowledge, this is the first work to demonstrate and analyze the potential for curved, heterogeneous microlens arrays to enable compact, wide FOV head-worn VR displays.

Digital Object Identifier: 10.1109/TVCG.2020.2973064

1991Scene-Aware Audio Rendering via Deep Acoustic Analysis

Zhenyu Tang, University of Maryland, USA

Nicholas J. Bryan, Adobe Research

Dingzeyu Li, Adobe Research

Timothy R. Langolis, Adobe Research

Dinesh Manocha, University of Maryland, USA

Read Article: view PDFPDF

Supplemental Material: Video

We present a new method to capture the acoustic characteristics of real-world rooms using commodity devices, and use the captured characteristics to generate similar sounding sources with virtual models. Given the captured audio and an approximate geometric model of a real-world room, we present a novel learning-based method to estimate its acoustic material properties. Our approach is based on deep neural networks that estimate the reverberation time and equalization of the room from recorded audio. These estimates are used to compute material properties related to room reverberation using a novel material optimization objective. We use the estimated acoustic material characteristics for audio rendering using interactive geometric sound propagation and highlight the performance on many real-world scenarios. We also perform a user study to evaluate the perceptual similarity between the recorded sounds and our rendered audio.

Digital Object Identifier: 10.1109/TVCG.2020.2973058

2002Physically-inspired Deep Light Estimation from a Homogeneous-Material Object for Mixed Reality Lighting

Jinwoo Park, KAIST, Republic of Korea

Hunmin Park, KAIST, Republic of Korea

Sung-eui,Yoon KAIST, Republic of Korea

Woontack Woo, KAIST, Republic of Korea

Read Article: view PDFPDF

Supplemental Material: Supplemental PDF

Supplemental Material: Video

In mixed reality (MR), augmenting virtual objects consistently with real-world illumination is one of the key factors that provide a realistic and immersive user experience. For this purpose, we propose a novel deep learning-based method to estimate high dynamic range (HDR) illumination from a single RGB image of a reference object. To obtain illumination of a current scene, previous approaches inserted a special camera in that scene, which may interfere with user’s immersion, or they analyzed reflected radiances from a passive light probe with a specific type of materials or a known shape. The proposed method does not require any additional gadgets or strong prior cues, and aims to predict illumination from a single image of an observed object with a wide range of homogeneous materials and shapes. To effectively solve this ill-posed inverse rendering problem, three sequential deep neural networks are employed based on a physically-inspired design. These networks perform end-to-end regression to gradually decrease dependency on the material and shape. To cover various conditions, the proposed networks are trained on a large synthetic dataset generated by physically-based rendering. Finally, the reconstructed HDR illumination enables realistic image-based lighting of virtual objects in MR. Experimental results demonstrate the effectiveness of this approach compared against state-of-the-art methods. The paper also suggests some interesting MR applications in indoor and outdoor scenes.

Digital Object Identifier: 10.1109/TVCG.2020.2973050

2012Live Semantic 3D Perception for Immersive Augmented Reality

Lei Han, HKUST, Hong Kong, China

Tian Zheng, Tsinghua University, China

Yinheng Zhu, Tsinghua University, China

Lan Xu, Tsinghua University, China

Lu Fang, Tsinghua University, China

Read Article: view PDFPDF

Supplemental Material: Video

Semantic understanding of 3D environments is critical for both the unmanned system and the human involved virtual/augmented reality (VR/AR) immersive experience. Spatially-sparse convolution, taking advantage of the intrinsic sparsity of 3D point cloud data, makes high resolution 3D convolutional neural networks tractable with state-of-the-art results on 3D semantic segmentation problems. However, the exhaustive computations limits the practical usage of semantic 3D perception for VR/AR applications in portable devices. In this paper, we identify that the efficiency bottleneck lies in the unorganized memory access of the sparse convolution steps, i.e., the points are stored independently based on a predefined dictionary, which is inefficient due to the limited memory bandwidth of parallel computing devices (GPU). With the insight that points are continuous as 2D surfaces in 3D space, a chunk based sparse convolution scheme is proposed to reuse neighboring points within each spatially organized chunk. An efficient multi-layer adaptive fusion module is further proposed for employing the spatial consistency cue of 3D data to further reduce the computational burden. Quantitative experiments on public datasets demonstrate that our approach works $11\times$ faster than previous approaches with competitive accuracy. By implementing both semantic and geometric 3D reconstruction simultaneously on a portable tablet device, we demo a foundation platform for immersive AR applications.

Digital Object Identifier: 10.1109/TVCG.2020.2973477

2023Using Facial Animation to Increase the Enfacement Illusion and Avatar Self-Identification

Mar Gonzalez-Franco, Microsoft Research

Anthony Steed, Microsoft Research

Steve Hoogendyk, Microsoft Research

Eyal Ofek, Microsoft Research

Read Article: view PDFPDF

Supplemental Material: Video

Through avatar embodiment in Virtual Reality (VR) we can achieve the illusion that an avatar is substituting our body: the avatar moves as we move and we see it from a first person perspective. However, self-identification, the process of identifying a representation as being oneself, poses new challenges because a key determinant is that we see and have agency in our own face. Providing control over the face is hard with current HMD technologies because face tracking is either cumbersome or error prone. However, limited animation is easily achieved based on speaking. We investigate the level of avatar enfacement, that is believing that a picture of a face is one’s own face, with three levels of facial animation: (i) one in which the facial expressions of the avatars are static, (ii) one in which we implement a synchronous lip motion and (iii) one in which the avatar presents lip-sync plus additional facial animations, with blinks, designed by a professional animator. We measure self-identification using a face morphing tool that morphs from the face of the participant to the face of a gender matched avatar. We find that self-identification on avatars can be increased through pre-baked animations even when these are not photorealistic nor look like the participant.

Digital Object Identifier: 10.1109/TVCG.2020.2973075

2030FibAR: Embedding Optical Fibers in 3D Printed Objects for Active Markers in Dynamic Projection Mapping

Daiki Tone, Osaka University, Japan

Daisuke Iwai, Osaka University, Japan

Shinsaku Hiura, University of Hyogo, Japan

Kosuke Sato, Osaka University, Japan

Read Article: view PDFPDF

Supplemental Material: Video

This paper presents a novel active marker for dynamic projection mapping (PM) that emits a temporal blinking pattern of infrared (IR) light representing its ID. We used a multi-material three dimensional (3D) printer to fabricate a projection object with optical fibers that can guide IR light from LEDs attached on the bottom of the object. The aperture of an optical fiber is typically very small; thus, it is unnoticeable to human observers under projection and can be placed on a strongly curved part of a projection surface. In addition, the working range of our system can be larger than previous marker-based methods as the blinking patterns can theoretically be recognized by a camera placed at a wide range of distances from markers. We propose an automatic marker placement algorithm to spread multiple active markers over the surface of a projection object such that its pose can be robustly estimated using captured images from arbitrary directions. We also propose an optimization framework for determining the routes of the optical fibers in such a way that collisions of the fibers can be avoided while minimizing the loss of light intensity in the fibers. Through experiments conducted using three fabricated objects containing strongly curved surfaces, we confirmed that the proposed method can achieve accurate dynamic PMs in a significantly wide working range.

Digital Object Identifier: 10.1109/TVCG.2020.2973444

2041On Motor Performance in Virtual 3D Object Manipulation

Alexander Kulik, Virtual Reality and Visualization Research, Bauhaus-Universität Weimar, Germany

André Kunert, Virtual Reality and Visualization Research, Bauhaus-Universität Weimar, Germany

Bernd Froehlich, Virtual Reality and Visualization Research, Bauhaus-Universität Weimar, Germany

Read Article: view PDFPDF

Fitts’s law facilitates approximate comparisons of target acquisition performance across a variety of settings. Conceptually, also the index of difficulty of 3D object manipulation with six degrees of freedom can be computed, which allows the comparison of results from different studies. Prior experiments, however, often revealed much worse performance than one would reasonably expect on this basis. We argue that this discrepancy stems from confounding variables and show how Fitts’s law and related research methods can be applied to isolate and identify relevant factors of motor performance in 3D manipulation tasks. The results of a formal user study (N=21) demonstrate competitive performance in compliance with Fitts’s model and provide empirical evidence that simultaneous 3D rotation and translation can be beneficial.

Digital Object Identifier: 10.1109/TVCG.2020.2973034

2051IlluminatedFocus: Vision Augmentation using Spatial Defocusing via Focal Sweep Eyeglasses and High-Speed Projector

Tatsuyuki Ueda, Osaka University, Japan

Daisuke Iwai, Osaka University, Japan

Takefumi Hiraki, Osaka University, Japan

Kosuke Sato, Osaka University, Japan

Read Article: view PDFPDF

Supplemental Material: Video

Aiming at realizing novel vision augmentation experiences, this paper proposes the IlluminatedFocus technique, which spatially defocuses real-world appearances regardless of the distance from the user’s eyes to observed real objects. With the proposed technique, a part of a real object in an image appears blurred, while the fine details of the other part at the same distance remain visible. We apply Electrically Focus-Tunable Lenses (ETL) as eyeglasses and a synchronized high-speed projector as illumination for a real scene. We periodically modulate the focal lengths of the glasses (focal sweep) at more than 60 Hz so that a wearer cannot perceive the modulation. A part of the scene to appear focused is illuminated by the projector when it is in focus of the user’s eyes, while another part to appear blurred is illuminated when it is out of the focus. As the basis of our spatial focus control, we build mathematical models to predict the range of distance from the ETL within which real objects become blurred on the retina of a user. Based on the blur range, we discuss a design guideline for effective illumination timing and focal sweep range. We also model the apparent size of a real scene altered by the focal length modulation. This leads to an undesirable visible seam between focused and blurred areas. We solve this unique problem by gradually blending the two areas. Finally, we demonstrate the feasibility of our proposal by implementing various vision augmentation applications.

Digital Object Identifier: 10.1109/TVCG.2020.2973496

2062Avatar and Sense of Embodiment: Studying the Relative Preference Between Appearance, Control and Point of View

Rebecca Fribourg, Inria, Rennes, France

Ferran Argelaguet, Inria, Rennes, France

Anatole Lécuyer, Inria, Rennes, France

Ludovic Hoyet, Inria, Rennes, France

Read Article: view PDFPDF

Supplemental Material: Supplemental PDF

Supplemental Material: Video

Aiming at realizing novel vision augmentation experiences, this paper proposes the IlluminatedFocus technique, which spatially defocuses real-world appearances regardless of the distance from the user’s eyes to observed real objects. With the proposed technique, a part of a real object in an image appears blurred, while the fine details of the other part at the same distance remain visible. We apply Electrically Focus-Tunable Lenses (ETL) as eyeglasses and a synchronized high-speed projector as illumination for a real scene. We periodically modulate the focal lengths of the glasses (focal sweep) at more than 60 Hz so that a wearer cannot perceive the modulation. A part of the scene to appear focused is illuminated by the projector when it is in focus of the user’s eyes, while another part to appear blurred is illuminated when it is out of the focus. As the basis of our spatial focus control, we build mathematical models to predict the range of distance from the ETL within which real objects become blurred on the retina of a user. Based on the blur range, we discuss a design guideline for effective illumination timing and focal sweep range. We also model the apparent size of a real scene altered by the focal length modulation. This leads to an undesirable visible seam between focused and blurred areas. We solve this unique problem by gradually blending the two areas. Finally, we demonstrate the feasibility of our proposal by implementing various vision augmentation applications.

Digital Object Identifier: 10.1109/TVCG.2020.2973077

2073Animals in Virtual Environments

Hemal Naik, Max Planck Institute of Animal Behavior, Centre for the Advanced Study of Collective Behaviour, University of Konstanz, Technische Universitaet Muenchen

Renaud Bastien, Max Planck Institute of Animal Behavior, Centre for the Advanced Study of Collective Behaviour, University of Konstanz

Nassir Navab, Technische Universitaet Muenchen

Iain Couzin, Max Planck Institute of Animal Behavior, Centre for the Advanced Study of Collective Behaviour, University of Konstanz

Read Article: view PDFPDF

Supplemental Material: Supplemental PDF

The core idea in an XR (VR/MR/AR) application is to digitally stimulate one or more sensory organs (e.g. visual, auditory, and olfactory) of the user in an interactive way to achieve an immersive experience. Since early 2000s biologists have been using Virtual Environments (VE) to investigate the mechanisms of behavior in non-human animals including insect, fish, and mammals. VEs have become reliable tools for studying vision, cognition, and sensory-motor control in animals. In turn, the knowledge gained from studying such behaviors can be harnessed by researchers designing biologically inspired robots, smart sensors, and multi-agent artificial intelligence. VE for animals is becoming a widely used application of XR technology but such applications have not previously been reported in the technical literature related to XR. Biologists and computer scientists can benefit greatly from deepening interdisciplinary research in this emerging field and together we can develop new methods for conducting fundamental research in behavioral sciences and engineering. To support our argument we present this review which provides an overview of animal behavior experiments conducted in virtual environments.

Digital Object Identifier: 10.1109/TVCG.2020.2973063

2084EarVR: Using Ear Haptics in Virtual Reality for Deaf and Hard-of-Hearing People

Mohammadreza Mirzaei, Vienna University of Technology, Austria

Peter Kán, Vienna University of Technology, Austria

Hannes Kaufmann, Vienna University of Technology, Austria

Read Article: view PDFPDF

Virtual Reality (VR) has a great potential to improve skills of Deaf and Hard-of-Hearing (DHH) people. Most VR applications and devices are designed for persons without hearing problems. Therefore, DHH persons have many limitations when using VR. Adding special features in a VR environment, such as subtitles, or haptic devices will help them. Previously, it was necessary to design a special VR environment for DHH persons. We introduce and evaluate a new prototype called “EarVR” that can be mounted on any desktop or mobile VR Head-Mounted Display (HMD). EarVR analyzes 3D sounds in a VR environment and locates the direction of the sound source that is closest to a user. It notifies the user about the sound direction using two vibro-motors placed on the user’s ears. EarVR helps DHH persons to complete sound-based VR tasks in any VR application with 3D audio and a mute option for background music. Therefore, DHH persons can use all VR applications with 3D audio, not only those applications designed for them. Our user study shows that DHH participants were able to complete a simple VR task significantly faster with EarVR than without. The completion time of DHH participants was very close to participants without hearing problems. Also, it shows that DHH participants were able to finish a complex VR task with EarVR, while without it, they could not finish the task even once. Finally, our qualitative and quantitative evaluation among DHH participants indicates that they preferred to use EarVR and it encouraged them to use VR technology more.

Digital Object Identifier: 10.1109/TVCG.2020.2973441

2094Pseudo-Haptic Display of Mass and Mass Distribution During Object Rotation in Virtual Reality

Run Yu, Virginia Tech, USA

Doug Bowman, Virginia Tech, USA

Read Article: view PDFPDF

We propose and evaluate novel pseudo-haptic techniques to display mass and mass distribution for proxy-based object manipulation in virtual reality. These techniques are specifically designed to generate haptic effects during the object’s rotation. They rely on manipulating the mapping between visual cues of motion and kinesthetic cues of force to generate a sense of heaviness, which alters the perception of the object’s mass-related properties without changing the physical proxy. First we present a technique to display an object’s mass by scaling its rotational motion relative to its mass. A psycho-physical experiment demonstrates that this technique effectively generates correct perceptions of relative mass between two virtual objects. We then present two pseudo-haptic techniques designed to display an object’s mass distribution. One of them relies on manipulating the pivot point of rotation, while the other adjusts rotational motion based on the real-time dynamics of the moving object. An empirical study shows that both techniques can influence perception of mass distribution, with the second technique being significantly more effective.

Digital Object Identifier: 10.1109/TVCG.2020.2973056

2104Immersive Process Model Exploration in Virtual Reality

André Zenner, German Research Center for Artificial Intelligence (DFKI), Germany

Akhmajon Makhsadov, German Research Center for Artificial Intelligence (DFKI), Germany

Sören Klingner, German Research Center for Artificial Intelligence (DFKI), Germany

David Liebemann, German Research Center for Artificial Intelligence (DFKI), Germany

Antonio Krüger, German Research Center for Artificial Intelligence (DFKI), Germany

Read Article: view PDFPDF

Supplemental Material: Supplemental Material

Supplemental Material: Video

In many professional domains, relevant processes are documented as abstract process models, such as event-driven process chains (EPCs). EPCs are traditionally visualized as 2D graphs and their size varies with the complexity of the process. While process modeling experts are used to interpreting complex 2D EPCs, in certain scenarios such as, for example, professional training or education, also novice users inexperienced in interpreting 2D EPC data are facing the challenge of learning and understanding complex process models. To communicate process knowledge in an effective yet motivating and interesting way, we propose a novel virtual reality (VR) interface for non-expert users. Our proposed system turns the exploration of arbitrarily complex EPCs into an interactive and multi-sensory VR experience. It automatically generates a virtual 3D environment from a process model and lets users explore processes through a combination of natural walking and teleportation. Our immersive interface leverages basic gamification in the form of a logical walkthrough mode to motivate users to interact with the virtual process. The generated user experience is entirely novel in the field of immersive data exploration and supported by a combination of visual, auditory, vibrotactile and passive haptic feedback. In a user study with N = 27 novice users, we evaluate the effect of our proposed system on process model understandability and user experience, while comparing it to a traditional 2D interface on a tablet device. The results indicate a tradeoff between efficiency and user interest as assessed by the UEQ novelty subscale, while no significant decrease in model understanding performance was found using the proposed VR interface. Our investigation highlights the potential of multi-sensory VR for less time-critical professional application domains, such as employee training, communication, education, and related scenarios focusing on user interest.

Digital Object Identifier: 10.1109/TVCG.2020.2973476

2115Presence, Mixed Reality, and Risk-Taking Behavior: A Study in Safety Interventions

Sogand Hasanzadeh, Virginia Tech, USA

Nicholas F. Polys, Virginia Tech, USA

Jesus M. de la Garza, Clemson University, USA

Read Article: view PDFPDF

Supplemental Material: Video

Immersive environments have been successfully applied to a broad range of safety training in high-risk domains. However, very little research has used these systems to evaluate the risk-taking behavior of construction workers. In this study, we investigated the feasibility and usefulness of providing passive haptics in a mixed-reality environment to capture the risk-taking behavior of workers, identify at-risk workers, and propose injury-prevention interventions to counteract excessive risk-taking and risk-compensatory behavior. Within a mixed-reality environment in a CAVE-like display system, our subjects installed shingles on a (physical) sloped roof of a (virtual) two-story residential building on a morning in a suburban area. Through this controlled, within-subject experimental design, we exposed each subject to three experimental conditions by manipulating the level of safety intervention. Workers’ subjective reports, physiological signals, psychophysical responses, and reactionary behaviors were then considered as promising measures of Presence. The results showed that our mixed-reality environment was a suitable platform for triggering behavioral changes under different experimental conditions and for evaluating the risk perception and risk-taking behavior of workers in a risk-free setting. These results demonstrated the value of immersive technology to investigate natural human factors.

Digital Object Identifier: 10.1109/TVCG.2020.2973055

2126Toward Standardized Classification of Foveated Displays

Josef Spjut, NVIDIA Corporation

Ben Boudaoud, NVIDIA Corporation

Jonghyun Kim, NVIDIA Corporation

Trey Greer, NVIDIA Corporation

Rachel Albert, NVIDIA Corporation

Michael Stengel, NVIDIA Corporation

Kaan Akşit, NVIDIA Corporation

David Luebke, NVIDIA Corporation

Read Article: view PDFPDF

Emergent in the field of head mounted display design is a desire to leverage the limitations of the human visual system to reduce the computation, communication, and display workload in power and form-factor constrained systems. Fundamental to this reduced workload is the ability to match display resolution to the acuity of the human visual system, along with a resulting need to follow the gaze of the eye as it moves, a process referred to as foveation. A display that moves its content along with the eye may be called a Foveated Display, though this term is also commonly used to describe displays with non-uniform resolution that attempt to mimic human visual acuity. We therefore recommend a definition for the term Foveated Display that accepts both of these interpretations. Furthermore, we include a simplified model for human visual Acuity Distribution Functions (ADFs) at various levels of visual acuity, across wide fields of view and propose comparison of this ADF with the Resolution Distribution Function of a foveated display for evaluation of its resolution at a particular gaze direction. We also provide a taxonomy to allow the field to meaningfully compare and contrast various aspects of foveated displays in a display and optical technology-agnostic manner.

Digital Object Identifier: 10.1109/TVCG.2020.2973053