Surgeon s Magic Wand: A Screen Pointing Interactive Method

Pages 7
Views 8

Please download to get full document.

View again

of 7
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Description
Surgeon s Magic Wand: A Screen Pointing Interactive Method Naren Vira, Professor Department of Mechanical Engineering Howard University, Washington, D.C Shaleen Vira, Student College of Arts and
Transcript
Surgeon s Magic Wand: A Screen Pointing Interactive Method Naren Vira, Professor Department of Mechanical Engineering Howard University, Washington, D.C Shaleen Vira, Student College of Arts and Science New York University, New York, N.Y Abstract: A novel, non-touch, screen pointing magic wand interface is proposed for surgeon s use in an environment requiring simultaneous display of several patients data over a continuous period of time. The magic wand or passive pointing device does not have any active energy source within it (as opposed to a laser pointer) and thus cannot easily be detected or identified. Thus, modeling and simulation task is carried out by generating high resolution color images of a pointer viewing via two digital cameras with a popular three-dimensional (3D) computer graphics and animation program, Studio 3D Max by Discreet is used for detection. These images are then retrieved for analysis into a Microsoft s Visual C ++ program developed based on the theory of image triangulation. The program outputs the precise coordinates of the surgeon s wand in the 3D space along with its projection on a large view screen. The computational results of the pointer projection are compared to the known locations specified by the Studio 3D Max for different simulated configurations. High pointing accuracy is achieved: a pointer kept 30 feet away correctly hits the target location within a few inches. This preliminary work will lead to a complex development for interactive hand pointing gesture recognition system and its applicability to a large viewing display environment. Keywords: Surgeon s wand, screen pointing interface, passive pointer, image processing, hand pointing gesture 1.0 Introduction Advances in medical research and technology drive the future of medicine, holding the promise of earlier and more accurate diagnosis of disease as well as safer and more effective treatments. Access to information remains crucial for delivering the most advanced care to patients. Despite a revolution in medical technology in the past few decades, access to mission-critical patient data continues to be inadequate in most settings for most physicians. As the health care information technology industry has evolved, information has been divided between numerous disparate systems with no common viewing portal. A physician confronted with the need to make fast critical patient care decisions often is forced to spend time hunting and gathering for clinically relevant data [1]. Data in the form of electronic medical records, digital images, video streams, radiology reports, patient records and pharmacy documents are available to the medical team via disparate systems. When the clinician must make patient care decisions outside the medical facility-from home, office or any other remote location, access to information is even more limited. But advances in computer and communication technologies make it possible to craft 21st century solutions to these problems, bridging the gap between the clinician and vital medical data. Thus, to address the issue of putting together necessary patient s information at one place, commonviewing large displays are now commercially available in the market place [1]. Rapid improvements in CPU performance, storage density, and network bandwidth have provided sufficient bandwidth and computational resources to support high-resolution displays and natural human-computer interactions. Nowadays, the main bandwidth bottleneck in an interactive computer system occurs in the link between computer and human, not between computer components within the system. The large display devices, such as projectors and flat panels, are rapidly becoming commodity items. Figure 1 shows usage of a large display system at a typical traffic management center. Meanwhile, new display technologies, such as organic light-emitting diodes (OLED), will soon become available at inexpensive prices. They can be attached to almost any kind of surface, allowing unlimited freedom of design for the interiors and exteriors of rooms and buildings. We believe that new display technologies will revolutionize the way we use computers, making us rethink the relationship between information technology and our society. As an example, consider how wall-sized displays enable qualitatively different human-computer interactions than traditional desktop displays. The research community and design industry have long been interested in interactions with large-format displays, with much of the early research focusing on single whiteboard-sized displays. More recently, the rapidly decreasing cost of projectors have spurred construction of wall-sized displays by tiling multiple projectors to form a single virtual image [2-8]. These multiple projector displays are particularly interesting from an interaction perspective in that the high resolution provided by the tiling of multiple projectors lets users view high-quality images even when they [the projectors] are up close to the display. The scenario of interaction with a large scale display system spanning an entire wall is completely different from single-user desktop applications. Specifically, one cannot rely on traditional control mechanisms using keyboard and mouse. The laser pointers as pointing devices are used to indicate specific positions on viewing screens. Laser pointers utilize an active energy source, a concentrated photon/energy beam that streams from the device to the nearest physical object, hopefully the slide/screen. Occasionally, accidental pointing is hazardous. They are restricted in use as an interactive tool. Alternatively, the present research work demonstrates the use of a passive device, one that does not require any energy source. However, external detecting mechanisms to precisely identify where the pointer is pointing to are required. To achieve this requisite, two high resolution color cameras and image triangulation methodology for pointer detection analysis were employed. Figure 2 depicts conceptual utilization of surgeon s magic wand as an interactive pointing device. In a standard graphical user interface, keyboard and mouse are typical input devices that are used to type commands, write text, and point and select graphical elements at specific locations of a graphic display. The graphical user interface can usually be categories based on two-dimensional (icons are clickable to represent procedures or pieces of multimedia information) or three-dimensional (3D mouse or joystick) environments. The primary restriction on these interfaces is that the users have to sit in front of the computer monitor or view screen, thus limiting user mobility. Advanced user interfaces, such as those employed in many augmented and virtual reality applications, achieve higher user mobility [9]. The input devices for advanced interfaces can either be wearable devices (gloves, glasses or body marks) or non-wearable devices (microphones, laser pointers or vision cameras). Note that non-wearable devices are non-contact type. The audio processing and voice recognition software is needed in the case of microphone use whereas vision cameras require image processing software. These devices are non-intrusive, and can support natural interaction as the user can express commands and actions through voice and gestures in the same way as in everyday life. An active research trend in computer vision is the development of robust, environment-independent tracking methodologies for the development of effective human-computer interfaces [10 & 11]. Several vision-based interaction approaches have been presented so far. The aim is to derive a semantic interpretation of human hand gestures and facial expressions [12]. The interaction by means of gesture languages to be recognized by computer vision technique provides several advantages: (1) there is no mechanical part in the interaction, avoiding the problems caused by degradation of those hardware parts; (2) recognition is remote: the user can be located far away from the perceptual and computational apparatus, no contact interaction is required; and (3) the definition of interaction language is very flexible: many variations could be defined for the same task in search of an optimal one. Among most of the human-computer interfaces, the vision-based hand pointing systems appear to be particularly promising. Because hand pointing is an everyday life operation reflecting a specific interest in a specific portion of the visible space, it does not require any a priori skills or training, and is a perfect candidate for the design of a natural interaction device based on computer vision. Thus, to develop such an interface is our focus of the present research work. However, we first develop interface technology using magic wand as an interactive tool for large display environment. Such initial development can easily be adopted with minor modification for more complex interactive hand pointing interface and gesture recognition system. Magic wands have a presence in the history and legends of human cultures from thousands of years ago all the way to the present day, and are surrounded by rich systems of belief. Often carried by wizards, these sticks, or in some cases large rods, focus magical strength. Some anthropologists believe that Stone Age cave paintings showing people with sticks are meant to portray leaders of clans holding wands to attest to their power. That is only a guess, but strong evidence goes back at least to the time of ancient Egypt, in which hieroglyphs show priests holding small rods. In Greek mythology, the messenger of the Greek gods carries a special wand called a caduceus. This is a rod with wings, around which tow serpents are twisted, meant to signify wisdom and healing powers. Physicians adopted it as their symbol hundreds of years ago and still use it today. This and more historical background on magic wands can be found in Ref [13]. Magic wands are simple objects that respond to human gesture, speech, emotion, and even thought, and thanks to modern-day books and movies, they are widely understood from an early age a symbols of great empowerment. Many who have experienced these stories may have gained somewhat of a mental model or an intuitive sense for how to use a magic wand, i.e. what kinds of gesture can be made with it or what words should be said to cast a spell ( abracadabra!, hocus pocus! ---). Considering these factors, the magic wand presents an interesting design opportunity as a form for a tangible computer interface. This work is also a stepping stone for developing an intelligent non-touch computer screen interface. Let us visualize two web cameras mounted on top of a computer screen viewing the computer user. The camera can track a non-touch passive pointer or user s finger as it approaches the screen. Once the camera and associated interface identify the pointing location on the screen, it can zoom in or out showing details as the finger, respectively, move towards or away from the screen. A simple example would be to view a geographical map with zooming in and out capability. The interface can also pop out or display additional details/ information, if needed, in another window of the pointing location. The example for this scenario would be its use in the Physician s Dashboard where medical staff can instantly access a patient s status and all relevant medical data by a single point clicking in the corresponding patient s viewing window. 2.0 Description The main system components of the human-computer interface for an intelligent interactive environment are shown in Figure 3. The computing system receives input from a pair of color cameras place in a way that the user s pointing action can be viewed for most of the work setting (to avoid occlusion). The user is holding a passive pointer or magic wand coded with two distinct colors as depicted in Figure 4. These colors are chosen for quick image processing and do not jeopardize underling methodology. The pointer would be replaced by the user finger in the future work. The computing system performs image analysis using image triangulation technique and outputs interaction parameters reflecting user s status. Based on computational analysis, the systems also send a specific color beam to a digital projector to identify the action of hand pointing (the point of hand projection). Different color beams are activated on the viewing screen when multiple users are employed (distinguishing who is pointing where). 3.0 Analysis The basic geometry problem to be solved is how to compute the location of interest P i in Figure 3 from image observation. The intensity of each image pixel in RGB color space for a pair of images can be represented in a vector form as P (I, J) = R (I, J) e 1 + G (I, J) e 2 + B (I, J) e 3 (1) The symbol I and J stand for pixel coordinates, and e 1, e 2 and e 3 are unit vectors along R, G, and B color space, respectively. The terms R (I, J), G (I, J), and B (I, J), respectively, represent red, green and blue color intensities. As opposed to stereo matching algorithm (correspondence of every image pixel is found), here we are only interested in identifying those pixels that corresponds to the pointer in one image and respective matching pixels in another image viewed from a second camera. More precisely, if we marked the pointer s ends with two distinct colors then only those pixels are required to be matched in both images. Without loss of generality, let us say that one end is marked with red color and other is with blue. Because we are only interested in matching the pointer s red or blue color end pixels of each image, Equation (1) can be rewritten as P (I, J) = R (I, J) for the red color end pixels and P (I, J) = B (I, J) for the blue color end pixels. Alternatively, we scan the whole image to identify all pixel-coordinates I and J that represent either red or blue color end of the pointer. From this information, we compute the centroid of each color end. That is P 1 (I, J) and P 2 (I, J) centroids for the red color end as shown in Figure 5. The term centroid and mid point of the color end are interchangeable because of two dimensional coordinate system representation. We assume that the centroid points P 1 (I, J) and P 2 (I, J) represent the matching points. This assumption is valid because the pointer dimensions are relatively small with respect to the physical dimension of the room. Thus, the implication is that the process of disparity analysis needed for stereo matching is not required and the task of finding matching pixels is considerably simplified. The same analysis can be applied for finding the matching points corresponding to the blue color end of the pointer. It should be emphasized that we deliberately chosen two distinct color-ends to simplify and speed up the process of image scanning. One can choose other pixel matching methods depending upon the application. By knowing the x and y coordinates of each centroid point (after correlating image pixels with the space coordinates) of the pointer in a single image, we can mathematically pass a line through these two points to describe a pointer in a 2D space. Now the process of triangulation in required to compute the three-dimensional coordinates of the pointer from these two images (i.e., from four centroid points). 3.1 Three-dimensional Triangulation Technique We apply ray casting analysis to triangulate three-dimensional coordinates of each image pixel point in a space as it is viewed by two cameras with respect to a chosen reference frame. Without loss of generality, the reference fame could be at one of the cameras center. We have chosen the center location of camera 2 as the frame of reference. Each ray is cast from the viewpoint (in this case, center of the camera) through each pixel of the projection plane (in this case, image planes 1 and 2) into the volume dataset. The two rays wherever they intersect in a 3D space determine the coordinates of a point viewed by both cameras as shown in Figure 6. By connecting all intersecting points in the volume dataset, we can generate a 3D point cloud floating in a space. We utilize four points at a time (two in each image) to compute the three-dimensional coordinates of the pointer s end. Thus, the location of the pointer can be identified in a 3D space from the knowledge of its two ends. The computation of a common point from two rays reduces to a problem of two-line intersection each radiating from the center of a camera. The ray line is generated by two points in each image as shown in Figure 7. One point on the line is defined by the camera center and the second point by the centroidal pixel of the pointer end in the image plane (i.e., P 1 or P 2 in Figures 5, 6, and 7). Note that P 1 and P 2 are image matched points. The point sets (C 1, P 1 ) and (C 2, P 2 ) are situated on the ray lines 1 and 2, respectively, in a frame of reference (x, y, z). Since the points P 1 (I, J) and P 2 (I, J) are identified by the pixel coordinates, they need to be converted into the physical space by a transformation: f * tan (half view angle of camera) x distance per pixel = (2) (Image width in pixel) / 2 Similarly, y distance per pixel can be correlated. Note that f denotes camera focal length. Because we are interested in computing coordinates of the common point P, let us define each point on the line in x, y, and z reference frames as P = x i + y j + z k. That is P 1 = P x1 i + P y1 j + P z1 k, P 2 = P x2 i + P y2 j + P z2 k, C 1 = C x1 i + C y1 j + C z1 k, C 2 = C x2 i + C y2 j + C z2 k (3) Where i, j, and k are unit vectors along x, y and z axes, respectively. With the condition that the four points must be coplanar (when the lines are not skewed), we can write (C 2 C 1 ) [(P 1 C 1 ) x (P 2 C 2 )] = 0 (4) where the symbols and x represent vector dot and cross product respectively. If s and t are scalar quantities then the common point P can be expressed parametrically as P = C 1 + s (P 1 C 1 ) = C 1 + s A = C 2 + t (P 2 C 2 ) = C 2 + t B (5) Simultaneous solution of equations (5) yields the value of s as [(C 2 - C 1 ) x B)] (A x B) s = (5b) A x B 2 3.2 Accounting for Camera s Rotations Six degrees-of-freedom are required to uniquely describe a point in three-dimensional space. One can choose three linear and three rotational coordinate axes. Determination of the pointer s position defined by three linear coordinates (x, y, z) is presented above, whereas orientation of the pointer specified by three rotations (θ, φ, ψ) is given in this section. Thus the rotational motion of the camera is accounted for by the pointer s position and orientation analysis. Define camera s each axis of rotation as pitch, yaw and roll along x, y and z axes, respectively, as depicted in Figure 8. Hence, each axis transformation is given by (6) (7) (8) Where, the notations S(angle) = sin (angle) and C(angle) = cos (angle) are used. The combined transformation pitch-yaw-roll can be written as PYR The world coordinates (x, y, z) are, thus, related to camera s view coordinates (x, y, z ) as (9) (10) Note that inverse transformation is used to account for camera rotations. 3.3 Point of Projection on a View Screen Knowing the three-dimensional coordinates of a common point corresponding to each end of the pointer (after triangulation of red and blue centroids), we can represent the pointer in a 3D space by a line passing through these two points. Figure 9 depicts the pointer connecting red and blue centroidal points P r and P b respectively. The projection of this line on a plane described by the view screen is of our interest. Thus, the problem is now simplified to finding coordinates of intersecting point between line and a plane as shown by point P i in Figure Equation of a Plane Describing the Screen The standard equation of a plane in a 3D space is: Ax + By + Cz + D = 0 (11)
Advertisements
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks
SAVE OUR EARTH

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...

Sign Now!

We are very appreciated for your Prompt Action!

x