Journal of Young Investigators
    Undergraduate, Peer-Reviewed Science Journal
Volume Two
    RESEARCH ARTICLE
RECENT ISSUES | ARCHIVES | RESOURCES | JYI NEWS | ABOUT JYI 
Issue 1, June 1999

Engineering & Applied Sciences
Evaluation of an Optical Human Motion Tracking System

Mark Palatucci
University of Pennsylvania

Abstract

We present an evaluation of the ExpertVision HiRES optical motion capture system. Using a spreadsheet, we performed a statistical analysis on motion data from a female dancer. This analysis provided insight to the major sources of error in the HiRES device. Among our results, we found the major causes of error to be optical marker movement and occlusion. We also analyzed how this error inhibits the animation of anatomically correct human models.

Introduction

Movement is perhaps the most significant interaction that humans have with their environment. Various motions can convey messages, manipulate objects, and traverse surroundings. Virtually all human processes, both internal and external, are a result of some sort of movement. Given this significance it is no surprise that movement must be an integral part of any computer generated or virtual environment (VE). It requires a good VR system to provide these realistic interactions. Such systems will capture the motions of their users while simulating the movements of various virtual objects (Matsuba and Roehl, 1996).

VR systems use tracking to capture and produce motion data. There are several methods of tracking that include mechanical, ultrasonic, magnetic, and optical. While each type of system has its own advantages and disadvantages, recent advances in technology have made optical capture the method of choice for many animators and developers.

One of the most popular optical motion capture systems is the Motion Analysis Corporation (MAC) ExpertVision HiRES. This system uses multiple cameras to record the motion of actors wearing reflective markers. The included software then analyzes the motion data from the various camera views to produce 3-D location coordinates for each marker. These coordinates can then be exported to various animation packages such as Alias/Wavefront, SoftImage, Prism, and Nichiman Graphics. This export ability allows users of the HiRES system to produce incredibly realistic motions in a fraction of the time that key-frame animation would take to produce similar movements.

Another time saving aspect of the HiRES system is what MAC calls "Accumulation of Digital Resources." The feature allows users of the HiRES to capture one set of motion data and then apply that data to any number of figures regardless of age, size, race, clothing, etc.. While many people praise this feature of the HiRES, some criticize it saying that it may produce movements that are unrealistic for the given character. Others qualify this feature as useful in some situations while unhelpful in others.

The debate raises several issues about the overall usefulness of the HiRES motion capture system. These issues include: 1) When should key-frame animation1 be used over motion capture? When should motion capture be used over key-frame animation? 2) What are the ideal uses of the HiRES system? 3) To what level of realism can the motion data be trusted? What kind of characters can be realistically animated with one set of data? 4) What repercussions are there for animating only a portion of a human figure with the HiRES system?

While this paper analyzes the HiRES device, the discussion is relevant to all reflector based optical systems. The evaluation will first present an analysis of the error involved in the device. From the data we will then discuss the various issues described above.

Methods

The optical capture device that was used for this study was the Motion Analysis Corporation ExpertVision HiRES. The system uses a combination of high speed cameras and reflective markers to capture the motion of a performer. As the performer executes a movement, light from each camera is reflected by the markers. The cameras then filter this light to record a 3-D location of each marker.

The trajectories of the reflective (real) markers are used to reference the internal skeletal system of the performer. In reality, the internal skeletal system is defined by "virtual markers" which are nothing more than estimated joint centers. The real markers are used to define these virtual marker locations. The basic method for defining a virtual marker is to physically estimate the distance between a real marker and a joint center. After virtual markers are defined, the rest of the setup includes forming a body segment hierarchy and defining the motion area.

The setup process is the most difficult aspect of this HiRES system. Since this paper is an evaluation of the system, we wanted the setup to be as accurate as possible. Rather than perform our own clumsy configuration, we analyzed data given to us by Motion Analysis Corporation. The data is that of a human female dancing and was recorded at 60 frames per second. The motion lasts 15 seconds for a total of 900 frames of data.

This dance demo links the virtual markers together to create a 20 segment human "stick" figure. Each segment has its own origin and local coordinate system that enables calculation of joint translations and rotations. A segment hierarchy is then defined to interpret these coordinate systems. The hierarchy describes the parent-child relationships of the different segments. A child's coordinate system is relative to its parent's. The main parent of the hierarchy is the root, and in our demo the root is the lower torso. Its translations and rotations are relative to the global coordinate system or capture space.

Defining virtual markers and a segment hierarchy is only necessary when using the HiRES .htr (hierarchical translation and rotation) format. The are currently two formats that the HiRES uses for animation purposes. The first, the track row column (.trc) output file, contains XYZ position values for the reflective markers relative to the global coordinate system. With this approach the animation software must use inverse kinematics to create joint translations and rotations (Badler and Hollick, 1993). The second format is the .htr format mentioned above. With this format the output file contains translation and rotation information for each segment relative to its parent segment. For our study, we consider the .htr format as it allows the segment based motion of a single subject to animate a variety of different models.

The hierarchical translation and rotation (htr) format separates the body into twenty distinct segments. In each frame of data the length of each segment is determined by measuring the distance between joint centers (virtual markers). Rather than report actual segments lengths for each frame, the software supplied with the HiRes system records each segment length as a "scale factor". For each segment, the software calculates the actual length for the first frame of data (scale factor 1.0). The segment length in subsequent frames is then reported as a scale factor (or percentage) of the length measured in the first frame. For example, if the segment length in frame ten is calculated to be 97% of the length measured for that segment in frame one, the scale factor for frame ten would be reported as 0.97.

We performed our statistical analysis on the 18,000 (900 frames * 20 segments) scale factors recorded during the dance demo. We calculated the mean, mode, and standard deviation values for each body segment. The standard deviation value is most important as it gives a percentage approximation of the error involved for each body part.

Analysis & Results

We performed this statistical error analysis of the dance demo data using a simple spreadsheet. This analysis raised several important questions that need to be considered.

Question 1: What are the major sources/causes of error in the HiRes motion capture system?

We found that the average error over the entire body was 5.3% (an average segment deviation of 0.053). For realism in motion capture, this error may present noticeable differences. Hypothetically consider the mode human male to be 6' tall. A 5.3% deviation would mean that the majority of men would fall between 5' 8'' and 6' 4''.If realism is the prime objective, motion data from a 6' male may be inaccurate for properly animating both 5' 8'' and 6' 4'' males. We further consider this realism problem in the discussion section.

What are the sources of these segment deviations? We believe these deviations to be a direct result of the HiRes motion capture process. Remember that the segment lengths are nothing but distances between joint centers (virtual markers). These virtual marker locations are estimated when the performer is suited up. The estimation requires the setup crew to approximate the distance from the real markers to the corresponding joint center. These human approximations can cause inaccurate numbers to be reported for the segment lengths.

Also consider that the real markers placed on the performer are not entirely static. Some may be attached with rubber bands while others may be attached with glue. Some markers may be attached directly to the skin while others may be fastened to clothing. The point is that the real markers do not define a rigid human skeleton. Their ability to move causes the virtual markers to move and hence the segment lengths will be improperly reported.

Another major source of error comes from the optical nature of this device. When a marker cannot be seen from any the cameras it is said to be occluded or obscured. The HiRes system does take this into account. Unlike many less developed systems, the HiRes incorporates a best fit algorithm (spline polynomial) to estimate the location of a real marker based on its location in nearby motion frames. Unfortunately, this algorithm requires post processing and cannot help the occlusion problem for real time applications.

Question 2: How does the error vary with respect to individual body segments?

Although the average segment deviation is 5.3%, individual body parts vary from as little as 0.5% (the head segment) to 13.8% (the neck segment). We believe this discrepancy to be a direct result of occlusion and skin surface differences.

Just like the method of attachment (glue, rubber bands, etc.), the surface of attachment seems to cause error by allowing slippage in the real marker positions. This means that the skin variations over the body affect the segment length estimates. Consider the surface of the head. The skin located near the temples (where the real head markers are attached) does not stretch much during normal body movement. The segment deviation of the head is very small (0.5%). Other segment surfaces that do not stretch much during movement include the torso, lower legs, and pelvis sections. These segments have deviations of less than 4%.

The skin surface of other segments can vary a relatively large amount. The skin around the neck stretches and compresses enough to cause deviations of up to 13.8%. Other skin sections with large variations are the wrists, collars, and ankles. The corresponding body segments have deviations of over 9%. In general, the closer the real marker is to a joint, the greater the deviation in the corresponding body segment.

The other major factor that affects the discrepancy in segment deviations is occlusion. Simply put, certain markers get obscured more than others. The markers placed on the head are unlikely to be blocked throughout a simple motion. Even simple motions however, have many complex translations and rotations around the "extremity" segments such as the hands or feet. Markers on these body parts can loose their tracking more readily than those on other parts. When tracking is lost, the HiRes system applies the best fit algorithm to estimate the real marker location. The accuracy of this estimate depends on the number of frames the marker is occluded.

Question 3: How does the error vary with respect to body symmetry?

It is possible to see that segment error is greater for most segments on the left side of the body. For those with moderate to heavy deviations (5.0% +) the left segment deviates 1%-4% more than the corresponding segment on the right side of the body. Consider the left collar bone. It deviates by 12.7% while the right collar bone deviates by 8.9%. Other large discrepancies occur in the hands (9.2% vs. 6.7%), and feet (8.0% vs. 5.2%).

This problem was particularly puzzling until we viewed the dance sequence from a "bird's eye view". When looking at the motion with respect to the camera locations it was possible to see that the majority of the motion was concentrated towards the right portion of the motion area. This may explain why the left body segments were occluded more often.


Question 4: How can the accuracy of the device be increased?

After considering the sources and causes of error we can suggest several methods for increasing the accuracy of the HiRes device. Unfortunately most of these suggestions require additional setup time and/or cost to the already expensive optical process. It is important to consider what level of accuracy is desired before implementing these process recommendations.

The easiest and zero cost way to increase the accuracy of the of the device is to raise the frame capture rate. The HiRes cameras can capture motion data from 60 fps to 240 fps. Unfortunately, increasing the frame rate does not really have a large effect on the error unless the motion is extremely fast and complex. For normal human motion, a frame rate of 60 fps is usually enough. Capturing at high rates produces more information than is often needed for animation purposes. Although this is the cheapest way to reduce error, there are other low cost options that are much more effective.

We noted above that a major source of error came from marker slippage or movement. It is important to consider this when attaching the real markers to the performer. If the marker movement can be reduced than the segment length calculations will be much more accurate. We see two ways to reduce marker movement. The first is to attach the markers directly to the skin wherever possible. Although the skin is not rigid, it is much more resistant to stretching and compressing than clothing. When clothing must be used, try spandex as it hugs the body well. The second method of reducing marker movement is to attach the markers as firmly as possible. Use glue or adjustable straps instead of rubber bands. If the marker is attached firmly then it will only move as much as the surface itself moves. For those on low/no budgets, marker attachment is the most cost effective method to reduce segment deviation.

For those with money to spend, there are other options for reducing error. The most expensive and by far the best way of reducing error is to have more capture cameras. If more angles are covered than markers are less likely to be occluded. Eliminating occlusion is the best way of reducing error in any optical capture system. This option is not always practical because the high speed cameras are very expensive. A lower cost alternative is to add more optical markers to the performer. The particular advantage of this method is that the accuracy of individual segments can be increased. The new markers further define the location of a joint center. The more markers that define a particular joint center, the less effect any one occluded marker will have on the segment length calculation.

Discussion

We now relate our analysis back to the general issues presented in the Introduction.

Issue 1: When should key-frame animation be used over motion capture? When should motion capture be used over key-frame animation?

The contention between key-frame animators and motion capture professionals has never been greater. While key-frame animators argue that motion capture devices will never replace human creativity and imagination, motion capture professionals (MCPs) stress that animators could never duplicate the realism of natural movement. Both parties have valid points.

Using systems like the HiRES, motion capture professionals can produce the most realistic looking animations in the least amount of time. "A small production team and a single actor can produce many complex animations in an afternoon." (MAC, 1997) For jobs where complex movements need to be produced in a short period of time, the HiRES system is the best option. Another advantage of using systems like the HiRES is the ability to separate the motion data from the character. This allows MCPs to exhibit many different characters using a single set of data. This feature saves production houses a great deal of time and money.

While motion capture often saves overall expense, there are certain instances when key-frame animation is a more appropriate choice. This occurs most often in computer storytelling. Whereas traditional animators are only limited by their imaginations, MCPs are limited by what their performer can do. Computer animators often create interesting characters that defy the laws of physics and anatomy. If animators only used motion capture devices to add movement to their characters, stories would be much less interesting and creative.

Issue 2: What are the ideal uses of the HiRES system?

The HiRES device is not a replacement for traditional computer animation. It is a tool to help animators be more productive and efficient. Like any tool, the HiRES has ideal applications.

The HiRES is very well suited for capturing extremely realistic motion. This is especially important in VR applications where realism is the prime objective. A VR user is more easily convinced when virtual objects behave just like their natural counterparts. By using the HiRES, animators can eliminate the human error involved in reproducing an object's motion.

The HiRES system can also eliminate human error from multiple character animations. When animating many characters with identical motions such as in a synchronized dance, the HiRES system can save animators a massive amount of time.

The HiRES is best suited for any application in which identical and extremely realistic movements are required.

Issue 3: To what level of realism can the motion data be trusted? What kind of characters can be realistically animated with one set of data?

This question really deserves a philosophical discussion rather than a simple yes or no answer. The easiest way to answer this question is to say that captured motion data is only realistic for the particular performer. If the computer character is not an accurate representation of the performer, then the motion data is not realistic relative to that character (Hodgins and Pollard, 1997). Consider data captured from a average female jogging. If this data is used to animate an average female jogger, the motion is extremely realistic. With the HiRES system this data may also be used to animate an average male jogger. While the male jogger animates very easily by importing the female data, the motion is considerably unrealistic to the computer figure.

When realism is a significant goal it is important to choose a performer that accurately represents what the computer character is portraying. As we mentioned in our analysis, it isinaccurate to animate a 6'4'' character and a 5'8'' character with the same data. The animator must consider what level of realism is required. Some animators may be satisfied with identical motion for different physically shaped characters while others may want each character to have his/her own realistic movement.

Unfortunately, when total realism is the ultimate goal, the HiRES system falls a little short. The segment length deviations pose a majority problem for realistically animating anatomically correct human figures. When real humans move their body segments do not change in length (except for a very small amount due to joint geometry). Animating a correct human figure with motion data that deviates 13.8% in some cases is an incredible task. Even if the motion data can be used to animate this figure, there is no guarantee that the motion is realistic to a correct human model.

Issue 4: What repercussions are there for animating only a portion of a human figure with the HiRES system?

This is another question that relates back to realism demands. When a human moves any part of his body, the other body parts react and respond to this motion. If motion over the entire body is captured this is not a problem because the translations and rotations of one marker affects the translations and rotations of every other marker. When only a section of the body is captured a number of realism problems can occur.

Consider the captured motion of a single segment, the head. If the performer remains still and merely rotates his head, this does not pose a major problem. But if the performer leans forward slightly a large behavioral problem. Without the rest of the segments for reference, the computer has no way of telling how the head got from one place to the other. Did the performer just lean forward? Or did he walk forward a few inches? There is no way to know for sure. The only thing to do is develop behavioral constraints that instruct the human model on how to move to the new location. A great deal of planning is required in order for the human movements to be realistic. It is a complex problem with no easy solution.

Future Work

In future work we plan to develop a method for using the .htr (hierarchical translation and rotation) format to properly animate anatomically correct human figures. Our immediate goal is to find a way to deal with the large segment length deviations. Once we can animate our models we will then evaluate the relative accuracy of the movement. We would then like to define a set behavioral constraints that will allow the realistic movement of an entire human figure by capturing motion of only certain body segments. As a long term goal, we would like develop methods for using these motion techniques in real time applications such as interactive VR environments.

Acknowledgements

We would like to thank the Army Research Lab at Aberdeen, Maryland for use of the ExpertVision HiRES system. We would also like to thank Motion Analysis Corporation for supplying the dance demo data.


References

Badler, N. I., Hollick, M., Granieri, J. (1993). Real-Time control of a virtual human using minimal sensors. Presence: Volume 2, Number 1, 1993.

FX Fighter - Motion Capture vs. Keyframing Page. http://www.im.gte.com/FXF/fxfwhat2.html

Hodgins, J.,Pollard, N. (1997). Adapting simulated behaviors for new characters. SIGGRAPH Proceedings, 1997.

Ko, H., Badler, N.I. (1993). Straight line walking animation based on kinematic generalization that preserves the original characteristics. Graphics Interface '93.

Matsuba, S., Roehl, B. (1996). Using VRML. Indianapolis: Que Publishing, 1996.

Motion Analysis Corporation. EVa HiRES Version 4.0 User's Manual, 1997.

Mulder, S., Human Movement Tracking Technology. Hand Centered Studies of Human Movement Project, Simon Fraser University. Technical Report 94-1, July 1994.

Zeltzer, D. (1992). Autonomy, interaction, and presence. Presence, Volume 1, Number 1, 1992.

Pixar's Renderman, http://www.pixar.com/renderman/

Footnotes:
1 Keyframing is a method by which an animator specifies where an object should be at certain frames in an animation. The computer then linearly interpolates between the "key-frames" to generate the entire motion sequence.

Journal of Young Investigators. 1999. Volume Two.
Copyright © 1999 by Mark Palatucci and JYI. All rights reserved.
 
SEARCH   |   SITE MAP   |   RECENT WEB SITE ADDITIONS          PRIVACY POLICY  |    CONTACT US

JYI is supported by: The National Science Foundation, The Burroughs Wellcome Fund, Glaxo Wellcome Inc., Science Magazine, Science's Next Wave, Swarthmore College, Duke University, Georgetown University, and many others.
Copyright ©1998-2003 The Journal of Young Investigators, Inc.