Society News

Introduction
Microsecond Prognostics and Health Management is the application of prognostic health management (PHM) methods to very fast failure mechanisms. For microsecond PHM the objective is to sense, analyze, and mitigate damage mechanisms propagating to failure for time scales on the order of 1/1000th of a second. Due to the rate of decision making required, this would eliminate any human-in-the-loop methodologies. Target applications for fast PHM include many domains with highly dynamic environments, potentially unstable run away reactions, and explosively driven mechanisms where unplanned failure results in unbearable costs of human life or economic loss.

Wear is a common colloquialism used in PHM literature to describe the degradation mechanism that will eventually cause a system to fail. For example the tires on a car wear away until there is no tread left. With no remaining tread the tire ceases to provide traction and perform as required, a condition that is considered to be a failure. Problems arise when trying to reconcile the informal definition of wear to a product, like a circuit board, that does not have as direct a link between wear and failure. Elevated temperature exposure can change the microstructure of a solder joint on a circuit board. As a result of this modified microstructure a relatively benign impact loading that wouldn’t normally cause a failure can initiate and propagate a crack through a solder joint [Mattila 2011]. When the solder joint cracks and no longer passes electrical signals the circuit board does not function as required and is considered failed.

PHM on a microsecond time scale aspires to expand the applicability of current PHM methods to include the many failure modes and mechanisms that are not easily captured using traditional approaches. These failure mechanisms that are not currently feasible with traditional methods are relatively fast. The next sections of this paper provides rough order of magnitude estimates to help illustrate the technical needs that will be required to sense, analyze, and mitigate fast failure mechanisms. Two rare examples of fielded technologies that fulfill the rapid sense/analyze/mitigate paradigm are discussed to highlight the state of the art in microsecond PHM. Finally perceived limitations of microsecond PHM and fundamental areas of future research that could enable more widespread application of microsecond PHM are discussed.

Back of Napkin Calculation for Microsecond PHM Feasibility
Two simplistic order of magnitude estimates are presented in this section to provide quantitative examples of possible microsecond PHM applications. The examples are considered back of the napkin calculations because they are intended to be brief enough to fit onto a small cocktail napkin, but realistic enough to generate meaningful quantitative arguments. For a more structured introduction to this method of assertion see [Weinstein 2009, Roam 2009].

One of the most severe environmental conditions that a circuit board can experience is very high-g impact loadings. Assume that a small circuit board (~1.5” between supports) is exposed to a 50,000 g acceleration due to a metal on metal impact [Lowe 2014]. The small, stiff circuit board responds with an oscillation at about 5kHz. Under such large amplitude loadings low cycle fatigue failures can occur in 10 to 100 cycles. Based on the simplistic relation below, the resulting time to failure will be on the order of 2^-20 x 10^-3 seconds.

Budgeting for delays in sensing (100 x 10^-6 seconds) and analysis (peak/amplitude counting algorithm, 500 x 10^-6 seconds) there may be sufficient time to mitigate failure by reprograming electrical functionality to redundant circuitry (500 x 10^-6 seconds) for a total impact to mitigation budget of 1.1 x 10^-3 seconds.

Another back of the napkin example is the subject of a patent [Yue 2011] pertaining to the automatic mitigation of a tire blow out in an automobile. The sudden loss of tire pressure in one wheel can result in a very dangerous torque that steers the vehicle away from its intended path. The loss of a front driver side tire results in the car steering into oncoming traffic. Under good conditions an automobile traveling at 50 mph requires 120 ft to come to a complete stop. A tire suffering a catastrophic blow out deflates in approximately 60 x 10^-3 seconds [Orengo 2003], or about 4 ft of travel at 50 mph. After the blow out the dynamics of the car are drastically different than with four inflated tires resulting in a dangerous situation that is not intuitively easy to control. For example steering away from the skid may increase damage to the tire and result in less control of the vehicle. Pressure, temperature, and speed sensors identify the loss of tire integrity (100 x 10^-6 seconds), an algorithm would have to evaluate the system and rule out false positives like hitting a pot hole (500 x 10^-6 seconds), and then an active safety systems would have to apply breaking in a manner that will maintain course (5000 x 10^-6 seconds for initial activation). The total event to mitigation time budget is 5.6 x10^-3 seconds. It is interesting to note that the rights to the patent that motivated this example were forfeit in 2013.

Fielded examples
Like most ideas in engineering, the idea of sensing, analyzing, and mitigating a fast failure mechanism is not a new idea. Two fielded examples of this methodology include airbags (circa 1964) and the Apollo launch escape system (circa 1958).

During the launch phase of space flight a multitude of components are supposed to function together under extreme pressures and temperatures. As early as 1958 launch escape systems have been in use to automatically detect impending failures and safely separate the crew module from a malfunctioning rocket before it explodes. In his book Digital Apollo David Mindell explores what was an uncomfortable friction between the engineers and the test pilots (astronauts) while designing the appropriate level of human control required for spacecraft operation. He argues that simply solving the technical control challenges do not ensure that an effective human-machine system is realized. The launch phase of a mission is a good example of a situation where a human cannot react quickly enough to mitigate an impending failure in the rocketry systems. At the other extreme it was discovered that the unique mixture of orbital mechanics, velocity, and range prevented a human pilot from easily docking with another spacecraft, but that a fully automated system was too computationally expensive to be practical. Ultimately a hybrid system with a computer and a human-in-the-loop was the optimal solution. Only recently have increases in computer power solved the problems of automating a spacecraft docking sequence, but the same logic still applies to many modern control problems. Removing a human from the control loop is not always optimal.

Another fielded example where sensors and algorithms mitigate a potentially dangerous situation is an automobiles air bag. Modern air bag systems deploy within 8 to 40 x 10^-3 seconds of initial impact [Car & Driver 2011]. Avoiding false positives (air bag deployment when not needed) is important in automobiles so for example a complicated set of sensors and algorithms work quickly to distinguish accelerations caused by driving over a large pot hole from impact with a bridge abutment. The history of automotive airbags illustrates how slowly new technology that interacts intimately with humans can take to be accepted. Reported details vary, but one source [Car & Driver 2011] indicates that R&D started on airbags in 1964. In the US airbags were originally offered as an upgrade option by General Motors in 1976. In 1979 GM canceled their initial offering of the air bag upgrades due to limited consumer demand for the new technology. Not until 1998 were airbags a standard requirement in US automobiles per federal law.

Both of the examples listed above utilize a mitigation scheme that potentially saves lives, but totally abandon the initial mission of each respective system. The author is unaware of any mainstream commercial implementation of microsecond PHM that mitigates failures while still ensuring the success of the original mission.

Limitations on the implementation of microsecond PHM
Based on the previous illustrations of microsecond PHM there are a significant number of technical challenges that could motivate basic research in the immediate area of microsecond PHM, and for all PHM in general.

Cost
Both development cost and implementation cost are limiting factors for microsecond PHM. The benefits of PHM are not universally accepted, and without established engineering design tools it is difficult to promise fielded system on time/budget. For applications that require fast sensing and mitigation the cost impact of removing humans from the control loop also has unknown cost consequences.
Physics of Failure
Failure mechanisms that are not defined by a wear level are often complicated. Further compounding the problem is the transient nature of many failure mechanisms targeted as candidates for microsecond PHM. At least partial information about the physics of failure is required to formulate effective sensing, analysis, and mitigation strategies. Understanding how to develop a sufficient, but not overly detailed understanding of the physics of failure is a formidable challenge limiting the development of microsecond PHM.
Complexity
Modern engineering systems are very complex structures. A typical circuit board in an electrical sub system contains thousands of solder joints. A failure at any single location will render the board non-functional. Similar analogs can be illustrated for the mechanical and software elements of a design. Designing PHM methods to handle this complexity that results in the combination of so many disparate elements is a challenging technical hurdle.
Speed
Hopefully based on the order of magnitude estimates the assertion that it is feasible to build a system that could sense, process, and react quickly enough to mitigate impending failures in some cases is believable. The simulation tools needed to demonstrate that an electro-mechanical system can actuate quickly enough to mitigate fast failure mechanisms is believed to be quite advanced. Without an initial prototype honest criticism could be leveled at the practicality of responding quickly enough for useful PHM implementations.
Uncertainty
In a physical system uncertainty is unavoidable. Material variation, geometric tolerances, component lots, and environmental conditions directly affect the mechanical response of a system to external loads and the resulting failure modes and mechanisms. Further compounding the challenges of uncertainty management are sensor and digitization noise, necessary simplifications in control loop models, and the inability to observe all possible variables. To successfully implement microsecond PHM methods will be needed to reliably quantify and control uncertainty inherent in monitoring and actuating physical systems.

Mitigation with no human-in-the-loop
Besides the purely technical control and automation challenges required to mitigate impending failures without a human-in-the-loop care must be paid to the nuances of human operators interacting with highly automated electro-mechanical systems. Specifying appropriate requirements that are needed to strike a balance in the human-machine relationship will require more understanding before microsecond PHM methods can be introduced and accepted into mainstream products.

Conclusion
The approach of quickly sensing, analyzing, and mitigating impending failures was introduced in the context of reliability engineering as microsecond PHM. Two rough order of magnitude calculations/examples argued for the feasibility of such fast acting methods. This type of sensing and failure mitigation is not necessarily a new idea, but the access to technology needed to implement such a system may have previously been lacking. A number of significant technical problems and challenging research areas were identified to help bridge the gap between concept and implementation for microsecond PHM.

References
Mattila, T.T., Jue Li, and J.K. Kivilahti. “On the Effects of Temperature on the Drop Reliability of Electronic Component Boards.” Microelectronics Reliability, no. 0 (2011).

Weinstein, Lawrence, and John A. Adam. Guesstimation: Solving the World’s Problems on the Back of a Cocktail Napkin. Princeton University Press, 2009.

Roam, Dan. Back of The Napkin: Solving Problems and Selling Ideas with Pictures. Marshall Cavendish, 2009.

Lowe, Ryan D., Jason R. Foley, David W. Geissler, and Jennifer A. Cordes. “Operating Mode Shapes of Electronic Assemblies Under Shock Input.” In Topics in Modal Analysis II, Volume 8, 179–84. Springer, 2014.

Orengo, Fabio, Malcolm H. Ray, and Chuck A. Plaxico. “Modeling Tire Blow-out in Roadside Hardware Simulations Using LS-DYNA.” In ASME 2003 International Mechanical Engineering Congress and Exposition, 71–80. American Society of Mechanical Engineers, 2003.

Safety, Courtesy of the Insurance Institute for Highway. “The Physics Of: Airbags - Feature.” Car and Driver. Accessed May 23, 2014. http://www.caranddriver.com/features/the-physics-of-airbags-feature.

许恩乐 (Wang Yue), 陈兴 (Liu Hui), and 陈建国 (Wang Shichang). “Emergency Mechanism for Flat Tire Accidents of Automobile Tires,” April 6, 2011. Publication Number: CN203035629 U.

Microsecond Prognostics and Health Management

Ryan Lowe

Jacob Dodson, Jason Foley