IEEE Reliability Society Newsletter Vol. 60, No. 2., May 2014 |
|||||||||
Table of Contents
Front page: Society News: In Memoriam: Harold E. Ascher (1935-2014) Candidates Sought for Reliability Society Admin Committee (2015-2017)
Members & Chapters Chapter Outreach UK/Ireland Chapter Announcement: Reliability Outreach, Milan, Italy
Meetings & Conferences 2014 IEEE PHM Conference in Spokane
Letters in Reliability |
Microsecond Prognostics and Health Management
|
Ryan Lowe |
Jacob Dodson, Jason Foley |
Applied Research Associates |
Air Force Research Laboratory |
Littleton, CO |
Eglin Air Force Base, FL |
|
Introduction
Microsecond Prognostics and Health Management is the application of prognostic health management (PHM) methods to very fast failure mechanisms. For microsecond PHM the objective is to sense, analyze, and mitigate damage mechanisms propagating to failure for time scales on the order of 1/1000th of a second. Due to the rate of decision making required, this would eliminate any human-in-the-loop methodologies. Target applications for fast PHM include many domains with highly dynamic environments, potentially unstable run away reactions, and explosively driven mechanisms where unplanned failure results in unbearable costs of human life or economic loss.
Wear is a common colloquialism used in PHM literature to describe the degradation mechanism that will eventually cause a system to fail. For example the tires on a car wear away until there is no tread left. With no remaining tread the tire ceases to provide traction and perform as required, a condition that is considered to be a failure. Problems arise when trying to reconcile the informal definition of wear to a product, like a circuit board, that does not have as direct a link between wear and failure. Elevated temperature exposure can change the microstructure of a solder joint on a circuit board. As a result of this modified microstructure a relatively benign impact loading that wouldn’t normally cause a failure can initiate and propagate a crack through a solder joint [Mattila 2011]. When the solder joint cracks and no longer passes electrical signals the circuit board does not function as required and is considered failed.
PHM on a microsecond time scale aspires to expand the applicability of current PHM methods to include the many failure modes and mechanisms that are not easily captured using traditional approaches. These failure mechanisms that are not currently feasible with traditional methods are relatively fast. The next sections of this paper provides rough order of magnitude estimates to help illustrate the technical needs that will be required to sense, analyze, and mitigate fast failure mechanisms. Two rare examples of fielded technologies that fulfill the rapid sense/analyze/mitigate paradigm are discussed to highlight the state of the art in microsecond PHM. Finally perceived limitations of microsecond PHM and fundamental areas of future research that could enable more widespread application of microsecond PHM are discussed.
Back of Napkin Calculation for Microsecond PHM Feasibility
Two simplistic order of magnitude estimates are presented in this section to provide quantitative examples of possible microsecond PHM applications. The examples are considered back of the napkin calculations because they are intended to be brief enough to fit onto a small cocktail napkin, but realistic enough to generate meaningful quantitative arguments. For a more structured introduction to this method of assertion see [Weinstein 2009, Roam 2009].
One of the most severe environmental conditions that a circuit board can experience is very high-g impact loadings. Assume that a small circuit board (~1.5” between supports) is exposed to a 50,000 g acceleration due to a metal on metal impact [Lowe 2014]. The small, stiff circuit board responds with an oscillation at about 5kHz. Under such large amplitude loadings low cycle fatigue failures can occur in 10 to 100 cycles. Based on the simplistic relation below, the resulting time to failure will be on the order of 2-20 x 10-3 seconds.
Budgeting for delays in sensing (100 x 10-6 seconds) and analysis (peak/amplitude counting algorithm, 500 x 10-6 seconds) there may be sufficient time to mitigate failure by reprograming electrical functionality to redundant circuitry (500 x 10-6 seconds) for a total impact to mitigation budget of 1.1 x 10-3 seconds.
Another back of the napkin example is the subject of a patent [Yue 2011] pertaining to the automatic mitigation of a tire blow out in an automobile. The sudden loss of tire pressure in one wheel can result in a very dangerous torque that steers the vehicle away from its intended path. The loss of a front driver side tire results in the car steering into oncoming traffic. Under good conditions an automobile traveling at 50 mph requires 120 ft to come to a complete stop. A tire suffering a catastrophic blow out deflates in approximately 60 x 10-3 seconds [Orengo 2003], or about 4 ft of travel at 50 mph. After the blow out the dynamics of the car are drastically different than with four inflated tires resulting in a dangerous situation that is not intuitively easy to control. For example steering away from the skid may increase damage to the tire and result in less control of the vehicle. Pressure, temperature, and speed sensors identify the loss of tire integrity (100 x 10-6 seconds), an algorithm would have to evaluate the system and rule out false positives like hitting a pot hole (500 x 10-6 seconds), and then an active safety systems would have to apply breaking in a manner that will maintain course (5000 x 10-6 seconds for initial activation). The total event to mitigation time budget is 5.6 x10-3 seconds. It is interesting to note that the rights to the patent that motivated this example were forfeit in 2013.
Fielded examples
Like most ideas in engineering, the idea of sensing, analyzing, and mitigating a fast failure mechanism is not a new idea. Two fielded examples of this methodology include airbags (circa 1964) and the Apollo launch escape system (circa 1958).
During the launch phase of space flight a multitude of components are supposed to function together under extreme pressures and temperatures. As early as 1958 launch escape systems have been in use to automatically detect impending failures and safely separate the crew module from a malfunctioning rocket before it explodes. In his book Digital Apollo David Mindell explores what was an uncomfortable friction between the engineers and the test pilots (astronauts) while designing the appropriate level of human control required for spacecraft operation. He argues that simply solving the technical control challenges do not ensure that an effective human-machine system is realized. The launch phase of a mission is a good example of a situation where a human cannot react quickly enough to mitigate an impending failure in the rocketry systems. At the other extreme it was discovered that the unique mixture of orbital mechanics, velocity, and range prevented a human pilot from easily docking with another spacecraft, but that a fully automated system was too computationally expensive to be practical. Ultimately a hybrid system with a computer and a human-in-the-loop was the optimal solution. Only recently have increases in computer power solved the problems of automating a spacecraft docking sequence, but the same logic still applies to many modern control problems. Removing a human from the control loop is not always optimal.
Another fielded example where sensors and algorithms mitigate a potentially dangerous situation is an automobiles air bag. Modern air bag systems deploy within 8 to 40 x 10-3 seconds of initial impact [Car & Driver 2011]. Avoiding false positives (air bag deployment when not needed) is important in automobiles so for example a complicated set of sensors and algorithms work quickly to distinguish accelerations caused by driving over a large pot hole from impact with a bridge abutment. The history of automotive airbags illustrates how slowly new technology that interacts intimately with humans can take to be accepted. Reported details vary, but one source [Car & Driver 2011] indicates that R&D started on airbags in 1964. In the US airbags were originally offered as an upgrade option by General Motors in 1976. In 1979 GM canceled their initial offering of the air bag upgrades due to limited consumer demand for the new technology. Not until 1998 were airbags a standard requirement in US automobiles per federal law.
Both of the examples listed above utilize a mitigation scheme that potentially saves lives, but totally abandon the initial mission of each respective system. The author is unaware of any mainstream commercial implementation of microsecond PHM that mitigates failures while still ensuring the success of the original mission.
Limitations on the implementation of microsecond PHM
Based on the previous illustrations of microsecond PHM there are a significant number of technical challenges that could motivate basic research in the immediate area of microsecond PHM, and for all PHM in general.
Conclusion
The approach of quickly sensing, analyzing, and mitigating impending failures was introduced in the context of reliability engineering as microsecond PHM. Two rough order of magnitude calculations/examples argued for the feasibility of such fast acting methods. This type of sensing and failure mitigation is not necessarily a new idea, but the access to technology needed to implement such a system may have previously been lacking. A number of significant technical problems and challenging research areas were identified to help bridge the gap between concept and implementation for microsecond PHM.
References
Mattila, T.T., Jue Li, and J.K. Kivilahti. “On the Effects of Temperature on the Drop Reliability of Electronic Component Boards.” Microelectronics Reliability, no. 0 (2011).
Weinstein, Lawrence, and John A. Adam. Guesstimation: Solving the World’s Problems on the Back of a Cocktail Napkin. Princeton University Press, 2009.
Roam, Dan. Back of The Napkin: Solving Problems and Selling Ideas with Pictures. Marshall Cavendish, 2009.
Lowe, Ryan D., Jason R. Foley, David W. Geissler, and Jennifer A. Cordes. “Operating Mode Shapes of Electronic Assemblies Under Shock Input.” In Topics in Modal Analysis II, Volume 8, 179–84. Springer, 2014.
Orengo, Fabio, Malcolm H. Ray, and Chuck A. Plaxico. “Modeling Tire Blow-out in Roadside Hardware Simulations Using LS-DYNA.” In ASME 2003 International Mechanical Engineering Congress and Exposition, 71–80. American Society of Mechanical Engineers, 2003.
Safety, Courtesy of the Insurance Institute for Highway. “The Physics Of: Airbags - Feature.” Car and Driver. Accessed May 23, 2014. http://www.caranddriver.com/features/the-physics-of-airbags-feature.
许恩乐 (Wang Yue), 陈兴 (Liu Hui), and 陈建国 (Wang Shichang). “Emergency Mechanism for Flat Tire Accidents of Automobile Tires,” April 6, 2011. Publication Number: CN203035629 U.