Vol. 64, No. 2, May 2018

Table of Contents

Front page:


Society Announcements:


RS Events & News:


Members & Chapters:


Links:

PHM 2018 Keynote

Enabling Autonomous Computing –Technology solutions that anticipate and avert failure

Dr. William R. Tonti, IEEE Fellow, Past IEEE Reliability Society President, USA

Note: This keynote address was given at the 2018 IEEE PHM conference (http://www.phmconf.org).

Dr. Tonti holds a BSEE from Northeastern University, an MSEE and a P.h.D from the University of Vermont, and an MBA from St. Michael's College. He retired from IBM in 2009 after 30+ years of service, working as the lead semiconductor technologist responsible for IBM's advanced node development for a large part of his career. Dr. Tonti holds in excess of 290 issued patents, and has been recognized as an IBM Master Inventor. He was honored by having his 250'th patent issue transcribed into the U.S. Congressional Record. Dr. Tonti is recognized as one of the world's leading inventors, developing over 200 patent families. He is a lead inventor of the modern day embedded electronic fuse.

Dr. Tonti is a Fellow of the IEEE, a past IEEE Reliability Society President, a recipient of the IEEE Reliability Engineer of the Year award, and the IEEE 3'rd Millennium medal. Dr. Tonti joined IEEE in 2009 as the Director of IEEE Future Directions where he works alongside staff and volunteers to incubate new technologies within the IEEE.


Abstract: Current and future computing solutions demand a solution that guarantees zero failures of the architecture through a systems useful life. Traditional technology solutions to guarantee requirements has become difficult as the field use conditions have moved off of a design nominal and approach the maximum allowed. Figure 1 shows a system level instantaneous failure rate as a function of field use power on hours. Region II is the expected intrinsic failure rate during a systems useful life. The issue described in this talk is the compression of Region III to Region I.


Figure 1: Bathtub curve-Instantaneous system failure rate

Region I: Time "0" extrinsic failures
Region II: Useful life, intrinsic failures
Region III: End of life, Wearout

On chip techniques that anticipate or react to a measured failure through repair solution implementation are the subject of this talk. Figure 2 (USP 7,966,537) describes a topology that anticipates failure and autonomously executes an in system repair using on chip one time programmable e-Fuse (Figure 3). The integration of in die field programmable e-Fuse with on board diagnostics coupled to a repair solution is one method that leads to autonomous computing.


Figure 2: USP7,966,537 Autonomic computing solution for use in an MCIOT implementation

In this example, circuit repair is autonomically enabled based on a use model that tracks active cycles. e-Fuse technology is used to implement the repair by replacing internal elements at their end of life.


Figure 3: Electronic Fuse (e-fuse) shown in the 90nm Silicon node

Programmed mode (left) and un-programmed mode (right). One time programming is accomplished through controlled high currents that uses the process of electromigration in the e-Fuse. This alters the e-fuse impedance from a low to a high state when programmed.