Industrial Computing Systems: A Case Study of Fault Tolerance Analysis

International Journal of Computer Trends and Technology (IJCTT)          
� 2015 by IJCTT Journal
Volume-21 Number-1
Year of Publication : 2015
Authors : Andrey A. Shchurov
DOI :  10.14445/22312803/IJCTT-V21P110


Andrey A. Shchurov "Industrial Computing Systems: A Case Study of Fault Tolerance Analysis". International Journal of Computer Trends and Technology (IJCTT) V21(1):50-55, March 2015. ISSN:2231-2803. Published by Seventh Sense Research Group.

Abstract -
Fault tolerance is a key factor of industrial computing systems design. But in practical terms, these systems, like every commercial product, are under great financial constraints and they have to remain in operational state as long as possible due to their commercial attractiveness. This work provides an analysis of the instantaneous failure rate of these systems at the end of their life-time period. On the basis of this analysis, we determine the effect of a critical increase in the system failure rate and the basic condition of its existence. The next step determines the maintenance scheduling which can help to avoid this effect and to extend the system life-time in fault-tolerant mode.

[1] D. K. Pradhan, Ed., Fault-tolerant computer system design, Prentice- Hall, 1996.
[2] H. Langmaack, W.-P. d. Roever and J. Vytopil, Eds., Formal Techniques in Real-Time and Fault-Tolerant Systems: Third International Symposium Organized Jointly with the Working Group Provably Correct Systems, ProCoS, Lubeck, Germany, September 19-23, 1994 Proceedings, Springer-Verlag, 1994.
[3] N. G. Leveson, Engineering a Safer World: Systems Thinking Applied to Safety (Engineering Systems), The MIT Press, 2012.
[4] I. Gertsbakh, Reliability Theory With Applications to Preventive Maintenance, Springer, 2006.
[5] A. Sarkar, S. C. Panja and B. Sarkar, "Survey of maintenance policies for the Last 50 Years," International Journal of Software Engineering & Applications, vol. 03, no. 2, pp. 130-148, 2011.
[6] N. G. Leveson, Safeware: system safety and computers, ACM, 1995.
[7] M. Modarres, M. Kaminskiy and V. Krivtsov, Reliability Engineering And Risk Analysis: A Practical Guide, 2nd ed., CRC Press, 2010.
[8] M. L. Ayers, Telecommunications System Reliability Engineering, Theory, and Practice, 1st ed., Wiley-IEEE Press, 2012.
[9] D. P. Siewiorek and R. S. Swarz, Reliable computer systems: design and evaluation, 3rd ed., A. K. Peters, Ltd., 1998.
[10] A. S. Tanenbaum and T. Austin, Structured Computer Organization, 6th ed., Prentice Hall Press, 2012.

reliable computing system, fault tolerance, maintenance scheduling.