A Comparative Study of Various Distance Measures for Software fault prediction

International Journal of Computer Trends and Technology (IJCTT)          
© 2014 by IJCTT Journal
Volume-17 Number-3
Year of Publication : 2014
Authors : Deepinder Kaur
DOI :  10.14445/22312803/IJCTT-V17P122


Deepinder Kaur. "A Comparative Study of Various Distance Measures for Software fault prediction". International Journal of Computer Trends and Technology (IJCTT) V17(3):117-120, Nov 2014. ISSN:2231-2803. www.ijcttjournal.org. Published by Seventh Sense Research Group.

Abstract -
Different distance measures have been used for efficiently predicting software faults at early stages of software development. One stereotyped approach for software fault prediction due to its computational efficiency is K-means clustering, which partitions the dataset into K number of clusters using any distance measure. Distance measures by using some metrics are used to extract similar data objects which help in developing efficient algorithms for clustering and classification. In this paper, we study K-means clustering with three different distance measures Euclidean, Sorensen and Canberra by using datasets that have been collected from NASA MDP (metrics data program) .Results are displayed with the help of ROC curve. The experimental results shows that K-means clustering with Sorensen distance is better than Euclidean distance and Canberra distance.

[1] NASA IV &V Facility. Metric Data Program. Available from http://MDP.ivv.nasa.gov/.
[2] Teknomo, Kardi, Similarity Measurement Available from http: http:\people.revoledu.comkardi utorialSimilarity
[3] Bray J. R., Curtis J. T., 1957. An ordination of the upland forest of the southern Winsconsin. Ecological Monographies, 27, 325-349.
[4] G.Gan,C. Ma,J.Wu,“Data clustering: theory,algorithms, and applications”, Society for Industrial and Applied Mathematics, Philadelphia, 2007.
[5] Jiang Y. et. al., “Fault Prediction Using Early Lifecycle Data”. ISSRE 2007, the 18th IEEE Symposium on Software Reliability Engineering, IEEE Computer Society, Sweden, pp. 237-246.
[6] Seliya N., Khoshgoftaar T.M. (2007), “Software quality with limited fault-proneness defect data: A semi supervised learning perspective”, published online pp.327-324.
[7] Jiang Y, Cukic B, Menzies T,”Cost curve Evaluation of fault prediction models”, Proceedings of the 2008 19th International Symposium on Software Reliability Engineering, 2008,pg 197-206
[8] A Kaur, et. al. (2009),”Early software fault prediction using real time defect data”, 2009 Second International Conference on Machine Vision, pp 243-245.
[9] Deepinder Kaur et. al., “A Clustering Algorithm for Software Fault Prediction,” IEEE international conference on computer and communication technology, (pp. 603-607), 2010 (a).
[10] Deepinder Kaur, Arashdeep Kaur,”Fault Prediction using K-Canberra-Means Clustering”, CNC 2010, ISBN: 978-0-7695-4209-6,2010(b).
[11] Archana Singh et. al. International Journal of Computer Applications (0975 – 8887) Volume 67– No.10, April 2013
[12] Aditi Sanyal, Balraj Singh, International Journal of Advanced Research in Computer Science and Software Engineering, Volume 4, Issue 1, January 2014 ,ISSN: 2277 128X

Distance measures; K-means clustering; Fault prediction; Euclidean distance; Sorensen distance; Canberra distance.