KLASIFIKASI BERBASIS GRAVITASI DATA DAN PROBABILITAS POSTERIOR

Muhamad Arief Hidayat, Arif Djunaidy

Abstract


The classifi

cation method based on data gravitation (DGC) is one of the new classification techniques that uses data  gravitation as the criteria of the classification. In the case of DGC, an object is classified on the basis of the class that creates the largest gravitation in that object. However, the DGC method may cause inaccurate result when the training data being used suffer from the class imbalanced problem. This may be caused by the existence of the training data containing a class having excessively big mass that will in turn tend to classify an uknown object as a member of that class due to the high degree of the data gravitation produced, and vice versa.
In this research, a modification to the DGC method is performed by constructing a classificaion method that is based on both the data gravitation and posterior probability (DGCPP). In DGCPP, the mass concept defined in the DGC method as the prior probability is replaced by the posterior probability. By using this modification, data gravitation calculation process is expected to produce more accurate results in compared to those produced by the DGC method. In addtion, by improving the data gravitation calculation, it is expected that the DGCPP method will
produce more accurate classification results in compared to those produced by the DGC method for both normal dataset as well as dataset having class imbalanced problems. A thorough tests for evaluating the classification accuracy are performed using a ten-fold cross-validation method on several datasets containing both normal and
imbalanced-class datasets. The results showed that DGCPP method produced positive average of accuracy differences in compared to those produced by the DGC method. For the tests using the entire normal datasets showed that the average of accuracy differences are statistically significant with a 95% confidence level. In addition, results of the tests using the four imbalanced-class datasets also showed that the average accuracy differences are statistically significant with a 95% confidence level. Finally, results of the tests for evaluating the computing times required by the classification program showed that the additional computing time needed by DGCPP method to perform the classification process is insignificant and less than the human response time, in compared to that needed by DGC method for running all datasets being used.
 
Keywords—data gravitation-based classification, class imbalanced problem,posterior probability

 


References


Kun, 2007,“Hierarchically SVM classification based on support vector clusteringmethod and its application to document categorization”, Expert

Systemswith Applications, 33 (2007), 627–635

Li, Tzuu-Hseng S., Guo, Nai Ren dan Cheng, Chia Ping, 2008,“Design of a two-stage fuzzy classification model”, Expert Systems with Applications, 35 (2008), 1482–1495

Jan, Nien-Yi, Lin, Shun-Chieh, Tseng, ShianShyong dan P. Lin,Nancy, 2009, “A decision support system for constructing an alert classification model”, Expert Systems with Applications, 36 (2009),11145–11155

Peng, Lizhi, Yang, Bo dan Chen, Yuehui 2005, "A NovelClassification Method Based on Data Gravitation", Proc. OfInternational Conference on Neural Networks and Brain(ICNN&B),667-672, 2005.

Peng, Lizhi, Yang, Bo, Chen, Yuehui dan Abraham, Ajith, 2009, “Data Gravitation Based Classification”, Information Sciences, 179, 809–

Tan, P.N., Steinbach, M. dan Kumar, V., 2006, “Introduction to DataMining”, Pearson Education, Inc., Boston.

Li, Yumei dan Anderson-Sprecher, Richard, 2006, “Faciesidentification from well logs: A comparison of discriminant analysis andnaïve

bayesclassifier”, Journal of Petroleum Science and Engineering,53 (2006), 149–157

Rish, Irina, 2001, "An empirical study of the Naive bayes classifier", IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence.

Turhan, Burak dan Bener, Ayse, 2009, “Analysis of Naive bayesassumptions on software fault data : An empirical study”, Data &

Knowledge Engineering, 68 (2009), 278–290




DOI: http://dx.doi.org/10.53567/spirit.v7i1.23

Refbacks

  • There are currently no refbacks.


Copyright (c) 2016 Jurnal SPIRIT




 

 

Indexed By :



SPIRIT : Sarana Penunjang Informasi Terkini 

Diterbitkan oleh Fakultas Teknologi Informasi Institut Teknologi dan Bisnis Yadika Pasuruan
Alamat Redaksi: Jl. Bader No.9, Kwangsan, Kalirejo, Kec. Bangil, Pasuruan, Jawa Timur 67153
Telp/Fax: (0343) 742070, Email : lppm@stmik-yadika.ac.id
Google Maps : Klik Here


 Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.