Toward optimal feature selection using ranking methods and classification algorithms

Authors

  • Jasmina Novaković Megatrend University, Faculty of Computer Science, Belgrade
  • Perica Strbac Megatrend University, Faculty of Computer Science, Belgrade
  • Dusan Bulatović Megatrend University, Faculty of Computer Science, Belgrade

DOI:

https://doi.org/10.2298/YJOR1101119N

Keywords:

Feature selection, feature ranking methods, classification algorithms, classification

Abstract

We presented a comparison between several feature ranking methods used on two real datasets. We considered six ranking methods that can be divided into two broad categories: statistical and entropy-based. Four supervised learning algorithms are adopted to build models, namely, IB1, Naive Bayes, C4.5 decision tree and the RBF network. We showed that the selection of ranking methods could be important for classification accuracy. In our experiments, ranking methods with different supervised learning algorithms give quite different results for balanced accuracy. Our cases confirm that, in order to be sure that a subset of features giving the highest accuracy has been selected, the use of many different indices is recommended.

References

Kohavi, R., and John, G.H., “Wrappers for feature subset selection,” Artificial Intelligence, 97 (1997) 273-324.

Blum, A.L., and Rivest, R.L., “Training a 3-node neural networks is NP-complete,” Neural Networks, 5 (1992) 117–127.

Wyse, N., Dubes, R., and Jain, A.K., “A critical evaluation of intrinsic dimensionality algorithms,” in: E.S. Gelsema and L.N. Kanal, (eds), Pattern Recognition in Practice, Morgan Kaufmann Publishers, Inc., 1980, 415–425.

Ben-Bassat, M., “Pattern recognition and reduction of dimensionality,” in: P. R. Krishnaiah and L. N. Kanal, (eds), Handbook of Statistics-II, North Holland, 1982, 773–791.

Siedlecki, W., and Sklansky, J., “On automatic feature selection,” International Journal of Pattern Recognition and Artificial Intelligence, 2 (1988) 197–220.

Blum, A.I., and Langley, P., “Selection of relevant features and examples in machine learning,” Artificial Intelligence, 97 (1997) 245-271.

Dash, M., and Liu, H., “Feature selection methods for classifications,” Intelligent Data Analysis: An International Journal, 1 (3) 1997. http://www-east.elsevier.com/ida/free.htm.

Dy, J.G., and Brodley, C.E., “Feature subset selection and order identification for unsupervised learning,” in: Proceedings of the Seventeenth International Conference on Machine Learning, 2000, 247–254.

Kim, Y., Street, W., and Menczer, F., “Feature selection for unsupervised learning via evolutionary search,” in: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2000, 365–369.

Das, S., “Filters, wrappers and a boosting-based hybrid for feature selection,” in: Proceedings of the Eighteenth International Conference on Machine Learning, 2001.

Mitra, P., Murthy, C. A., and Pal, S. K., “Unsupervised feature selection using feature similarity,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 24 (3) (2002) 301–312.

Quinlan, J. R., C4.5: Programs for Machine Learning, San Mateo, Morgan Kaufman, 1993.

Doak, J., “An evaluation of feature selection methods and their application to computer security,” Technical report, Davis CA: University of California, Department of Computer Science, 1992.

Talavera, L., “Feature selection as a preprocessing step for hierarchical clustering,” in: Proceedings of International Conference on Machine Learning (ICML’99), 1999.

Liu, H., and Motoda, H., Feature Selection for Knowledge Discovery and Data Mining, Kluwer Academic Publishers, 1998.

Almuallim, H., and Dietterich, T. G., “Learning with many irrelevant features,” in: Proc. AAAI-91, Anaheim, CA, 1991, 547-552.

Kira, K., and Rendell, L. A., “The feature selection problem: traditional methods and a new algorithm,” in: Proc. AAAI-92, San Jose, CA, 1992, 122-126.

Breiman, L., Friedman, J.H., Olshen, R.H., and Stone, C.J., Classification and Regression Trees, Wadsworth and Brooks, Monterey, CA, 1984.

Duch, W., Adamczak, R., and Grabczewski, K., “A new methodology of extraction, optimization and application of crisp and fuzzy logical rules,” IEEE Transactions on Neural Networks, 12 (2001) 277-306.

Fayyad, U.M., and Irani, K.B., “The attribute selection problem in decision tree generation,” in: AAAI-92, Proceedings of the Ninth National Conference on Artificial Intelligence, AAAI Press/The MIT Press, 1992, 104–110.

Liu, H., and Setiono, R., “A probabilistic approach to feature selection - a filter solution,” in: L. Saitta, (ed.), Proceedings of International Conference on Machine Learning (ICML-96), July 3-6, 1996, Bari, Italy, 1996, San Francisco: Morgan Kaufmann Publishers, CA, 319–327.

John, G.H., Kohavi, R., and Pfleger, K., “Irrelevant feature and the subset selection problem,” in: W., W. and Hirsh H., Cohen, (eds.), Machine Learning: Proceedings of the Eleventh International Conference, New Brunswick, N.J., 1994, Rutgers University, 121–129.

Caruana, R., and Freitag, D., “Greedy attribute selection,” in: Proceedings of International Conference on Machine Learning (ICML-94), Menlo Park, California, 1994, AAAI Press/The MIT Press, 28–36.

Xing, E., Jordan, M., and Karp, R., “Feature selection for high-dimensional genomic microarray data,” in: Proceedings of the Eighteenth International Conference On Machine Learning, 2001.

Weiss, S.M., and Kulikowski, C.A., Computer Systems That Learn, Morgan Kaufmann Publishers, San Mateo, California, 1991.

Dash, M., Liu, H., and Yao, J., “Dimensionality reduction of unsupervised data,” in: Proceedings of the Ninth IEEE International Conference on Tools with AI (ICTAI’97), November, 1997, Newport Beach, California, 1997, IEEE Computer Society, 532–539.

Dash, M., and Liu, H., “Handling large unsupervised data via dimensionality reduction,” in: Proceedings of 1999 SIGMOD Research Issues in Data Mining and Knowledge Discovery (DMKD-99) Workshop, 1999.

Yang, J., and Honavar, V., “Feature subset selection using a genetic algorithm,” IEEE Intelligent Systems, 13 (1998) 44-49.

Koller, D., and Sahami, M., “Toward optimal feature selection,” in: International Conference on Machine Learning, 1996, 284-292.

Abe, N., and Kudo, M., “Entropy criterion for classifier-independent feature selection,” Lecture Notes in Computer Science, 3684 (2005) 689-695.

Hall, M.A., and Smith, L.A., “Practical feature subset selection for machine learning,” Proceedings of the 21st Australian Computer Science Conference, 1998, 181–191.

Liu, H., and Setiono, R., “Chi2: Feature selection and discretization of numeric attributes,” Proc. IEEE 7th International Conference on Tools with Artificial Intelligence, 1995, 338-391.

Holte, R.C., “Very simple classification rules perform well on most commonly used datasets,” Machine Learning, 11 (1993) 63-91.

Marko, R.S., and Igor, K., “Theoretical and empirical analysis of relief and rreliefF,” Machine Learning Journal, 53 (2003) 23–69. doi: 10.1023/A:1025667309714

Kuramochi, M., and Karypis, G., “Gene classification using expression profiles: a feasibility study,” International Journal on Artificial Intelligence Tools, 14 (4) (2005) 641-660.

Domingos, P., and Pazzani, M., “Feature selection and transduction for prediction of molecular bioactivity for drug design,” Machine Learning, 29 (1997) 103-130.

Xing, E. P., Jordan, M. L., and Karp, R. M., “Feature selection for high-dimensional genomic microarray data,” Proceedings of the 18th International Conference on Machine Learning, 2001, 601-608.

Frank, A., and Asuncion, A., UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science, 2010.

Downloads

Published

2011-03-01

Issue

Section

Research Articles