EDITORIAL BOARD
Editor-In-Chief
Ngo Thanh Long,Le Quy Don Technical University
Deputy Editor-In-Chief
Tran Xuan Nam, Le Quy Don Technical University
Editors
Pham The Long, Le Quy Don Technical University
Nguyen Thanh Thuy, Vietnam National University, Ha Noi
Vu Duc Thi, Vietnam National University, Ha Noi
Le Hoai Bac, Vietnam National University, Ho Chi Minh
Tu Minh Phuong, Posts and Telecommunications Institute of Technology
Huynh Quyet Thang, Hanoi University of Science and Technology
Nguyen Huu Thanh, Hanoi University of Science and Technology
Luong Chi Mai, Vietnam Academy of Science and Technology
Tran Xuan Tu, Vietnam National University, Ha Noi
Truong Trung Kien, Posts and Telecommunications Institute of Technology
Tran Nguyen Ngoc, Le Quy Don Technical University
Hoang Van Phuc, Le Quy Don Technical University
Scientific Secretary
Tran Cao Truong, Le Quy Don Technical University
An empirical study of early stopping in genetic programming.

Nguyen Thi Hien, Nguyen Xuan Hoai, Nguyen Quang Uy, Nguyen Thi Quyen.

Keywords: Genetic Programming, Early Stopping, Generalisation, Over-fitting.

Abstract: Genetic Programming (GP) is an evolutionary method based on the principles of natural genetic systems. In GP, a population of individuals are randomly generated. This population is evolved through a number of generations by applying some genetic operators such as crossover, mutation and selection. Although GP has been successful applied to many real world problems, there are very few guidelines for determining when to stop GP algorithm. Traditionally, a predefined number of generation is set and GP is stopped when it reaches to the last generation. In this article, we present an empirical study of the impact of early stopping to GP performance. We proposes some early stopping criteria for GP. We tested the proposed methods on a number of symbolic regression problems. Our experiment results show that using early stopping helps to maintain the generalisation capacity of GP while significantly reducing its solutions complexity and training time.

Issues 9 (12/2016)

References

[1] R. Poli and W. L. N. McPhee, A Field Guide to Genetic Programming. http://lulu.com, 2008.

[2] G. Paris, D. Robilliard, and C. Fonlupt, "Exploring overfitting in genetic programming," in Evolution Artificielle, 6th International Conference, ser. Lecture Notes in Computer Science, vol. 2936. Springer, 2003, pp. 267-277.

[3] U. N. Quang, T. H. Nguyen, X. H. Nguyen, and M. O'Neill, "Improving the generalisation ability of genetic programming with semantic similarity based crossover," in Proceedings of the 13th European Conference on Genetic Programming, EuroGP 2010, ser. LNCS, vol. 6021. Istanbul: Springer, 7-9 Apr. 2010, pp. 184-195.

[4] I. Kushchu, "Genetic programming and evolutionary generalization," IEEE Transactions on Evolutionary Computation, vol. 6, no. 5, pp. 431-442, Oct. 2002.

[5] T. M. Mitchell, Machine Learning. McGraw Hill, 1997.

[6] M. Amir Haeri, M. M. Ebadzadeh, and G. Folino, "Improving GP generalization: a variance-based layered learning approach," Genetic Programming and Evolvable Machines, vol. 16, no. 1, pp. 27-55, Mar. 2015.

[7] D. Costelloe and C. Ryan, "On improving generalisation in genetic programming," in Proceedings of the 12th European Conference on Genetic Programming, EuroGP 2009, ser. LNCS, L. Vanneschi, S. Gustafson, A. Moraglio, I. De Falco, and M. Ebner, Eds., vol. 5481. Tuebingen: Springer, Apr. 15-17 2009, pp. 61-72.

[8] N. Foreman and M. Evett, "Preventing overfitting in GP with canary functions," in Proceedings of the 2005 conference on Genetic and Evolutionary Computation (GECCO'05). ACM, 2005, pp. 1779-1780.

[9] C. Gagné, M. Schoenauer, M. Parizeau, and M. Tomassini, "Genetic programming, validation sets, and parsimony pressure," in Proceedings of the 9th European Conference on Genetic Programming, ser. Lecture Notes in Computer Science, vol. 3905. Springer, 2006, pp. 109-120.

[10] L. Panait and S. Luke, "Methods for evolving robust programs," in Genetic and Evolutionary Computation - GECCO-2003, ser. LNCS, vol. 2724. Chicago: Springer-Verlag, 12-16 Jul. 2003, pp. 1740-1751.

[11] L. Vanneschi and S. Gustafson, "Using crossover based similarity measure to improve genetic programming generalization ability," in Proceedings of the 11th Annual conference on Genetic and evolutionary computation (GECCO'09). ACM, 2009, pp. 1139-1146.

[12] L. Prechelt, "Early stopping - but when?" in Neural Networks: Tricks of the Trade, volume 1524 of LNCS, chapter 2. Springer-Verlag, 1997, pp. 55-69.

[13] W. Finno, F. Hergert, and H. Zimmermann, "Improving model selection by nonconvergent methods," Neural Networks, vol. 6, pp. 771-783, 1993.

[14] C. Tuite, A. Agapitos, M. O'Neill, and A. Brabazon, "Early stopping criteria to counteract overfitting in genetic programming," in Proceedings of the 13th annual conference companion on Genetic and evolutionary computation, ser. GECCO '11. New York, NY, USA: ACM, 2011, pp. 203-204.

[15] N. T. Hien, N. X. Hoai, R. McKay, and N. Q. Uy, "Where should we stop? - an investigation on early stopping for gp learning," in Proceedings of the 9th Conference on Simulated Evolution And Learning (SEAL-2012). Springer-Verlag, 2012, pp. 391-399.

[16] K. Shafi and H. Abbass, "The role of early stopping and population size in xcs for intrusion detection," in Proceedings of the 6th International Conference on Simulated Evolution and Learning. Springer, 2006, pp. 50-57.

[17] G. Raskutti, M. J. Wainwright, and B. Yu, "Early stopping and non-parametric regression: An optimal data-dependent stopping rule," J. Mach. Learn. Res., vol. 15, no. 1, pp. 335-366, Jan. 2014. [Online]. Available: http://dl.acm.org/citation.cfm?id=2627435.2627446

[18] N. T. Hien, N. X. Hoai, N. Q. Uy, and R. McKay, "Where should we stop? - an investigation on early stopping for gp learning," Strutural Complexity Laboratory, Seoul National University, Seoul, Korea, Tech. Rep. TRSNUSC:2011:001, February 2011.

[19] L. Vanneschi, M. Castelli, and S. Silva, "Measuring bloat, overfitting and functional complexity in genetic programming," in GECCO '10: Proceedings of the 12th annual conference on Genetic and evolutionary computation. Portland, Oregon, USA: ACM, 7-11 Jul. 2010, pp. 877-884.

[20] C. Blake, E. Keogh, and C. J. Merz, "Uci repository of machine learning databases," Department of Information and Computer Science, University of California, Irvine, CA, 1998. [Online]. Available: http://www.ics.uci.edu/~mlearn/MLRepository.html

[21] P. Vlachos, "Statlib project repository, http://lib.stat.cmu.edu," 2000. [Online]. Available: http://lib.stat.cmu.edu

[22] S. Mahler, D. Robilliard, and C. Fonlupt, "Tarpeian bloat control and generalization accuracy," in Proceedings of the 8th European Conference on Genetic Programming, EuroGP'2005, ser. LNCS, vol. 3447. Springer, 2005, pp. 203-214.

[23] G. Peskir and A. Shiryaev, Optimal Stopping Theory and Free-Boundary Problems, ser. Lectures in Maths. (ETH). Birkhauser, 2006.