HISTORICAL ANALYSIS OF MESSAGE CONTENTS TO RECOMMEND ISSUES TO OPEN SOURCE SOFTWARE CONTRIBUTORS

Igor Fabio Steinmacher, Igor S Wiese, Andre Luis Schwerz, Rafael Liberato Roberto, João Eduardo Ferreira, Marco Aurélio Gerosa
DOI: https://doi.org/10.21529/RESI.2014.1302005

Full Text:

PDF

Abstract

Developers of distributed open source projects make use of issue tracker tools to coordinate their work. These tools store valuable information, maintaining a log of relevant decisions and bug solutions. Finding the appropriate issues to contribute can be hard, as the high volume of data increases contributors’ overhead. This paper shows the importance of the content of issue tracker discussions in an open source project to build a classifier to predict the participation of a contributor in an issue. To design this prediction model, we used two machine learning algorithms called Naïve Bayes and J48. We used data from the Apache Hadoop Commons project to evaluate the use of the algorithms. By applying machine learning algorithms to the ten most active contributors of this project, we achieved an average recall of 66.82% for Naïve Bayes and 53.02% using J48. We achieved 64.31% of precision and 90.27% of accuracy using J48. We also conducted an exploratory study with five contributors that took part in fewer issues and achieved 77.41% of precision, 48% of recall, and 98.84% accuracy using J48 algorithm. The results indicate that the content of comments in issues of open source projects is a relevant factor to recommend issues to contributors.

Keywords

open source; recommendation system; issue tracker; mining software repositories


Compartilhe