TY - JOUR
T1 - A catalog of metrics at source code level for vulnerability prediction
T2 - A systematic mapping study
AU - Codabux, Zadia
AU - Zakia Sultana, Kazi
AU - Chowdhury, Md Naseef Ur Rahman
N1 - Publisher Copyright:
© 2023 John Wiley & Sons, Ltd.
PY - 2024/7
Y1 - 2024/7
N2 - Industry practitioners assess software from a security perspective to reduce the risks of deploying vulnerable software. Besides following security best practice guidelines during the software development life cycle, predicting vulnerability before roll-out is crucial. Software metrics are popular inputs for vulnerability prediction models. The objective of this study is to provide a comprehensive review of the source code-level security metrics presented in the literature. Our systematic mapping study started with 1451 studies obtained by searching the four digital libraries from ACM, IEEE, ScienceDirect, and Springer. After applying our inclusion/exclusion criteria as well as the snowballing technique, we narrowed down 28 studies for an in-depth study to answer four research questions pertaining to our goal. We extracted a total of 685 code-level metrics. For each study, we identified the empirical methods, quality measures, types of vulnerabilities of the prediction models, and shortcomings of the work. We found that standard machine learning models, such as decision trees, regressions, and random forests, are most frequently used for vulnerability prediction. The most common quality measures are precision, recall, accuracy, and (Formula presented.) -measure. Based on our findings, we conclude that the list of software metrics for measuring code-level security is not universal or generic yet. Nonetheless, the results of our study can be used as a starting point for future studies aiming at improving existing security prediction models and a catalog of metrics for vulnerability prediction for software practitioners.
AB - Industry practitioners assess software from a security perspective to reduce the risks of deploying vulnerable software. Besides following security best practice guidelines during the software development life cycle, predicting vulnerability before roll-out is crucial. Software metrics are popular inputs for vulnerability prediction models. The objective of this study is to provide a comprehensive review of the source code-level security metrics presented in the literature. Our systematic mapping study started with 1451 studies obtained by searching the four digital libraries from ACM, IEEE, ScienceDirect, and Springer. After applying our inclusion/exclusion criteria as well as the snowballing technique, we narrowed down 28 studies for an in-depth study to answer four research questions pertaining to our goal. We extracted a total of 685 code-level metrics. For each study, we identified the empirical methods, quality measures, types of vulnerabilities of the prediction models, and shortcomings of the work. We found that standard machine learning models, such as decision trees, regressions, and random forests, are most frequently used for vulnerability prediction. The most common quality measures are precision, recall, accuracy, and (Formula presented.) -measure. Based on our findings, we conclude that the list of software metrics for measuring code-level security is not universal or generic yet. Nonetheless, the results of our study can be used as a starting point for future studies aiming at improving existing security prediction models and a catalog of metrics for vulnerability prediction for software practitioners.
KW - code level metrics
KW - software metrics
KW - software security
KW - software vulnerability
KW - systematic mapping study
KW - vulnerability prediction
UR - http://www.scopus.com/inward/record.url?scp=85176959237&partnerID=8YFLogxK
U2 - 10.1002/smr.2639
DO - 10.1002/smr.2639
M3 - Review article
AN - SCOPUS:85176959237
SN - 2047-7481
VL - 36
JO - Journal of Software: Evolution and Process
JF - Journal of Software: Evolution and Process
IS - 7
M1 - e2639
ER -