Developing robust arsenic awareness prediction models using machine learning algorithms

Sushant K. Singh, Robert Taylor, Mohammad Mahmudur Rahman, Biswajeet Pradhan

Research output: Contribution to journalArticle

9 Citations (Scopus)

Abstract

Arsenic awareness plays a vital role in ensuring the sustainability of arsenic mitigation technologies. Thus far, however, few studies have dealt with the sustainability of such technologies and its associated socioeconomic dimensions. As a result, arsenic awareness prediction has not yet been fully conceptualized. Accordingly, this study evaluated arsenic awareness among arsenic-affected communities in rural India, using a structured questionnaire to record socioeconomic, demographic, and other sociobehavioral factors with an eye to assessing their association with and influence on arsenic awareness. First a logistic regression model was applied and its results compared with those produced by six state-of-the-art machine-learning algorithms (Support Vector Machine [SVM], Kernel-SVM, Decision Tree [DT], k-Nearest Neighbor [k-NN], Naïve Bayes [NB], and Random Forests [RF]) as measured by their accuracy at predicting arsenic awareness. Most (63%) of the surveyed population was found to be arsenic-aware. Significant arsenic awareness predictors were divided into three types: (1) socioeconomic factors: caste, education level, and occupation; (2) water and sanitation behavior factors: number of family members involved in water collection, distance traveled and time spent for water collection, places for defecation, and materials used for handwashing after defecation; and (3) social capital and trust factors: presence of anganwadi and people's trust in other community members, NGOs, and private agencies. Moreover, individuals' having higher social network positively contributed to arsenic awareness in the communities. Results indicated that both the SVM and the RF algorithms outperformed at overall prediction of arsenic awareness—a nonlinear classification problem. Lower-caste, less educated, and unemployed members of the population were found to be the most vulnerable, requiring immediate arsenic mitigation. To this end, local social institutions and NGOs could play a crucial role in arsenic awareness and outreach programs. Use of SVM or RF or a combination of the two, together with use of a larger sample size, could enhance the accuracy of arsenic awareness prediction.

Original languageEnglish
Pages (from-to)125-137
Number of pages13
JournalJournal of Environmental Management
Volume211
DOIs
StatePublished - 1 Apr 2018

Fingerprint

Arsenic
Learning algorithms
Learning systems
arsenic
prediction
Support vector machines
defecation
caste
machine learning
nongovernmental organization
Sustainable development
mitigation
sustainability
Water
Sanitation
social capital
social network
Decision trees
sanitation
occupation

Keywords

  • Arsenic
  • Awareness
  • Demographics
  • GIS
  • India
  • Machine learning algorithms
  • RF
  • SVM
  • Sociobehavioral
  • Socioeconomic

Cite this

Singh, Sushant K. ; Taylor, Robert ; Rahman, Mohammad Mahmudur ; Pradhan, Biswajeet. / Developing robust arsenic awareness prediction models using machine learning algorithms. In: Journal of Environmental Management. 2018 ; Vol. 211. pp. 125-137.
@article{cfcaea7042b9482aa93ac9ed98724baa,
title = "Developing robust arsenic awareness prediction models using machine learning algorithms",
abstract = "Arsenic awareness plays a vital role in ensuring the sustainability of arsenic mitigation technologies. Thus far, however, few studies have dealt with the sustainability of such technologies and its associated socioeconomic dimensions. As a result, arsenic awareness prediction has not yet been fully conceptualized. Accordingly, this study evaluated arsenic awareness among arsenic-affected communities in rural India, using a structured questionnaire to record socioeconomic, demographic, and other sociobehavioral factors with an eye to assessing their association with and influence on arsenic awareness. First a logistic regression model was applied and its results compared with those produced by six state-of-the-art machine-learning algorithms (Support Vector Machine [SVM], Kernel-SVM, Decision Tree [DT], k-Nearest Neighbor [k-NN], Na{\"i}ve Bayes [NB], and Random Forests [RF]) as measured by their accuracy at predicting arsenic awareness. Most (63{\%}) of the surveyed population was found to be arsenic-aware. Significant arsenic awareness predictors were divided into three types: (1) socioeconomic factors: caste, education level, and occupation; (2) water and sanitation behavior factors: number of family members involved in water collection, distance traveled and time spent for water collection, places for defecation, and materials used for handwashing after defecation; and (3) social capital and trust factors: presence of anganwadi and people's trust in other community members, NGOs, and private agencies. Moreover, individuals' having higher social network positively contributed to arsenic awareness in the communities. Results indicated that both the SVM and the RF algorithms outperformed at overall prediction of arsenic awareness—a nonlinear classification problem. Lower-caste, less educated, and unemployed members of the population were found to be the most vulnerable, requiring immediate arsenic mitigation. To this end, local social institutions and NGOs could play a crucial role in arsenic awareness and outreach programs. Use of SVM or RF or a combination of the two, together with use of a larger sample size, could enhance the accuracy of arsenic awareness prediction.",
keywords = "Arsenic, Awareness, Demographics, GIS, India, Machine learning algorithms, RF, SVM, Sociobehavioral, Socioeconomic",
author = "Singh, {Sushant K.} and Robert Taylor and Rahman, {Mohammad Mahmudur} and Biswajeet Pradhan",
year = "2018",
month = "4",
day = "1",
doi = "10.1016/j.jenvman.2018.01.044",
language = "English",
volume = "211",
pages = "125--137",
journal = "Journal of Environmental Management",
issn = "0301-4797",
publisher = "Academic Press Inc.",

}

Developing robust arsenic awareness prediction models using machine learning algorithms. / Singh, Sushant K.; Taylor, Robert; Rahman, Mohammad Mahmudur; Pradhan, Biswajeet.

In: Journal of Environmental Management, Vol. 211, 01.04.2018, p. 125-137.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Developing robust arsenic awareness prediction models using machine learning algorithms

AU - Singh, Sushant K.

AU - Taylor, Robert

AU - Rahman, Mohammad Mahmudur

AU - Pradhan, Biswajeet

PY - 2018/4/1

Y1 - 2018/4/1

N2 - Arsenic awareness plays a vital role in ensuring the sustainability of arsenic mitigation technologies. Thus far, however, few studies have dealt with the sustainability of such technologies and its associated socioeconomic dimensions. As a result, arsenic awareness prediction has not yet been fully conceptualized. Accordingly, this study evaluated arsenic awareness among arsenic-affected communities in rural India, using a structured questionnaire to record socioeconomic, demographic, and other sociobehavioral factors with an eye to assessing their association with and influence on arsenic awareness. First a logistic regression model was applied and its results compared with those produced by six state-of-the-art machine-learning algorithms (Support Vector Machine [SVM], Kernel-SVM, Decision Tree [DT], k-Nearest Neighbor [k-NN], Naïve Bayes [NB], and Random Forests [RF]) as measured by their accuracy at predicting arsenic awareness. Most (63%) of the surveyed population was found to be arsenic-aware. Significant arsenic awareness predictors were divided into three types: (1) socioeconomic factors: caste, education level, and occupation; (2) water and sanitation behavior factors: number of family members involved in water collection, distance traveled and time spent for water collection, places for defecation, and materials used for handwashing after defecation; and (3) social capital and trust factors: presence of anganwadi and people's trust in other community members, NGOs, and private agencies. Moreover, individuals' having higher social network positively contributed to arsenic awareness in the communities. Results indicated that both the SVM and the RF algorithms outperformed at overall prediction of arsenic awareness—a nonlinear classification problem. Lower-caste, less educated, and unemployed members of the population were found to be the most vulnerable, requiring immediate arsenic mitigation. To this end, local social institutions and NGOs could play a crucial role in arsenic awareness and outreach programs. Use of SVM or RF or a combination of the two, together with use of a larger sample size, could enhance the accuracy of arsenic awareness prediction.

AB - Arsenic awareness plays a vital role in ensuring the sustainability of arsenic mitigation technologies. Thus far, however, few studies have dealt with the sustainability of such technologies and its associated socioeconomic dimensions. As a result, arsenic awareness prediction has not yet been fully conceptualized. Accordingly, this study evaluated arsenic awareness among arsenic-affected communities in rural India, using a structured questionnaire to record socioeconomic, demographic, and other sociobehavioral factors with an eye to assessing their association with and influence on arsenic awareness. First a logistic regression model was applied and its results compared with those produced by six state-of-the-art machine-learning algorithms (Support Vector Machine [SVM], Kernel-SVM, Decision Tree [DT], k-Nearest Neighbor [k-NN], Naïve Bayes [NB], and Random Forests [RF]) as measured by their accuracy at predicting arsenic awareness. Most (63%) of the surveyed population was found to be arsenic-aware. Significant arsenic awareness predictors were divided into three types: (1) socioeconomic factors: caste, education level, and occupation; (2) water and sanitation behavior factors: number of family members involved in water collection, distance traveled and time spent for water collection, places for defecation, and materials used for handwashing after defecation; and (3) social capital and trust factors: presence of anganwadi and people's trust in other community members, NGOs, and private agencies. Moreover, individuals' having higher social network positively contributed to arsenic awareness in the communities. Results indicated that both the SVM and the RF algorithms outperformed at overall prediction of arsenic awareness—a nonlinear classification problem. Lower-caste, less educated, and unemployed members of the population were found to be the most vulnerable, requiring immediate arsenic mitigation. To this end, local social institutions and NGOs could play a crucial role in arsenic awareness and outreach programs. Use of SVM or RF or a combination of the two, together with use of a larger sample size, could enhance the accuracy of arsenic awareness prediction.

KW - Arsenic

KW - Awareness

KW - Demographics

KW - GIS

KW - India

KW - Machine learning algorithms

KW - RF

KW - SVM

KW - Sociobehavioral

KW - Socioeconomic

UR - http://www.scopus.com/inward/record.url?scp=85041536071&partnerID=8YFLogxK

U2 - 10.1016/j.jenvman.2018.01.044

DO - 10.1016/j.jenvman.2018.01.044

M3 - Article

C2 - 29408061

AN - SCOPUS:85041536071

VL - 211

SP - 125

EP - 137

JO - Journal of Environmental Management

JF - Journal of Environmental Management

SN - 0301-4797

ER -