TY - JOUR
T1 - Structuring Nutrient Yields throughout Mississippi/Atchafalaya River Basin Using Machine Learning Approaches
AU - Zhen, Yi
AU - Feng, Huan
AU - Yoo, Shinjae
N1 - Publisher Copyright:
© 2023 by the authors.
PY - 2023/9
Y1 - 2023/9
N2 - To minimize the eutrophication pressure along the Gulf of Mexico or reduce the size of the hypoxic zone in the Gulf of Mexico, it is important to understand the underlying temporal and spatial variations and correlations in excess nutrient loads, which are strongly associated with the formation of hypoxia. This study’s objective was to reveal and visualize structures in high-dimensional datasets of nutrient yield distributions throughout the Mississippi/Atchafalaya River Basin (MARB). For this purpose, the annual mean nutrient concentrations were collected from thirty-three US Geological Survey (USGS) water stations scattered in the upper and lower MARB from 1996 to 2020. Eight surface water quality indicators were selected to make comparisons among water stations along the MARB over the past two decades. Principal component analysis (PCA) was used to comprehensively evaluate the nutrient yields across thirty-three USGS monitoring stations and identify the major contributing nutrient loads. The results showed that all samples could be analyzed using two main components, which accounted for 81.6% of the total variance. The PCA results showed that yields of orthophosphate (OP), silica (SI), nitrate–nitrites (NO3-NO2), and total suspended sediment (TSS) are major contributors to nutrient yields. It also showed that land-planted crops, density of population, domestic and industrial discharges, and precipitation are fundamental causes of excess nutrient loads in MARB. These factors are of great significance for the excess nutrient load management and pollution control of the Mississippi River. It was found that the average nutrient yields were stable within the sub-MARB area, but the large nitrogen yields in the upper MARB and the large phosphorus yields in the lower MARB were of great concern. t-distributed stochastic neighbor embedding (t-SNE) revealed interesting nonlinear and local structures in nutrient yield distributions. Clustering analysis (CA) showed the detailed development of similarities in the nutrient yield distribution. Moreover, PCA, t-SNE, and CA showed consistent clustering results. This study demonstrated that the integration of dimension reduction techniques, PCA, and t-SNE with CA techniques in machine learning are effective tools for the visualization of the structures of the correlations in high-dimensional datasets of nutrient yields and provide a comprehensive understanding of the correlations in the distributions of nutrient loads across the MARB.
AB - To minimize the eutrophication pressure along the Gulf of Mexico or reduce the size of the hypoxic zone in the Gulf of Mexico, it is important to understand the underlying temporal and spatial variations and correlations in excess nutrient loads, which are strongly associated with the formation of hypoxia. This study’s objective was to reveal and visualize structures in high-dimensional datasets of nutrient yield distributions throughout the Mississippi/Atchafalaya River Basin (MARB). For this purpose, the annual mean nutrient concentrations were collected from thirty-three US Geological Survey (USGS) water stations scattered in the upper and lower MARB from 1996 to 2020. Eight surface water quality indicators were selected to make comparisons among water stations along the MARB over the past two decades. Principal component analysis (PCA) was used to comprehensively evaluate the nutrient yields across thirty-three USGS monitoring stations and identify the major contributing nutrient loads. The results showed that all samples could be analyzed using two main components, which accounted for 81.6% of the total variance. The PCA results showed that yields of orthophosphate (OP), silica (SI), nitrate–nitrites (NO3-NO2), and total suspended sediment (TSS) are major contributors to nutrient yields. It also showed that land-planted crops, density of population, domestic and industrial discharges, and precipitation are fundamental causes of excess nutrient loads in MARB. These factors are of great significance for the excess nutrient load management and pollution control of the Mississippi River. It was found that the average nutrient yields were stable within the sub-MARB area, but the large nitrogen yields in the upper MARB and the large phosphorus yields in the lower MARB were of great concern. t-distributed stochastic neighbor embedding (t-SNE) revealed interesting nonlinear and local structures in nutrient yield distributions. Clustering analysis (CA) showed the detailed development of similarities in the nutrient yield distribution. Moreover, PCA, t-SNE, and CA showed consistent clustering results. This study demonstrated that the integration of dimension reduction techniques, PCA, and t-SNE with CA techniques in machine learning are effective tools for the visualization of the structures of the correlations in high-dimensional datasets of nutrient yields and provide a comprehensive understanding of the correlations in the distributions of nutrient loads across the MARB.
KW - Mississippi/Atchafalaya River Basin
KW - clustering analysis (CA)
KW - nutrient yields
KW - principal component analysis (PCA)
KW - surface water quality
KW - t-distributed stochastic neighbor embedding (t-SNE)
UR - http://www.scopus.com/inward/record.url?scp=85172150890&partnerID=8YFLogxK
U2 - 10.3390/environments10090162
DO - 10.3390/environments10090162
M3 - Article
AN - SCOPUS:85172150890
SN - 2076-3298
VL - 10
JO - Environments - MDPI
JF - Environments - MDPI
IS - 9
M1 - 162
ER -