# Modeling of the spatial determinants of teenage pregnancy in Ethiopia; geographically weighted regression | BMC Women’s Health

### Data source, study design and framework

This study used secondary data from the Ethiopian Demographic and Health Survey 2016 (EDHS). The survey data was downloaded from the Measure DHS website after a reasonable request and permission to use the data was fully guaranteed. EDHS 2016 is part of the global MEASURE DHS project which was funded by the United States Agency for International Development (USAID) and implemented by the Ethiopian Central Statistics Agency. A DHS is carried out every 5 years and the 2016 survey is the fourth demographic and health survey in Ethiopia which covers the nine regions and two administrative towns.

### Sample size and sampling procedure

Ethiopia’s Demographic and Health Survey (EDHS) program collected data on nationally representative samples of all age groups and key indicators. Information on sociodemographic, socioeconomic and maternal variables was included in the survey. A two-stage stratified cluster sampling procedure was used to select study participants. In the 2016 survey, a total of 645 EAs (202 urban and 443 rural) were selected. From these enumeration areas, 18,008 households and from these households a total of 15,683 women of reproductive age were included in the survey. Relevant information on the sampling procedure and data quality can be found elsewhere [4]. For the present study, a total of 3381 (weighted sample) adolescents (15-19 years) were included.

### Study variables

#### Dependent variable

Teenage Pregnancy: This is a composite binary outcome variable that refers to the pregnancy experience of a woman aged 15-19. A history of birth before the age of 19 or of being pregnant at the time of the interview was considered a teenage pregnancy. Therefore, it was categorized such that 0 = no pregnancy before age 19 and 1 = pregnancy experienced before age 19. Finally, the weighted proportion of teenage pregnancies by cluster, which is a continuous variable, was used for the spatial analysis, including the spatial regression analysis.

#### Independent variables

Aggregate community variables such as community poverty (the proportion of the two lowest wealth quintiles), use of community contraceptives (the proportion of women who have not used any type of contraceptive), use of traditional contraceptives community (proportion of women who use traditional contraceptives contraceptive methods), community education of women (proportion of women with no education), female community employment (proportion of unemployed women), exposure to community media (proportion of women who do not ” have not been exposed to television, radio or newspaper reading) community health insurance coverage (proportion of women not covered by health insurance) and community illiteracy (proportion of women unable to read or write) were considered to be independent candidate variables for the spatial regression models.

### Data management and analysis

Descriptive analyzes were performed using Stata version 14 statistical software. While spatial analysis was performed using ArcGIS 10.7. Prior to performing the spatial analysis, weighted teenage pregnancy proportions (outcome variable) and candidate predictor variables were performed in stata and exported to ArcGIS. A detailed explanation of the weighting procedure can be found elsewhere [23].

### Spatial analysis

#### Spatial autocorrelation

Spatial autocorrelation arises from the concept of correlation or dependence. Geographically close areas are more connected than remote areas. In global autocorrelation, the concept is stationary. The correlation between nearby or connected observations will remain the same. Moran I is an indicator of spatial autocorrelation in the range of -1 to 1. The positive value shows that nearby areas have similar values while a negative value is an indicator of dissimilarity between adjacent values. [24]. The global moran I was calculated as follows [25]

$$ { text {I}} = frac {{{ text {n}} sum nolimits _ {{ text {i}}} ^ {{ text {n}}} { sum nolimits_ { { text {j}}} ^ {{ text {n}}} {{ text {wij}}}} left ({{ text {yi}} – overline {{ text {y}} }} right) left ({{ text {yj}} – overline {{ text {y}}}} right)}} {{ left ({ sum nolimits _ {{ text {i }}} ^ {{ text {n}}} { sum nolimits _ {{ text {j}}} ^ {{ text {n}}} {{ text {wij}}}}} right ) sum nolimits _ {{ text {i}}} { left ({yi – bar {y}} right) ^ {2}}}} $$

where yi represents the vector of observations at n different locations, and wij are elements of a spatial weighting matrix.

#### Hot spot analysis

Hot spot analysis identifies statistically significant clustering areas using vectors calculates the Getis-Ord Gi statistic the resulting Z score and *p* value will identify where high or low values cluster spatially. The hot spot area is where high values of the given data are surrounded by similar high values on the opposite side where low values are surrounded by similar low values give the cold spot areas [26].

#### Spatial analysis statistics

Satscan analyzes spatio-temporal and spatio-temporal data using spatio-temporal or spatio-temporal analysis statistics. It is used to perform geographic disease surveillance and to detect areas of significantly high or low rates. In the Bernoulli-based model, pregnant adolescents were taken as cases and non-pregnant adolescents as controls to determine the geographic locations of statistically significant clusters of adolescent pregnancies using kuldorff sat scan software version 9.6. population was used. Primary and secondary clusters were detected and classified according to the likelihood ratio test, based on 999 Monte Carlo replications [27]

### Spatial regression analysis

#### Ordinary Least Squares (OLS) Regression

After detecting the hotspots of teenage pregnancy, spatial regression modeling was performed to identify predictors of the observed spatial clustering of teenage pregnancy. The first ordinary least squares regression was therefore performed. Ordinary least squares (OLS) regression results are only reliable if the regression model satisfies all of the assumptions required by this method. The coefficients of the explanatory variables in a correctly specified OLS model must be statistically significant and have a positive or negative sign. In addition, there should be no correlation between the explanatory variables (without multicollinearity). The model must be unbiased (heteroskedasticity or non-stationarity). The residuals should be normally distributed and not reveal any spatial pattern. The model should include key explanatory variables. Residues must be free from spatial autocorrelation [28]. Thus, these assumptions have been verified accordingly. The OLS regression equation [29] is given as:

$$ Y_ {i} = beta + mathop sum limits _ {{K = 1}} ^ {P} left ({ beta _ {k} X _ {{ik}} ~} right) + in _ {i} ~ $$

where i = 1, 2,… n; β0, β1, β2,… βp are the parameters of the model, yi is the result variable for observation i, *X*_{I} are explanatory variables and_{1}, ∈_{2}, … ∈_{m} are the error term / the residuals with zero mean and homogeneous variance σ2**.**

To identify a model that fulfills the OLS method hypothesis, exploratory regression identifies models with high fitted R2 values. In addition, it identifies models that meet all the assumptions of the OLS method [30].

#### Geographically Weighted Regression (GWR)

A variable that is a strong predictor in one cluster may not necessarily be a strong predictor in another cluster. This type of cluster variation (non-stationary) can be identified through the use of GWR. In this context, GWR can help answer the question: “Does the association vary in space?” Unlike OLS which fits a single linear regression equation to all data in the study area, GWR creates one equation for each DHS cluster. While the equation in OLS is calibrated using data from all features (cluster in this case), GWR uses data from nearby features. Thus, the GWR coefficient takes different values for each cluster [31] The coefficient maps associated with each explanatory variable, which are produced using the GWR, provide guidelines for targeted interventions. The GWR model [32]can be written as:

$$ ~ Y _ {{i ~}} = beta _ {{O ~}} (u _ {{i ~}} v _ {{i ~}}) + mathop sum limits _ {{k = 1}} ^ {p} beta _ {{k ~}} left ({u _ {{i ~}} v _ {{i ~}}} right) X _ {{ik ~ + ~ in _ {i}}} $$

where yi are the observations of the response y, *you*_{I}*v*_{I} are geographic points (longitude, latitude), ( beta _ {{k ~}} (u _ {{i}} v _ {{i}}) ) (k = 0, 1… p) are p functions unknown to geographic locations *you*_{I}*v*_{I}, *X*_{I} are explanatory variables to the location *you*_{I}*v*_{I}, i = 1, 2,… n and ( in _ {i} ) are error terms / residuals with zero mean and homogeneous variance (σ2).