1.1 Subsidies and the African Green Revolution: Direct effects and social network spillovers of randomized input subsidies in Mozambique
Carter, M., Laajaj, R., & Yang, D. (2021). Subsidies and the African Green Revolution: Direct effects and social network spillovers of randomized input subsidies in Mozambique. American Economic Journal: Applied Economics, 13(2), 206‑229.https://doi.org/10.1257/app.20190396
[Link to the full original replication package paper from Open ICPSR]
Highlights
The paper investigates the short-term and long-term effects (i.e. diffusion through farmers’ networks) of an input subsidy program on agricultural yields.
The authors rely on a standard Randomized Controlled Trial in which farmers are randomly assigned to the treatment and control groups, eliminating potential selection bias.
Yet, it departs from the standard by exploring spillover effects through social networks, assessing long-term impacts, and addressing informational market failures. The study emphasizes learning processes as part of the intervention’s impact, extending beyond the typical scope of a standard RCT.
This document provides a detailed explanation of the original replication package. But you will also learn:
- how to create a beautiful coefficient plot tailored to your needs, with estimates stored in a matrix (this is how the authors display the main results of their paper) using
mkmat
together withcoefplot
, - how to create a result table with
outreg2
, while adding a few statistical tests tailored to your needs in the bottom part withtest
andscalar
, - how to automatically generate a new label by keeping / removing certain parts of an existing label with
subinstr
, - how to use a
local
macro together withreplace
andappend
commands in order to loop over a list of variables and store all their corresponding results into a fresh table every time the code is run.
- how to create a beautiful coefficient plot tailored to your needs, with estimates stored in a matrix (this is how the authors display the main results of their paper) using
1.1.1 Background elements
This study examines the impact of a one-off input subsidy program implemented in Mozambique in 2010 within the context of the Green Revolution and the Alliance for a Green Revolution in Africa. While the Green Revolution transformed Asian and Latin American agriculture, Sub-Saharan Africa lagged behind. In response, the Maputo Declaration (2003) committed African nations to invest 10% of their budgets in agriculture. Launched in 2006, the Alliance aimed to catalyze the Green Revolution in Africa through input subsidy programs (ISPs), providing technologies at discounted rates to randomly selected farmers. Focusing on Mozambique’s 2010 Programa de Suporte aos Produtores, this paper not only assesses subsidy impacts on 704 farmers but also explores post-subsidy persistence and spillover effects through social networks. Findings reveal increased technology adoption, sustained maize yield growth post-program, and notable impacts on farmers’ networks. The program also fostered enhanced beliefs about technology returns, mitigating information-related market failures.
1.1.2 Replication of Table 2 - Regressions with spillover effects
This table (Table 2 in the paper) is the output you will get from the code we explain below.
Table 2 examines the impact of the government-implemented input subsidy program (ISP) on various agricultural outcomes: fertilizer use on maize, adoption of improved maize seeds, maize yield, daily consumption per capita, and expected yield with the technology package. The objective of this analysis is to determine whether the ISP has a positive and statistically significant impact on the use of fertilizer on maize and on maize yields, measured as average maize yield per hectare.
How to read this table. The first two lines of this table present the program’s results estimated on treated farmers (those who received the subsidy). The effects of the program are assessed both during and after the ISP implementation. This table reveals that the program has a positive impact on all farmers’ outcomes, both during and after the program. The effects are more pronounced during the treatment period for fertilizer use, improved maize seeds, and maize yields.
The next lines present the impact of the ISP on social network spillovers, which is estimated on members of treated farmers’ social networks during (round 2) and after (rounds 3 and 4) the treatment period, respectively. The general pattern observed is that the coefficients for social network variables are positive and significant in the post-intervention period but not in the intervention period. The magnitude of the coefficients increases as the number of social network contacts in the treatment group increases from one to two, and then stabilizes. This pattern suggests that the effect of social networks on technology adoption and yield increases sharply at two or more social network contacts in the treatment group. For example, after the program, having one contact in the treatment group increases maize yield by 0.18 kilogram per hectare, while having two contacts increases maize yield by 0.53 kilogram per hectare. This substantial increase in yield with two or more social network contacts justifies the authors’ estimation of spillover effects on individuals having above-median (two or more) social network members in the treatment group in Figure 2.
Replication code for Table 2
Data importation
After having downloaded the original dataset and saved this dataset under your working directory defined with command “cd” (for current directory), you can generate a shorter version of the dataset, named here Dataset_CLY.dta, for which we provide a detailed codebook.
*A/ Download and save data "Moz1234panel.dta" from: https://www.openicpsr.org/openicpsr/project/116761/version/V2/view?path=/openicpsr/116761/fcr:versions/V2.1/Rep-file-Moz-Input-Subsidy/data/original/Moz1234panel.dta&type=file
*B/ Set your working directory where Moz1234panel.dta already is:
cd "XXX" /*XXX=the directory where you just saved the 'Moz1234panel.dta' dataset*/
*C/ Create a shorter dataset:
use "Moz1234panel.dta", clear
* selecting the necessary avriables for replicating the Table 2, Figure 2, and Table A7
keep sn5up_sub_dur sn2up_sub_aft nw_talkedagmoder lfertmaizr sn2_sub_dur sn2up_sub_dur sn1_sub_aft lyieldr sn3_sub_aft lexp_yield_fertr vouch_aft_r3 sn2up_sub_aft_r3 limprovedseedsr vouch_aft sn1_sub_dur vlgid_round respid sn2up_sub_aft_r4 vouch_aft_r4 vouch_dur ldailyconsr sn3_sub_dur sn4_sub_aft sn5up_sub_aft sn4_sub_dur sn2_sub_aft fertmaizr vouch round
* exponentiation of the winsorized variables back to their original forms
foreach x in improvedseedsr yieldr dailyconsr exp_yield_fertr{
// it is necessary in order to add the means of the control to the table 2. This way it uses the value that is not Winsorized.
gen `x' = exp(l`x')
}
save "Dataset_CLY.dta", replace
To start the replication, one should always start by deleting any potential data in memory using clear all. Then, open the dataset of interest, and the command use “…”, clear to indicate the dataset you want to import in Stata.
Because this project uses panel data, you need to inform Stata of the structure of the data. Use the command xtset to indicate that you are working with panel data, followed by the name of the variable you want to set as the panel variable. Here, vlgid_round is an identifier for every possible combination of locality and round.
clear all
use "Dataset_CLY.dta", clear
xtset vlgid_round
global sn_treatments sn1_sub_dur sn2_sub_dur sn3_sub_dur sn4_sub_dur sn5up_sub_dur sn1_sub_aft sn2_sub_aft sn3_sub_aft sn4_sub_aft sn5up_sub_aft
Now, you want to modify the labels for a set of variables that were in log format in the original dataset, to indicate that they are in their original scale.
Start with creating a loop to iterate the same procedure over a set of variables, using the foreach command.
Stata trick: Then, you can easily modify the labels using variable label to extract the label for each variable in the loop, remove the text “log” from the label using subinstr, remove the excess spaces using trim, and update the variable label with the modified text.
Regressions, and table creation
You are now set to running the regressions and creating a table presenting results.
Stata trick: Start with initializing a local macro using local, named rep_app, and set its value to replace. The purpose of this macro is to control whether the replace option is used when exporting regression results in an Excel file using the command outreg2. In other words, it means that if the file already exists in your computer, it will be replaced.
Again, use a loop to iterate regressions over the same list of variables using foreach.
You want to apply several commands to the variables included in the loop only if they meet the following conditions: round==2 (survey round 2, that is, during the program) and vouch==0 (farmers who did not win the voucher, that is, the control group). Then:
qui sum
calculates the mean, and results are stored using the local commandareg
runs a fixed effects regression on panel data, usingabsorb
to absorb fixed effects, andcluster
to take into account the intra-cluster correlationsoutreg2
creates the regression table and saves it into an Excel file (.xls extension)
Then, create a nice table using key commands:
bracket
generates brackets for standard errorslabel
includes variable labels in the results table to display full variable namesnocons
,nor2
, andnoni
are used to avoid displaying the constant, the R-squared and missing values in the tableless(1)
means that only the first row of the result table will be included (only the first regression)keep
allows you to keep only the variables you wantadds()
is used to add control variables into the brackets
foreach x in fertmaizr improvedseedsr yieldr dailyconsr exp_yield_fertr {
qui sum `x' if vouch==0 & round==2
local control_mean=r(mean)
qui xi: areg l`x' vouch_dur vouch_aft ${sn_treatments} i.nw_talkedagmoder*i.round, ///
absorb(vlgid_round) cluster(respid)
outreg2 using "table 2.xls", `rep_app' bracket label nocons noni ///
less(1) nor2 keep(vouch_dur vouch_aft ${sn_treatments}) adds(mean_control,`control_mean')
local rep_app = "append"
}
1.1.3 Replication of Figure 2 - Direct and Spillover Impacts of Subsidies
This figure (Figure 2 in the paper) is the output you will get from the code we explain below.
This figure highlights the Intention-to-Treat estimates through the direct and indirect effects on five outcomes of the subsidy program on the farmers and on their social networks. The outcomes of interest include the adoption of fertilizer and adoption of improved maize seeds, the maize yields, the consumption of the household (an indirect measure of agricultural profit), and the expected yields to the technology package. In this figure, indirect effects are estimated on individuals having at least two contacts in the treatment group, because they are the ones for whom effects are the largest, as demonstrated in Table 2.
How to read this figure. The coefficient of the regressions of interest are represented with dots, while the lines represent 95 percent confidence intervals.
Direct impacts of the subsidies on the voucher recipient group are estimated for all five outcomes for both the “during” (subsidized) and the “after” (post-subsidy) periods. The Figure shows that the program had a large and significant impact during the subsidy period on the adoption of fertilizers and improved seeds, also allowing for higher maize yields. The effects on consumption were only significant after the subsidy period. Yield expectations were stable across periods for treated farmers. While the effect on fertilizer use decreases in magnitude after the subsidy period, it remains substantial and statistically significant. The impact on agricultural yields also persists over time.
Regarding spillover effects, impacts of the program are positive and significant on the adoption of fertilizers, on improved seeds, maize yields, and expected returns to the technology package.
Replication code for Figure 2
Data preparation and confidence intervals calculation
So far, a lot of modifications have been made to the variables. To construct the figure, you need the original data, without saving the transformations to the variables made until now.
Install the parmest package that allows for semi-parametric estimation of partially linear models.
Then, regress the outcome variables on the treated individuals (farmers who won the vouchers) and on individuals who have at least two contacts in the treatment group. Regressions are run separately for each of the five outcomes, using a loop - see section 1.2.2 for more information on how to create a loop.
Cluster the standard error at the level of respid (respid is the variable for the respondent’s identifier, that is, cluster at the individual level), and use parmest to save the regression results in the .dta format. For each variable, you will obtain the estimated coefficients and standard errors for the periods during and after the program, for treated farmers and those with at least two contacts in the treatment group.
foreach x in lfertmaizr limprovedseedsr lyieldr ldailyconsr lexp_yield_fertr {
qui xi: reg `x' vouch_dur vouch_aft sn2up_sub_dur sn2up_sub_aft ///
i.nw_talkedagmoder*i.round i.vlgid_round, cluster(respid)
parmest, saving(`x', replace)
}
To generate the final figure, you must create a new dataset. To do so, you need to iterate through variable names and perform certain operations on our variables.
First, use the local
command to define a local macro y with a value of 1.
Then create a loop using foreach
which includes our variables of interest. Then, apply the following commands on variables in the loop:
use `x'
.dta loads the dataset corresponding to the variables in the loop- use the command gen to create a time variable using the observation number (_n), drop the observations for which time is greater than 4 using
drop
- This graph aiming to generate a visually informative figure by plotting mean estimates along with their corresponding 95% confidence intervals for a set of variables. Only keep the variables estimate, stderr and time
- Use the
round
command to round the values of the variables to 2 decimal points - Generate, using
gen
, two new variables with missing values which capture the estimates of each regression after the treatment and the standard error of these coefficients (respectfully estimate_NV and sd_NV, followed by the value taken by y). The commandreplace
allows us to assign specific values to the new variables. - With the command
rename
, include a suffix based on the local macroy’
to the selected variables. - Drop all periods after round 2 because you only want the first two periods in the figure
- Creating a
local
y equal to y + 1, increment y to the next rank to move on to the next variable estimate_Vy+1 and estimate_NV+1
Eventually, save the variables created with the command save.
local y = 1
foreach x in lfertmaizr limprovedseedsr lyieldr ldailyconsr lexp_yield_fertr {
use `x'.dta
gen time=_n
drop if time>4
keep estimate stderr time
replace estimate=round(estimate,0.01)
replace stderr=round(stderr,0.01)
gen estimate_NV`y'=.
replace estimate_NV`y'=estimate[3] in 1
replace estimate_NV`y'=estimate[4] in 2
gen sd_NV`y'=.
replace sd_NV`y'=stderr[3] in 1
replace sd_NV`y'=stderr[4] in 2
rename estimate estimate_V`y'
rename stderr sd_V`y'
local y = `y'+1
drop if time>2
save, replace
}
Merge the quantity of fertilizer used with each variable of interest (again, using a loop) based on the time variable.
use lfertmaizr, clear
foreach x in limprovedseedsr lyieldr ldailyconsr lexp_yield_fertr{
merge 1:1 time using `x'.dta, nogen
save figure2.dta, replace
}
Clear existing matrices and results in Stata’s memory, and load the dataset named “figure2.dta”.
Calculate lower and upper limits of the 95% confidence interval:
- Create a foreach loop to iterate over a list of variables(V1, V2, …, NV5).
- Use the mean estimate (
estimate_
) and standard deviation (sd_
) to calculate the lower and upper bounds of the 95% confidence interval for each variable by subtracting 1.96 times the standard deviation from the values taken by each variable in the loop. - Set the upper limit to 1 if it’s greater than 1, in other words, substitute 1 for any variable value exceeding 1.
- Create a matrix with
mkmat
to store the results (estimate, lower limit, and upper limit) for each variable in the loop.
Package installation and graph style settings
These preprocessing steps are crucial for ensuring accurate and visually compelling representations of the estimated coefficients with their associated confidence intervals. The resulting figure is a valuable tool for conveying the uncertainty surrounding the mean estimates, aiding in the interpretation and communication of statistical findings.
Install and initialize additional packages (grstyle
and coefplot
) for graph styling and coefficient plotting.
Set graph style settings, such as background color and major grid color.
Graph creation and exportation
The command coefplot
creates a graphic displaying the coefficients of the regression and the confidence interval.
- matrix
selects the values to be used for the graph in the matrix.
- ci
selects the upper and lower bounds of the confidence interval
- msize
is used to select the size of the graph lines.
- xline(0, lpattern(solid) lw(thick))
adds a vertical line to the x-axis at position 0, with a line style of “solid” and a line thickness of “thick”.
- xlabel(-0.5(0.5)1)
adds axis labels from -0.5 to 1, with a step of 0.5.
- bylabel
is used to define labels and byopts is used to define the chart title.
- ciopts(lwidth(thick thick))
defines the line thickness for confidence interval.
Reproduce 5 times this process for each variable, during and after the treatment.
coefplot /*
*/ (matrix(V1[,1]), ci((V1[,2] V1[,3])) msize(large) xline(0, lpattern(solid) lw(thick)) ///
xscale(range(-0.5(0.5)1)) xlabel(-0.5(0.5)1)) || /*
*/ (matrix(NV1[,1]), ci((NV1[,2] NV1[,3])) msize(large) ///
xline(0, lpattern(solid) lw(thick)) xscale(range(-0.5(0.5)1)) xlabel(-0.5(0.5)1))|| /*
*/ ||, bylabel( ) byopts(title("Fertilizer on maize")) xsize(2) scale(1.35) ///
ylabel(,labsize(vlarge)) ciopts(lwidth(thick thick)) /*
*/ coeflabels(r1="During" r2="After" ) mlabel mlabposition(1) mlabsize(vlarge) ///
format(%04.2f) name(fertilizer)
coefplot /*
*/ (matrix(V2[,1]), ci((V2[,2] V2[,3])) msize(large) xline(0, lpattern(solid) lw(thick)) ///
xscale(range(-0.5(0.5)1)) xlabel(-0.5(0.5)1)) || /*
*/ (matrix(NV2[,1]), ci((NV2[,2] NV2[,3])) msize(large) ///
xline(0, lpattern(solid) lw(thick)) xscale(range(-0.5(0.5)1)) xlabel(-0.5(0.5)1)) || /*
*/ ||, byopts( title("Improved maize seeds")) bylabel( ) xsize(2) scale(1.35) ///
ylabel(,labsize(vlarge)) ciopts(lwidth(thick thick)) /*
*/ coeflabels(r1="During" r2="After" ) mlabel mlabposition(1) mlabsize(vlarge) ///
format(%04.2f) name(improved)
coefplot /*
*/ (matrix(V3[,1]), ci((V3[,2] V3[,3])) msize(large) xline(0, lpattern(solid) lw(thick)) ///
xscale(range(-0.5(0.5)1)) xlabel(-0.5(0.5)1)) || /*
*/ (matrix(NV3[,1]), ci((NV3[,2] NV3[,3])) msize(large) ///
xline(0, lpattern(solid) lw(thick)) xscale(range(-0.5(0.5)1)) xlabel(-0.5(0.5)1))|| /*
*/ ||, byopts( title("Maize yield")) bylabel( ) xsize(2) scale(1.35) ///
ylabel(,labsize(vlarge))ciopts(lwidth(thick thick) ) /*
*/ coeflabels(r1="During" r2="After" ) mlabel mlabposition(1) mlabsize(vlarge) ///
format(%04.2f) name(yield)
coefplot /*
*/ (matrix(V4[,1]), ci((V4[,2] V4[,3])) msize(large) xline(0, lpattern(solid) lw(thick)) ///
xscale(range(-0.5(0.5)1)) xlabel(-0.5(0.5)1)) || /*
*/ (matrix(NV4[,1]), ci((NV4[,2] NV4[,3])) msize(large) ///
xline(0, lpattern(solid) lw(thick)) xscale(range(-0.5(0.5)1)) xlabel(-0.5(0.5)1)) || /*
*/ ||, byopts( title("Consumption")) bylabel( ) xsize(2) scale(1.35) ///
ylabel(,labsize(vlarge))ciopts(lwidth(thick thick) ) /*
*/ coeflabels(r1="During" r2="After" ) mlabel mlabposition(1) mlabsize(vlarge) ///
format(%04.2f) name(consumption)
coefplot /*
*/ (matrix(V5[,1]), ci((V5[,2] V5[,3])) msize(large) xline(0, lpattern(solid) lw(thick)) ///
xscale(range(-0.5(0.5)1)) xlabel(-0.5(0.5)1)) || /*
*/ (matrix(NV5[,1]), ci((NV5[,2] NV5[,3])) msize(large) ///
xline(0, lpattern(solid) lw(thick)) xscale(range(-0.5(0.5)1)) xlabel(-0.5(0.5)1)) || /*
*/ ||, byopts( title("Expected yield with technology package")) bylabel( ) xsize(2) ///
scale(1.35) ylabel(,labsize(vlarge)) ciopts(lwidth(thick thick)) /*
*/ coeflabels(r1="During" r2="After" ) mlabel mlabposition(1) mlabsize(vlarge) ///
format(%04.2f) name(expected)
Combine all the graphs in one graph and edit it:
b1title(Estimated coefficients, size(small) color(black))
adds a title to the top of the combined chart, stating “Estimated coefficients.” The title size is “small” and the text color is “black.”cols(1)
charts are combined into a single column.altshrink
altering scaling of textiscale(*3.2)
is used to select the graph scale. The graphs here are enlarged by 3.2 times.xcommon
means that the x is common to all graphs.imargine
define the interior margins.
graph combine fertilizer improved yield consumption expected, ///
b1title(Estimated coefficients, size(small) color(black)) cols(1) altshrink iscale(*3.2) ///
xsize(3) xcommon imargin(b=1 t=1) title("Direct impact" "on treatment group", ///
size(small) color(black) position(11)) subtitle("Spillover impact via social" ///
"network contacts", size(small) color(black) position(1))
Finally, export the graph in .png and .pdf with the function gr export.
1.1.4 Replication of Table 7 - Separated Estimation of Spillover Effects for the First and Second Years After the Program
In both Table 2 and Figure 2 presented above, researchers estimate a single “after” treatment effect, pooling the two years after the intervention period. In Table 7, the researchers distinguish between spillover effects one year after, and two years after the program, separately. The objective of this table is to push the analysis of spillovers further, and is similar to conduct a robustness check in the context of this study.
To dissect spillover effects over time, researchers introduce two different “after” indicators for each of the post-subsidy years. As expected, due to reduced power, fewer coefficients reach statistical significance. There is no clear systematic pattern to the coefficients across the two years, and the hypothesis that the direct and spillover effects are consistent between the first and second “after” years cannot be rejected.
Replication code for Table 7
Data preparation
Start with initializing a local macro using local, named rep_app and set its value to replace. For more information, see section explaining Table 2 above.
Begin by correctly labeling the variables that will appear in Table 7.
local rep_app = "replace"
label var vouch_dur "Direct impacts during"
label var vouch_aft_r3 "Direct impacts 1 year after"
label var vouch_aft_r4 "Direct impacts 2 year after"
label var sn2up_sub_dur "Spillover impacts during"
label var sn2up_sub_aft_r3 "Spillover impacts 1 year after"
label var sn2up_sub_aft_r3 "Spillover impacts 2 year after"
Regressions, and table creation
The code for creating Table 7 is very similar to that for creating Table 2 (see above). The interesting novelties are the following:
Using a loop, run a regression with areg
, adding the prefix xi-
to indicate that the regression includes indicator values (equal to either 0 or 1), and including fixed effects which will be absorbed for the variable vlgid_round with the option absorb
. Standard errors are clustered at the individual level (respid variable). The command qui
, for “quietly”, means that the results will not be displayed.
Then test the equality of coefficients between vouch_aft_r3 and vouch_aft_r4, and between sn2up_sub_aft_r3 and sn2up_sub_aft_r4 using the test
command. This command performs Wald tests of simple and composite linear hypotheses about the parameters of the most recently fitted model. This allows us to test whether there is a significant difference between the coefficients for period 3 and period 4.
Next, store the p-value results using scalar
.
The last step is to export the results you have just obtained into our results folder using outreg2
. To create a nice table easily, refer to explanations from Table 2.
foreach x in lfertmaizr limprovedseedsr lyieldr ldailyconsr lexp_yield_fertr {
qui xi: areg `x' vouch_dur vouch_aft_r3 vouch_aft_r4 sn2up_sub_dur sn2up_sub_aft_r3 ///
sn2up_sub_aft_r4 i.nw_talkedagmoder*i.round, absorb(vlgid_round) cluster(respid)
test vouch_aft_r3 = vouch_aft_r4
scalar vou_di = r(p)
test sn2up_sub_aft_r3 = sn2up_sub_aft_r4
scalar sn_di = r(p)
outreg2 using "table A7.xls", `rep_app' bracket label nocons noni less(1) nor2 ///
keep(vouch_dur vouch_aft_r3 vouch_aft_r4 sn2up_sub_dur sn2up_sub_aft_r3 ///
sn2up_sub_aft_r4) adds("vouch_aft dif (r3-r4) p-value", vou_di, ///
"sn2up_sub_aft dif (r3-r4) p-value" , sn_di )
local rep_app = "append"
}
Authors: Elvire Jégu, Ambre Delaunay, Chiara Balducci, Yagmur Helin Aslan, students in the Master program in Development Economics and Sustainable Development (2023-2024), Sorbonne School of Economics, Université Paris 1 Panthéon Sorbonne.
Date: December 2023