The recent Covid-19 crisis demonstrated how useful high-frequency data is to reflect the current state of the economy in a rapidly changing environment. However, to quantify what a change in a high-frequency variable means for a lower-frequency variable such as the growth rate of gross domestic product (GDP), methods are needed that exploit the information contained in the high-frequency indicator and link it to the low-frequency variable. A method that is able to do this is called mixed data sampling (MIDAS) regressions developed by Ghysels et. al., 2014.1 Because there are not many good online tutorials on how to apply MIDAS regressions in R, a rather flexible script for applying the method is shown here.
The objective is to predict quarterly U.S. GDP growth with the monthly variables industrial production growth and the growth rate of the number of workers in the nonfarm sector, and the weekly growth rate of continued claims, also referred to as insured unemployment. We would like to find out if the use of these variables improves the nowcasting performance compared to a simple autoregressive model of order 1 (AR(1) model). And if so, at which horizon the improvement becomes statstically significant. For this we use a pseudo realtime out-of-sample nowcasting procedure. It’s pseudo realtime because some simplifying assumptions are made. First, we assume that the variables are always published on the last day of the corresponding period (i.e. no ragged edge). In reality, macroeconomic variables are published with a substantial delay. For example, GDP is only published 1-2 months after the end of the quarter. Second, we ignore the fact that the variables are being revised over time (we only use the latest vintage).
More technically, the variable of interest is quarterly GDP growth, which is denoted as \(y_{t_q}\), where \(t_q\) is the quarterly time index \(t_q = 1,2,...,T_y\), with \(T_y\) being the last quarter for which GDP figures are available. The aim is to nowcast quarterly GDP growth, \(y_{Ty+1}\). We assume hat the information set for nowcasting includes two stationary monthly indicators \(x_{t_m}\) and one stationary weekly indicator \(x_{t_w}\) in addition to the available GDP observations. For simplicity we assume every quarter to have \(M=3\) months and \(W=12\) weeks. Hence, the time index for the monthly observations is defined as a fraction of the low-frequency quarter according to \(t_m =1-2/3,1-1/3,...,1,2-2/3,...,T_x-1/3,T_{x_m}\), where \(T_{x_m}\) is the last day for which the monthly indicator is available. Accordingly, the time index for the weekly variable is given by \(t_w =1-11/12,1-11/12,...,1,2-11/12,...,T_x-1/12,T_{x_w}\). The sample spans from January 1, 1968 to December 31, 2021. The evaluation starts in January, 1980.
The MIDAS approach is a direct multi-step forecasting tool. We use the following model
where \(c\) is a constant, \(S\) the AR lag order, \(P\) denotes the number of low-frequency lags and \(K\) and \(L\) the number of high-frequency lags per low-frequency lag (including zero). We set \(S=1\), \(P=2\), \(K=3\) and \(L=12\), meaning the dependent variable depends on all 3 monthly and all 12 weekly values of the current and the last quarter. The lag operator is defined as \(L^{1/3}x_{t_m} = x_{t_m - 1/3}\). Because \(x_{t_w}\) is sampled at a much higher frequency than \(y_{t_q}\), we potentially have to include many high-frequency lags to achieve an adequate modelling, which easily leads to overparameterization in the unrestricted linear case. To avoid parameter proliferation, we use a non-linear weighting scheme given by the polynomials \(b(k, \theta)\) and \(b(l, \theta)\). Note that we use the same polynomial specification for all low-frequency lags included in the model.
The polynomial we use is the exponential Almon lag polynomial of order two. It has the following form
This functional form allows for many different shapes. The weighting scheme can for instance be hump-shaped, declining or flat. By definition, they sum to one. Moreover, it parsimoniously represents the large number of predictors. The parameters are estimated by non-linear least squares (NLS).
To program the whole thing in R we start by loading the necessary packages and defining useful functions:
The next step is to define a set of parameters:
Then we get the data. Moreover, we adjust the series such that every quarter consists of 3 months and 12 weeks and set the dates such that every fourth week of the month has the same date as the months.
Then we estimate the model and calculate a nowcast within a for loop.
Finally, we calculate the nowcasting errors, conduct Diebold-Mariano-West (DMW) tests and plot the results.
The results show that the used high-frequency variables indeed provide valuable information about current quarter GDP growth. The MIDAS nowcast outperfroms statistically significantly the AR(1) model after about 2 weeks of the current quarter.
Ghysels E, Santa-Clara P, Valkanov R (2002). “The MIDAS Touch: Mixed Data Sampling Regression Models.” Working paper, UNC and UCLA. ↩
Comments You need to have a GitHub Account to comment!
Comments You need to have a GitHub Account to comment!
Post comment