Step 1
I get the total column ozone (TCO) field for day N, day N1 and day N+1 as well
as their associated uncertainty fields. If the fields from yesterday and
tomorrow are not available, I create them and fill them with null values. To demonstrate the application of the data filling algorithm I use filling for 3 January 1980 as an example.
The lefthand column of the figure above shows the TCO fields for the 2, 3 and 4 January while the righthand column shows the associated uncertainty fields for each daily TCO field.
Step 2
I next create a first order statistical model of the TCO field on day N. I start by making a copy of the TCO field on day N (together with this uncertainty field) and apply a 'nearest neighbour' interpolation to fill in as many missing values as I can. I do this by looking for cells with null values which are neighboured, either to the north and south, or to the east and west, by nonnull values. If such a nonnull pair is found, I take that pair of TCO values, together with their uncertainties, and fit a straight line (y=Mx+C) to the two values to estimate the average and the uncertainty on the average. I use this approach rather than just calculating the mean of the two neighbouring values because I also need the uncertainty on the interpolated value which is calculated as: where is the uncertainty on the slope and is the uncertainty on the intercept of the straight line fit. After doing the nearest neighbour interpolation, I look for cases where, for a cell containing a null value on day N, there are nonnull values in the same cell on day N1 and day N+1. I then interpolate between yesterday and tomorrow using the same straight line fit approach that was used for the nearest neighbour interpolation. I then repeat the nearest neighbour interpolation (since there may now be other regions amenable to that approach after doing the interpolation between yesterday and tomorrow) and do a longitude interpolation. The longitude interpolation works by finding two nonnull values at the same latitude that are separated by two or more null values. In this case, I impose the additional constraint that the nonnull values cannot be separated by more than 30°^{ }in longitude. Linear interpolation between the two nonnull values, including propagation of uncertainties, is used to determine the interstitial values and their uncertainties. Nearest neighbour interpolation and longitudinal interpolation are repeated until no further data are added to the first order statistical model field. A plot of the first order model for 3 January 1980 and its uncertainty field is shown below.
The region of missing values over South America and the east coast of North America has been filled but with elevated uncertainties. The same is true for the missing data in the western Pacific.
Step 3
The goal of this step is to create a second, less constrained, statistical model of the TCO field on day N (also called the second order model). The second order model is derived by fitting an equation of the form:
(1)
to all TCO measurements from day N1, N and N+1 and where and are expanded in spherical harmonics of order (15,15) to account for latitudinal and longitudinal structure in the TCO fields. If the number of data point on day N1 is more than a factor of 5 greater than the number of data points on day N+1, then the trend term is omitted from equation (1) and only the coefficient is fitted. I start by going back to the original unfilled TCO fields and uncertainty fields on day N1, N and N+1. For each of those three fields, I repeat the twostep process of a nearest neighbour interpolation, followed by a longitudinal interpolation (but now with a 60° longitude separation constraint) until no addition data have been added. I then perform an 'over the pole' interpolation. This interpolation works by locating two nonnull values, each no more than 30^{o} away from the pole, which are separated by two or more null values. Linear interpolation between these two nonnull values, over the pole, is used to fill in missing data over the pole. The sole purpose of these various interpolation steps is to provide as many data as possible to the regression model to constrain the fit of equation (1). Every nonnull value from the now partially filled TCO fields from Day N1, N and N+1 are now passed, along with their uncertainties, to the regression model. Values filled by over the pole interpolation are assumed to have an uncertainty of 30 DU. This reduces the statistical importance of these data in the regression model fit while still providing some constraint for the fit at high latitudes. A total of at least 1000 data values much be passed to the regression model for the fit to be considered valid.
Once equation (1) has been fitted to the data available from the TCO fields on day N1, N, and N+1, it can be evaluated at every latitude and longitude on day N to create the second order model of the TCO fields on day N. The approach for estimating the uncertainties on the second order model TCO fields is described below. If the maximum TCO value in the second order model field exceeds the maximum TCO value passed to the regression model, or the maximum TCO value in the second order model field is less than the minimum TCO value passed to the regression model, then the order of the spherical harmonic expansions is reduced from (15,15) to (14,14) on the assumption that the regression model was overfitting. This reduction in the spherical harmonic expansions is repeated until the range of the regression modelled TCO values falls within the range of the values passed to the regression model. This reduction cannot go below (5,5) for the second order modelled TCO field to be considered valid. The second order model field for 3 January 1980 and its associated uncertainty field are shown in the figure below.
Because the second order model field is the output from a regression model fit which can be evaluated at every latitude and longitude, these fields contain no nonnull data. The uncertainties on the second order modelled fields are calculated by...
Step 4
If a second order modelled TCO field is available, it is 'blended' with the first order model field. Two fields, A (the primary field) and B (the secondary field) are blended as follows: If there is a null value in some cell in field A and a nonnull value in the same cell in field B then I find a proxy value for field A and combine it with the value from field B as follows:
(2)
where Blend is the blended value, W is a weight calculated as detailed below, A_{proxy} is a proxy value for the missing value in the A field determined as detailed below, and B_{value} is the nonnull value from the B field. To derive a A_{proxy} value I look in 60° sectors around that missing grid node for nonnull values from the A field which are within 20 grid cells in the eastwest and northsouth directions. I then take a weighted mean of the 6 values from those 6 search sectors where the weighting is calculated as:
(3)
where D is the distance to the nearest nonnull value in that sector measured in metres. The weight is set to zero when D is greater than 1000 km. A_{proxy} is then set to the weighted mean of the nonnull values across all 6 sectors. The calculating the blended value using equation (2), the weight (W) is calculated using the distance to the nearest nonnull value across all 6 search sectors. If no A_{proxy} value can be found, then W in equation (2) is set to zero. Standard error propagation rules are used to determine an uncertainty on the blended value. In the execution of Step 4, the first order model is used as the primary field and the second order model as the secondary field in the blending process. This creates a best estimate of the TCO field on day N where values from the first order model are used where available which then blend into values from the second order model when they are not available. An example of the blended first and second order models for 3 January, and its associated uncertainty field, are shown in the figure below.
The available first order model values close to the Arctic now blend into the values obtained from the second order model where first order model values are not available over the Arctic. If a second order modelled TCO field is not available (for any of the reasons outlined above), them the first order modelled field is used in Step 5 without any blending with the second order modelled field.
Step 5
Finally the original ozone field on day N is blended with the first order modelled ozone field to create the filled TCO field for day N with the original TCO field being used as the the primary field and the first order model being used as the secondary field in the blending process. The final filled TCO field for 3 January 1980 and its associated uncertainty field are shown in the figure below.
The result is TCO field that replicates measured values where they are available and relaxes to an optimal modelled value in regions way from measured values.
