Data‎ > ‎Total column ozone‎ > ‎

### Step 1

I get the total column ozone (TCO) field for day N, day N-1 and day N+1 as well as their associated uncertainty fields. If the fields from yesterday and tomorrow are not available, I create them and fill them with null values. To demonstrate the application of the data filling algorithm I use filling for 3 January 1980 as an example.
The left-hand column of the figure above shows the TCO fields for the 2, 3 and 4 January while the right-hand column shows the associated uncertainty fields for each daily TCO field.

Step 2

I next create a first order statistical model of the TCO field on day N. I start by making a copy of the TCO field on day N (together with this uncertainty field) and apply a 'nearest neighbour' interpolation to fill in as many missing values as I can. I do this by looking for cells with null values which are neighboured, either to the north and south, or to the east and west, by non-null values. If such a non-null pair is found, I take that pair of TCO values, together with their uncertainties, and fit a straight line (y=Mx+C) to the two values to estimate the average and the uncertainty on the average. I use this approach rather than just calculating the mean of the two neighbouring values because I also need the uncertainty on the interpolated value which is calculated as: $\sqrt{\sigma_M^2+\sigma_C^2}$ where $\sigma_M$ is the uncertainty on the slope and $\sigma_C$ is the uncertainty on the intercept of the straight line fit. After doing the nearest neighbour interpolation, I look for cases where, for a cell containing a null value on day N, there are non-null values in the same cell on day N-1 and day N+1. I then interpolate between yesterday and tomorrow using the same straight line fit approach that was used for the nearest neighbour interpolation. I then repeat the nearest neighbour interpolation (since there may now be other regions amenable to that approach after doing the interpolation between yesterday and tomorrow) and do a longitude interpolation. The longitude interpolation works by finding two non-null values at the same latitude that are separated by two or more null values. In this case, I impose the additional constraint that the non-null values cannot be separated by more than 30° in longitude. Linear interpolation between the two non-null values, including propagation of uncertainties, is used to determine the interstitial values and their uncertainties. Nearest neighbour interpolation and longitudinal interpolation are repeated until no further data are added to the first order statistical model field. A plot of the first order model for 3 January 1980 and its uncertainty field is shown below.
The region of missing values over South America and the east coast of North America has been filled but with elevated uncertainties. The same is true for the missing data in the western Pacific.

### Step 3The goal of this step is to create a second, less constrained, statistical model of the TCO field on day N (also called the second order model). The second order model is derived by fitting an equation of the form:

$\alpha + \beta \times t$            (1)

to all TCO measurements from day N-1, N and N+1 and where $\alpha$ and $\beta$ are expanded in spherical harmonics of order (15,15) to account for latitudinal and longitudinal structure in the TCO fields. If the number of data point on day N-1 is more than a factor of 5 greater than the number of data points on day N+1, then the trend term is omitted from equation (1) and only the $\alpha$ coefficient is fitted. I start by going back to the original unfilled TCO fields and uncertainty fields on day N-1, N and N+1. For each of those three fields, I repeat the two-step process of a nearest neighbour interpolation, followed by a longitudinal interpolation (but now with a 60° longitude separation constraint) until no addition data have been added. I then perform an 'over the pole' interpolation. This interpolation works by locating two non-null values, each no more than 30o away from the pole, which are separated by two or more null values. Linear interpolation between these two non-null values, over the pole, is used to fill in missing data over the pole. The sole purpose of these various interpolation steps is to provide as many data as possible to the regression model to constrain the fit of equation (1). Every non-null value from the now partially filled TCO fields from Day N-1, N and N+1 are now passed, along with their uncertainties, to the regression model. Values filled by over the pole interpolation are assumed to have an uncertainty of 30 DU. This reduces the statistical importance of these data in the regression model fit while still providing some constraint for the fit at high latitudes. A total of at least 1000 data values much be passed to the regression model for the fit to be considered valid.

Once equation (1) has been fitted to the data available from the TCO fields on day N-1, N, and N+1, it can be evaluated at every latitude and longitude on day N to create the second order model of the TCO fields on day N. The approach for estimating the uncertainties on the second order model TCO fields is described below. If the maximum TCO value in the second order model field exceeds the maximum TCO value passed to the regression model, or the maximum TCO value in the second order model field is less than the minimum TCO value passed to the regression model, then the order of the spherical harmonic expansions is reduced from (15,15) to (14,14) on the assumption that the regression model was over-fitting. This reduction in the spherical harmonic expansions is repeated until the range of the regression modelled TCO values falls within the range of the values passed to the regression model. This reduction cannot go below (5,5) for the second order modelled TCO field to be considered valid. The second order model field for 3 January 1980 and its associated uncertainty field are shown in the figure below.
Because the second order model field is the output from a regression model fit which can be evaluated at every latitude and longitude, these fields contain no non-null data. The uncertainties on the second order modelled fields are calculated by...

Step 4

If a second order modelled TCO field is available, it is 'blended' with the first order model field. Two fields, A (the primary field) and B (the secondary field) are blended as follows: If there is a null value in some cell in field A and a non-null value in the same cell in field B then I find a proxy value for field A and combine it with the value from field B as follows:

$Blend} = W \times A_{proxy} + (1-W) \times B_{value}$     (2)

where Blend is the blended value, W is a weight calculated as detailed below, Aproxy is a proxy value for the missing value in the A field determined as detailed below, and Bvalue is the non-null value from the B field. To derive a Aproxy value I look in 60° sectors around that missing grid node for non-null values from the A field which are within 20 grid cells in the east-west and north-south directions. I then take a weighted mean of the 6 values from those 6 search sectors where the weighting is calculated as:
$W_i = \cos (\frac{D \times \pi}{2 \times 10^6})$     (3)

where D is the distance to the nearest non-null value in that sector measured in metres. The weight is set to zero when D is greater than 1000 km. Aproxy is then set to the weighted mean of the non-null values across all 6 sectors. The calculating the blended value using equation (2), the weight (W) is calculated using the distance to the nearest non-null value across all 6 search sectors. If no Aproxy value can be found, then W in equation (2) is set to zero. Standard error propagation rules are used to determine an uncertainty on the blended value. In the execution of Step 4, the first order model is used as the primary field and the second order model as the secondary field in the blending process. This creates a best estimate of the TCO field on day N where values from the first order model are used where available which then blend into values from the second order model when they are not available. An example of the blended first and second order models for 3 January, and its associated uncertainty field, are shown in the figure below.
The available first order model values close to the Arctic now blend into the values obtained from the second order model where first order model values are not available over the Arctic.
If a second order modelled TCO field is not available (for any of the reasons outlined above), them the first order modelled field is used in Step 5 without any blending with the second order modelled field.

Step 5

Finally the original ozone field on day N is blended with the first order modelled ozone field to create the filled TCO field for day N with the original TCO field being used as the the primary field and the first order model being used as the secondary field in the blending process. The final filled TCO field for 3 January 1980 and its associated uncertainty field are shown in the figure below.
The result is TCO field that replicates measured values where they are available and relaxes to an optimal modelled value in regions way from measured values.