Average

. Visibility data are fundamental meteorological data widely used in many fields such as climate change, atmospheric radiation, atmospheric pollution, and environmental health. Calculating the average visibility is typically the first step when using visibility data. However, this study proves that the algorithm previously used to calculate average visibility is incorrect, leading to a non-negligible error in 10 average visibility data. Moreover, the use of this incorrect algorithm not only artificially reduces the reliability of visibility data, but also affects the credibility and even the correctness of the conclusions reached in previous studies using visibility data. Therefore, we present the correct algorithm for average visibility, which should be applied to both future and previous research to significantly increase the reliability and application scope of visibility data.

according to the content of improvements, while some are just general improvements. In order to clarify the difference between the two, we will start with an example for a detailed explanation. Example: A car is travelling on a road. The average speed of the car is measured to be v1, v2 and v3 when travelling uphill, on a flat road and downhill respectively. What is the average speed of the car ( ̅ )? Student A first proposed the first method to calculate the average speed, as shown in Eq. 1.
(1) Student B thought that the measurement error of the speed of the car is related to the slope and should be corrected. Therefore, student B suggested that the average speed should be calculated using Eq. 2, where c1, c2 and c3 are the correction factors. (2) Student C thought that student A had misunderstood the concept of speed, and that the correct average speed should be calculated by dividing the total distance travelled by the time taken, as shown in Eq. 3, where t1, t2 and t3 correspond to the times the car runs at speeds of v1, v2 and v3 respectively, and t is the total running time of the car.
We think that Eq. 2 and Eq. 3 are both improvements to Eq. 1, but Eq. 2 is only a general improvement, whereas Eq. 3 is an improvement from "incorrect" to "correct". This is because the improvement of Eq. 3 corrects the misunderstanding of the concept of the average speed in Eq. 1 and clearly points out the cause of the error, that is, the "weight" of the values should be considered when calculating the average value. However, the improvement of Eq. 2 does not improve the perception of the concept and is a technical improvement. The improvement of the proposed algorithm for average visibility to the old algorithm is identical in nature to the improvement of Eq. 3 to Eq. 1. The proposed algorithm is derived considering the "weight" of the values when calculating the average visibility, whereas the old one does not. This improvement is not a technical one, but rather a cognitive one, and we therefore consider our improvement a change from "incorrect" to "correct". 2. Question 2: Why do you think that the new algorithm is "correct" and the old one is "incorrect"? 2. Respond 2: We have presented a rigorous proof in the manuscript. Here we use an extreme example to illustrate why the new algorithm is "correct" and the old one is "incorrect". Suppose there are two kinds of boxes of the same volume, box A is filled with gases and aerosols with a horizontal visibility of v, and box B is a perfect vacuum so that the visibility is infinite. We mix uniformly a certain number of boxes A with boxes B, and then calculate the average visibility after mixing using the new algorithm and the old one, respectively, the results of which are given in Table R1 and Table R2. First, we mix one box B with a different number of boxes A. The average visibility calculated using the new algorithm and the old algorithm is given in Table R1. It can be seen from Table R1 that as the number of boxes A increases, the average visibility after mixing calculated by the new algorithm gradually converges to the visibility of box A, while the average visibility calculated by the old algorithm is always infinite. Then, we mix one box A with a different number of boxes B. The average visibility calculated by two algorithms is given in Table R2. It can be seen from Table R2 that as the number of boxes B increases, the average visibility calculated by the new algorithm gradually converges to the visibility of box B, while the average visibility calculated by the old algorithm remains infinite. Clearly, the results calculated by the new algorithm are more reasonable than the results of the old algorithm. The difference between the old and new algorithms is essentially a matter of the weight of the values of observed visibility data. The visibility is determined by the extinction coefficient of the medium through which the light propagates, so the weight should match the extinction coefficient of the medium when calculating the average of visibility data. Large weighting factors should be given to the relatively small visibility values corresponding to the large extinction coefficient. But the old algorithm is the opposite, giving large weighting factors to those large visibility data corresponding to relatively small extinction coefficients.

Question 3:
Discussion of the relationship between the evidence and the conclusion.

Response 3:
The argument that "there is no evidence that previous studies miscalculated visibility" does not lead to the conclusion that the algorithm for calculating the average visibility in the past is correct, nor to the conclusion that the title of the manuscript is misleading. This is because in many cases people only look for evidence when they realize that there exists a problem. A well-known example is that before Galileo, it was a common belief that "heavier objects fell faster than lighter ones". No one could give conclusive evidence denying the above conclusion at that time until Galileo's thought experiment.
Returning to the issue of the algorithm for average visibility in this manuscript, we think that we should not decide that the old algorithm is correct and then come to reject the new algorithm from the start, but rather should look at the process of proving the algorithm to determine which is correct. However, the commonly used old algorithm has not been rigorously verified, which probably has been neglected in past research. Instead, we not only present the new algorithm for average visibility, but also prove that the new algorithm is correct and the old one is incorrect. The rigorous proof is presented in the manuscript. In brief, the weight should be considered when calculating the average. The visibility is determined by the extinction coefficient of the medium through which the light propagates. Therefore, the weight should match the extinction coefficient of the medium when calculating the average of visibility data. The answers to Question 1 and Question 2 in this response can help to understand the difference between the old and new algorithms, i.e., the new algorithm considers the weight of the values of observed visibility data, whereas the old one does not. If we cannot find a problem in the process of proving, we should conclude that the new algorithm and the old algorithm cannot be correct at the same time, and the new algorithm is the correct one.
To summarize, neither of the two reviewers denied the significance of discussing the algorithm for average visibility, and did not raise any objections to the proof process of the new algorithm in the referee comments. In other words, the two reviewers did not object to the manuscript at a substantive level, but actually expressed doubts about the conclusions of the manuscript out of caution or habitual thinking. We hope that this response will dispel the doubts of the two reviewers.
Calculating the average visibility is the most frequently performed task when using visibility data Kessner et al., 2013;Zhang et al., 2010). Two methods of calculating the average visibility arise from Eq. 1. The first method directly calculates the average of visibility data using the algorithm shown in Eq. 2. The second method calculates the average extinction coefficient data first, then substitutes the averaged extinction coefficient into Eq. 1 to obtain the average visibility; the 35 corresponding algorithm is shown in Eq. 3.
where 2 ̅̅̅ and 3 ̅̅̅ represent the average visibility calculated using Eq. 2 and Eq. 3, respectively, ̅ is the average extinction coefficient, n is the number of measurements, and vi denotes the visibility 40 obtained in the i th measurement.
The question arises as to whether the average visibility values calculated by the algorithms of Eq. 2 and Eq. 3 are the same? If not, which is the correct algorithm? Unfortunately, the above questions have not previously been seriously discussed. Intuitively, Eq. 2 has been used as the correct algorithm to calculate the average visibility in previous studies Kessner et al., 2013;Rosenfeld et 45 al., 2007;Singh et al., 2017;Zhang et al., 2017) , and Eq.3 has never been discussed to calculate the average visibility. However, this study proves that Eq. 2 is incorrect, and should not be used to estimate other parameters, such as the concentration of PM2.5 (Chen et al., 2005), aerosol optical depth (Wu et al., 2021), mortality (Huang et al., 2009), etc. Eq. 3 is instead the correct algorithm for calculating average visibility. Therefore, the reliability of both visibility observations and the results of previous 50 3 studies using visibility data has been artificially reduced by the continuous use of an incorrect algorithm to calculate the average visibility.

Inferences
To determine the correct algorithm between Eq. 2 and Eq. 3, it is necessary to discuss the physical meaning of both algorithms. Because atmospheric visibility is mainly determined by aerosol particles 55 (Wang et al., 2009), to simplify the problem, only the effect of aerosol particles on visibility is considered in this study. Assuming that a total of n measurements are made at the same site with the same time interval, Eq. 4 relates the mass concentration (m) and the mass extinction coefficient (M) of aerosol particles to the extinction coefficient, and to the visibility in the i th observation (Cheng et al., 2013). 60 It should be noted that it is the mass concentration and mass extinction coefficient of aerosol particles that determine the extinction coefficient and visibility of the atmosphere, not the other way around. Similarly, it is the average mass concentration and average mass extinction coefficient of aerosol particles during the observation period that determine the average extinction coefficient and 65 average visibility during the observation period, not the other way around. Therefore, to calculate the average visibility during the observation period, we should first calculate the average mass concentration and the average mass extinction coefficient during the observation period, as shown in Eq. 5. A comparison of Eq. 6 and Eq. 3 indicates that they are identical. Therefore, the algorithm of Eq. 75 3 is the correct algorithm for calculating the average visibility. The following is a discussion of whether the algorithm of Eq. 2 is the correct algorithm, which is characterized by direct calculation of the average visibility using observed visibility data. Equation 7 shows the relationship between the average visibility calculated from the algorithm of Eq. 2 and aerosol particles. Equation 8 gives the relationship between the average extinction coefficient and aerosol particles.
The relationship of the average visibility and the average extinction coefficients to aerosol particles in Eq. 7 is significantly different from that in Eq. 6; therefore, the algorithm of Eq. 2 is incorrect. The error in Eq. 2 occurs because visibility is treated as an independent variable rather than a 85 function of aerosol particles. This affects the average value of visibility data by increasing the weight of visibility data at low aerosol concentrations and decreasing the weight of visibility data at high aerosol concentrations. As an extreme example, if the concentration of aerosol particles was zero in the i th measurement, it follows from Eq. 7 and Eq. 8 that the average visibility obtained from the algorithm of Eq. 2 would be infinitely large and the average extinction coefficient would be infinitely small, 90 regardless of the concentration of aerosol particles in the other n-1 measurements, which is clearly illogical.
This proves that Eq. 3 is the correct algorithm for calculating the average visibility, whereas Eq. 2 is incorrect. However, this does not necessarily indicate that previous average visibility values calculated using Eq. 2 are not credible. Actual visibility observation data are required to compare the 95 differences between the average visibility values calculated by Eq. 2 and Eq. 3. If the difference is negligible, the average visibility obtained from Eq. 2 is also reliable. If the difference is considerable, then not only should the algorithm of Eq.2 not be used for future calculations of average visibility, but the corresponding results of previous studies should be revised.

Relative error caused by the erroneous algorithm 100
To develop an intuitive understanding of the magnitude of the relative error in average visibility values calculated using Eq. 2, we analyze the visibility data measured at 1-min resolution by a CJY-1 visibility meter (CAMA Measurement & Control Equipments Co., Ltd) on the campus of the Nanjing University of Information Science and Technology in Nanjing, China, during 2010-2017. The details regarding the observation site and instruments are given in Zhang et al. (2017). 105 Typically, the output of a visibility meter is the value of visibility. Therefore, the average visibility is calculated directly from the output visibility by the algorithm of Eq. 2. However, more steps are required to derive the average visibility using the algorithm of Eq. 3. First, the extinction coefficient in the i th measurement (bi) is derived by substituting the measured value of visibility (vi) into Eq. 1. Then, the average extinction coefficient is calculated using a total of n extinction coefficients. The specific 110 derivation process and results are shown in Eq. 9.
The hourly, daily, monthly, and yearly average visibility values calculated using Eq. 2 and Eq. 3 are shown in Fig. 1a and 1b, respectively. It is clear from the above discussion that Fig. 1a shows the erroneous average visibility calculated by the incorrect algorithm, whereas Fig. 1b shows the average 115 visibility calculated by the correct algorithm. By substituting the values of average visibility during the corresponding period shown in Fig. 1a and Fig. 1b into Eq. 10, we obtain the relative error of the hourly, daily, monthly, and yearly average visibility calculated by Eq. 2. Figure 1c  As shown in Fig. 1, the average visibility calculated using Eq. 2 (Fig. 1a) is always higher than that calculated using Eq. 3 (Eq. 9) (Fig. 1b); therefore, all values of the relative error lie in the range of greater than zero. The results in Fig. 1 are not a coincidence because of the specificity of the measurement data, but an inevitable result that will appear when calculating the average of any visibility measurement data 130 using the algorithms of Eq. 2 and Eq. 3. A more in-depth look at Eq. 2 and Eq. 9 (Eq. 3) reveals that Eq.
2 calculates the arithmetic mean of visibility, whereas Eq. 9 calculates the harmonic mean of visibility.
It has been mathematically proven that, unless all values used to calculate the average are the same, the arithmetic mean is always greater than the harmonic mean; the greater the variation in the data, the greater the difference between the two (Ferger, 1931). 135 The relationship between the arithmetic mean and harmonic mean can explain the distribution of the relative error values in Fig. 1c. The range of the measured visibility values is typically related to the observation period. The longer the duration of observations, the larger the range of the measured visibility data. Therefore, the longer the observation period used to calculate the average visibility, the larger the relative error caused by the algorithm of Eq. 2. It is not difficult to understand why the relative error of 140 the yearly average is larger than that of the monthly average, which is larger than that of the hourly average, according to the distribution of the relative error shown in Fig. 1c. Regarding the relative errors of yearly and monthly average visibility caused by the algorithm of Eq. 2 (Fig. 1c), most of the values fall within the range of 30% to 70%, which is far greater than the typical range of measurement error of visibility meters (WMO, 2018). Therefore, the error caused by the incorrect algorithm of Eq. 2 cannot 145 be ignored. Regarding the relative error of hourly and daily average visibility, although most of the values are less than 30%, this does not mean that the average visibility can be calculated by the algorithm of Eq.
2 for short observation periods. Because sometimes the atmospheric visibility may change significantly in a short time, the relative error of the average visibility calculated by Eq. 2 is large over this time period.
The largest relative errors in Fig. 1c caused by the algorithm of Eq. 2 fall into this category. 150 The only way to conclude that the average relative error caused by Eq. 2 is sufficiently small to continue using this algorithm, despite knowing that Eq. 2 is incorrect for calculating the average visibility, would be to perform statistical analysis of a large amount of visibility data obtained from different sites at different times. However, to reject this conclusion, it is logically enough to be able to provide a counter example. That is, the relative error range of the average visibility values calculated by Eq. 2 in this study 155 ( Fig. 1) is sufficient to show that the error in average visibility arising from the incorrect algorithm is not negligible.

Conclusions
This study proves that the algorithm that has been used to calculate the average visibility is incorrect, and proves that the error in average visibility caused by the incorrect algorithm is not negligible. On this 160 basis, the correct visibility algorithm is proposed in this study. The average visibility has so far been calculated from the incorrect algorithm, which will not only artificially reduce the reliability of visibility data, but also affect the credibility and even the correctness of the conclusions reached in the previous studies using visibility data. Therefore, not only should the correct algorithm be used to calculate the average visibility in the future, but also the past visibility data should be revised, as this will significantly 165 increase the reliability of the visibility data and thus extend the range of applicability of the visibility data. In addition, the error in the algorithm for average visibility occurs because of inconsistencies between the measurement parameters and the target parameters. It cannot be excluded that similar problems occur in other instruments, so it is necessary to analyze the measurement principles of different instruments to avoid the recurrence of such errors. 170