Paper on Data Quality Assessment

1 May 2007

The quality of statistics can be defined with reference to several criteria:

relevance of statistical concepts
accuracy
timeliness
accessibility and clarity of information
comparability of statistics
coherence
completeness/coverage

The above elements are those most frequently considered in Eurostat and are also widely acknowledged.

Although not measures of quality, the resources available for the production of statistics and the burden of form-filling placed on respondents act as constraints on quality. When assessing the ability of a Statistical Office to comply with quality guidelines, it is necessary to take these into account.

In the following analysis the general description of each quality criterion (in italics) is followed by a more specific comment relating particularly to the Oil Data Exercise.

1. Relevance

Relevance in statistics is assured when statistical concepts meet users' needs. The identification of users and their expectations is therefore necessary.

In the case of the joint oil data exercise the relevance of statistical concepts and users’ expectations is assured, as the questionnaire was drafted in close collaboration with users (Bangkok, 2001). It contains the minimum and most important (relevant) data on oil supply, taking into account resources available and burden on respondents (see above).

2. Accuracy

Accuracy is defined as the closeness between the estimated value and the (unknown) true population value. Assessing the accuracy of an estimate involves analysing the total error associated with the estimate.

The accuracy of the information supplied can be assessed at two levels.

First, at an international organisations’ level, time series’ revisions may be evaluated comparing m-2 data with m-1 data provided during the previous month. The number and size of revisions can measure data accuracy at m-1. Data collected as a result of this exercise can also be compared with other data sources. In the EU and OECD Member States the use of the Monthly Oil Statistics system provides a basis for error evaluation. OPEC Member States also collect monthly oil statistics on a regular basis. Other organisations, however, do not have such a reference system (APEC collects quarterly oil statistics) and in such cases the evaluation of the accuracy of estimates is more difficult. Finally, internal questionnaire checks of information supplied could be performed to ensure coherency (production + imports – exports +/- stock change for up/down data). However, information collected by means of the questionnaire cannot fully relate upstream and downstream data in order to obtain a balance, as in the case of MOS, because of a lack of detail (NGL transfers, rebrands across products, refinery fuelling, etc.)

Second, at country level, data submitted monthly contain both reported information (hard data) and estimates. Knowing the split (%) between the two is helpful in assessing the accuracy of the overall data supplied. The method or assumption on which these estimates, when required, are based, in each country and for each entry, is also very useful for assessing accuracy. A task of international organisations could be an exchange of information on estimation practices between Member States.

3. Timeliness

Most users want up-to-date figures that are published frequently and on time (according to users’ needs) at pre-established dates.

Timeliness depends on data availability at national level. The whole process of data collection, editing, consolidation and dissemination has to be kept under control in order to minimise the processing period. There is a trade-off between timeliness and accuracy that must be optimised for obtaining the best possible results.

The aim of the Joint Oil Data Exercise is to provide timely data. Many countries are able to report m-2 data for most flows and products, with some difficulty for trade data. Apart from OECD countries, only a few countries, albeit large ones (e.g. Saudi Arabia, Russia and Brazil), are currently in a position to report m-1 data. Information at m-1 is available in all EU/OECD Member States. The value of this exercise, however, is to have all Organisations’ Member States’ data available at m-1 in order to provide data users with timely oil market information.

A useful step in improving timeliness is the lifting of administrative and communication burdens for data transmission, where such a burden exists.

4. Accessibility and clarity of information

Statistical data have most value when they are easily accessible by users, are available in the form users desire and are adequately documented. Assistance in using and interpreting the statistics should also be forthcoming from the providers.

As agreed at the meeting in Vienna last year a mechanism would be put in place to enable exchanges of responses and data received from the countries. However it is acknowledged that a rule of confidentiality should be established and data should not be disclosed for any other purpose or to any other party. There is no particular problem to report in the exchanging of data between international organisations. Eurostat is, however, examining the possibility of disseminating data for EU countries together with other short-term economic indicators in the near future.

5. Comparability of statistics

Statistics for a given characteristic have the greatest usefulness when they enable reliable comparisons of values across space and over time.

The harmonisation of definitions and conversion factors for collecting harmonised and comparable statistics was achieved. The methods and definitions used by each international organisation were compared and documented extensively. On this basis common definitions were agreed and proposed for the data collected with the oil questionnaire for this exercise. To assure comparability of statistics, however, international organisations must be sure that Member States understand, respect and apply these definitions when filling-in the questionnaire. Regular contacts with national statisticians as well as training may ensure understanding of methodology and definitions to be used. Finally, time series’ analysis is also a useful tool for comparability of some of the data requested.

6. Coherence

Coherence is the measure of the extent to which one set of statistical characteristics agree with an other and can be used together (with each other) or as an alternative (to each other). The messages that statistics convey to users will then be coherent, or at least will not contradict each other.

To assess the coherency of the statistics collected, comparisons with other statistics relating to the Joint Oil Exercise data could be made, e.g. comparisons with Monthly Oil Statistics and, at a later stage, Annual Oil Statistics for EU/OECD Member States or monthly and quarterly statistics of other international organisations (also described under 2. 'Accuracy').

7. Completeness/coverage

The component of completeness reflects the extent to which the statistical system in place answers the users’ needs and priorities by comparing all user demands with the availability of statistics.

Two aspects of completeness are considered here. On the one hand coverage or participation of Member Countries in the exercise and on the other, completeness in terms of filling-in data in the monthly questionnaire. Although the aspect of coverage would, by definition, fall under the heading of accuracy, it is examined here as this is considered more appropriate for this particular exercise.

As far as coverage is concerned, based on reports from the organisations, 55 countries have already participated in the exercise. This represents over 70% of world oil production and over 80% of world oil demand. Taking into account that the exercise was only launched in June, this result is positive and encouraging. However, the exercise will become of real value only when over 90% of both production and consumption are covered. For APEC, 15out of 21 Member States participated in this exercise. Key consumer and production countries like Russia and China joined the exercise; however, Indonesia, Malaysia and Singapore have not yet participated. For Eurostat/IEA all 29 Member States are now participating. For OLADE 10 out of 26 Latin American Countries participated in the exercise. Although this is a relatively low number, almost all the key OLADE countries participated (Mexico, Brazil, Argentina, Chile). The only key country that has not yet participated is Venezuela. For OPEC, five (Iran, Libya, Nigeria, Qatar, and Saudi Arabia) out of 11 Member States participated, while six countries including Venezuela and Indonesia have not yet participated. Finally, for the UN, the Statistics Office of the United Nations is in charge of collecting data for nine countries, which do not belong to any of the five other organisations. Despite communication difficulties, the UN was successful in ensuring some kind of involvement from Angola, Egypt, Gabon, India, Syria, and South Africa.

Other than coverage, completeness could also be perceived in terms of the number of entries filled-in in the questionnaires received. If this is the case international organisations should examine particular areas of non-availability of information and propose solutions (more clear or precise definitions, training of statisticians, estimation methods, etc.). Completeness of the questionnaire ranks from very good for many OECD and other countries to poor for some of the UN countries.

8. The way forward

Transparency requires not only availability but also reliability of data. In the first phase of the Joint Oil Data Exercise, most efforts were concentrated on obtaining the required information. Quality control of available data should be the focus of the second phase.

The aim of assessing the quality of information collected is twofold. On the one hand it will provide users with the information necessary to evaluate and assess the certainty of the analysis and conclusions drawn on the basis of the basic data. On the other hand, quality control by the assessment of each quality component will identify specific and systematic action to be undertaken for the improvement of the information collected. A task for international organisations is to identify the areas where improvement is required, propose solutions and assist the transfer of knowledge and methods (e.g. methods for making estimations) between countries.

As an outcome of the examination of each quality component some proposals for future actions are listed here:

Relevance:

No particular problems are identified here so no action is proposed.

Accuracy:

The estimated part of each item of information (entry) reported could be obtained from Member States. This could be a one-off exercise. As production data (discrete sources of information fields, terminals, refineries) are easier to obtain, this exercise could concentrate on stocks and trade. The error involved in short-term estimates may be evaluated with revisions’ analyses (m-1, m-2) as well as analyses of the size of statistical differences and comparisons with other data sources. The exchange of information on national methods for estimations may also be helpful.

Other points for discussion:

Propose a methodology for m-1 oil data estimation without ruling out speedier systems for estimation.
Determine the acceptable error margin for data users

Timeliness

Additional effort is required from some countries to respect the m-1 deadline.

Accessibility and clarity of the information

Organisations not disseminating data collected should examine how to do so.

Comparability of statistics

Training sessions may be organised

Coherence

Data comparison with other statistics (MOS, Quarterly, etc.)

Completeness

All flows need to be reported. The main problem is with the availability of data on stocks.

Coverage

The remaining key players need to be brought on board, particularly Malaysia and Singapore for APEC; the six remaining OPEC countries; and a few Caribbean countries where large storage capacities exist.