Benchmark problems in statistical modelling and analysis involving the simplex as sample space, concomitant space and design space.

By John Aitchison

Rosemount, Carrick Castle, Lochgoilhead, Argyll, UK, PA248AF

The following collection of problems has been developed as a challenge to statisticians and others who have problems involving the use of the simplex in their modelling, either as the sample space in what has come to be known as compositional data analysis; or as a concomitant space, where there is interest in how some variable feature may depend on some composition associated with the case; or as a desing space, as for example in an experimental desing where some response is dependent on some mixture of ingredients.

These benchmark problems are not real in the sense that the data sets have arisen directly from real practical problems. This has been a deliberate choice by the presenter of the challenge for a variety of reasons. Experience with developing a methodology for sensible statistical analysis involving the simplex has revealed that workers with such problems often have preconceived ideas, based on previous unsound methodology, of what a 'satisfactory' statistical analysis should produce. The presenter would claim that the problems and data are realistic and since they are new to any reader any preconceived ideas as to their solution should be absent.

The challenged readers should not confine themselves to problems in their own discipline. No special knowledge of any of the disciplines involved is likely to give any advantage in resolving the problems. And it is surely true that statistical problems resolved for particular problems often transfer with little difficulty to similar problems in another discipline. A main objective of the challenge is to stretch the model-building expertise and analytical skills of workers in the field.

The reader should attempt to view the problems as a challenge to consultative ability. Certainly in a consultative situation there is always the opportunity of dialogue between consultant and consultee to clarify aspects of the problem to ensure good modelling. This facility is not available in this website setting, but the composer has done everything in his power to attempt to make the problems fully explained in the hope that there is no serious loss from the lack of personal discussion.

The problems are not presented in any specific order, either within discipline or in order of difficulty. And it should be emphasized that the presenter of the problems does not have a set of perfect solutions.

So good modelling!

 

Benchmark problems:

Benchmark problem 1.- Manuscript analysis
Benchmark problem 2.- A differential diagnostic problem
Benchmark problem 3.- Economic estimation of a composition
Benchmark problem 4.- Anthropological time budgets
Benchmark problem 5.- Describe dental amalgam compositional variability
Benchmark problem 6.- Fossil pollen zonation