Sankhya: The Indian Journal of Statistics

1999, Volume 61, Series B, Pt. 3, pp. 496--513

AN EFFICIENT APPROACH TO CONSISTENT SET ESTIMATION

By

A. RAY CHAUDHURI, Visva Bharati, Santiniketan

and

A. BASU, S. K. BHANDARI, and B.B. CHAUDHURI, Indian Statistical Institute, Calcutta

SUMMARY. Determining the shape of a point pattern on the real plane is a problem of considerable practical interest and has applications in many branches of science. Set estimators of a nonparametric nature which may also be used as shape descriptors should have several desirable properties. The more important ones among them are the following: (a) The estimator should be consistent, i.e. the Lebesgue measure of the symmetric difference of the actual region and the set estimator should go to zero in probability as the number of sample points increase arbitrarily; (b) it should be computationally efficient; and (c) it should be automatic, in the sense that the method should be able to detect the number of independent disjoint components making up the true region and should not depend on this number being known. None of the currently known estimators combine all these properties.

Ray Chaudhuri et al. (1997) has introduced a shape descriptor called s-shape in the context of perceived border extraction of dot patterns. In this paper we develop a related idea to construct a class of set estimators which have all the three properties stated above. The emphasis of the paper is on establishing the consistency results of the proposed set estimator. It is shown that the s-shape is a consistent estimator not just under the uniform distribution, but also when the points are drawn according to any continuous distribution.

The method is illustrated with several examples, and the role of d, the only parameter controlling the structure of the s-shape is discussed. Values of d which appear to be intuitively and experimentally justified are proposed. A bound for the order of error is computed. Possible directions for future research are also mentioned.

AMS (1991) subject classification.Primary 62G05; secondary 62P99, 68T10, 68U10.

Key words and phrases. Consistent set estimator, dot pattern, shape description, s-shape, symmetric difference, Lebesgue measure, minimum spanning tree, digital image, mathematical morphology.

Full paper (PDF)