Sankhya: The Indian Journal of Statistics

1999, Volume 61, Series B, Pt. 3, pp. 496--513

AN EFFICIENT APPROACH TO CONSISTENT SET ESTIMATION

By

A. RAY CHAUDHURI, *Visva Bharati, Santiniketan*

and

A. BASU, S. K. BHANDARI, and B.B. CHAUDHURI, * Indian Statistical Institute, Calcutta*

*SUMMARY.* Determining the shape of a point pattern on the real plane is a
problem of considerable practical interest and has applications in many
branches of science. Set estimators of a nonparametric nature which may
also be used as shape descriptors should have several desirable properties.
The more important ones among them are the following: (a) The estimator
should be consistent, i.e. the Lebesgue measure of the symmetric difference
of the actual region and the set estimator should go to zero in probability
as the number of sample points increase arbitrarily; (b) it should be
computationally efficient; and (c) it should be automatic, in the sense that
the method should be able to detect the number of independent disjoint
components making up the true region and should not depend on this number
being known. None of the currently known estimators combine all these
properties.

Ray Chaudhuri et al. (1997) has introduced a shape descriptor called
*s-shape* in the context of perceived border extraction of dot patterns.
In this paper we develop a related idea to construct a class of set
estimators which have all the three properties stated above. The emphasis
of the paper is on establishing the consistency results of the proposed set
estimator. It is shown that the * s-shape* is a consistent estimator not
just under the uniform distribution, but also when the points are drawn
according to any continuous distribution.

The method is illustrated with several examples, and the role of d,
the only parameter controlling the structure of the *s-shape* is discussed.
Values of d which appear to be intuitively and experimentally justified
are proposed. A bound for the order of error is computed. Possible
directions for future research are also mentioned.

*AMS (1991) subject classification.*Primary 62G05; secondary 62P99, 68T10, 68U10.

*Key words and phrases. * Consistent set estimator, dot pattern, shape
description, s-shape, symmetric difference, Lebesgue measure,
minimum spanning tree, digital image, mathematical morphology.