Availability, informatively and burstiness: Why average corpus measures are an inaccurate guide to surprisal in language

Abstract

It has been proposed that Chinese classifiers facilitate efficient communication by reducing the noun uncertainty in context. Although recent evidence has undermined this proposal, it was obtained using the common method of equating noun occurrence probabilities with corpus frequencies. This method implicity assumes words occur uniformly across contexts, yet this is inconsistent with empirical findings showing word distributions to be bursty. We hypothesized that if language users are sensitive to burstiness, and if classifiers provide information about upcoming nouns, this information will be less important in reducing uncertainty about noun after their first mention. We show that classifier usage provides more information at earlier mentions of nouns and and less information at later mentions, and that the actual classifier distribution appears inconsistent with previous proposals. These results support the idea that classifiers facilitate efficient communication and indicate that language users representations of lexical probabilities in context are dynamic.

Publication
in Proceedings of the 46th Annual Meeting of the Cognitive Science Society

Related