Frans Adriaans                         Resources
Home Research Publications Teaching Resources Contact


Adding generalization to statistical learning: The induction of phonotactics from continuous speech

Frans Adriaans, René Kager

Abstract

Emerging phonotactic knowledge facilitates the development of the mental lexicon, as demonstrated by studies showing that infants use the phonotactic patterns of their native language to extract words from continuous speech. The present study provides a computational account of how infants might induce phonotactics from their immediate language environment, which consists of unsegmented speech. Our model, StaGe, implements two learning mechanisms that are available to infant language learners: statistical learning and generalization. StaGe constructs phonotactic generalizations on the basis of statistically learned biphone constraints. In a series of computer simulations, we show that such generalizations improve the segmentation performance of the learner, as compared to models that rely solely on statistical learning. Our study thus provides an explicit proposal for a combined role of statistical learning and generalization in the induction of phonotactics by infants. Furthermore, our simulations demonstrate a previously unexplored potential role for phonotactic generalizations in speech segmentation.

(2010, JML) [Pre-print version]

StaGe Software Package (SSP)

In order to allow for replication of our findings, and to encourage further research on phonotactic learning and speech segmentation, we have created a software package, which can be downloaded below. The software can be used to run phonotactic learning and speech segmentation simulations, as described in Adriaans and Kager (2010). In addition, it allows the user to train models on new data sets, with the possibility to use other statistical measures, thresholds, and a different inventory of phonological segments and features. The package implements several learning models (StaGe, transitional probability, observed/expected ratio) and segmentation models (Optimality Theory, threshold-based segmentation, trough-based segmentation). Please see Adriaans and Kager (2010) for details about these models. A downloadable manual explains the user-defined parameters and explains how the model can be applied to new data sets. Note that, due to copyright issues we are not able to make corpora available on this website.

Please cite the following paper when publishing results obtained with SSP:

Adriaans, F., & Kager, R. (2010). Adding generalization to statistical learning: The induction of phonotactics from continuous speech. Journal of Memory and Language, 62, 311–331.

Downloads

- StaGe Software Package (SSP)
- SSP manual


If you have any questions about SSP (which are not answered by the paper or the manual), please send an e-mail to: