I'm interested in understanding the role of modeling assumptions in models of human cognition. I work with models of infant speech segmentation. By around 6 or 7 months of age, infants are already learning to identify boundaries in speech, breaking up the stream of sounds into meaningful chunks (along the lines of what we'd think of as words). My work adapts existing Bayesian models of segmentation to better match experimental evidence.
Unit of Representation:
How do infants process the sounds they hear? Traditionally, models assumed infants break up speech into individual sounds or phonemes. This contrasts with experimental evidence suggesting infants segment using syllables. My work investigates syllables as a possible unit of representation for Bayesian speech segmentation. Testing over 7 languages, I showed that syllables provide a good base for a relatively simple Bayesian learner. Non-Bayesian models were used for comparison and did not succeed on all languages. To help other researchers in the field, the syllabified corpora are available on GitHub.
Bayesian models rely on complex inference proceedures in order to learn model parameters. This is often done in a "batch" process where all the data is seen at once. I build off the work of Pearl et al. (2011) to show that "online" inference, processing utterance by utterance, is still feasible for syllable-based learners. I find that properties of English interact with online inference in interesting ways, underscoring the importance of evaluating on multiple languages.
Traditionally, models of language acquisition are evaluated against gold standards that reflect adult linguistic knowledge. In modeling early segmentation, infants don't achieve adult knowledge and so these comparisons are difficult to interpret. I've investigated multiple methods to avoid these gold standard problems. First, I performed a detailed error analysis for each learner, demonstrating that the Bayesian approach produces reasonable errors that vary according to the language being learned. Second, I treat segmentation as part of the language learning pipeline where the output of segmentation is used to learn other later tasks. I showed that gold standard segmentation performance does not necessarily indicate that the output is useful for tasks such as word-object mapping or stress pattern acquisition.
Phillips, L. & Pearl, L. (Under Review). Evaluating language acquisition strategies: A cross-linguistic look at early segmentation.
Phillips, L. & Pearl, L. (2015). The utility of cognitive plausibility in language acquisition modeling: Evidence from word segmentation. Cognitive Science, 1-31.
Bent, T., Loebach, J.L., Phillips, L., & Pisoni, D.B. (2011). Perceptual adaptation to sine-wave vocoded speech across languages. Journal of Experimental Psychology: Human Perception and Performance, 37(5), 1607-1616.
Phillips, L. & Pearl, L. (2015). Utility-based evaluation metrics for models of language acquisition: A look at speech segmentation. Workshop on Cognitive Modeling and Computational Linguistics, NAACL.
Phillips, L. & Pearl, L. (2014). Bayesian inference as a viable cross-linguistic word segmentation strategy: It's all about what's useful. Proceedings of the 36th Annual Conference of the Cognitive Science Society. Quebec City, CA, 2775-2780.
Phillips, L. & Pearl, L. (2014). Bayesian inference as a cross-linguistic word segmentation strategy: Always learning useful things. Proceedings of the Computational and Cognitive Models of Language Acquisition and Language Processing Workshop, EACL. Gothenburg, Sweden, 9-13.
Phillips, L. & Pearl, L. (2012). 'Less is More' in Bayesian word segmentation: When cognitively plausible learners outperform the ideal. Proceedings of the 34th Annual Conference of the Cognitive Science Society.