Predicting reading behavior is a difficult task. Reading behavior depends on various linguistic factors (e.g. sentence length, structural complexity etc.) and other factors (e.g individual's reading style, age etc.). Ideally, a reading model should be similar to a language model where the model is built upon a fixed number of overlapping word sequences (n-grams). But it would be difficult to decide what kind of representation of gaze data (unit of n-grams) would correlate more with cognitive effort associated with reading. Moreover, the randomness associated with gaze data also accounts for data sparsity, making it difficult for gaze based n-gram models to handle real test scenarios. It has already been seen that some important eye-movement phenomena are captured better by scanpaths than considering individual fixations, saccades and pauses. In this talk, we propose and validate an n-gram based gaze model for reading. The units contributing to each n-gram will be scanpaths (in a temporal order). We describe different scanpath extraction techniques and chose the one which minimizes the entropy/perplexity of the system. To handle data sparsity, we cluster the scanpaths into several groups, assign them with ids and use n-grams of cluster-ids instead of taking exact scanpaths.
Journal of Eye Movement Research: 17th European Conference on Eye Movemement, 11-16 August 2013, Lund, Sweden, 2013