Though one might think of media as an audiovisual stream of consciousness, we frequently encode frames of video sequences and waves of sound into strings of text. Language allows us to both share the internal representations of what we perceive as mental concepts, as well as categorizing them as distinct states in the continuous ebb and flow of emotions underlying consciousness. Whether it being a soundscape of structured peaks or tiny black characters lined up across a page, we rely on syntax for parsing sequences of symbols, which based on hierarchically nested structures allow us to express and share the meaning contained within a sentence or a melodic phrase. As both low-level semantic structure of texts and our affective responses can be encoded in words, a simplified cognitive model can be constructed which uses LSA latent semantic analysis to emulate how we perceive the emotional context of media based on lyrics, synopses, subtitles, blogs or web pages associated with the content. In the proposed model the bottom-up generated sensory input is a matrix of tens of thousands of words co-occurring within multiple contexts, that are in turn represented as vectors in a semantic space of reduced dimensionality. While top-down, patterns of emotional categorization emerge by defining term vector distances to affective adjectives, that constrain the latent semantic structures according to the neurophysiological dimensions of valence and arousal. The thesis thus combines elements of machine learning with aspects of cognitive linguistics that potentially could be utilized in applications ranging from information retrieval and media personalization, to emotional brand building or neuroscientific modeling of syntax and semantics.