Gupta, Ramneek2; Jung, Eva5; Gooley, Andrew A5; Williams, Keith L.5; Brunak, Søren2; Hansen, Jan Erik4
1 Department of Biotechnology, Technical University of Denmark2 Department of Bio and Health Informatics, Technical University of Denmark3 Macquarie University4 unknown5 Macquarie University
Dictyostelium discoideum has been suggested as a eukaryotic model organism for glycobiology studies. Presently, the characteristics of acceptor sites for the N-acetylglucosaminyl-transferases in Dictyostelium discoideum, which link GlcNAc in an alpha linkage to hydroxyl residues, are largely unknown. This motivates the development of a species specific method for prediction of O-linked GlcNAc glycosylation sites in secreted and membrane proteins of D. discoideum. The method presented here employs a jury of artificial neural networks. These networks were trained to recognize the sequence context and protein surface accessibility in 39 experimentally determined O-alpha-GlcNAc sites found in D. discoideum glycoproteins expressed in vivo. Cross-validation of the data revealed a correlation in which 97% of the glycosylated and nonglycosylated sites were correctly identified. Based on the currently limited data set, an abundant periodicity of two (positions-3, -1, +1, +3, etc.) in Proline residues alternating with hydroxyl amino acids was observed upstream and downstream of the acceptor site. This was a consequence of the spacing of the glycosylated residues themselves which were peculiarly found to be situated only at even positions with respect to each other, indicating that these may be located within beta-strands. The method has been used for a rapid and ranked scan of the fraction of the Dictyostelium proteome available in public databases, remarkably 25-30% of which were predicted glycosylated. The scan revealed acceptor sites in several proteins known experimentally to be O-glycosylated at unmapped sites. The available proteome was classified into functional and cellular compartments to study any preferential patterns of glycosylation. A sequence based prediction server for GlcNAc O-glycosylations in D. discoideum proteins has been made available through the WWW at http://www.cbs.dtu.dk/services/DictyOGlyc/ and via E-mail to DictyOGlyc@cbs.dtu.dk.