Özet Görüntüleme: 22 / PDF İndirme: 11


  • Çağın KANDEMİR ÇAVAŞ Dokuz Eylül University, Graduate School of Natural and Applied Sciences, Department of Computer Science
  • Çağdaş KÜÇÜK Dokuz Eylül University, Graduate School of Natural and Applied Sciences


Anahtar Kelimeler:

PseAAC, support vector machine, neural network, random forest, signaling proteins


Cells send signals between them to start or end biological activities. These signals can bemolecules like proteins, hormones. Receptors outside and inside of the cell capture thesemolecules and help the signal transduction process. There are 3 types of receptors that are onthe surface of the cell: G-protein linked receptors, enzymic receptors and chemically gated ionchannels.The function of a protein depends on its structure. Signaling proteins have a crucial role in manybiological activities such as functioning of the brain, tongue …etc. Signaling proteins areimportant for drug discovery. For these reasons it is important to define whether a protein is asignaling or not. Since experimental studies are very expensive and time consuming, machinelearning techniques are used for this purpose.Aside from the machine learning techniques that are used the protein encoding scheme is alsoimportant for getting efficient results. Proteins are made of amino acids. Amino acids in aprotein written next to each other is called an amino acid sequence. To use sequences in machinelearning there are protein encoding schemes. Amino acid composition is one of them. Aminoacid composition encodes amino acid frequencies as percentages but the sequence informationis gone in the process. A solution for this was proposed by Kuo‐Chen Chou in 2001 calledpseudo amino acid composition. The aim of this study is to classify signaling proteins encodedby pseudo amino acid composition.Protein sequences in Fasta format were downloaded and encoded using pseudo amino acidcomposition to create a dataset. Training and test sets were separated by 75% and 25%respectively. Dataset had 1867 signaling and 3317 non-signaling proteins. Random forest,support vector machine and deep neural network models were applied to classify the signalingproteins.Random forest, support vector machine and deep neural network give the accuracy values as0.749, 0.748 and 0.763, respectively. The AUROC values were similar to each other butRandom Forest algorithm has the highest value of 0.745. The accuracy rate of random forest, support vector machine and artificial neural networkalgorithms is higher than 0.70, which shows that these models are effective in the classificationof signal proteins encoded by pseudo amino acid composition. 




Nasıl Atıf Yapılır

KANDEMİR ÇAVAŞ, Çağın, & KÜÇÜK, Çağdaş. (2020). CLASSIFICATION OF SIGNALING PROTEINS USING COMPUTER-BASED METHODS. Euroasia Journal of Mathematics, Engineering, Natural & Medical Sciences, 7(8), 53–62.