next up previous
Next: Experimental results in searching Up: Efficient algorithms for Local Previous: Experimental Results

Accuracy Results

Our dataset is the same standard data-set used by PSIST[3]and other algorithms like ProgRESS,geometric hashing please refer [3] for additional details of the dataset. The dataset consist of 181 superfamilies, and each of the superfamilies have atleast 10 protein structures (the proteins are chosen in such a way that there is less than 30% of sequence homology between any two proteins from the same superfamily.) from the same superfamily, these superfamilies are based on SCOP[2] classification, so our database consists of around 2000 proteins. The query sample is a sample of 176 proteins selected randomly from these 2000 proteins (PSIST used a sample size of 176 proteins so we also sticked with the same sample size). Once the sample is selected we run both our algorithm and PSIST and classify the results based on maximum occuring superfamily and class on the top-20 ranked proteins by our algorithm and PSIST. We selected top-20 because we want also measure the sensitivity of both the algorithms, by sensitivity we mean increasing the number of top ranked proteins should not have effect on the classification since we consider the maximum occuring superfamily and class as the basis of our classification. We ran the experiment several times to measure the average number of positive and false classifications. The results indicate that our algorithm acheives an average accuracy of 84.09%(super family) and 86.93%(class), see Table1 for additional details, Table2,3 show the results of the top ranked proteins for query protein 1c2n, as we can see for this example we have +ve class classification, but PSIST has both -ve classification. Also for a protein 1hsm we could achieve both +ve class and superfamily classification, but PSIST only +ve class. These are only few examples during some runs while we computed the average classfication accuracy for a random sample of 176 proteins from 2000 protein database as mentioned previously. Please see the following URL's for the complete list of at $ http://trinity.engr.uconn.edu/~vamsik/cgalgo.res.txt$ (for our algorithms accuracy) and $ http://trinity.engr.uconn.edu/~vamsik/psist.20.176.0725.txt$ (for accuracy of PSIST). The reader is encouraged to verify the facts and also the the dataset for both our algorithms is the same.


Table 1: Accuracy comparision between PSIST and CG_ALGO
Algorithm Correct (SF) Correct(Class) Top-K Accuracy(SF) Accuracy(Class)
PSIST 120 129 K=20 $ \frac{120}{176}=68.18\%$ $ \frac{129}{176}=73.29\%$
CG_ALGO 148 153 K=20 $ \frac{148}{176}=84.09\%$ $ \frac{153}{176}=86.93\%$


Further the result of top ranked proteins are illustrated when searched using our algorithm and PSIST.


Table 2: Top scored proteins query 1c2n sf(46626) cl(46456) with our algo
Classfication Length cost pdb SF Class
*c* 44 24.128731 pdb1mbj- 46689 46456
*c* 34 21.989410 pdb2bby- 46785 46456
*c* 40 26.387238 pdb1jtb- 47699 46456
*c* 46 31.961258 pdb1hsn- 47095 46456
*c* 36 25.842737 pdb1nhm- 47095 46456
*c* 32 23.051079 pdb1mbe- 46689 46456
  37 26.797998 pdb2cjo- 54292 53931
*c* 35 25.395412 pdb1uxd- 47413 46456
*c* 36 26.392666 pdb1aab- 47095 46456
*c* 39 28.690279 pdb1mbk- 46689 46456
  40 29.537001 pdb1eot- 54117 53931
*c* 37 27.860340 pdb1etd- 46785 46456
*c* 46 35.429729 pdb1nhn- 47095 46456
  47 36.750553 pdb1e09-A 55961 53931
*c* 47 38.046467 pdb2new- 48695 46456
  45 36.703476 pdb1bt7- 50494 48724
*c* 50 41.156513 pdb1gjt-A 46997 46456
  32 26.897558 pdb4ull- 50203 48724
*c* 37 31.491304 pdb1a2i- 48695 46456
*c* 47 40.159309 pdb1wjd-B 46919 46456
*c* 47 40.571198 pdb1wjd-A 46919 46456
+ve class classification (46456) occurs 16 times False sf classification



Table 3: Top scored proteins query 1C2N sf(46626) cl(46456) using psist
Classfication Score pdb SF Class
*c*sf 114 1C2N_ 46626 46456
*c*sf 84 1COT_ 46626 46456
*c* 63 1PFRA 47240 46456
*c* 61 1PFRB 47240 46456
  60 1GEQB 51366 51349
*c* 60 2AV8A 47240 46456
  60 1GEQA 51366 51349
*c* 60 2AV8B 47240 46456
*c* 60 1AV8A 47240 46456
*c* 60 1AV8B 47240 46456
  59 1BL5_ 53659 51349
  58 1GRP_ 53659 51349
  58 1F8IA 51621 51349
  58 1D3GA 51395 51349
  58 7CEL_ 49899 48724
  58 1QOQA 51366 51349
  58 1CW2A 51366 51349
  58 1D3HA 51395 51349
  58 1C8VA 51366 51349
  58 1QOPA 51366 51349
False class classificationFalse sf classification



Table 4: Top scored proteins for query 1hsm sf(47095) cl(46456) using our algo
Classfication Length cost pdb SF Class
*c*sf 168 72.649147 pdb1hsn- 47095 46456
*c* 33 19.644470 pdb2bby- 46785 46456
  31 19.394936 pdb2cjo- 54292 53931
*c*sf 145 95.475159 pdb1nhm- 47095 46456
*c* 38 26.182356 pdb1ba5- 46689 46456
*c* 38 28.572844 pdb1edj- 46997 46456
*c* 39 30.662464 pdb1wtu-B 47729 46456
*c*sf 127 102.177414 pdb1nhn- 47095 46456
*c*sf 107 86.437531 pdb1hmf- 47095 46456
*c*sf 110 89.275589 pdb1hme- 47095 46456
  33 27.142843 pdb1bc6- 54862 53931
  35 29.192152 pdb1grx- 52833 51349
  42 35.985783 pdb2cjn- 54292 53931
*c* 37 31.731483 pdb1tnt- 46785 46456
  41 35.815174 pdb1mit- 54654 53931
*c*sf 52 46.024429 pdb1hma- 47095 46456
*c* 36 32.188934 pdb1bqv- 47769 46456
*c* 46 41.358707 pdb1hue-A 47729 46456
*c* 42 38.252728 pdb1mbg- 46689 46456
*c* 35 31.920086 pdb1bdc- 46997 46456
  33 30.160261 pdb1svq- 55753 53931
+ve class classification (46456) occurs 15 times +ve sf classification (47095) occurs 6 times



Table 5: Top scored proteins query 1HSM sf(47095) cl(46456) using psist
Classfication Score pdb SF Class
*c*sf 77 1HSM_ 47095 46456
*c*sf 60 1NHN_ 47095 46456
*c* 56 1PFRB 47240 46456
*c* 56 1PFRA 47240 46456
*c* 54 1AV8A 47240 46456
*c* 54 2AV8A 47240 46456
*c* 54 2AV8B 47240 46456
*c* 53 1AV8B 47240 46456
*c* 52 1I4ZE 47188 46456
*c* 52 1I4ZC 47188 46456
*c* 51 1I4ZD 47188 46456
*c* 51 1I4ZG 47188 46456
*c* 51 1ITF_ 47266 46456
*c* 50 1I4ZB 47188 46456
*c* 50 1VLK_ 47266 46456
*c* 50 2LBD_ 48508 46456
*c* 50 4LBD_ 48508 46456
  50 1ICRB 55469 53931
  50 1ICUB 55469 53931
  50 1ICUA 55469 53931
+ve class classification (46456) occurs 17 times False sf classification




Subsections
next up previous
Next: Experimental results in searching Up: Efficient algorithms for Local Previous: Experimental Results
Vamsi Kundeti 2007-10-10