Identifying Outliers in a Large Biological Data Base.

The Pro-Kmeans Algorithm involves partitioning that is randomly performed on a set of data which are then put into clusters and then later makes use of the Smith-Waterman algorithm in an effort to come up with a comparison of each protein cluster as well as computing each protein's SumScores. The sequence in the respective cluster which happens to have the most SumScores is normally regarded as the cluster's centroid (Tatusov, 2003). The Smith-Waterman algorithm is applied at this stage to compare the respective protein in the set of data provided with the found centroids and also used in the assigning of objects to the cluster that is nearest to the maximum score and is similar to the identified object. This algorithm repeats the above process many times so as to come up with the maximum function. In this kind of algorithm, the number of clusters forms the input parameter with the output being the most suitable partition in the entire set of data used (Sasson and Linial, 2002).

