top of page

Breast Cancer diagnosis using Machine Learning Techniques

(This is an introduction to a machine learning work i have been doing.

Please give your suggestions)

INTRODUCTION

Breast cancer is one of the prime reason of death of females. Early detection and classification of tumor can save lifes. Medical professionals make mistakes while identifying a disease, and here technology can help in reducing the risk. Using machine learning and deep mining techniques Diagnosis accuracy for breast cancer have been reached up to 91.1% where as diagnosis made by an experienced doctor is on an average 79.9% accurate.

On very simple terms its the known fact that the accuracy in cancer prediction very much determines cancer result. Now with all the research going on in variety of fields we get to know of several biomarker which are found to be a predictor for cancer. It has also been established that the combination of these biomarkers are even more predictive than any single marker. When these informations about these biomarkers are combined with macro-scale clinical data(tumor type,size etc), then the accuracy and the robustness of cancer prognoses increase even more.

We just saw that the number of parameters needed to predict cancer are quite large which makes it very difficult for human to take in account all the factors important for prognosis of cancer. At this time of need machine learning techniques comes to our rescue.

DATA

The data i used is “Breast Cancer Wisconsin Diagnostic” dataset from UCI Machine learning repository(http://archive.ics.uci.edu/ml). This data includes measurement from image of fine needle aspirate of a breast mass. The values represent the characteristic of cell nuclei present in the image.

Further this data set includes 569 examples of cancer biopsies,each with 32 features. These 30 features mostly represent the mean,standard error,and the largest value of 10 different characteristic of cell nuclei image.(eg:- radius, texture,perimeter,area,smoothness, compactness, concavity,concave points, symmetry,fractal dimensions.) Other two features are id and diagnosis(ie Benign or Malignant).

During this work, along with some discussion on exploiting some less hidden pattern in the data, i have applied Artificial Neural Network,K-Nearest Neigbours and Support Vector machines to classify Breast Cancer data and analyzed the performance of various classifiers. I have used sensitivity,specificity, and accuracy as performance measure for comparing the performance of different classifiers on the data.

RESULTS:-

Confusion Matrix obtained for Neural-net predictor having 5 hidden layers:-

hidden_layer:- This column in result table denotes the number of hidden layers taken during the learning of neural net model. Sigmoid and Radial are the two kinds of kernel function taken while implementing Support Vector Machine to build the classifier.

TPR(True Positive Rate),FPR(False Positive Rate),Precision,Specificity and Accuracy have there usual meaning.

On a brief note " I have actually investigated the utility machine learning methods for detecting cancer by applying Machine learning algorithm to the measurement of biopsied cells from women with abnormal breast masses."

Comments


bottom of page