Sequence Mining in DNA chips data for Diagnosing Cancer Patients

Date

2011-1

Type

Conference paper

Conference title

SELECTED TOPICS in APPLIED COMPUTER SCIENCE

Author(s)

Mariam Abojela Msaad
Zakaria Suliman Zubi

Abstract

: Deoxyribonucleic acid (DNA) micro-arrays present a powerful means of observing thousands of gene terms levels at the same time. They consist of high dimensional datasets, which challenge conventional clustering methods. The data’s high dimensionality calls for Self Organizing Maps (SOMs) to cluster DNA micro-array data. The DNA micro-array dataset are stored in huge biological databases for several purposes . The proposed methods are based on the idea of selecting a gene subset to distinguish all classes, it will be more effective to solve a multi-class problem, and we will propose a genetic programming (GP) based approach to analyze multi-class micro-array datasets. This biological dataset will be derived from multiple biological databases. The procedure responsible for extracting datasets called DNA-Aggregator. We will design a biological aggregator, which aggregates various datasets via DNA micro-array community-developed ontology based upon the concept of semantic Web for integrating and exchanging biological data. Our aggregator is composed of modules that retrieve the data from various biological databases. It will also enable queries by other applications to recognize the genes. The genes will be categorized in groups based on a classification method, which collects similar expression patterns. Using a clustering method such as k-mean is required either to discover the groups of similar objects from the biological database to characterize the underlying data distribution. arabic 9 English 55