Classification is one of the most common tasks of data mining and knowledge discovery which maps each item of the selected data onto one of a given set of classes. Classification has countless applications in many fields including financial, insurance, medical, social, biological sciences, etc. Improving performance and capabilities have always attracted attention in this field. Feature selection is a preprocessing procedure in pattern recognition and data mining. This thesis uses rough set theory as an eff ective feature selection method. A tree of the subsets of the original features set is developed and searched minimally to prune branches based on a monotonic property. Starting the search from a greedy solution yields an effective and exact feature selection algorithm in rough set for categorical datasets. The capability of the algorithm is compared with full search. Furthermore, its solution and computation time are compared with a meta-heuristic algorithm. The strengths and the weaknesses are described.The classification models developed in this thesis are able to treat different types of features, such as numerical, categorical and mixed features, differently without transforming them. In fact, the distance or similarity measures of case-based reasoning model are built. These measures consider the weight for each feature and handle categorical and numerical features differently. The proposed distance measures use the Euclidean distance for numerical features and co-occurrence of values for categorical features. The proportional distribution of different categorical values of features is computed only with respect to the values of class features at two states: without/with considering the class of the cases. The proposed case-based reasoning models are implemented on categorical and mixed datasets and their performance is evaluated in comparison withthe well-known tools of classification. The problem of sticker defect on cold rolling coils of Mobarakeh Steel Complex, as a classification problem, is investigated to fulfill the practical perspective of thesis. For this purpose, the features which were effective in producing defect are determined from research and expert viewpoints and the available data are collected. After refining the dataset and performing initial analysis, the performance of the proposed classifiers and some of the other well-known methods are used on datasets. Accordingly, the important features responsible for sticker defect are identified. Followed by the extraction of high-accuracy classification rules used for setting different process parameters so as to reduce, or possibly omit sticker defect.