Abstract:Text data accounts for a large proportion in the era of big data nowadays, text classification, as an effective method of managing and organizing text data, has attracted much attention. KNN is a classic classification algorithm, but its classification speed and accuracy cannot be considered synchronously. Aiming at this shortage, the improved KMedoids clustering algorithm is adopted to cut the training samples which make little contribution to the classification, to reduce the KNN similarity computation. The representativeness function is defined in order to treat K nearest neighbor samples of testing text differently, to enhance the accuracy of KNN. The results show that the improved method performs better than the traditional method both in speed and accuracy of classification.