Document clustering is becoming more and more important with the abundance of text documents available through World Wide Web and corporate document management systems. Document clustering is the process of categorizing text document into a systematic cluster or group, such that the documents in the same cluster are similar whereas the documents in the other clusters are dissimilar. This survey includes the information about data mining clustering technique for unstructured data.
Liping Jing, “Survey of Text Clustering”, Department of Mathematics, The University of Hong Kong, HongKong, China, ISBN: 7695-1754-4/02, 2005
Likas, A., Vlassis, N. and Verbeek, J.J. “The Global k-means Clustering algorithm”, Pattern Recognition , Vol. 36, No. 2, pp. 451-461, 2003.
Shehroz S. Khan and Amir Ahmad, “Cluster Center Initialization Algorithm for K-means Clustering”, Pattern Recognition Letters, Vol. 25, No. 11, pp. 1293-1302, 2004.
Agrawal, Rakesh, Gehrke, Johannes, Gunopulos, Dimitrios, Raghavan and Prabhakar, “Automatic subspace clustering of high dimensional data”, Data Mining and Knowledge Discovery (Springer Netherlands) Vol. 11, pp. 5-33, DOI:10.1007/s10618-005-1396-1, 2005.
Malay K. Pakhira, “A Modified k-means Algorithm to Avoid Empty”, International Journal of Recent Trends in Engineering, Vol. 1, No. 1, pp. 220-226, 2009.
Charu C. Aggarwal, Jiawei Han, Jianyong Wang and Philip S. Yu, “A Framework for Clustering Evolving Data Streams”, Proceedings of the 29th international conference on Very Large Data Bases (VLDB), pp. 81-92, 2003.
Guo-Yan Huang, Da-Peng Liang, Chang-Zhen Hu and Jia-Dong Ren, “An algorithm for clustering heterogeneous data streams with uncertainty”, 2010 International Conference on Machine Learning and Cybernetics (ICMLC), Vol. 4, pp. 2059-2064, 2010.
Alam, S., Dobbie, G., Riddle, P. and Naeem, M.A. “Particle Swarm Optimization Based Hierarchical Agglomerative Clustering”, IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), Vol. 2, pp. 64-68, 2010.
Shin-Jye Lee and Xiao-Jun Zeng, “A three-part input-output clustering-based approach to fuzzy system identification”, 2010 10th International Conference on Intelligent Systems Design and Applications (ISDA), pp. 55-60, 2010.
Li Taoying, Chne Yan, Qu Lili and Mu Xiangwei, “Incremental clustering for categorical data using clustering ensemble”, 29th Chinese Control Conference (CCC), pp. 2519-2524, 2010.
Nagy, G. “State of the art in pattern recognition”, Proceedings IEEE, Vol. 56, pp. 836-862, 1968.
Raju, G.T. and Sudhamani, M.V. “A novel approach for extraction of cluster patterns from Web Usage Data and its performance analysis”, International Conference on Emerging Trends in Electrical and Computer Technology (ICETECT), pp. 718-723, 2011.
Crescenzi valter, Giansalvatore Mecca, Paolo Merialdo and Paolo Missier, “An Automatic Data Grabber for Large Web Sites”, VLDB , pp. 1321-1324, 2004.
Miha Grcar, Marko Grobelnik and Dunja Mladenic, “Using Text Mining and Link Analysis for Software Mining”, Lecture Notes in Computer Science, Vol. 4944, pp. 1-12, 2008.
Grcar, M., Mladenic, D., Grobelnik, M., Fortuna, B. and Brank, J. “Ontology Learning Implementation”, Project report IST-2004-026460 TAO, WP 2, D2.2, 2006.
Meila, M. and Heckerman, D. “An experimental comparison of model-based clustering methods”, Machine Learning, kluwer Academic publishers, Vol. 42, pp. 9-29, 2001.
Pallav Roxy and Durga Toshniwal, “Clustering Unstructured Text Documents Using Fading Function”, International Journal of Information and Mathematical Sciences, Vol. 5, No. 3, pp. 149-156, 2009.
Kohonen, T., Kaski, S., Lagus, K., Salojrvi, J., Honkela, J., Paatero, V. and Saarela, A. “Self Organization of a Massive Document Collection”, IEEE Transactions Neural Networks, Vol. 11, pp. 574-585, 2000.
Tantrum, J., Murua, A. and Stuetzle, W. “Hierarchical model-based clustering of large datasets through fractionation and refractionation”, Proc. 8th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pp. 183-190, 2002.
Dhillon Inderjit, S. and Modha Dharmendra, S. “A data clustering algorithm on distributed memory multiprocessors”, In Large-Scale Parallel Data Mining, pp. 245-260, 2000.
Steinbach, M., Karypis, G. and Kumar,V. “A Comparison of Document Clustering Techniques”, KDD Workshop on Text Mining, pp. 109-110, 2000.
Vaithyanathan, S. and Dom, B. “Model-based Hierarchical Clustering”, Proc. 16th Conf. Uncertainty in Artificial Intelligence, pp. 599-608, 2000.
Jiabin Deng, JuanLi Hu, Hehua Chi and Juebo Wu, “An Improved Fuzzy Clustering Method for Text Mining”, Second International Conference on Networks Security Wireless Communications and Trusted Computing (NSWCTC), Vol. 1, pp. 65-69, 2010.
Dave, R.N. “Generalized fuzzy C-shells Clustering and Detection of Circular and Elliptic Boundaries”, Pattern Recognition, Vol. 25, pp. 713-722, 1992.
Odukoya, O.H., Aderounmu, G.A. and Adagunodo, E.R. “An Improved Data Clustering Algorithm for Mining Web Documents”, International Conference on Computational Intelligence and Software Engineering (CiSE), pp. 1-8, 2010.
Liu Jinling and Zhou Hong, “Clustering Efficient Method on Mass Chinese Text Based on Semantic Concept”, International Forum on Information Technology and Applications (IFITA), Vol. 2, pp. 151-155, 2010.
Renchu Guan, Xiaohu Shi, Marchese, M., Chen Yang and Yanchun Liang, “Text Clustering with Seeds Affinity Propagation”, IEEE Transactions on Knowledge and Data Engineering, Vol. 23, No. 4, pp. 627 - 637, 2011.
Macskassy, S.A., Banerjee, A. Davison, B.D. and Hirsh, H. “Human Performance On Clustering Web Pages: A Preliminary Study”, In Proc. of KDD-1998, New York, USA, pp. 264-268, Menlo Park, CA, USA, 1998.
Beyer, K., Goldstein, J., Ramakrishnan, R. and Shaft, U. “When is ‘Nearest Neighbor’ Meaningful”, In Proc. of ICDT-1999, Jerusalem, Israel, pp. 217-235, 1999.
Data Mining, Text mining, Document clustering and Clustering techniques.