UNSTRUCTURED DATA ANALYSIS TECHNIQUE FOR MEASURING COHESION IN SOURCE CODE USING MACHINE LEARNINGUNSTRUCTURED DATA ANALYSIS TECHNIQUE FOR MEASURING COHESION IN SOURCE CODE USING MACHINE LEARNING

Authors

  • Furqan Ashraf
  • Muhammad Nasir
  • Zakia Jalil

Abstract

One of the major challenges for software community is to find source code quality matrices while using Object Oriented Paradigm. With advancements in Machine Learning and Natural Language Processing (NLP) it is now possible to evaluate code using unstructured data analysis. Many tries have been made to capture this important software quality attribute using traditional structured analysis methods. This research study aims to investigate unstructured data analysis could be performed for calculation of cohesion in source code classes. In this study, we designed an experiment to evaluate cohesion score results of two datasets of source code corpus. Furthermore, unstructured way of measuring high cohesion in source code classes is presented using semantic analysis of class names with method names and class description with methods descriptions. The results gathered through performing experiments yielded that unstructured data analysis technique can be applied for finding cohesion of classes. In this study we calculate LCOM for each class present in obtained datasets. By comparing the experiment result with LCOM we get following results. The study compares the results of cohesion score obtained from this technique with traditional LCOM score of source code classes. It is common practice in software industry to follow naming conventions of classes and writing proper description of classes and methods, unstructured data analysis can be effectively applied for calculating the cohesion score of classes.

Downloads

Published

2026-03-09

How to Cite

Furqan Ashraf, Muhammad Nasir, & Zakia Jalil. (2026). UNSTRUCTURED DATA ANALYSIS TECHNIQUE FOR MEASURING COHESION IN SOURCE CODE USING MACHINE LEARNINGUNSTRUCTURED DATA ANALYSIS TECHNIQUE FOR MEASURING COHESION IN SOURCE CODE USING MACHINE LEARNING. Spectrum of Engineering Sciences, 4(3), 125–144. Retrieved from https://www.thesesjournal.com/index.php/1/article/view/2162