تأثير استخدام تحجيم مقياس البيانات العددية على تعلم الاله بإشراف

المؤلفون

  • منى علي محمد جامعة عمر المختار

DOI:

https://doi.org/10.37376/glj.vi67.5903

الكلمات المفتاحية:

تحجيم الخواص، التعلم الآلي بإشراف، مصنّف آلات المتجهات الداعمة، مصنف Naïve Bayes، مصنف أشجار القرار، مصنفK لأقرب الجيران.

الملخص

قبل إنشاء نماذج التعلم الآلي ، يجب إعداد مجموعة البيانات لتكون مجموعة بيانات عالية الجودة ، وبأفضل تمثيل ممكن للبيانات. قد يكون للخواص المختلفة مقاييس مختلفة مما قد يزيد من صعوبة صياغة المشكلة. قد يعاني النموذج من ضعف الأداء أثناء التعلم مع استخدام مقاييس مختلفة للقيم. تعرض دراستنا استخدام تحجيم مقياس البيانات العددية كخطوة للمعالجة المسبقة للبيانات بهدف بيان مدى فعالية هذه الأساليب في تحسين دقة خوارزميات التعلم. على وجه الخصوص ، تمت مقارنة ثلاث طرق لتحجيم مقياس البيانات العددية مع أربعة خوارزميات تصنيف من خوارزميات للتعلم الآلي للتنبؤ بخطورة الامراض. تم بناء التجارب على مجموعة بيانات Coronavirus 2 (SARS-CoV-2)  والتي شملت 1206 مريض خلال الفترة بين يونيو 2020 وأبريل 2021. تم تأكيد تشخيص جميع الحالات باستخدام  ,RT-PCR  جمعت البيانات الأساسية والخصائص الطبية لجميع المرضى. تشير النتائج إلى أن جميع الخوارزميات المستخدمة تعمل بشكل جيد مع تحجيم مقياس البيانات العددية وهناك تحسن كبير في ادائها في بيانات الاختبار. أخيرًا ، يمكننا أن نستنتج أن هناك تحسن في أداء خوارزميات التصنيف أثناء استخدام تحجيم مقياس البيانات العددية. الخلاصة تساعد هذه الأساليب الخوارزميات على فهم أفضل للتعلم الأنماط مما يساعد في صنع نماذج دقيقة.

التنزيلات

بيانات التنزيل غير متوفرة بعد.

السيرة الشخصية للمؤلف

منى علي محمد، جامعة عمر المختار

Department of Computer Science, Faculty of Sciences, University of Omar Almukhtar, Albaida, Libya

المراجع

M. M. Abualhaj, A. A. Abu-Shareha, M. O. Hiari, Y. Alrabanah, M. Al-Zyoud, and M. A. Alsharaiah, “A Paradigm for DoS Attack Disclosure using Machine Learning Techniques,” Int. J. Adv. Comput. Sci. Appl., vol. 13, no. 3, 2022.

D. A. P. Delzell, S. Magnuson, T. Peter, M. Smith, and B. J. Smith, “Machine learning and feature selection methods for disease classification with application to lung cancer screening image data,” Front. Oncol., vol. 9, p. 1393, 2019.

M. Kang and N. J. Jameson, “Machine learning: fundamentals,” Progn. Heal. Manag. Electron. Fundam. Mach. Learn. Internet Things, pp. 85–109, 2018.

R. Nisbet, G. Miner, and K. Yale, “Handbook of Statistical Analysis and Data Mining Applications.” Academic Press, Inc., 2017.

M. Kuhn and K. Johnson, Applied predictive modeling, vol. 26. Springer, 2013.

N. Pudjihartono, T. Fadason, A. W. Kempa-Liehr, and J. M. O’Sullivan, “A review of feature selection methods for machine learning-based disease risk prediction,” Front. Bioinforma., vol. 2, p. 927312, 2022.

D. S. W. Ho, W. Schierding, M. Wake, R. Saffery, and J. O’Sullivan, “Machine learning snp based prediction for precision medicine. Front Genet. 2019; 10: 267.” 2019.

Y. Xu, K. Hong, J. Tsujii, and E. I.-C. Chang, “Feature engineering combined with machine learning and rule-based methods for structured information extraction from narrative clinical discharge summaries,” J. Am. Med. Informatics Assoc., vol. 19, no. 5, pp. 824–832, 2012.

Ü. Çavuşoğlu, “A new hybrid approach for intrusion detection using machine learning methods,” Appl. Intell., vol. 49, no. 7, pp. 2735–2761, 2019.

T. M. Ma, K. Yamamori, and A. Thida, “A comparative approach to Naïve Bayes classifier and support vector machine for email spam classification,” in 2020 IEEE 9th Global Conference on Consumer Electronics (GCCE), 2020, pp. 324–326.

P. Wang, Y. Zhang, and W. Jiang, “Application of K-Nearest Neighbor (KNN) Algorithm for Human Action Recognition,” in 2021 IEEE 4th Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC), 2021, vol. 4, pp. 492–496.

H. Elaidi, Y. Elhaddar, Z. Benabbou, and H. Abbar, “An idea of a clustering algorithm using support vector machines based on binary decision tree,” in 2018 International Conference on Intelligent Systems and Computer Vision (ISCV), 2018, pp. 1–5.

M. M. Ahsan, M. A. P. Mahmud, P. K. Saha, K. D. Gupta, and Z. Siddique, “Effect of data scaling methods on machine learning algorithms and model performance,” Technologies, vol. 9, no. 3, p. 52, 2021.

W. Xu et al., “Oncometabolite 2-hydroxyglutarate is a competitive inhibitor of α-ketoglutarate-dependent dioxygenases,” Cancer Cell, vol. 19, no. 1, pp. 17–30, 2011.

Y. Tang and I. Sutskever, “Data normalization in the learning of restricted Boltzmann machines,” Dep. Comput. Sci. Univ. Toronto, Tech. Rep. UTML-TR-11-2, pp. 27–41, 2011.

Q. Munisa, “Pengaruh kandungan lemak dan energi yang berbeda dalam pakan terhadap pemanfaatan pakan dan pertumbuhan patin (Pangasius pangasius),” J. Aquac. Manag. Technol., vol. 4, no. 3, pp. 12–21, 2015.

F. R. F. Padao and E. A. Maravillas, “Using Naïve Bayesian method for plant leaf classification based on shape and texture features,” in 2015 International Conference on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment and Management (HNICEM), 2015, pp. 1–5.

A. Ambarwari, Y. Herdiyeni, and I. Hermadi, “Biometric analysis of leaf venation density based on digital image,” TELKOMNIKA (Telecommunication Comput. Electron. Control., vol. 16, no. 4, pp. 1735–1744, 2018.

L. Shahriyari, “Effect of normalization methods on the performance of supervised learning algorithms applied to HTSeq-FPKM-UQ data sets: 7SK RNA expression as a predictor of survival in patients with colon adenocarcinoma,” Brief. Bioinform., vol.

, no. 3, pp. 985–994, 2019.

A. Ambarwari, Q. J. Adrian, and Y. Herdiyeni, “Analysis of the effect of data scaling on the performance of the machine learning algorithm for plant identification,” J. RESTI (Rekayasa Sist. Dan Teknol. Informasi), vol. 4, no. 1, pp. 117–122, 2020.

K. Balabaeva and S. Kovalchuk, “Comparison of temporal and non-temporal features effect on machine learning models quality and interpretability for chronic heart failure patients,” Procedia Comput. Sci., vol. 156, pp. 87–96, 2019.

K. Balabaeva and S. Kovalchuk, “Post-hoc interpretation of clinical pathways clustering using Bayesian inference,” Procedia Comput. Sci., vol. 178, pp. 264–273, 2020.

S. Dong, B. Tang, and R. Chen, “Bearing running state recognition based on non-extensive wavelet feature scale entropy and support vector machine,” Measurement, vol. 46, no. 10, pp. 4189–4199, 2013.

T. Pranckevičius and V. Marcinkevičius, “Comparison of naive bayes, random forest, decision tree, support vector machines, and logistic regression classifiers for text reviews classification,” Balt. J. Mod. Comput., vol. 5, no. 2, p. 221, 2017.

S. Dey, S. Wasif, D. S. Tonmoy, S. Sultana, J. Sarkar, and M. Dey, “A comparative study of support vector machine and Naive Bayes classifier for sentiment analysis on Amazon product reviews,” in 2020 International Conference on Contemporary Computing and Applications (IC3A), 2020, pp. 217–220.

L. Jiang, L. Zhang, L. Yu, and D. Wang, “Class-specific attribute weighted naive Bayes,” Pattern Recognit., vol. 88, pp. 321–330, 2019.

K. L. Priya, M. S. C. R. Kypa, M. M. S. Reddy, and G. R. M. Reddy, “A novel approach to predict diabetes by using Naive Bayes classifier,” in 2020 4th International Conference on Trends in Electronics and Informatics (ICOEI)(48184), 2020, pp. 603–607.

R. Blanquero, E. Carrizosa, P. Ramírez-Cobo, and M. R. Sillero-Denamiel, “Variable

selection for Naïve Bayes classification,” Comput. Oper. Res., vol. 135, p. 105456, 2021.

K. P. Murphy, “Naive bayes classifiers,” Univ. Br. Columbia, vol. 18, no. 60, pp. 1–8, 2006.

M. Rakhra et al., “Crop price prediction using random forest and decision tree regression:-a review,” Mater. Today Proc., 2021.

T. R. Prajwala, “A comparative study on decision tree and random forest using R tool,” Int. J. Adv. Res. Comput. Commun. Eng., vol. 4, no. 1, pp. 196–199, 2015.

R. Caffrey, “Using the Decision Tree (DT) to Help Scientists Navigate the Access to Space (ATS) Options,” in 2022 IEEE Aerospace Conference Proceedings, 2022.

M. Brijain, R. Patel, M. R. Kushik, and K. Rana, “A survey on decision tree algorithm for classification,” 2014.

L. Jiang, Z. Cai, D. Wang, and S. Jiang, “Survey of improving k-nearest-neighbor for classification,” in Fourth international conference on fuzzy systems and knowledge discovery (FSKD 2007), 2007, vol. 1, pp. 679–683.

H. A. Abu Alfeilat et al., “Effects of distance measure choice on k-nearest neighbor classifier performance: a review,” Big data, vol. 7, no. 4, pp. 221–248, 2019.

Z. Zhang, “Introduction to machine learning: k-nearest neighbors,” Ann. Transl. Med., vol. 4, no. 11, 2016.

M. M. Ali, “Dealing with Missing Values in Classification Tasks,” in Special Issue for 5th International Conference for Basic Sciences and Their Applications (5th ICBSTA, 2022), P:------ , 22-24/10/2022 https://ljbs.omu.edu.ly eISSN 2707-6261, 2022.

S. Gnat, “Impact of Categorical Variables Encoding on Property Mass Valuation,” Procedia Comput. Sci., vol. 192, pp. 3542–3550, 2021.

K. Potdar, T. S. Pardawala, and C. D. Pai, “A comparative study of categorical variable encoding techniques for neural network classifiers,” Int. J. Comput. Appl., vol. 175, no. 4, pp. 7–9, 2017.

C. T. T. Thuy, K. A. Tran, and C. N. Giap, “Optimize the Combination of Categorical Variable Encoding and Deep Learning Technique for the Problem of Prediction of Vietnamese Student Academic Performance,” Int. J. Adv. Comput. Sci. Appl., vol. 11, no. 11, 2020.

S. Kotsiantis, “Feature selection for machine learning classification problems: a recent overview,” Artif. Intell. Rev., vol. 42, no. 1, pp. 157–176, 2011.

B. Xue, M. Zhang, W. N. Browne, and X. Yao, “A survey on evolutionary computation approaches to feature selection,” IEEE Trans. Evol. Comput., vol. 20, no. 4, pp. 606–626, 2015.

التنزيلات

منشور

2024-06-17

كيفية الاقتباس

Ali Mohammed, M. . (2024). تأثير استخدام تحجيم مقياس البيانات العددية على تعلم الاله بإشراف. المجلة الليبية العالمية, (67), 1–21. https://doi.org/10.37376/glj.vi67.5903

إصدار

القسم

Articles