- Faculty of Fundamental Sciences
- Faculty Departments
- Department of Mathematical Statistics
Department of Mathematical Statistics
The Department prepares Bachelor鈥檚 and Master鈥檚 specialists in data analysis, data science, and statistics who are able to apply statistical methods in various fields of research and practice. It conducts theoretical research on limit theorems in probability theory, as well as applied statistical research ranging from technological processes to economics and genetics.
About the Department
Partners
Department of Mathematical Statistics maintains active cooperation with social and business partners.
Show more
Department Staff
The staff of the Department of Mathematical Statistics are highly qualified specialists in mathematics, statistics, and data analysis. They conduct research and deliver study courses to students.
Administration
- Administration
- Research Staff
-
Assoc Prof. Dr. R奴ta Simanavi膷ien臈Head
-
Edita Dombrovskien臈Administrator
Show more
Thesis abstracts
Years
Qualification
Clear selections
Dominykas Jasas
— Dr Tadas 沤virblis
Determining the efficiency of air purification systems using statistical methods
The final thesis examines the efficiency of an air purification system using statistical methods. The aim of the thesis is to determine the efficiency of the air purification...
Determining the efficiency of air purification systems using statistical methods
Student:
Dominykas Jasas
Supervisor:
Dr Tadas 沤virblis
Department:
Department of Mathematical Statistics
Thesis abstract (LT)
Baigiamajame darbe nagrin臈jamas oro valymo sistemos efektyvumas taikant statistinius metodus. Tyrimo tikslas 鈥 nustatyti oro valymo sistemos efektyvum膮 remiantis statistiniais metodais. Tyrime analizuoti 57 eksperimentiniai bandymai, kuriuose daleli懦 koncentracija buvo matuojama prie拧 oro valymo sistemos akustin臈s aglomeracijos etap膮 ir po jo. Dalel臈s suskirstytos 寞 smulki懦j懦 ir stambi懦j懦 daleli懦 grupes, o efektyvumui 寞vertinti apskai膷iuotas procentinis koncentracijos pokytis. Darbe taikyti apra拧omosios statistikos metodai, Wilcoxon ranginis testas, dvifaktor臈 ANOVA su Tukey post-hoc palyginimais, daugialyp臈 tiesin臈 regresija, atsitiktini懦 mi拧k懦 regresija ir atramini懦 vektori懦 regresija. Prognozavimo modeliai buvo lyginami pagal kokyb臈s rodiklius: MAE, RMSE ir determinacijos koeficient膮. Gauti rezultatai parod臈 reik拧ming膮 smulki懦j懦 daleli懦 koncentracijos suma啪臈jim膮, ta膷iau efektyvus akustin臈s aglomeracijos procesas nebuvo vienareik拧mi拧kai statisti拧kai patvirtintas. Daugialyp臈s tiesin臈s regresijos modeliuose oro srauto greitis ir tiekiama daleli懦 koncentracija buvo statisti拧kai reik拧mingi parametrai, ta膷iau d臈l netenkinam懦 liekan懦 prielaid懦 rezultatai nebuvo priimti kaip vienareik拧mi拧kai patikimi. Prognozavimo modeli懦 palyginimas parod臈, kad geriausius rezultatus pasiek臈 atsitiktini懦 mi拧k懦 regresijos modelis, kuris testavimo imtyje paai拧kino apie 70 % duomen懦 variacijos. Vis d臈lto d臈l nedidel臈s eksperimentini懦 bandym懦 imties rezultatai taip pat nebuvo vienareik拧mi拧kai patvirtinti.
Thesis abstract (EN)
The final thesis examines the efficiency of an air purification system using statistical methods. The aim of the thesis is to determine the efficiency of the air purification system based on statistical methods. The study analysed 57 experimental trials in which particle concentration was measured before and after the acoustic agglomeration stage of the air purification system. The particles were divided into fine and coarse particle groups, and the percentage change in concentration was calculated to evaluate the system鈥檚 efficiency. The study applied descriptive statistics, the Wilcoxon signed-rank test, two-way ANOVA with Tukey post-hoc comparisons, multivariate linear regression, random forest regression and support vector regression. The prediction models were compared using performance metrics: MAE, RMSE and the coefficient of determination. The results showed a significant decrease in fine particle concentration; however, the efficiency of the acoustic agglomeration process was not unambiguously statistically confirmed. In the multivariate linear regression models, air flow velocity and supplied particle concentration were statistically significant parameters; however, due to unmet residual assumptions, the results were not accepted as unambiguously reliable. The comparison of prediction models showed that the random forest regression model achieved the best results, explaining approximately 70% of the data variation in the test sample. Nevertheless, due to the small sample of experimental trials, the results were also not unambiguously confirmed.
Edvinas Kurmis
— Dr Mindaugas Jasas
Explainability and Interpretation of Peer-to-Peer (P2P) Loan Risk Assessment Models
This bachelor's thesis examines the assessment of loan default risk in a peer-to-peer (P2P) lending platform using logistic regression and artificial neural network models. The aim of the...
Explainability and Interpretation of Peer-to-Peer (P2P) Loan Risk Assessment Models
Student:
Edvinas Kurmis
Supervisor:
Dr Mindaugas Jasas
Department:
Department of Mathematical Statistics
Thesis abstract (LT)
Baigiamajame darbe nagrin臈jamas paskol懦 negr膮啪inimo rizikos vertinimas tarpusavio skolinimo platformoje, taikant logistin臈s regresijos ir neuronini懦 tinkl懦 modelius. Darbo tikslas 鈥 palyginti 拧i懦 modeli懦 prognozavimo kokyb臋 bei interpretuoti j懦 sprendimus naudojant paai拧kinamojo dirbtinio intelekto metodus. Tyrime naudoti sutelktinio finansavimo platformos paskol懦 duomenys, apimantys skolinink懦 demografines, socialines ir finansines charakteristikas. Sukurti keturi prognozavimo modeliai, kuri懦 kokyb臈 vertinta taikant klasifikavimo rodiklius. Modeli懦 sprendimams ai拧kinti naudotas SHAP metodas, leid臋s atlikti tiek globali膮, tiek lokali膮 prognozi懦 interpretacij膮. Tyrimo rezultatai parod臈, kad logistin臈 regresija nagrin臈jamoje u啪duotyje pasiek臈 geresnius rezultatus nei neuroniniai tinklai. Taip pat nustatyta, kad SHAP metodas suteikia galimyb臋 nustatyti svarbiausius prognozes lemian膷ius veiksnius ir paai拧kinti individualias kredito rizikos prognozes, ta膷iau jo rezultatai priklauso nuo pasirinkto foninio duomen懦 rinkinio. Gauti rezultatai rodo, kad paai拧kinamojo dirbtinio intelekto metodai gali prisid臈ti prie skaidresnio ir geriau interpretuojamo kredito rizikos vertinimo proceso.
Darbo apimtis 鈥 44 p. teksto be pried懦, 14 iliustr., 7 lent., 12 bibliografini懦 拧altini懦. Atskirai pridedami darbo priedai.
Thesis abstract (EN)
This bachelor's thesis examines the assessment of loan default risk in a peer-to-peer (P2P) lending platform using logistic regression and artificial neural network models. The aim of the study is to compare the predictive performance of these models and to interpret their decisions using explainable artificial intelligence methods. The research is based on P2P lending data containing borrowers鈥 demographic, social, and financial characteristics. Four predictive models were developed, and their performance was evaluated using classification metrics. The SHAP method was applied to explain model decisions, enabling both global and local interpretation of predictions. The results showed that logistic regression outperformed neural networks in the analyzed task. The findings also demonstrated that the SHAP method can identify the most important factors influencing predictions and explain individual credit risk assessments; however, its results depend on the selected background dataset. Overall, the study indicates that explainable artificial intelligence methods can contribute to a more transparent and interpretable credit risk assessment process.
Thesis volume: 44 pages of text excluding appendices, 14 figures, 7 tables, and 12 references. Appendices are provided separately.
Elz臈 Viltrakyt臈
— Assoc Prof Dr Tomas Reka拧ius
Statistical analysis of the gap between men`s and women`s salaries in Lithuania
The final bachelor鈥檚 thesis performs a statistical analysis of the gender pay gap in Lithuania. Based on the theoretical framework and descriptive analysis, linear regression and analysis of...
Statistical analysis of the gap between men`s and women`s salaries in Lithuania
Student:
Elz臈 Viltrakyt臈
Supervisor:
Assoc Prof Dr Tomas Reka拧ius
Department:
Department of Mathematical Statistics
Thesis abstract (LT)
Baigiamajame bakalauro darbe atliekama atlyginim懦 atotr奴kio tarp vyr懦 ir moter懦 Lietuvoje statistin臈 analiz臈. Remiantis teorine dalimi bei apra拧om膮ja analize, darbe kuriami tiesin臈s regresijos bei dispersin臈s analiz臈s modeliai, kurie naudojami 寞vertinti skirting懦 kintam懦j懦 寞tak膮 vidutiniam atlyginimui bei nustatyti lyties reik拧mingum膮 atlyginim懦 skirtumui. Siekiant paai拧kinti atlyginim懦 atotr奴k寞, atlikta Blinder 鈥 Oaxaca dekompozicija ir i拧skirta statisti拧kai paai拧kinama ir nepaai拧kinama 拧io rei拧kinio komponent臈s. Darbe taip pat sudaryta santykinio atlyginim懦 atotr奴kio laiko eilut臈, atlikta jos analiz臈 bei prognoz臈. Pabaigoje pateikti baigiamojo darbo rezultatai bei i拧vados.
Darb膮 sudaro 7 dalys: 寞vadas, literat奴ros ap啪valga, metodin臈 dalis, praktin臈 dalis, rezultatai ir i拧vados, literat奴ros s膮ra拧as, priedai.
Darbo apimtis 鈥 40 p. teksto be pried懦, 15 iliustr., 6 lent., 12 bibliografini懦 拧altini懦.
Atskirai pridedami darbo priedai.
Thesis abstract (EN)
The final bachelor鈥檚 thesis performs a statistical analysis of the gender pay gap in Lithuania. Based on the theoretical framework and descriptive analysis, linear regression and analysis of variance (ANOVA) models are developed and used to asses the impact of various variables on average wages and to determine the significance of gender in explaining wage differences. To explain the gender pay gap, a Blinder 鈥 Oaxaca decomposition is performed, distinguishing between the statistically explained and unexplained components of this phenomenon. The thesis also constructs a time series of the relative gender wage gap, followed by its analysis and forecasting. Finally, the results and conclusions of the thesis are presented.
The thesis consists of the following parts: introduction, literature review, theoretical-methodological part, practical part, conclusions, list of references, and appendices. Thesis length: 40 pages without appendices, 15 figures, 6 tables, 12 bibliographic sources.
Liana Radeckaja
— Dr Vilma Nekra拧ait臈-Lieg臈
Development of Digital Learning Tasks for Teaching Small Area Estimation in Sampling Methods Courses
This bachelor's thesis focuses on the development of digital educational exercises for the topic of small area estimation in sampling methods courses. The work is part of the...
Development of Digital Learning Tasks for Teaching Small Area Estimation in Sampling Methods Courses
Student:
Liana Radeckaja
Supervisor:
Dr Vilma Nekra拧ait臈-Lieg臈
Department:
Department of Mathematical Statistics
Thesis abstract (LT)
Bakalauro baigiamasis darbas skirtas skaitmenini懦 mokom懦j懦 u啪davini懦 k奴rimui ma啪懦 sri膷i懦 vertinimo temai im膷i懦 metod懦 kursuose. Darbas yra tarptautinio 鈥濶ordplus Horizontal" projekto 鈥濱nnovative Sampling Methods: A Baltic-Nordic Collaboration" dalis, kurio tikslas 鈥 sukurti visapusi拧k膮 im膷i懦 metod懦 kurs膮 Baltijos ir 艩iaur臈s 拧ali懦 auk拧tojo mokslo poreikiams.
Teorin臈je dalyje atliekama literat奴ros analiz臈, apra拧omos pagrindin臈s im膷i懦 teorijos s膮vokos ir nagrin臈jami du klasikiniai ma啪懦 sri膷i懦 vertinimo modeliai: Fay鈥揌erriot srities lygio modelis ir Battese鈥揌arter鈥揊uller elemento lygio modelis. Praktin臈je dalyje sukurta trij懦 sud臈tingumo lygi懦 u啪davini懦 strukt奴ra 鈥 ai拧kinamasis taikomasis pavyzdys, i拧 dalies strukt奴ruota praktin臈 u啪duotis ir kompleksin臈 taikomoji u啪duotis su realiais duomenimis. U啪daviniai realizuoti R programavimo aplinkoje pasitelkiant sae paket膮. Empiriniai rezultatai gauti pritaikius Fay鈥揌erriot ir Battese鈥揌arter鈥揊uller modelius Lietuvos paslaug懦 sektoriaus 寞moni懦 2024 met懦 apyvartos duomenims, suskirstytiems pagal NACE veiklos klasifikatori懦. I拧vadose aptariami pasiekti modeli懦 tikslumo pagerinimai ir u啪duo膷i懦 pritaikymas mokymo procesui.
Darb膮 sudaro 6 dalys: 寞vadas, teorin臈 dalis, tiriamoji dalis, i拧vados, literat奴ros s膮ra拧as, priedai. Naudojama R programavimo kalba.
Thesis abstract (EN)
This bachelor's thesis focuses on the development of digital educational exercises for the topic of small area estimation in sampling methods courses. The work is part of the international "Nordplus Horizontal" project "Innovative Sampling Methods: A Baltic-Nordic Collaboration", whose goal is to create a comprehensive sampling methods course tailored to the higher education needs of the Baltic and Nordic countries.
The theoretical part presents a literature review, describes the main concepts of sampling theory, and examines two classical small area estimation models: the Fay鈥揌erriot area-level model and the Battese鈥揌arter鈥揊uller unit-level model. The practical part introduces a three-level exercise structure 鈥 an explanatory illustrative example, a semi-structured practical exercise, and a complex applied exercise using real data. The exercises are implemented in the R programming environment using the sae package. Empirical results are obtained by applying the Fay鈥揌erriot and Battese鈥揌arter鈥揊uller models to the 2024 turnover data of Lithuanian service sector enterprises, classified according to the NACE economic activity classifier. The conclusions discuss the achieved accuracy improvements of the models and the applicability of the exercises to the teaching process.
The thesis consists of 6 parts: introduction, theoretical part, applied part, conclusions, list of references, and appendices. The R programming language is used.
Rytis Ma啪eika
— Dr Mindaugas Jasas
Investigation of Car Price Dependency on Accessories and Other Characteristics Using Regression Models
This bachelor鈥檚 thesis examines the relationship between used car prices, vehicle equipment and other technical characteristics. The aim of the thesis is to evaluate which vehicle features are...
Investigation of Car Price Dependency on Accessories and Other Characteristics Using Regression Models
Student:
Rytis Ma啪eika
Supervisor:
Dr Mindaugas Jasas
Department:
Department of Mathematical Statistics
Thesis abstract (LT)
Baigiamajame darbe nagrin臈jama naudoto automobilio kainos priklausomyb臈 nuo komplektacijos ir kit懦 charakteristik懦. Darbo tikslas - 寞vertinti, kurie automobili懦 po啪ymiai yra statisti拧kai reik拧mingai susij臋 su kain膮, ir nustatyti, ar papildomos 寞rangos kintam懦j懦 寞traukimas pagerina regresinio modelio kokyb臋. Tyrime naudoti Vokietijos automobili懦 skelbim懦 portalo duomenys, kuriuos sudar臈 24 t奴kst. steb臈jim懦 ir 106 kintamieji. Analizei taikyti du daugialyp臈s tiesin臈s regresijos modeliai: bazinis modelis, sudarytas tik i拧 pagrindini懦 automobilio charakteristik懦, ir i拧pl臈stinis modelis, papildytas papildomos 寞rangos po啪ymiais. Modeli懦 kokyb臈 vertinama pagal determinacijos koeficient膮, MAE, RMSE, MAPE, AIC ir BIC rodiklius, taip pat atlikta modeli懦 diagnostika. Tyrimo rezultatai parod臈, kad did啪iausi膮 automobilio kainos sklaidos dal寞 paai拧kina variklio galia, rida ir automobilio am啪ius. Nustatyta, kad papildomos 寞rangos po啪ymi懦 寞traukimas pagerino prognozavimo tikslum膮: MAE suma啪臈jo nuo 4095,23 Eur iki 3641,26 Eur, o MAPE suma啪臈jo nuo 14,34 % iki 12,93 %. Gauti rezultatai rodo, kad papildoma 寞ranga n臈ra pagrindinis kainos veiksnys, ta膷iau ji suteikia modeliui papildomos informacijos apie automobilio komplektacijos lyg寞 ir pagerina naudoto automobilio kainos prognozavimo tikslum膮.
Darbo apimtis 鈥 65 p. teksto be pried懦, 14 iliustr., ir 15 lent., 16 bibliografini懦 拧altini懦. Atskirai pridedami darbo priedai.
Thesis abstract (EN)
This bachelor鈥檚 thesis examines the relationship between used car prices, vehicle equipment and other technical characteristics. The aim of the thesis is to evaluate which vehicle features are statistically significantly related to price and to determine whether the inclusion of additional equipment variables improves the quality of the regression model. The study uses data collected from a German car advertisement portal, consisting of approximately 24 thousand observations and 106 variables. Two multiple linear regression models were applied: a baseline model based on the main vehicle characteristics and an extended model including additional equipment features. Model quality was evaluated using the coefficient of determination, MAE, RMSE, MAPE, AIC and BIC indicators, and model diagnostics were also performed. The results showed that engine power, mileage and vehicle age explain the largest part of car price variation. It was found that including additional equipment features improved the model鈥檚 predictive accuracy: MAE decreased from 4095,23 Eur to 3641,26 Eur, while MAPE decreased from 14.34% to 12.93%. The results indicate that although additional equipment is not the main price-determining factor, its inclusion provides additional information about the vehicle鈥檚 equipment level and improves the accuracy of used car price prediction.
Thesis length: 65 pages of text excluding appendices, 14 illustrations, 15 tables, and 16 bibliographic sources. Appendices are attached separately.
R奴ta Jurkevi膷i奴t臈
— Assoc Prof Dr R奴ta Simanavi膷ien臈
Analysis of the time required to perform customs inspection tasks and the development of a mathematical model for their optimal allocation
The primary objective of this thesis is to develop a mathematical model, that would be used for optimising the execution of customs inspection tasks. The study analyses data...
Analysis of the time required to perform customs inspection tasks and the development of a mathematical model for their optimal allocation
Student:
R奴ta Jurkevi膷i奴t臈
Supervisor:
Assoc Prof Dr R奴ta Simanavi膷ien臈
Department:
Department of Mathematical Statistics
Thesis abstract (LT)
Pagrindinis 拧io baigiamojo darbo tikslas - parengti matematin寞 model寞 skirt膮 optimizuoti muitinio tikrinimo u啪duo膷i懦 vykdymo proces膮. Darbe analizuojami tikrinim懦 valdymo sistemos duomenys, juos sudaro 241 929 寞ra拧ai. Taikant iteracin寞 algoritm膮 nustatomos individuali懦 u啪duo膷i懦 trukm臈s, graf懦 analiz臈s metodais vizualizuojami ry拧iai tarp 拧i懦 u啪duo膷i懦. Pagrindini懦 komponen膷i懦 analiz臈 parod臈, kad duomenys yra reti. Klasterin臈 analiz臈, k-means metodu i拧skyr臈 tris deklaracij懦 grupes pagal u啪duo膷i懦 trukm臋 ir sud臈tingum膮. Kiekvienam klasteriui sudaryti po du tiesin臈s regresijos modeliai - su fiktyviais kintamaisiais (savait臈s diena ir sezonu) ir be fiktyvi懦 kintam懦j懦. Atlikta 拧i懦 modeli懦 diagnostika ir pateikiamos rekomendacijos muitinio tikrinimo proceso optimizavimui. Darbo pabaigoje pateikiamos i拧vados.
Darb膮 sudaro: 寞vadas, teorin臈 ir metodologin臈 dalis, empirin臈 tyrimo dalis, i拧vados, literat奴ros s膮ra拧as.
Darbo apimtis: 64 p. teksto be pried懦, 43 iliustracijos, 18 lenteli懦, 37 拧altiniai.
Atskirai pridedami darbo priedai.
Thesis abstract (EN)
The primary objective of this thesis is to develop a mathematical model, that would be used for optimising the execution of customs inspection tasks. The study analyses data obtained from a customs inspection management system comprising 241 929 records. An iterative algorithm is created to determine the individual duration of inspection tasks and graph analysis methods are used to visualize the relation between these tasks. Principal component analysis revealed that the data is very sparse. Using k-means clustering method there were identified three groups based on task duration and complexity. Two linear regression models were created for each of the clusters - one including dummy variables (weekdays and seasons) and the other without them. Model diagnostics were applied and recommendations for optimising the customs inspection process provided. The thesis concludes with a summary of findings.
The thesis consists of the following sections: introduction, theoretical and methodological framework, empirical analysis, conclusions, and references.
Scope: 64 pages of main text (excluding appendices), 43 figures, 18 tables, 37 references.
The appendices are attached separately.
Simona Sakalauskait臈
— Assoc Prof Dr Viktoras Chady拧as
Application of Data Analysis Methods for Suspicious User Labeling in the Telecommunications Sector
The aim of this Bachelor's thesis is to investigate and practically apply data analysis methods for identifying and flagging suspicious user behavior in telecommunications Call Detail Record (CDR)...
Application of Data Analysis Methods for Suspicious User Labeling in the Telecommunications Sector
Student:
Simona Sakalauskait臈
Supervisor:
Assoc Prof Dr Viktoras Chady拧as
Department:
Department of Mathematical Statistics
Thesis abstract (LT)
Bakalaurinio darbo tikslas 鈥 i拧tirti ir prakti拧kai pritaikyti duomen懦 analiz臈s metodus 寞tartin懦 vartotoj懦 elgsenos identifikavimui telekomunikacij懦 CDR duomenyse. Tyrimas aktualus d臈l ma啪臈jan膷io taisykl臈mis gr寞st懦 metod懦 efektyvumo prie拧 nuolat kintan膷ias suk膷iavimo schemas ir istorini懦 duomen懦 啪ym懦 tr奴kumo, tod臈l pasirinktas nepri啪i奴rimo ma拧ininio mokymosi metodologinis kelias, leid啪iantis aptikti statisti拧kai i拧skirtin臋 elgsen膮 be i拧 anksto 啪inom懦 suk膷iavimo pavyzd啪i懦. Tyrimui naudoti 1 778 973 skambu膷i懦 duomen懦 寞ra拧ai i拧 9 603 unikali懦 vartotoj懦, papildyti mok臈jim懦 duomenimis. Teorin臈je dalyje aptariama telekomunikacij懦 suk膷iavimo problematika, anomalij懦 aptikimo raida ir pagrindiniai algoritm懦 principai. Praktin臈je dalyje i拧 CDR 寞ra拧懦 suformuotas 28 elgsenos po啪ymi懦 rinkinys, apimantis aktyvumo intensyvumo, ry拧io strukt奴ros, laiko aktyvumo, paros laiko strukt奴ros, kontakt懦 koncentracijos, skambu膷i懦 baigties ir ry拧io technologijos rodiklius. Toliau pritaikyti du nepri啪i奴rimo mokymosi algoritmai 鈥 izoliacijos mi拧kas (iForest) ir lokalaus i拧skirtinumo faktorius (LOF), kurie identifikavo po 97 寞tartinus vartotojus (1,01 % populiacijos). Jaccard sutapimo koeficientas tarp metod懦 rezultat懦 siek臈 J = 0,054, patvirtindamas, kad metodai aptinka skirting懦 tip懦 anomalijas ir vienas kit膮 papildo: iForest identifikuoja globaliai i拧skirtin臋 elgsen膮, o LOF 鈥 lokalaus konteksto anomalijas. Ypa膷 i拧siskyr臈 10 vartotoj懦, identifikuot懦 abiem metodais, kuri懦 skambu膷i懦 skai膷ius 啪enkliai vir拧ijo populiacijos vidurk寞, o iki 96 % skambu膷i懦 buvo nes臈kmingi. Gauti rezultatai sudaro statistiniais nukrypimais gr寞st膮 prioritetizuot膮 rizikos s膮ra拧膮, skirt膮 tolimesnei ekspertinei analizei telekomunikacij懦 operatori懦 rizik懦 valdyme.
Darb膮 sudaro 寞vadas, 2 skyriai, i拧vados, literat奴ros s膮ra拧as ir 7 priedai. Darbo apimtis be pried懦 鈥 74 puslapiai, 15 paveiksl懦, 17 lenteli懦 ir 20 拧altini懦.
Thesis abstract (EN)
The aim of this Bachelor's thesis is to investigate and practically apply data analysis methods for identifying and flagging suspicious user behavior in telecommunications Call Detail Record (CDR) data. The study is motivated by the declining effectiveness of rule-based detection methods against constantly evolving fraud schemes and the lack of labeled historical data, which limits the application of supervised learning approaches. Therefore, an unsupervised machine learning methodology was adopted to identify statistically anomalous user behavior without relying on known fraud examples. The research is based on 1,778,973 call records from 9,603 unique users, supplemented with payment data. The theoretical part reviews telecommunications fraud, the evolution of anomaly detection, and the fundamental principles of anomaly detection algorithms. In the practical part, a set of 28 behavioral features was constructed from the CDR data, covering activity intensity, communication structure, temporal activity, time-of-day patterns, contact concentration, call outcome, and communication technology indicators. Two unsupervised learning algorithms were then applied: Isolation Forest (iForest) and Local Outlier Factor (LOF), each identifying 97 suspicious users (1.01% of the population). The Jaccard similarity coefficient between the methods was J = 0.054, confirming that the algorithms detect different types of anomalies and complement each other: iForest identifies globally unusual behavior, while LOF detects anomalies within a local context. Notably, 10 users were identified by both methods; these users exhibited exceptionally high call volumes, with up to 96% of their calls being unsuccessful. The results provide a statistically driven, prioritized risk list for further expert analysis in telecommunications fraud and risk management. The thesis consists of an introduction, two chapters, conclusions, a list of references, and seven appendices. The thesis comprises 74 pages (excluding appendices), 15 figures, 17 tables, and 20 references.
Toma Grini奴t臈
— Assoc Prof Dr Jolita Nork奴nien臈
Urbanization and Obesity: A Time Series Analysis (1990鈥2020)
This bachelor鈥檚 thesis examines the dynamics of urbanization and obesity indicators over the period 1990鈥2020 using time series analysis. The study relies on World Bank World Development Indicators:...
Urbanization and Obesity: A Time Series Analysis (1990鈥2020)
Student:
Toma Grini奴t臈
Supervisor:
Assoc Prof Dr Jolita Nork奴nien臈
Department:
Department of Mathematical Statistics
Thesis abstract (LT)
艩iame baigiamajame darbe tiriamas urbanizacijos ir nutukimo rodikli懦 kitimas 1990鈥2020 m. laikotarpiu, taikant laiko eilu膷i懦 analiz臈s metodus. Naudojami Pasaulio banko (World Development Indicators) duomenys: urbanizacija apibr臈啪iama kaip miesto gyventoj懦 dalis bendroje populiacijoje (%), o nutukimas 鈥 kaip suaugusi懦j懦 nutukimo paplitimas (%). Analiz臈 atlikta keturiuose regionuose: Afrikoje, Azijoje, Europoje ir 艩iaur臈s Amerikoje. Regioniniai rodikliai sudaryti kaip pasirinkt懦 拧ali懦 svertiniai vidurkiai, taikant populiacijos dyd寞 kaip svor寞. Pirmiausia atlikta apra拧omoji dinamika ir trendo 寞vertinimas, v臈liau 鈥 stacionarumo vertinimas ir transformacij懦 parinkimas. Prognoz臈ms sudaryti parinkti ARIMA modeliai, o j懦 adekvatumas vertintas liekan懦 diagnostika (ACF grafikai, Ljung鈥揃ox testas). Galiausiai urbanizacijos ir nutukimo tarpusavio ry拧ys 寞vertintas Pearsono koreliacijos koeficientu, papildomai analizuojant metinius poky膷ius, siekiant suma啪inti bendro trendo 寞tak膮. Gauti rezultatai rodo, kad abu rodikliai tiriamu laikotarpiu did臈jo visuose regionuose, ta膷iau augimo tempai ir ry拧io tarp rodikli懦 pob奴dis skiriasi tarp region懦.
Thesis abstract (EN)
This bachelor鈥檚 thesis examines the dynamics of urbanization and obesity indicators over the period 1990鈥2020 using time series analysis. The study relies on World Bank World Development Indicators: urbanization is measured as the share of the urban population in total population (%), while obesity is measured as the prevalence of obesity among adults (%). The analysis focuses on four regions: Africa, Asia, Europe, and North America. Regional series are constructed as population-weighted averages of selected countries. First, descriptive analysis and trend assessment are performed, followed by stationarity evaluation and the selection of appropriate transformations. ARIMA models are fitted to generate forecasts, and model adequacy is assessed through residual diagnostics (ACF plots and the Ljung鈥揃ox test). Finally, the relationship between urbanization and obesity is evaluated using the Pearson correlation coefficient, with additional analysis of year-to-year changes to reduce the influence of common trends. The results indicate that both indicators increased across all regions, while growth intensity and the strength/direction of the relationship differ by region.
Dominyka Gruodyt臈
— Assoc Prof Dr Nomeda Brat膷ikovien臈
Health data research
The aim of this bachelor's thesis is to develop and compare convolutional neural networks and decision tree models for classifying images of skin diseases. The study analyzes data...
Health data research
Student:
Dominyka Gruodyt臈
Supervisor:
Assoc Prof Dr Nomeda Brat膷ikovien臈
Department:
Department of Mathematical Statistics
Thesis abstract (LT)
Baigiamojo bakalauro darbo tikslas 鈥 sukurti ir palyginti konvoliucin寞 neuronin寞 tinkl膮 bei sprendim懦 med啪io modelius, skirtus odos lig懦 vaizd懦 klasifikavimui. Darbe analizuojami duomenys, gauti i拧 ISIC duomen懦 rinkinio, apiman膷io 2357 寞vairi懦 odos darini懦 nuotraukas. Modeliams kurti naudoti konvoliuciniai neuroniniai tinklai (CNN), taikant EfficientNetB0 architekt奴r膮, bei sprendim懦 med啪io metodai. Modeli懦 efektyvumas 寞vertintas pagal klasifikavimo tikslum膮 ir klaid懦 matricas. Eksperimento metu taip pat buvo naudota klas臈s svorio kompensacija bei taikytas 鈥瀍arly stopping鈥. Rezultatai parod臈, kad CNN modelis pasiek臈 auk拧tesn寞 tikslum膮 nei sprendim懦 medis, ypa膷 dirbant su sud臈tingesniais atvejais. Pabaigoje pateiktos rekomendacijos modeli懦 tobulinimui bei j懦 taikymo galimyb臈s praktikoje.
Darb膮 sudaro 拧ios dalys: 寞vadas, literat奴ros ap啪valga, teorin臈-metodin臈 dalis, praktin臈 dalis, i拧vados, literat奴ros s膮ra拧as, priedai.
Darbo apimtis: 34 p. be pried懦, 11 iliustracij懦, 2 lentel臈s, 20 拧altini懦.
Thesis abstract (EN)
The aim of this bachelor's thesis is to develop and compare convolutional neural networks and decision tree models for classifying images of skin diseases. The study analyzes data obtained from a dataset containing 2,357 images of various skin lesions. To build the models, convolutional neural networks (CNN) with the EfficientNetB0 architecture and decision tree methods were used. The performance of the models was evaluated based on classification accuracy and confusion matrices. During the experiment, class weight balancing and early stopping techniques were also applied. The results showed that the CNN model achieved higher accuracy than the decision tree, especially in more complex cases. Finally, recommendations for model improvement and practical application possibilities are presented.
The thesis consists of the following parts: introduction, literature review, theoretical-methodological part, practical part, conclusions, list of references, and appendices.
Thesis length: 34 pages without appendices, 11 figures, 2 tables, 20 sources.
Elena Strel膷i奴nait臈
— Dr Vilma Nekra拧ait臈-Lieg臈
Angl懦 kalba: The Impact of Nonresponse on Population Parameter Estimation in Asymmetric Data: A Comparison of Methods
The final bachelor鈥檚 thesis examines the impact of missing data on the estimation of population parameters in asymmetric data and compares different nonresponse evaluation methods. Based on the...
Angl懦 kalba: The Impact of Nonresponse on Population Parameter Estimation in Asymmetric Data: A Comparison of Methods
Student:
Elena Strel膷i奴nait臈
Supervisor:
Dr Vilma Nekra拧ait臈-Lieg臈
Department:
Department of Mathematical Statistics
Thesis abstract (LT)
Baigiamajame bakalauro darbe nagrin臈jama tr奴kstam懦 duomen懦 寞taka populiacijos parametr懦 vertinimui asimetriniuose duomenyse ir lyginami neatsakym懦 vertinimo metodai. Remiantis teorine dalimi ir taikant simuliacijas, nagrin臈ti skirtingi neatsakym懦 tipai 鈥 klausimo ir elemento. Elemento neatsakym懦 vertinimui generuoti skirtingi tr奴kstam懦 duomen懦 lygiai ir taikyti persv臈rimo metodai, tokie kaip visi拧kai atsitiktinio neatsakymo tikimyb臈s, lygi懦 neatsakymo tikimybi懦 grup臈se vertinimas bei atsitiktini懦 mi拧k懦 metodas. Klausimo neatsakymai vertinti turint vien膮 neatsakymo lyg寞, ta膷iau du skirtingus neatsakymo generavimo mechanizmus ir pritaikyti duomen懦 寞ra拧ymo metodai, tokie kaip 拧ilt懦j懦 duomen懦, artimiausi懦 kaimyn懦 ir tiesin臈s bei logistin臈s regresijos metodai. Geriausi metodai renkami atsi啪velgiant 寞 tikslumo matus, tokius kaip santykinis poslinkis, variacijos koeficientas ir santykin臈 vidutin臈 kvadratin臈 paklaida. Atlikus analiz臋, nustatyta, kad norint pasirinkti tinkam膮 neatsakym懦 vertinimo metod膮, reikia teisingai nustatyti neatsakym懦 atsiradimo prie啪ast寞.
Darb膮 sudaro 寞vadas, mokslini懦 darb懦 ap啪valga, teorin臈 鈥 metodin臈 bei praktin臈 dalys, i拧vados ir literat奴ros s膮ra拧as.
Darbo apimtis 鈥 42 p. teksto be pried懦, 15 iliustr., 11 lent., 12 bibliografini懦 拧altini懦. Atskirai pridedami darbo priedai.
Thesis abstract (EN)
The final bachelor鈥檚 thesis examines the impact of missing data on the estimation of population parameters in asymmetric data and compares different nonresponse evaluation methods. Based on the theoretical part and using simulations, different types of nonresponse are analyzed -鈥 item and unit. For unit nonresponse evaluation, various levels of missing data are generated and reweighting methods are applied, such as the probability of completely random nonresponse, evaluation of equal nonresponse probabilities within groups, and the random forest method. Item nonresponse is assessed at a single level of nonresponse, but using two different nonresponse generation mechanisms, and imputation methods such as hot deck, k-nearest neighbors, and linear and logistic regression are applied. The best-performing methods are selected based on accuracy metrics such as relative bias, coefficient of variation, and relative root mean square error. The analysis shows that in order to select an appropriate method for nonresponse evaluation, it is crucial to correctly identify the cause of nonresponse.
The thesis consists of an introduction, a review of scientific literature, theoretical鈥搈ethodological and practical parts, conclusions, and a list of references.
The volume of the thesis is 42 pages of text excluding appendices, 15 figures, 11 tables, and 12 bibliographic sources. The appendices are provided separately.