New insight into machine-learning error estimation
Date:
March 9, 2022
Source:
Texas A&M University
Summary:
Scientists are evaluating machine-learning models using transfer
learning principles.
FULL STORY ==========================================================================
Omar Maddouri, a doctoral student in the Department of Electrical
and Computer Engineering at Texas A&M University, is working with
Dr. Byung-Jun Yoon, professor, and Dr. Edward Dougherty, Robert M. Kennedy
'26 Chair Professor, to evaluate machine-learning models using transfer learning principles. Dr.
Francis "Frank" Alexander with Brookhaven National Labs and Dr. Xiaoning
Qian from the Department of Electrical and Computer Engineering at Texas
A&M University are also involved with the project.
==========================================================================
In data-driven machine learning, models are built to make predictions
and estimations for what's to come in any given data set. One important
field within machine learning is classification, which allows a data
set to be assessed by an algorithm and then classified or broken down
into classes or categories. When the data sets provided are very small,
it can be very challenging to not only build a classification model
based on this data but also to evaluate the performance of this model,
ensuring its accuracy. This is where transfer learning comes into play.
"In transfer learning, we try to transfer knowledge or bring data from
another domain to see whether we can enhance the task that we are doing
in the domain of interest, or target domain," Maddouri explained.
The target domain is where the models are built, and their performance is evaluated. The source domain is a separate domain that is still relevant
to the target domain from which knowledge is transferred to make the
analysis within the target domain easier.
Maddouri's project utilizes a joint prior density to model the relatedness between the source and target domains and offers a Bayesian approach
to apply the transfer learning principles to provide an overall error
estimator of the models. An error estimator will deliver an estimate of
how accurate these machine-learning models are at classifying the data
sets at hand.
What this means is that before any data is observed, the team creates
a model using their initial inferences about the model parameters in
the target and source domains and then updates this model with enhanced accuracy as more evidence or information about the data sets becomes
available.
This technique of transfer learning has been used to build models in
previous works; however, no one has ever before used this transfer
learning technique to propose novel error estimators to evaluate the performance of these models. For an efficient utilization, the devised estimator has been implemented using advanced statistical methods
that enabled a fast screening of source data sets which enhances the computational complexity of the transfer learning process by 10 to
20 times.
This technique can help serve as a benchmark for future research within academia to build upon. In addition, it can help with identifying
or classifying different medical issues that would otherwise be very
difficult.
For example, Maddouri utilized this technique to classify patients
with schizophrenia using transcriptomic data from brain tissue samples originally acquired by invasive brain biopsies. Because of the nature and
the location of the brain region that can be analyzed for this disorder,
the data collected is very limited. However, using a stringent feature selection procedure that comprises differential gene expression analysis
and statistical testing for assumptions validity, the research team
identified transcriptomic profiles of three genes from an additional
brain region found to be highly relevant to the desired brain tissue as reported by independent research studies from other literature.
This knowledge allowed them to utilize the transfer learning technique to leverage samples collected from the second brain region (source domain) to
help with the analysis and significantly boost the accuracy of diagnosis
within the original brain region (target domain). The data gathered from
the source domain can be exploratory in the absence of information from
the target domain, allowing the research team to enhance the quality of
their conclusion.
This research has been funded by the Department of Energy and the National Science Foundation.
========================================================================== Story Source: Materials provided by Texas_A&M_University. Original
written by Rachel Rose.
Note: Content may be edited for style and length.
========================================================================== Journal Reference:
1. Omar Maddouri, Xiaoning Qian, Francis J. Alexander, Edward
R. Dougherty,
Byung-Jun Yoon. Robust importance sampling for error estimation
in the context of optimal Bayesian transfer learning. Patterns,
2022; 100428 DOI: 10.1016/j.patter.2021.100428 ==========================================================================
Link to news story:
https://www.sciencedaily.com/releases/2022/03/220309154831.htm
--- up 1 week, 2 days, 10 hours, 51 minutes
* Origin: -=> Castle Rock BBS <=- Now Husky HPT Powered! (1:317/3)