THESIS
2023
1 online resource (xi, 43 pages) : illustrations (some color)
Abstract
Machine learning has become a priority for organizations as they move towards data
monetization strategies involving predictive analytics applied across the enterprise. Despite the
potential for immense value, many aspirational firms’ predictive capabilities are impeded by
data management constraints including data quality issues, privacy regulations, organizational
silos to data integration, the need to monetize an array of structured and unstructured data types,
all while dealing with increasingly complex and dynamic environments. Domain generalization
is the branch of transfer-based machine learning designed to handle such situations where
training data is unavailable in the application domain, and where the domain patterns might
shift (known as training-serving skew). However, the d...[
Read more ]
Machine learning has become a priority for organizations as they move towards data
monetization strategies involving predictive analytics applied across the enterprise. Despite the
potential for immense value, many aspirational firms’ predictive capabilities are impeded by
data management constraints including data quality issues, privacy regulations, organizational
silos to data integration, the need to monetize an array of structured and unstructured data types,
all while dealing with increasingly complex and dynamic environments. Domain generalization
is the branch of transfer-based machine learning designed to handle such situations where
training data is unavailable in the application domain, and where the domain patterns might
shift (known as training-serving skew). However, the domain generalization research space has
segmented into learning-strategies and data augmentation approaches, the common thread
being that neither branch is well-suited for prediction in many real-world enterprise settings.
We propose a robust domain generalization framework capable of addressing training-serving-skew
in unseen, noisy, structured and unstructured data environments routinely encountered in
enterprise analytics environments, thereby allowing predictive models to be effective and
valuable in settings constrained by data management capabilities. We evaluate our framework
in three important contexts encompassing different data modalities: customer analytics
(structured tabular), patient outcome forecasting (time series), and user modeling (text). When
applied to unseen domains, our framework outperforms state-of-the-art methods, with
performance gains often becoming even more pronounced on noisier data. Interestingly, the
framework performs better than models trained within those domains. Importantly, cost-benefit
analyses reveal that our prediction error costs are generally only 5-10% of those attained by the
best comparison techniques, underscoring the downstream economic value of more
generalizable predictive analytics in real-world enterprise settings. Our work has implications
for IS research on the design and management of AI in organizations, and practical implications
for managers looking to monetize predictive analytics in enterprise environments.
Keywords: Predictive Analytics, Domain Generalization, Machine Learning, Data
Augmentation
Post a Comment