THESIS
2023
1 online resource (xiv, 97 pages) : color illustrations
Abstract
In this thesis we concentrate our efforts on learning graph structures from datasets
via a statistical learning perspective. More precisely, we model the precision matrix of a
multivariate random variable as a Laplacian matrix associated to a graph whose node features
are observable. We are particularly motivated by the scenario where the nodes of a graph
represent financial instruments such as stocks or (digital) currencies, and the node features
represent observable quantities about those instruments such as their price changes over time.
It ought to be said, however, that the methodologies developed in the course of this thesis are
applicable in settings that go beyond financial time series data.
Motivated by extreme events or outliers in financial datasets, we consider the problem o...[
Read more ]
In this thesis we concentrate our efforts on learning graph structures from datasets
via a statistical learning perspective. More precisely, we model the precision matrix of a
multivariate random variable as a Laplacian matrix associated to a graph whose node features
are observable. We are particularly motivated by the scenario where the nodes of a graph
represent financial instruments such as stocks or (digital) currencies, and the node features
represent observable quantities about those instruments such as their price changes over time.
It ought to be said, however, that the methodologies developed in the course of this thesis are
applicable in settings that go beyond financial time series data.
Motivated by extreme events or outliers in financial datasets, we consider the problem of
learning a graph under heavy-tail assumptions. Heavy-tailed statistical distributions have
long been regarded as a more realistic statistical model for the data generating process in
financial markets in comparison to their Gaussian counterpart. Nonetheless, mathematical
nuisances, including nonconvexities, involved in estimating graphs in heavy-tailed settings
pose a significant challenge to the practical design of algorithms for graph learning. For
this challenge, we present graph learning estimators based on the Markov random field
framework that assume a Student-t data generating process. We design scalable numerical
algorithms, via the alternating direction method of multipliers, to learn both connected and k-component graphs along with their theoretical convergence guarantees. The proposed methods
outperform state-of-the-art benchmarks in an extensive series of practical experiments with
publicly available data from US stock markets, foreign exchanges, and cryptocurrencies.
A recurrent task in stock markets is that of grouping stocks into sectors. In the industry,
companies such as Standard & Poor’s (also known as S&P) periodically release classification
systems that provide investors with a mapping that informs the sector that a particular stock
belongs to. In practice, stocks may belong to or impact several sectors. Such information
would be helpful for investors who would like to diversify their investments across different
industries. Motivated by this challenge, we investigate the problem of learning an undirected,
weighted bipartite graph under the Gaussian Markov random field model, for which we present
an optimization formulation along with an efficient algorithm based on the projected gradient
descent. Motivated by practical applications, where outliers or heavy-tailed events are present,
we extend the proposed learning scheme to the case in which the data follow a multivariate
Student-t distribution. As a result, the optimization program is no longer convex, but a
verifiably convergent iterative algorithm is proposed based on the majorization-minimization
framework. Finally, we propose an efficient and provably convergent algorithm for learning k-component bipartite graphs that leverages rank constraints of the underlying graph Laplacian
matrix. The proposed estimators outperform state-of-the-art methods for bipartite graph
learning, as evidenced by real-world experiments using financial time series data.
Post a Comment