THESIS
2022
1 online resource (xiii, 50 pages) : illustrations (some color)
Abstract
Tremendous data is created by global financial exchanges day by day, and such time-series
data needs to be analyzed in real-time for maximum value. Besides, with the continuous
progress of machine learning technology in recent years, more and more machine
learning models are being applied to financial data. Such scenarios require new computing
frameworks, while traditional frameworks such as pandas and TA-Lib have shown
performance and adaptation problems for financial data.
In this paper, we proposed HXPY, a high-performance data processing package with
a c++/python interface for time-series data. Miscellaneous acceleration techniques such
as streaming algorithm, SIMD instruction set, and memory optimization were used, and
various functions for time series data such as time window func...[
Read more ]
Tremendous data is created by global financial exchanges day by day, and such time-series
data needs to be analyzed in real-time for maximum value. Besides, with the continuous
progress of machine learning technology in recent years, more and more machine
learning models are being applied to financial data. Such scenarios require new computing
frameworks, while traditional frameworks such as pandas and TA-Lib have shown
performance and adaptation problems for financial data.
In this paper, we proposed HXPY, a high-performance data processing package with
a c++/python interface for time-series data. Miscellaneous acceleration techniques such
as streaming algorithm, SIMD instruction set, and memory optimization were used, and
various functions for time series data such as time window function, group operation,
down-sampling operation, cross-section operation, row-wise or column-wise operation,
shape transformation, and alignment were also implemented.
Although HXPY is still at a relatively preliminary stage, the results of benchmark and
incremental analysis have shown that the performance of HXPY is better when compared
with its counterparts. From MiBs to GiBs data, our performance significantly outperforms
other in-memory computing rivals.
Post a Comment