Data Science Portfolio
Education
-
Ph.D, Mathematics |
City University of New York (2018-present) |
-
B.S., Mathematics |
University of Rochester (2013-2017) |
-
B.A., Financial Economics |
University of Rochester (2013-2017) |
Technical Skills:
- Python, SQL, R, scikit-learn, PyTorch, Torch_geometric, Pandas, MATLAB
Projects
GNN Predictive Modelling and Graph Data Mining
- Proposed a novel model integrating graph discovery data mining techniques and Graph Neural Networks(GNNs) to overcome limitations in graph structures for specific modeling tasks. By incorporating graph discovery models, the need for domain expertise in graph generation was also eliminated, resulting in significant time and effort savings compared to manual graph creation.
- It demonstrated a minimum of 15% improvement of the baseline Light Gradient Boosting Machine (LightGBM) and Neural Net models in a short-term intraday Realized Volatility (RV) prediction task by using 150 million records Limit Order Book (LOB) and trade data.
This is a network discovered using the Glasso algorithm from the Kaggle Optiver Realized Volatility Prediction dataset.
Graph Discovery Python Library
Recommender System for Advertising
- Developed a Graph Neural Networks (GNNs) based recommender system connecting businesses with social media influencers by learning various aspects of their social media behaviors such as text caption, image, and the social relationship among influencers and brands.
- Specifically, the model obtained the representations of businesses and social media influencers by Light Graph Convolutional Networks (LightGCNs).
- Collected and cleaned a dataset comprising 50GB pool of raw user profiles and posts collected from over 1427 businesses, and 16774 influencers.
- Extracted features from social media user behaviors by employing pretrained Natural Language Processing (NLP) and computer vision (CV) models.
This is a network consisting of selected brands and influencers from our dataset. This is an interactive version of this network.
Latent Stock Network Discovery Dashboard
Estimation and Optimization of Time Series Models
- Used a Simulated Minimum Distance (SMD) estimator based on the Wasserstein distance, Wasserstein Distance Estimator (WDE), between the model-simulated distribution and the empirical distribution.
- The method resolved common challenges posed by intractable analytical likelihood functions by implementing the Wasserstein distance estimator for complicated time series models, ensuring accurate estimations where Maximum Likelihood Estimator (MLE) fell short.
- The method outperformed existing estimation methods, both in terms of accuracy and computational time (2 - 100x faster). Models estimated include AR(1) model, ARMA(2, 2)-ARCH(2), random walks with a structural break and 10 years S&P 500 data.
Work Experience
Adjunct Professor @ Hunter College (August 2020 - Present)
- Instructor for courses including Statistics and Linear Algebra. Materials include experimental design, hypothesis test, dimension reduction, data science application, and Python demonstrations. An example.
Index Research Intern @ Nasdaq (June 2016 - August 2016)
- Provided support to internal cross-functional teams and external ETF fund managers through data collection, statistical analysis, and modeling, with a primary focus on summarizing index behaviors, including summary statistics and linear regression.
- Researched, developed, and presented the Canada Dividend Achiever Index report including index description, calculation, evaluation, maintenance, and rebalancing, tailored for sales initiatives and investors’ comprehensive analysis.
Asset Management Intern @ Huatai Insurance (June 2015 - August 2015)
- Conducted research reports on investment opportunities in Fund of Funds (FoFs) for investment of funds.
- Contributed to the development and deployment of high-quality accounting automation to expedite delivery and improve efficiency.