Vinn's Studio

Lingfeng Zhu

EDUCATION


University of Wisconsin Madison

Master's Degree; Data Science at the School of Computer, Data & Information Sciences.

  • Honors/Awards: Exchange & Visiting International Student Academic Excellence Award (2019), etc.
  • Relevant Coursework: Deep Learning, Machine Learning, Data Science Computing Project, Data Science Practicum, Statistical Learning, Regression Analysis, Statistical methods, R programming, etc.

Wuhan University

Bachelor's Degree; Statistics at the School of Mathematics and Statistics.

  • Honors/Awards: Outstanding Student Scholarship (2016), Studying Abroad Special Scholarship (2018), Freshman Scholarship (2015), Mathematical Contest In Modeling Honorable Mention (2017), Excellent Volunteer (2015), etc.
  • Relevant Coursework: Mathematical Analysis, C Programming Language, Numerical Analysis, Probability Theory, Mathematical Modelling, Mathematical Statistics, Real Analysis, Data Structure and Algorithm, Database Programming Technology, etc.

PROJECT


Yelp Review Sentiment Analysis

Research Leader supervised by Prof. Hyunseung Kang

  • Implemented statistical analysis, sentiment analysis and visualized the results based on 192,609 Yelp reviews using high throughput computing system (over 1,000 lines of code)
  • Used TF-IDF to extract 100 keywords related to the review ratings
  • Applied word embedding to the reviews using pre-trained GloVe model (400001 words vocabulary to 50-dimensional vector)
  • Constructed RNN model with LSTM units, adam optimizer and dropout layers predicting the customer’s rating of Mexican business based on his/her review
  • Built Web-based Shiny App about the analysis

Machine Learning Diagnosis of Musculoskeletal Diseases

Major Researcher for Graduation Project

  • Analyzed 40,895 human forelimb musculoskeletal X-ray images
  • Applied data augmentation, normalization, gray processing, edge detection (Sobel operator, Canny operator) and histogram equalization using OpenCV
  • Trained machine learning models like KNN, SVM, RandomForest, GBDT and Stacking on 36808 samples using Scikit-learn framework, tested model performances on 3197 test cases
  • Applied Cross-Validation and Random Search to perform hyper-parameter tuning (increased test accuracy by 10%), sped up training (more than 50%) by reducing the dimensionality using PCA
  • Evaluated the models using metrics like confusion matrix, ROC curve, AUC, etc.

Face-Painting Machine

Research Leader supervised by Prof. Sebastian Raschka

  • Trained CNN models (VGG, ResNet, Inception, etc) based on LFW dataset using Pytorch framework
  • Applied face recognition based on FaceNet (triplet loss) methods
  • Applied Style Transfer to generate portraits of a specific person based on famous paintings
  • The project report was highly praised by the professor and was displayed on his personal homepage

Emoji Auto-Generator

Research Leader

  • Applied word embedding using pre-trained GloVe model
  • Trained RNN model with LSTM units to generate the corresponding emoji automatically based on the given text (sentence with about 10 words)
  • Adjusted the model using Adam optimizer and dropout layers

WHU 3D GIS Research Group

Researcher supervised by Prof. Jianya Gong

  • Analyzed Qiantang River Tide Data using deep learning methods and visualized the results
  • Implemented NS-fluid Realtime Tide Simulation Program

RNA-Disease Data Analysis Research Group

Researcher supervised by Prof. Xing Chen

  • Constructed PWR-based algorithm predicting the relationship between human complex disease and LncRNA
  • Presented an improved algorithm based on WBSMDA and LapRLS methods to predict the disease-miRNA relationship

COMPETITION


iFLYTEK A.I. Developer Competition 多模态情感分析与识别挑战赛

挑战赛数据库包括 29 名受试者在平和、开心、愤怒、伤心四种情绪干扰下的语音、脑电、心电三种的生、心理和行为数据,要求本赛道队伍通过心电数据(ECG)进行情绪识别

  • 使用小波阈值去噪法对心电图信号进行去噪处理,并使用 Pan-Tompkins 算法和最大值搜索算法完成 P-QRS-T 波检测
  • 基于 P-QRS-T 波定位结果提取心电信号的时域特征153个,频域特征21个
  • 对特征提取结果进行标准化处理,使用随机搜索(Random Search)完成超参数调优,并训练 XGBoost 模型,模型最终的情感识别准确率为赛道第2名(共有456支队伍参赛)

Tencent Advertising Algorithm Competition 2020

Contestants need to explore a large amount of anonymized data from real business, and apply a variety of machine learning technologies comprehensively to make accurate estimates: using responsive behavior of users in the system to predict their demographic features (age and gender).

  • Used Word2Vec method to perform embedding/dimension reduction on anonymized IDs (creative_id, ad_id and advertiser_id), these features were reduced to 128-dim
  • Constructed BiLSTM models to predict user's age and gender based on his/she responsive behavior, trained the models using 5-fold cross validation
  • The best test score reached 1.42, the best rank reached 84/900+
# EXPERIENCE

Zheshang Securities Research Institute

Industry Research , Machine Group

  • Analyzed the financial data of related industry companies, tabulated the results and completed research reports
  • Generated Industry Daily News automatically using a python web crawler (reduced the time spent in generating Daily News by 2 hours)

DALI Technology

Data Analysis , Directional Business Department

  • Analyzed the accuracy of the positioning instrument and established a generalized linear regression model (Poisson regression) to obtain the optimum running time of each type of instrument
  • Recorded test accuracy of the instrument and summarized the test data
  • Compiled internal technical documents of the company

SKILL


  • Programming language: Python, R, C/C++, Java, shell/scripting, HPC, HTC
  • Framework: Pytorch, scikit-learn, Tensorflow, paddlepaddle
  • Database: MySQL