Utilizing LARS for Regression Analysis on SNP Data and Wolbachia Infection Preprocessing
In this project, we perform regression analysis using the LARS (Least Angle Regression) algorithm on SNP data from 163 subjects, each containing 5,222,888 SNPs. We preprocess the data, normalizing both the SNP sequences and Wolbachia infection statuses. Our normalization involves converting values (0, 0.5, 1, N) into a standardized format followed by the use of a multi-threading algorithm for efficiency. The findings reveal that SNP importance is not significantly influenced by the number of zeros present, indicating the reference sequence may be unreliable.
Utilizing LARS for Regression Analysis on SNP Data and Wolbachia Infection Preprocessing
E N D
Presentation Transcript
Regression Analysis • DataSet • Data Preprocess • Normalize • LARS
DataSet • X the SNP sequence of 163 subjects each sequence has 5222888 SNPs • Y the Wolbachia infected tables
Preprocess of X • As the email said, get an data array of 0,1,0.5 and N • Set the values:0->0; 0.5->1; 1->2; N->1; • Get the file new X(DataSet) on http://gdm.fudan.edu.cn/attach/Lasso_on_GPU/x.rar
Preprocess of Y • Choose the sheet of Wolbachia status • Set Values: y->1 n->0 (as they will be normalized, so we get the same results when y->2 n->0) • Get y here: • http://gdm.fudan.edu.cn/attach/lasso_on_GU/y.txt
Normalize X and Y • Use multithread algorithm(2048 threads) to get normalized X (bigger than 8G) • Normalized Y • Normalized X and Y are packaged here: http://gdm.fudan.edu.cn/attach/Lasso_on_GPU/normalize.rar
LARS • Use LARS for 163 iterations • Get the result as each line contains: The max angle between the remaining error and 5222888 vectors In which SNP we get the max angle in some iteration. Here is the result: http://gdm.fudan.edu.cn/attach/Lasso_on_GPU/result.txt
Findings • Are SNP's importance concerned with how many 0s it contains? • As the result file:http://gdm.fudan.edu.cn/attach/Lasso_on_GPU/rstAnd0s.txt • Showes: NO! • Means The Reference Sequence is not reliable.