Datawhale AI春训营 AI4S蛋白质赛道学习笔记
Datawhale AI春训营
AI4S蛋白质赛道学习笔记
流程
报名赛事在http://competition.sais.com.cn/competitionDetail/532313/format?spm=CHANNEL-0001
进入之后注册,然后填写个人手机号,通过支付宝进行实名认证等即可报名参与赛道
要报名赛道之后才可以下载相关的数据集和baseline代码等官方数据
- 报名赛事
- 下载docker、安装docker,打开docker
- 使用免费云算力训练模型、运行模型训练的baseline
git lfs install
git clone https://www.modelscope.cn/datasets/Datawhale/sais_third_synthetic_baseline.git - 开通阿里云镜像服务,创建镜像仓库 ,命名为 sais_synthetic
- 下载训练模型等五个文件
model.pkl,ml_baseline.py,Dockerfile,requirements.txt,run.sh - 在本地进行docker打包并推送
docker login --username=xx xxxx
大约3分钟
docker build -t sais_synthetic:v1 .
大约耗时5分钟
docker tag sais_synthetic:v1 xxxxxx/sais_medicine:v1
docker tag sais_synthetic:v1 crpi-yimn1cg16ys23bar.cn-hangzhou.personal.cr.aliyuncs.com/sais_synthetic_wumao/sais_synthetic:v1
docker push xxxxx/sais_synthetic:v1
docker push crpi-yimn1cg16ys23bar.cn-hangzhou.personal.cr.aliyuncs.com/sais_synthetic_wumao/sais_synthetic:v1 - 然后提交镜像、获得分数
model.pkl生成
model.pkl是在jupter notebook中的,打开克隆的datawhale的baseline仓库
里面有ml_baseline.ipynb,运行这个notebook就可以生成model.pkl了
!pip install gensim
import pickle
import gensim
import gensim.modelsimport os
import sys
import random
import numpy as np
import pandas as pd
from joblib import load, dumpfrom sklearn.linear_model import LogisticRegression, SGDClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import cross_val_predict
from sklearn.metrics import classification_reportdatas = pickle.load(open("WSAA_data_public.pkl", "rb"))random_seed = random.randint(0, 10000)
model_w2v = gensim.models.Word2Vec(sentences=[' '.join(x["sequence"]) for x in datas],vector_size=random.choice([10, 20, 40, 50, 100]),min_count=1,seed=random_seed
)data_x = []
data_y = []
for data in datas:sequence = list(data["sequence"])for idx, (_, y) in enumerate(zip(sequence, data['label'])):data_x.append(model_w2v.wv[sequence[max(0, idx-2): min(len(sequence), idx+2)]].mean(0))data_y.append(y)
model = GaussianNB()
pred = cross_val_predict(model, data_x, data_y
)
print(classification_report(data_y, pred))model = GaussianNB()
model.fit(data_x, data_y)
dump((model, model_w2v), "model.pkl")
然后生成的这个model.pkg和配合的Dockerfile,脚本等按照要求推送到ali云的镜像平台,然后就可以提交了
安装docker后构建:
docker build -t sais_synthetic:v2 .
docker images
推送:
docker tag sais_synthetic:v2 xxxxxx/sais_medicine:v1 docker push xxxxx/sais_synthetic:v1# 例如
docker tag sais_synthetic:v1 crpi-yimn1cg16ys23bar.cn-hangzhou.personal.cr.aliyuncs.com/sais_synthetic_wumao/sais_synthetic:v1
docker push crpi-yimn1cg16ys23bar.cn-hangzhou.personal.cr.aliyuncs.com/sais_synthetic_wumao/sais_synthetic:v1
上传之后,就可以到官网提交了,记得复制外网地址哦