import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn import metrics
from sklearn.linear_model import LogisticRegression
step1:数据理解和探索
from sklearn.linear_model import LogisticRegression
from sklearn.svm import LinearSVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
Mr 757
Miss 260
Mrs 198
Master 61
Dr 8
Rev 8
Col 4
Ms 2
Major 2
Mlle 2
Don 1
Capt 1
Dona 1
Lady 1
Mme 1
Sir 1
Jonkheer 1
Name: pre_name, dtype: int64
通过仔细的对比数据,Master这里对应的年龄基本在12岁以下,所以我们前面用所有人的均值来替代缺失值是有问题的