# 隨機森林演算法入門(python)

### 目錄

• 1 什麼是隨機森林

• 1.1 整合學習

• 1.2 隨機決策樹

• 1.3 隨機森林

• 1.4 投票

• 2 為什麼要用它

• 3 使用方法

• 3.1 變數選擇

• 3.2 分類

• 3.3 迴歸

• 4 一個簡單的Python示例

• 結語

### 2 為什麼要用它

#### 2.1 一個對映的例子

``````import numpy as np
import pylab as pl
x = np.random.uniform(1, 100, 1000)
y = np.log(x)   np.random.normal(0, .3, 1000)
pl.scatter(x, y, s=1, label="log(x) with noise")
pl.plot(np.arange(1, 100), np.log(np.arange(1, 100)), c="b", label="log(x) true function")
pl.xlabel("x")
pl.ylabel("f(x) = log(x)")
pl.legend(loc="best")
pl.title("A Basic Log Function")
pl.show()``````

### 4 一個簡單的Python示例

``````from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
import pandas as pd
import numpy as np
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['is_train'] = np.random.uniform(0, 1, len(df)) <= .75
df['species'] = pd.Categorical.from_codes(iris.target, iris.target_names)
train, test = df[df['is_train']==True], df[df['is_train']==False]
features = df.columns[:4]
clf = RandomForestClassifier(n_jobs=2)y, _ = pd.factorize(train['species'])
clf.fit(train[features], y)
preds = iris.target_names[clf.predict(test[features])]
pd.crosstab(test['species'], preds, rownames=['actual'], colnames=['preds'])``````

predssertosaversicolorvirginica
actual
sertosa600
versicolor0161
virginica0012