HW2

Submission requirements:

Please submit your solutions to our class website.

Part I: written assignment

a) Compute the Information Gain for Gender, Car Type and Shirt Size. (15 points)

b) Construct a decision tree with Information Gain. (10 points)

Answer A

整体熵

C0	C1
10	20

根据公式

$E(s)=-\sum_i^nplog_2(p)$

有：

$E(s)=-\frac{10}{20}log_2(\frac{10}{20})-\frac{10}{20}log_2(\frac{10}{20}) \\ =1$

下面我们计算各个属性列的信息增益

Gender属性列：

	C0	C1
F	6	4
M	6	4

$E(F)=-(\frac{6}{10})log_2(\frac{6}{10})-(\frac{4}{10})log_2(\frac{4}{10}) \\=0.97095$

$E(M)=-(\frac{6}{10})log_2(\frac{6}{10})-(\frac{4}{10})log_2(\frac{4}{10}) \\=0.97095$

根据信息增益公式:

$g(D,A)=E(D)-E(D|A)$

可得到：

$Gain(Gender)=E(s)-\frac{10}{20}*E(F)-\frac{10}{20}*E(M) \\ =0.029$

同理，CarType属性列

	C0	C1
F	1	3
L	1	7
S	8	0

$E(F)=-(\frac{1}{4})log_2(\frac{1}{4})-(\frac{3}{4})log_2(\frac{3}{4})\\=0.81127$

$E(L)=-(\frac{1}{8})log_2(\frac{1}{8})-(\frac{7}{8})log_2(\frac{7}{8})\\=0.543564$

$E(S)=-(1)log_2(1)=0$

最终结果为

$Gain(CarType)=E(s)-p(F)E(F)-p(L)E(L)-p(S)E(S)\\=1-\frac{4}{20}*E(F)-\frac{8}{20}*E(L)-\frac{8}{20}*E(S)\\=0.6205$

对于Shirt Size属性列

	C0	C1
E	2	2
L	2	2
M	3	4
S	3	2

$E(E)=-(\frac{2}{4})log_2(\frac{2}{4})-(\frac{2}{4})log_2(\frac{2}{4})\\=1$

$E(L)=-(\frac{2}{4})log_2(\frac{2}{4})-(\frac{2}{4})log_2(\frac{2}{4})\\=1$

$E(M)=-(\frac{3}{7})log_2(\frac{3}{7})-(\frac{4}{7})log_2(\frac{4}{7})\\=0.985228$

$E(S)=-(\frac{3}{5})log_2(\frac{3}{5})-(\frac{2}{5})log_2(\frac{2}{5})\\=0.97095$

最终结果为：

$Gain(CarType)=E(s)-p(E)E(E)-p(L)E(L)-p(S)E(S)-p(M)E(M)\\=1-\frac{4}{20}*E(E)-\frac{4}{20}*E(L)-\frac{5}{20}*E(S)-\frac{7}{20}E(M）\\=0.012432$

	Val
Gain(Gender)	0.029
Gain(CarType)	0.621
Gain(ShirtSize)	0.012

Answer B

通过计算结果可知，CarType属性对Class属性的信息增益率最大，因而决策树的第一个节点可以选择CarType。

而此时CarType有三个分支，分别是

	Entropy
Family	0.81
Luxury	0.54
Sports	0

我们通过这三个分支划分数据，值得注意的是，Sports分支没必要再分了

所以第一个分裂节点为：

我们进行第二次迭代，用family划分的数据集为：

用Luxury划分的数据集为：

迭代计算信息增益，family的信息增益为：

$E(f)=0.81127$

用Gender划分后的信息增益：

$E(G)=-\frac{1}{4}log_2{\frac{1}{4}}-\frac{3}{4}log_2{\frac{3}{4}}\\ =0.811278$

用ShirtSize划分后:

$E(S)=\frac{1}{4}[0+0+0+0]=0$

	Gain
Gender	0
ShirtSize	0.81127

我们选择ShirtSize进一步划分。此时，Gender已经不能提供任何的信息了，可以直接完成叶子结点的工作。

再来看luxury划分的数据集，

	C0	C1
F	1	6
M	0	1

$G(Gender)=E(luxury)-(0-\frac{7}{8}(\frac{1}{7}log_2{\frac{1}{7}}-\frac{6}{7}log_2\frac{6}{7}))\\ =0.022$

	C0	C1
E	0	1
L	1	1
M	0	3
S	0	2

$G(ShirtSize)=0.515$

所以我们用ShirtSize进行划分。

再往后的节点为：

此时，该节点已经没办法带来任何的信息收益了，我们可以考虑将其进行剪枝，那最终的一个决策树结果为

Q2.

a) Design a multilayer feed-forward neural network (one hidden layer) for the data set in Q1. Label the nodes in the input and output layers. (10 points)

(b) Using the neural network obtained above, show the weight values after one iteration of the back propagation algorithm, given the training instance “(M, Family, Small)". Indicate your initial weight values and biases and the learning rate used. (10 points)

a.前馈神经网络如下

b.设定一个权重初始值

隐层

w11	w21	w31	w41	w51	w61	w71	w81	w91
0.1	0.2	0.3	-0.4	-0.1	-0.2	-0.3	0.4	0.5
w12	w22	w32	w42	w52	w62	w72	w82	w92
0.2	0.4	0.1	-0.2	-0.4	0.3	0.2	0.4	-0.2

输出层

w1c	w2c
0.7	0.5

初始值为

1,0,1,0,0,1,0,0,0

学习率

lr=0.01

偏置

$\theta_{h1}$	-0.1
$\theta_{h2}$	0.2
$\theta_{c}$	0.1

对于每个节点，净输入值表示为：

$I=\theta+\sum_{i=1}^nw_iv_i$

其中， $\theta$ 表示偏置量， $w_i$ 表示分支权重， $v_i$ 表示节点值。

输出值表示为：

$O=\frac{1}{1+e^{-I}}$

输出结果

单元	净输入	输出
H1	$-0.1+0.11+0.31-0.2*1=0.1$	$\frac{1}{1+e^{-0.1}}=0.525$
H2	$0.2+0.21+0.11+0.3*1=0.8$	$\frac{1}{1+e^{-0.8}}=0.690$
C	$0.1+0.70.55+0.50.65=0.81$	$\frac{1}{1+e^{-0.81}}=0.692$

输出层的误差值计算为

$Err=O*(1-O)*(T-O)$

其中， $O$ 表示节点输出， $T$ 表示真实值

隐层节点的误差计算为：

$Err=O*(1-O)*\sum_{i=1}^nErr_i*w_i$

其中， $Err$ 表示从高层传过来的误差。

计算每个节点的误差

单元	误差
C	$0.692(1-0.692)(1-0.692)=0.066$
H1	$0.525(1-0.525)0.066*0.7=0.012$
H2	$0.69(1-0.69)0.066*0.5=0.007$

偏置量的更新方程表示为:

$\theta_{new}=\theta_{old}+lr*(Err)$

权重的更新方程表示为：

$w_{new}=w_{old}+lr*Err*O$

更新权重和偏置

这里只给出了链式求导用到的节点和权重

$w_{1c}$	$0.7+0.01(0.0660.525)=0.7003465$
$w_{2c}$	$0.5+0.01(0.0660.69)=0.5004554$
$w_{11}$	$0.1+0.01(0.012)1=0.10012$
$w_{12}$	$0.2+0.01(0.007)1=0.20007$
$w_{31}$	$0.3+0.01(0.012)1=0.30012$
$w_{32}$	$0.1+0.01(0.007)1=0.1007$
$w_{61}$	$-0.2+0.01(0.012)1=-0.19988$
$w_{62}$	$0.3+0.01(0.007)1=0.3007$
$\theta_{c}$	$0.1+0.01*(0.066)=0.10066$
$\theta_{h1}$	$-0.1+0.01*(0.012)=-0.09988$
$\theta_{42}$	$0.2+0.01*(0.007)=0.20007$

Q3

a) Suppose the fraction of undergraduate students who smoke is 15% and the fraction of graduate students who smoke is 23%. If one-ﬁfth of the college students are graduate students and the rest are undergraduates, what is the probability that a student who smokes is a graduate student? (5 points)

b) Given the information in part (a), is a randomly chosen college student more likely to be a graduate or undergraduate student? (5 points)

c) Suppose 30% of the graduate students live in a dorm but only 10% of the undergraduate students live in a dorm. If a student smokes and lives in the dorm, is he or she more likely to be a graduate or undergraduate student? You can assume independence between students who live in a dorm and those who smoke. (5 points)

Part II: Lab

Question 1

Assume this supermarket would like to promote milk. Use the data in “transactions” as training data to build a decision tree (C5.0 algorithm) model to predict whether the customer would buy milk or not.

Build a decision tree using data set “transactions” that predicts milk as a function of the other fields. Set the “type” of each field to “Flag”, set the “direction” of “milk” as “out”, set the “type” of COD as “Typeless”, select “Expert” and set the “pruning severity” to 65, and set the “minimum records per child branch” to be 95. Hand-in: A figure showing your tree. (5 points)
Use the model (the full tree generated by Clementine in step 1 above) to make a prediction for each of the 20 customers in the “rollout” data to determine whether the customer would buy milk. **Hand-in:**your prediction for each of the 20 customers. (5 points)

3.Hand-in: rules for positive (yes) prediction of milk purchase identified from the decision tree (up to the fifth level. The root is considered as level 1). (5 points)

1.决策树输出如下

2.具体数据在附件table1.csv

pasta	water	biscuits	coffee	brioches	yoghurt	frozen vegetables	tunny	beer	tomato souce	coke	rice	juices	crackers	oil	frozen fish	ice cream	mozzarella	$C-milk	$CC-milk
1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0.633
0	0	1	0	1	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0.56
1	1	1	0	1	1	1	1	1	0	1	0	0	0	0	0	1	0	1	0.613
1	1	1	0	0	0	1	0	0	0	0	0	0	1	0	0	0	0	1	0.566
1	1	0	0	1	0	0	1	0	0	0	0	0	1	0	0	1	0	0	0.633
0	0	0	0	1	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0.673
0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0.626
0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0.65
1	0	1	0	1	1	0	1	0	0	1	0	0	0	0	0	1	0	1	0.613
1	0	0	1	0	1	0	0	0	0	0	1	1	0	1	0	0	0	1	0.521
0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0.568
1	0	0	0	0	1	1	0	0	0	0	0	0	0	0	0	0	0	1	0.548
1	1	0	1	0	1	0	0	0	0	0	0	0	0	0	0	0	0	1	0.53
1	1	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	1	0.629
1	0	0	0	0	1	0	0	0	0	0	1	0	0	0	0	0	0	1	0.53
1	1	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0.633
0	1	0	0	0	1	0	0	0	0	1	1	0	0	0	0	0	0	1	0.55
1	0	0	0	0	0	0	1	0	0	0	1	0	0	0	0	0	1	0	0.633
1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0.633
0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0.62

3.输出规则集如下：

Q2

1.决策树a的混淆矩阵(56)：

决策树b的混淆矩阵(15):

决策树c的混淆矩阵(10):

2.我将使用决策树b作为最后预测数据的模型。该决策树在精度上高于a树和b树，具有更好的性能。

3.表格内容在table2.csv附件中

age	sex	region	income	married	children	car	save_act	current_act	mortgage	RECOMMEND PEP (Y/N)
22	0	1	14000	0	3	0	1	1	0	0
34	1	0	33000	0	0	1	1	0	0	1
47	0	0	16700	1	1	0	1	1	0	0
54	1	1	43400	1	1	1	1	1	0	1
65	1	2	60000	1	1	0	1	1	0	1
37	0	0	27700	0	1	1	0	0	0	1
44	0	0	38784	1	0	0	1	1	0	0
20	1	0	10200	1	0	0	1	1	1	0
46	0	0	22000	1	1	1	1	0	1	1
40	1	1	37400	1	2	0	1	1	0	1