Multinomial Logistic Regression Worksheet

A teacher wants to predict a student's letter grade (A, B, or C) on a test based on two features: hours studied ($x_1$) and number of absences ($x_2$). The classes are: A (class 1), B (class 2), C (class 3).

Training data:

Student	Hours ($x_1$)	Absences ($x_2$)	Grade
Alice	4	1	A
Bob	2	1	B
Carol	1	2	C
Dan	3	0	A

A multinomial logistic regression model has the following (partially trained) weight matrix $W$, where each column corresponds to a class and each row corresponds to the bias, $x_1$, and $x_2$:

	A (class 1)	B (class 2)	C (class 3)
bias	$-1$	$2$	$1$
$x_1$	$1$	$0$	$-1$
$x_2$	$0$	$-2$	$1$

Recall: Use the approximation $e \approx 3$ (and recall that $e^0 = 1$).

Key formulas:

Softmax: $\displaystyle\hat{y}_k = \frac{\exp(\mathbf{w}_k \cdot \mathbf{x})}{\sum_{j} \exp(\mathbf{w}_j \cdot \mathbf{x})}$
Cross-entropy loss: $L(\hat{\mathbf{y}}, \mathbf{y}) = -\displaystyle\sum_k y_k \log(\hat{y}_k) = -\log(\hat{y}_c)$ where $c$ is the correct class

Part (a): One-Hot Encoding

Write the one-hot encoded target vector $\mathbf{y}$ for each training example.

Student	One-hot vector $\mathbf{y}$
Alice (A)
Bob (B)
Carol (C)
Dan (A)

Part (b): Feature Vector

Write the feature vector $\mathbf{x}$ for Bob, including the bias term. (Recall that the bias term is a 1 prepended to the feature values.)

Part (c): Compute Dot Products (Logits)

Using Bob's feature vector from part (b) and the weight matrix $W$, compute $z_k = \mathbf{w}_k \cdot \mathbf{x}$ for each class $k$ = A, B, C. Each $\mathbf{w}_k$ is a column of $W$.

$z_A = \mathbf{w}_1 \cdot \mathbf{x}$ =

$z_B = \mathbf{w}_2 \cdot \mathbf{x}$ =

$z_C = \mathbf{w}_3 \cdot \mathbf{x}$ =

Part (d): Apply Softmax

Apply the softmax function to $\mathbf{z} = [z_A, z_B, z_C]$ to obtain the predicted probability vector $\hat{\mathbf{y}}$. Use the approximation $e \approx 3$.

Part (e): Prediction

Based on your answer to part (d), what class does the model predict for Bob? Is this prediction correct?