A teacher wants to predict a student's letter grade (A, B, or C) on a test based on two features: hours studied ($x_1$) and number of absences ($x_2$). The classes are: A (class 1), B (class 2), C (class 3).
Training data:
| Student | Hours ($x_1$) | Absences ($x_2$) | Grade |
|---|---|---|---|
| Alice | 4 | 1 | A |
| Bob | 2 | 1 | B |
| Carol | 1 | 2 | C |
| Dan | 3 | 0 | A |
A multinomial logistic regression model has the following (partially trained) weight matrix $W$, where each column corresponds to a class and each row corresponds to the bias, $x_1$, and $x_2$:
| A (class 1) | B (class 2) | C (class 3) | |
|---|---|---|---|
| bias | $-1$ | $2$ | $1$ |
| $x_1$ | $1$ | $0$ | $-1$ |
| $x_2$ | $0$ | $-2$ | $1$ |
Recall: Use the approximation $e \approx 3$ (and recall that $e^0 = 1$).
Key formulas:
Write the one-hot encoded target vector $\mathbf{y}$ for each training example.
| Student | One-hot vector $\mathbf{y}$ |
|---|---|
| Alice (A) | |
| Bob (B) | |
| Carol (C) | |
| Dan (A) |
Write the feature vector $\mathbf{x}$ for Bob, including the bias term. (Recall that the bias term is a 1 prepended to the feature values.)
Using Bob's feature vector from part (b) and the weight matrix $W$, compute $z_k = \mathbf{w}_k \cdot \mathbf{x}$ for each class $k$ = A, B, C. Each $\mathbf{w}_k$ is a column of $W$.
$z_A = \mathbf{w}_1 \cdot \mathbf{x}$ =
$z_B = \mathbf{w}_2 \cdot \mathbf{x}$ =
$z_C = \mathbf{w}_3 \cdot \mathbf{x}$ =
Apply the softmax function to $\mathbf{z} = [z_A, z_B, z_C]$ to obtain the predicted probability vector $\hat{\mathbf{y}}$. Use the approximation $e \approx 3$.
Based on your answer to part (d), what class does the model predict for Bob? Is this prediction correct?