In Lecture 30, L2 Regularization  Theory, you mention the Probability of Data from 2:09 to 2:37. I believe the term is used for two different, but proportional quantities:
P(w  Y,X)  stated as P of w given data. In this case, data is denoted by Y,X which I interpret as the intersection of events.
P(YX)  stated as P of data which I interpret as the conditional statement Y given X.
Thanks in advance for the clarification.
Probability of Data

 Site Admin
 Posts: 60
 Joined: Sat Jul 28, 2018 3:46 am
Re: Probability of Data
Thanks for your inquiry.
> In this case, data is denoted by Y,X which I interpret as the intersection of events.
The issue is, there is no such thing as p(w  Y  X).
A simpler way to think of it is that "X" is on the "given" side for all terms (and hence effectively ignored).
So a lazy way to think of it is p(w  Y) = p(Y  w)p(w) / p(Y) which is just Bayes rule.
> In this case, data is denoted by Y,X which I interpret as the intersection of events.
The issue is, there is no such thing as p(w  Y  X).
A simpler way to think of it is that "X" is on the "given" side for all terms (and hence effectively ignored).
So a lazy way to think of it is p(w  Y) = p(Y  w)p(w) / p(Y) which is just Bayes rule.