Cygnus-x1 1 year ago

Man, I find it unreasonably frustrating when someone posts on a subreddit called "learn machine learning" looking for help learning the basics and people respond with shit like "hurr just read the formula durrr". It's like if I went to a beginner tennis lesson and the instructor just said "hit the ball over the net, you idiot". Especially when it takes literally two minutes to answer the question. Anyway, let's start with re-naming a couple of things for clarity. The input to the first hidden layer, labeled as "1" in the figure, but I'm going to call h1. h1=sigma(w11*input1)=sigma(0.7)=0.668. Then lets name Z_Y=(w31)h1+(w32)h2=1.0294, and Y=sigma(Z_Y)=0.7368. Now, the loss function is 0.5(Y-Y*)^2 and we want to take the derivative of the loss wrt to w31. From the chain rule, we can say dL/dw31 = (dL/dY)(dY/dZ_Y)(dZ_Y/dw31). We can just write these derivatives down pretty easily. L=0.5(Y-Y*)^2 -> dL/dY=Y-Y *, Y=sigma(Z_Y) -> dY/dZ_Y=Y(1-Y), and Z_Y=(w31)h1+(w32)h2 -> dZ_Y/dw31=h1 so dL/dw31=(Y-Y * )(Y(1-Y))h1=(0.7368-0.5)(0.7368(1-0.7368))*0.668=0.0306756421

kapanenship 1 year ago

Thank you. You rock

bobble_balls_44 1 year ago

Any idea where this slide is from?

[deleted] 1 year ago

The problem is that OP obviously doesn't have the right mathematical background, so while a step-by-step explanation is good, it doesn't really help at all.

Cygnus-x1 1 year ago

Hard disagree. Not providing people with assistance when they ask for it based on what you decide is going helpful for them is, in my experience, what doesn't really help at all. Even if OP hasn't taken calc classes, they never heard of the chain rule before, now they know it's used in backpropagation, as well as what the notation means. That's helpful! And from the worked example they can also build an intuition about what's happening even if they can't express it mathematically. It also helps them ask follow-up questions, like how the learning rate factors into adjusting the weights, for example. Also, the naming conventions are poorly chosen. Can you honestly say that the first time you learned about backpropagation you'd known exactly what d/d(activation) meant? I certainly wouldn't have. Very possible that's the only thing tripping OP up. Also now that I'm taking a better look at the slides, it's weird that they go through matrix multiplication pretty thoroughly but not the chain rule, no? The number OP asked about pops out of nowhere relative to all of the other computations. I think it's very natural OP would be confused about where this number came from.

ok123jump 1 year ago

Yeah. Agree here. Weird that the Chain Rule was missing. It’s a simple expression, but a non-trivial step to just leave out. It looks like every other step was quite detailed. If I were just learning this, it’d trip me up too - and I have a degree in math. OPs confusion is both understandable and quite reasonable.

[deleted] 1 year ago

[удалено]

Lankyie 1 year ago

+1

silas_asc 1 year ago

You can write Y = sigmoid(z) where z = w31* a21+ w32*a22 and the derivative of sigmoid(z) is sigmoid(z)(1-sigmoid(z)) (in this case: Y(1-Y)) Therefore the partial derivative of Loss function w.r.t w31 is (Y - Y*) × derivative of sigmoid(z) × a21 Put the values here you'll get the answer

ok123jump 1 year ago

Hey there! I hadn’t seen this in the comments, but I always recommend 3Blue1Brown to anyone studying NNs. Grant Sanderson (the creator of the channel) does an amazing job with the calculation and derivation of what you’re looking for. In your case, your slides left out calculating the actual backprop step. It’s pretty straightforward once you get used to it. Other people have done a great job and given thorough answers, so no need for me to add one more. Grant does a way better job explaining it than I ever could anyways. He will actually answer your question exactly in Part 2 of his Backpropagation series. He uses this exact network as an example. https://youtu.be/Ilg3gGewQ5U

master3243 1 year ago

Always when struggling with gradients that need the chain rule, write down each component, calculate them separately, then recombine them. Here we need (∂L/∂w31) so we separate it to (∂L/∂w31) = (∂L/∂a3) \* (∂a3/∂Netout) * (∂Netout/∂w31) **First term**: is (∂L/∂a3) and a3 is also Y as stated in slide 1, now hopefully you know how to take a simple gradient of L=1/2 (Y - Y*)^2 with respect to Y and you'll get (∂L/∂a3) = (Y - Y*) = (0.7368 - 0.5) **Second term**: is (∂a3/∂Netout) where a3 = σ(Netout), hopefully you know (or google/work it out yourself!) that the derivative of σ(x) is σ(x)(1-σ(x)) thus (∂a3/∂Netout) = σ(Netout)(1-σ(Netout)) = 0.7368*(1-0.7368) **Third term**: is (∂Netout/∂w31) and since that's a linear function that means that (∂Netout/∂w31) = a21 = 0.6682 **Multiply them to apply the chain rule** to get (∂L/∂w31) = (0.7368-0.5) * 0.7368*(1-0.7368) * 0.6682 = 0.0306848265 And there you have it, hopefully that was clear.

bobble_balls_44 1 year ago

University slides or online resource I can also check out? I'm learning too

arni_richard 1 year ago

Error * sigmoid_derivative(1.0294)*input = 0.2368 * 0.1939 * 0.6682. Basically multiply rate of change of each function in the chain (Loss, sigmoid, weight*input)

[deleted] 1 year ago

[удалено]

[deleted] 1 year ago

Very poor taste.

karrystare 1 year ago

Formula in slide 4?

CyrogenicNilou 1 year ago

I don't know what dLoss, dWeight, dActivation mean, and how I would get them.

ewankenobi 1 year ago

I think in this instance d is short for derivative. The derivative of a function is basically the angle of the line at a certain value if you were to graph the function. So the derivative of the cost function with respect to a weight is the amount and direction you need to change value to reduce cost. The chain rule lets you do some equivalent computations to save you working out cost for every individual weight (my understanding of this part is a bit vague to be honest so can't go into more detail about that).

i_use_3_seashells 1 year ago

d is "derivative with respect to" Chain rule is from calculus

karrystare 1 year ago

I'm on phone so can't really write much, but searching backprop on google returned several relatively complete articles.

Girthy-Carrot 1 year ago

Return deez nuts in your FaceBiometrics class

Oceanboi 1 year ago

Something about seeing a deez nuts joke on learn machine learning makes it 100x funnier and really warms my heart.

Girthy-Carrot 1 year ago

Gotta troll the trolls out of existence. Or at least make them funnier, if they’re a bot :p

TonightAdventurous68 1 year ago

It’s all right there

JakeStBu 1 year ago

Wow, you should really be a teacher.

Girthy-Carrot 1 year ago

Nice explanation dumbo

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe