Trains in like 0.1 sec lol but depends bc its initialized randomly ; structure is obv 2 inputs and one output, with 2 neurons in hidden layer, no activation function, the secret is in the update code which I set to only update the w&b of the LAST layer, not the processing input-hidden ones, I don't understand the reason but I am working on another version