Gradient Descent is a technique to identify the the weights (theta) which yields a prediction model with minimum loss
The first stage in gradient descent is to pick a starting value (a starting point) for theta. The starting point doesn't matter much; therefore, many algorithms simply set to 0 or pick a random value. The following figure shows that we've picked a starting point slightly greater than 0. The gradient descent then repeats this process, edging ever closer to the minimum