This page is largely depend on this video:
Used the gradient of input image X related to model function which model the data distribution. The reason why it called score based function because it used to be called as Stein score
$$ \nabla_xlogp_{\theta}(x) = \nabla_xf_\theta(x) - \nabla_xlogZ_\theta $$
Where $p_\theta$ is actually a probability distribution:
$$ p_\theta = \frac{e^{f_\theta(x)}}{Z_\theta} $$
Where $Z_\theta$ is a normalizing constant which is usually intractable.
$$ Z_\theta = \int e^{f_\theta(x)}dx $$
Due to $Z_\theta$ is intractable, but it is a constant anyway, we can use $\nabla_xlogp_{\theta}(x)$ to by pass the intractable problem. Since the derivative of a constant is always 0.
And by checking the direction of gradient, we can still reach the local minimum by using the gradient decent method to check the direction of gradient
As one can see from the graph, we only used the sign to determine whether to increase of not.
From now on, we will call $\nabla_xf_\theta(x)$ as $S_\theta(x)$
Intuitively, we need to have a data distribution score model ,and compare it with the deep learning score model. However, there is no explicit score function for data distribution.
Fortunately, some mathematician proved that the following equation is correct: This function is called score matching