Maximum Likelihood Estimator
Definition
Maximum likelihood estimator (MLE) is an estimator for the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statistical model, the observed data is most probable.
Suppose that the sample \(X_1, \ldots, X_n\) comes from a population with distribution \(f(x | \theta)\) indexed by \(\theta\), which could be a scalar or a vector of parameters. Elements of the sample are independent, thus the joint distribution of \(X_1, \ldots, X_n\) is a product of individual densities:
When the sample is observed, the joint distribution remains dependent upon the parameter:
and, as a function of the parameter, \(L\) is called the likelihood. The value of the parameter \(\theta\) that maximimizes the likelihood \(L(\theta | X_1, \ldots, X_n)\) is the MLE:
The solution to the optimization problem can be found as:
Intuitively, likelihood refers to "what observation data may be generated in the current state". Since we know the observation or the measurement data, the maximum likelihood estimator can be understood as "under what state, it is most likely to produce the data that is currently being observed?".
Log-Likelihood
In most cases, maximizing the logarithm of likelihood, log-likelihood, is simpler than maximizing the likelihood directly:
and finding an extremum of a sum is simpler. Since the logarithm is a monotonically increasing function, the maxima of \(L\) and \(\ell\) are achieved at the same value \(\hat{\theta}_{mle}\). Figure 1 shows likelihood and log-likelihood for exponential distribution with rate parmater \(\lambda\) when the sample \(X = [0.4, 0.3, 0.1, 0.5]\) is observed. The MLE is \(1 / \bar{X} = 3.077\).
Example
Exponential Model
Consider the MLE of \(\lambda\) in the exponential model, \(\text{Exp}(\lambda)\). After \(X_1, \ldots, X_n\) are observed, the likelihood becomes:
The likelihood \(L\) is obtained as a product of densities \(f(x_i | \lambda)\) where the arguments \(x_i\)s are fixed observations \(X_i\). The log-lileihood is:
We have:
and the solution is \(\hat{\lambda}_{mle} = \frac{n}{\sum^n_{i = 1} X_i} = 1 / \bar{X}\). The second derivative of the log-likelihood, \(\frac{\partial^2 \ell}{\partial \lambda^2} = -\frac{n}{\lambda^2}\), is always negative; thus, the solution \(\hat{\lambda}_{mle}\) maximizes \(\ell\).
Invariance Property of MLEs
Let \(\hat{\theta}_{mle}\) be an MLE of \(\theta\) and let \(\eta = g(\theta)\), where \(g\) is an arbitrary function. Then \(\hat{\eta}_{mle} = g(\hat{\theta}_{mle})\) is an MLE of \(\eta\).
For example, if the MLE for \(\lambda\) in the exponential distribution was \(1 / \bar{X}\), then for a function of the parameter \(\eta = \lambda^2 - \sin \lambda\), the MLE is \((1 / \bar{X})^2 - \sin(1 / \bar{X})\).
Appendix
Plotting Script
import numpy as np
from matplotlib import pyplot as plt
def compute_likelihoods(X, lambdas):
num_samples = X.shape[0]
## Likelihood
L = np.zeros(lambdas.shape)
for j in range(lambdas.shape[0]):
sum = 0
for i in range(num_samples):
sum += -lambdas[j] * X[i]
L[j] = np.power(lambdas[j], num_samples) * np.exp(sum)
## Log-likelihood
l = np.zeros(lambdas.shape)
for j in range(lambdas.shape[0]):
sum = 0
for i in range(num_samples):
sum += -lambdas[j] * X[i]
l[j] = num_samples * np.log(lambdas[j]) + sum
return L, l
rate_params = np.linspace(1, 5)
X = np.array([0.4, 0.3, 0.1, 0.5], dtype=np.float32)
L, l = compute_likelihoods(X, rate_params)
max_rate_param = np.array([3.077])
L_mle, l_mle = compute_likelihoods(X, max_rate_param)
fig, ax = plt.subplots()
ax.plot(rate_params, L, label="Likelihood")
ax.plot(rate_params, l, label="Log-likelihood")
ax.plot(max_rate_param, L_mle, "o")
ax.plot(max_rate_param, l_mle, "o")
ax.axvline(x=max_rate_param, linestyle="--")
ax.set_xlim((1, 5))
ax.set_ylim((-2, 2))
ax.set_box_aspect(1)
ax.set_xlabel("Rate param")
ax.set_ylabel("Likelihood")
ax.grid(True)
ax.legend()
plt.show()
fig.tight_layout()
fig.savefig("maximum_likelihood.png", dpi=800, bbox_inches="tight")