In this post, I will briefly introduce pointwise maximal leakage (PML), a privacy measure that I developed during my PhD. My guess is that most privacy enthusiasts are already familiar with differential privacy (DP), which has become the de facto standard for privacy. Over the past two decades, DP has been at the center of a lot of theoretical and application-focused research, and things are looking pretty good.
So, why did I decide to develop PML instead of just hopping on the DP wagon, doing my PhD on DP, and enjoying the ride? To understand why, we first need to define PML and see what makes it unique. This post will introduce PML, and in future posts, I’ll explore its advantages and how it relates to DP.
Our setup is a pretty standard privacy one. There is some data $X$ that contains sensitive information, like a medical database or a person’s medical records. We also assume there is some probability distribution for $X$, denoted by $P$. (For now, let’s not be too picky about what $P$ is. This has been the subject of much debate in my research, and I plan to discuss it at length in later posts.) Some information related to $X$ is released, and we denote this released information by $Y$. The information release “function” is a conditional probability distribution $Q(\cdot \mid \cdot)$, often called the privacy mechanism, or just mechanism. The question is: how much information about $X$ is leaked through $Y$?
In this setup, the PML from $X$ to an outcome $y$ of the mechanism is:
\begin{equation} \label{eq:pml} \ell(X \to y) = \max_{x : P(x) > 0} \, \log \, \frac{Q(y \mid x)}{Q(y)}, \end{equation}
where $Q(\cdot)$ represents the output distribution over $Y$. That is,
\[Q(y) = \sum_x Q(y \mid x) P(x), \quad \forall y.\]A couple of points should be emphasized:
For now, we are not being too picky about what $X$ is; it could be an entire database or data of a single person. This is different from how things are done in the DP world where we either operate in the “central” mode (data of multiple individuals collected in a database) or the “local” mode ($X$ is the data of a single individual). Equation \eqref{eq:pml} is meaningful for both of these scenarios.
Equation \eqref{eq:pml} assumes that $X$ is a finite random variable, but PML can also be extended to more general types of $X$
Unlike DP, which requires randomized mechanisms, PML is well-defined even for deterministic mechanisms (as long as $X$ is a finite random variable). In fact, PML is well-defined even when $Y=X$, which corresponds to the case where we reveal $X$ entirely in the output. Intuitively, this works because from an information-theoretic perspective, there is only a finite amount of information contained within a finite random variable.
Looking at \eqref{eq:pml}, it may not be immediately clear why this expression makes sense as a privacy measure. It’s certainly not as intuitive as the definition of DP, which involves checking how the mechanism’s output changes when a single entry in the database is modified. The good news is that it doesn’t really matter if \eqref{eq:pml} isn’t intuitive because it’s not actually the definition of PML! In fact, \eqref{eq:pml} is a simplified form derived from the definitions of PML (I say definitions, plural, because there are two ways to define PML).
So, if \eqref{eq:pml} is not the definition of PML, then how is PML defined? To answer this, we need to be precise about what we mean by “information leakage” and what exactly we want our privacy measure to quantify. One approach is to consider how useful observing $y$ is for an adversary trying to infer information about $X$. Different adversaries may have different goals, and one way to model the adversary’s objective is through a non-negative function called the gain function. A gain function, denoted by $g(X,\hat X)$, quantifies how beneficial it is for an adversary to construct a guess $\hat X$ about the secret $X$.
Now, consider an adversary with a given gain function $g$. We quantify how useful the observation $y$ is for this adversary by comparing two quantities: the maximum posterior expected gain, which represents the expected gain from the adversary’s best guess after observing $y$, and the maximum prior expected gain, which is the expected gain from the adversary’s best guess without any observations. The ratio of these two quantities measures how much observing $y$ helps the adversary in her inference task. Then, to make our definition robust against various adversarial strategies, we maximize this ratio over all possible non-negative gain functions. Formalizing this approach yields the following definition of PML:
\[\ell(X \to y) := \log \, \sup_{g} \frac{\sup\limits_{\hat P} \; \mathbb{E} \left[g(X,\hat{X}) \mid Y=y \right]}{\sup\limits_{\tilde P} \; \mathbb E\left[g(X, \hat{X})\right]}.\]The supremum in the numerator is taken over the distribution of $\hat X$ given $y$, denoted by $\hat P$. This means that the adversary doesn’t just pick any silly $\hat X$; she chooses the one that maximizes her expected gain. Similarly, the supremum in the denominator is taken over the distribution of $\hat X$, denoted by $\tilde P$, when it is statistically independent of $X$. Essentially, this captures the adversary’s best guess without any information about $X$ other than its distribution.
There we have it—the formal definition of PML. I have to admit that it looks a bit dense. Fortunately, it simplifies to \eqref{eq:pml}, which is a much more workable expression. Also, as I mentioned earlier, there are actually two ways to define PML, and in the next post, I’ll explore the second definition. The material covered in this post, along with more details about PML, can be found in my paper