By WIRED
The original version of this story appeared in Quanta Magazine.
We all know to be careful about the details we share online, but the information we seek can also be revealing. Search for driving directions, and our location becomes far easier to guess. Check for a password in a trove of compromised data, and we risk leaking it ourselves.
These situations fuel a key question in cryptography: How can you pull information from a public database without revealing anything about what you’ve accessed? It’s the equivalent of checking out a book from the library without the librarian knowing which one.
Concocting a strategy that solves this problem—known as private information retrieval—is “a very useful building block in a number of privacy-preserving applications,” said David Wu, a cryptographer at the University of Texas, Austin. Since the 1990s, researchers have chipped away at the question, improving strategies for privately accessing databases. One major goal, still impossible with large databases, is the equivalent of a private Google search, where you can sift through a heap of data anonymously without doing any heavy computational lifting.
Now, three researchers have crafted a long-sought version of private information retrieval and extended it to build a more general privacy strategy. The work, which received a Best Paper Award in June 2023 at the annual Symposium on Theory of Computing, topples a major theoretical barrier on the way to a truly private search.
“[This is] something in cryptography that I guess we all wanted but didn’t quite believe that it exists,” said Vinod Vaikuntanathan, a cryptographer at the Massachusetts Institute of Technology who was not involved in the paper. “It is a landmark result.”
The problem of private database access took shape in the 1990s. At first, researchers assumed that the only solution was to scan the entire database during every search, which would be like having a librarian scour every shelf before returning with your book. After all, if the search skipped any section, the librarian would know that your book is not in that part of the library.
That approach works well enough at smaller scales, but as the database grows, the time required to scan it grows at least proportionally. As you read from bigger databases—and the internet is a pretty big one—the process becomes prohibitively inefficient.
In the early 2000s, researchers started to suspect they could dodge the full-scan barrier by “preprocessing” the database. Roughly, this would mean encoding the whole database as a special structure, so the server could answer a query by reading just a small portion of that structure. Careful enough preprocessing could, in theory, mean that a single server hosting information only goes through the process once, by itself, allowing all future users to grab information privately without any more effort.
For Daniel Wichs, a cryptographer at Northeastern University and a coauthor of the new paper, that seemed too good to be true. Around 2011, he started trying to prove that this kind of scheme was impossible. “I was convinced that there’s no way that this could be done,” he said.
Discussion about this post