Techies crack Social Security code

For all the concern about identity theft, researchers say there’s a surprisingly easy way for the technology-savvy to figure out the precious nine digits of Americans’ Social Security numbers.

"It’s good that we found it before the bad guys," Alessandro Acquisti of Carnegie-Mellon University in Pittsburgh said of the method for predicting the numbers.

Acquisti and Ralph Gross report in Tuesday’s edition of Proceedings of the National Academy of Sciences that they were able to make the predictions using data available in public records as well as information such as birthdates cheerfully provided on social networks such as Facebook.

For people born after 1988 — when the government began issuing numbers at birth — the researchers were able to identify, in a single attempt, the first five Social Security digits for 44 percent of individuals. And they got all nine digits for 8.5 percent of those people in fewer than 1,000 attempts.

For smaller states their accuracy was considerably higher than in larger ones.

Acquisti said in a telephone interview that he has sent the findings to the Social Security Administration and other government agencies with a suggestion they adopt a more random system for assigning numbers.

Social Security spokesman Mark Lassiter said the public should not be alarmed by the report "because there is no foolproof method for predicting a person’s Social Security number."

"The suggestion that Mr. Acquisti has cracked a code for predicting an SSN is a dramatic exaggeration," Lassiter said via e-mail.

However, he added: "For reasons unrelated to this report, the agency has been developing a system to randomly assign SSNs. This system will be in place next year."

The researchers say their report omits some details to make sure they aren’t providing criminals a blueprint for obtaining the numbers.

The predictability of the numbers increases the risk of identity theft, which cost Americans almost $50 billion in 2007 alone, Acquisti said.

A problem in the battle against identity thieves is that many businesses use Social Security numbers as passwords or for other forms of authentication, something that was not anticipated when Social Security was devised in the 1930s. The Social Security Administration has long cautioned educational, financial and health care institutions against using the numbers as personal identifiers.

"In a world of wired consumers, it is possible to combine information from multiple sources to infer data that is more personal and sensitive than any single piece of original information alone," he said, warning against providing too much data on social network sites.

Acquisti, who researches the economics of privacy, said he got interested in what could be learned from easily available by looking at social networks, which he termed "a great experiment in self-revelation."

People were willing to include their date of birth and hometown, he said, and he already knew that was part of the information used in issuing Social Security numbers.

So the researchers turned to the SSA’s "Death Master File," which lists the numbers of people who have died. The purpose of making that file public is to prevent impostors from assuming the Social Security numbers of deceased people.

But by plotting the data for people listed on the file between 1973 and 2003 the researchers were able to develop patterns for number issuance.

"I was surprised by the accuracy of certain predictions," Acquisti said.

The system can produce a range of possibilities for the last four numbers, making it easier for a computer to test the possibilities until the correct number is found for an individual, Acquisti explained.

In addition, "attackers can exploit various public- and private-sector online services, such as online "instant" credit approval sites, to test subsets of variations to verify which number corresponds to an individual with a given birth date.

While it was well known that the numbers have a geographic component, past studies have used the patterns plus other data to estimate when and where a specific number may have been issued.

"Our work focuses on the inverse, harder, and much more consequential inference: it shows that it is possible to exploit the presumptive time and location of SSN issuance to estimate, quite reliably, unknown SSNs," Acquisti said.

The research was supported by the National Science Foundation, the U.S. Army Research Office, Carnegie-Mellon University and the Pittsburgh Supercomputing Center.


On the Net:


  1. gazelle1929

    “For people born after 1988 — when the government began issuing numbers at birth — the researchers were able to identify, in a single attempt, the first five Social Security digits for 44 percent of individuals. And they got all nine digits for 8.5 percent of those people in fewer than 1,000 attempts.”

    Let’s see. After they have the first five, there are four left, which comes out to 9999 numbers to guess from. Let’s assume 900 guesses for each. 900/9999 comes out to .09 more or less. If I have 900 guesses I should be able to guess correctly in 9 percent of the cases. At 999, which is still less than the 1000 they allowed themselves, I should guess correctly just about 10 percent of the time.

    And the best they could do with their fancy shmancy algorithms was 8.5 percent. I am SO unimpressed.

    As to the SSANs, it used to be that when you wanted a SSAN (typically when you got your first job, at least it was in my case), you hied yourself off to the local SSA office with your birth certificate. There was a distinct formula that determined your SSAN.

    The first three numbers were the SSA office’s particular designation. The next two numbers were the book number, and the four numbers after that were assigned this way:

    They started at 0001, issued odd numbers up to 9999, then even numbers back down to 0002.

    When they got finished with the book number (the fourth and fifth) they went to the next book in numerical order.

    The SSA office in Alexandria, VA was 226, and my brother and I got our SSANs at the same time. His ends in 91, mine in 93. Other than that the numbers are the same because we were both in the same book.

    Both of my children, though, were born in the same hospital two years apart (after 1988). Their numbers begin with 521 and 753. I fail to see how the people who did this analysis can use place of birth and year of birth to come up with those two numbers, which are just NOT that close together.