/ Dump

Personal Data is more than PII


In a GDPR context, people sometimes get confused with similar terms such as "PII" or "personal data". This can cause them to mis-apply the GPDR, as they are relying on concepts from other jurisdictions.

In brief:

  • Personally Identifiable Information (PII) is a term strongly associated with the US regulatory environment, and generally describes information that helps you identify someone: given PII, you can identify the data subject.

  • Personal data is a term associated with EU privacy laws, in particular the GDPR. It describes any information that relates to an identifiable person. This definition encompasses both PII that helps you identify that person, but also any information that can then be linked to that person.

  • Personal information is a term associated with California privacy laws, in particular the CCPA. It is fairly similar to the GDPR personal data concept.

    The term personal information is also used colloquially by the UK ICO to mean the UKGDPR concept of "personal data".

How the GDPR defines personal data

The GDPR defines personal data in Art 4(1):

‘personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person;

Unfolding this a bit, the main part is: personal data is any information relating to an identifiable person (the data subject).

There are two criteria here (compare the WP29 opinion):

  1. the data subject must be identifiable
  2. the personal data must relate to the data subject, i.e. must be "about" that person or affect that person

The identifiability criterion is tricky, so the GDPR's definition provides some guidance:

  • It is not necessary for the data subject to be already identified or directly identified. It still counts as personal data if the data subject is only indirectly identifiable.

  • The data subject might be identifiable through an identifier (compare the US PII concept). Examples of identifiers:

    • name
    • identification number
    • location data
    • online identifier
  • The data subject might be identifiable through various factors:

    • physical identity
    • physiological identity
    • genetic identity
    • mental identity
    • economic identity
    • cultural identity
    • social identity
  • These are examples, not exhaustive lists.

Some of the provided examples like "identification numbers" or "online identifiers" might be pseudonyms, but that doesn't change that they are personal data (see GDPR Recital 26).

It is not necessary that a given piece of personal data contributes to identification, it might also be linked data.

How the US defines PII

At time of writing, there is no single US federal privacy law to point to, at least when it comes to the private sector. Instead, the US has privacy laws specific to a subject matter, guidance by government agencies, and state privacy laws.

An example of a privacy law for a specific subject matter is HIPAA, which has shaped the public's perception of PII. It defined the term "individually identifiable health information" as follows:

The term 'individually identifiable health information' means any information, including demographic information collected from an individual, that—
"(A) is created or received by a health care provider, health plan, employer, or health care clearinghouse; and
"(B) relates to the past, present, or future physical or mental health or condition of an individual, the provision of health care to an individual, or the past, present, or future payment for the provision of health care to an individual, and—
"(i) identifies the individual; or
"(ii) with respect to which there is a reasonable basis to believe that the information can be used to identify the individual.

Aside from the subject matter restriction in (A) and (B), note the key difference in points (B)(i) to (ii):

  • the GDPR concept of personal data focuses on the data subject
  • the HIPAA concept of individually identifiable health information focuses on whether information itself is identifying

As an example of guidance from government agencies, consider NIST SP 800-122, which references the definition established in a footnote of an US Government Accountability Office report:

For purposes of this report, the terms personal information and personally identifiable information are used interchangeably to refer to any information about an individual maintained by an agency, including (1) any information that can be used to distinguish or trace an individual’s identity, such as name, Social Security number, date and place of birth, mother’s maiden name, or biometric records; and (2) any other information that is linked or linkable to an individual, such as medical, educational, financial, and employment information.

Here, PII is used to mean both directly identifying information, but also linkable information. This is fairly close to the GDPR definition. However, the examples given by NIST for "linked or linkable" data include:

date of birth, place of birth, race, religion, weight, activities, geographical indicators, employment information, medical information, education information, financial information

The GDPR would generally consider these categories to be identifiers or relating factors. So the NIST concept of PII is much more narrow than the GDPR concept of personal data.

Further reading


I am not a lawyer, and this article is not legal advice. The article discusses general concepts, not concrete fact patterns.