Skip to content

Data Classification and Categorization

Data Classification

types of classification:

  • state
  • use
  • importance

classification is for risk management - reduce costs associated with protecting data

  • match level of protection and cost with value of asset

Data States

  • at rest
  • being created
  • in transit
  • being changed/deleted

data storage location should be considered as well

Data usage

  • determining how it is used can help determine how it should be shared (or not)

Usage classifications

  • Internal - created, computed, stored (in memory) within application
  • Input - read into system and possibly stored
  • Output - written to output destination

Sensitivity categories

  • Security-sensitive - subset of data, very valuable to attacker
  • PII
  • Hidden - concealed using obfuscation to protect from unauthorized disclosure

Data Risk Impact

  • 📝 labeled according to risk if lost (high, medium, low)
  • will differ from firm to firm
  • 📝 impact considerations include cost, operation impact, people impact, customer impact

Data ownership

  • data is not owned by a person, it is owned by the enterprise
  • data is assigned to people for stewardship/ownership for practical reasons
  • ownership is business driven


  • acts in the interests of the enterprise
  • determines who has what access
  • owner != custodian (e.g. CFO owns accounting records but DBA can make direct changes)
  • 📝 data owners define data classification, authorized users, access criteria, and security controls


  • ensure processes safely transport, manipulate, and store data
  • ensure data management processes (set by owner) are followed
  • 📝 perform backups, data retention, disposal
  • 📝 manage anything else (e.g. security controls) defined by owner
  • custodians may not need read access



  • built around business purpose


  • wider concern than sensitivity
  • includes loss, disclosure, and alteration
  • three levels (high, medium or moderate, low) - each level should be clearly defined
  • 📝 NIST FIPS 199 and SP 800-18 provide framework for classifying based on CIA

Clearly Defining levels

  • high: set high enough that a small number of data elements are included
  • financial limits and customer impact vary, each company needs to decide for itself

Types of data


  • has a defined structure that can be parsed, sorted, searched
  • 📝 is not determined by where it is stored, but how
    • 📝 look for relationships between data elements


  • tables in a database
  • formatted file structures
  • XML
  • JSON
  • certain text files like log files


  • not easily parsed/searched
  • more difficult to modify outside originating application (word docs, pdfs)
  • majority of data is unstructured

Data lifecycle

  • cost of storing data is still a resource issue despite lower costs
  • must be managed from backup, business continuity, disaster recovery perspective


if data is going to be persistent, need to label, classify, protect


if retained, must have metadata defined such as:

  • data owner
  • purpose of storing
  • level of protection
  • length of storage (retention policy)

protection must also consider how to protect backups and copies for DR

System logs: are considered important from a legal/compliance perspective, often have sensitive info that needs to be protected


primary purposes of disposal:

  1. conserve resources
  2. limit liabilities

📝 Length of storage is determined by:

  1. business purpose
  2. compliance

Legal hold data is not subject to normal disposal procedures