Data Classification and Categorization
Data Classification
types of classification:
- state
- use
- importance
classification is for risk management - reduce costs associated with protecting data
- match level of protection and cost with value of asset
Data States
- at rest
- being created
- in transit
- being changed/deleted
data storage location should be considered as well
Data usage
- determining how it is used can help determine how it should be shared (or not)
Usage classifications
- Internal - created, computed, stored (in memory) within application
- Input - read into system and possibly stored
- Output - written to output destination
Sensitivity categories
- Security-sensitive - subset of data, very valuable to attacker
- PII
- Hidden - concealed using obfuscation to protect from unauthorized disclosure
Data Risk Impact
- 📝 labeled according to risk if lost (high, medium, low)
- will differ from firm to firm
- 📝 impact considerations include cost, operation impact, people impact, customer impact
Data ownership
- data is not owned by a person, it is owned by the enterprise
- data is assigned to people for stewardship/ownership for practical reasons
- ownership is business driven
Owner
- acts in the interests of the enterprise
- determines who has what access
- owner != custodian (e.g. CFO owns accounting records but DBA can make direct changes)
- 📝 data owners define data classification, authorized users, access criteria, and security controls
Custodian
- ensure processes safely transport, manipulate, and store data
- ensure data management processes (set by owner) are followed
- 📝 perform backups, data retention, disposal
- 📝 manage anything else (e.g. security controls) defined by owner
- custodians may not need read access
Labeling
Sensitivity
- built around business purpose
Impact
- wider concern than sensitivity
- includes loss, disclosure, and alteration
- three levels (high, medium or moderate, low) - each level should be clearly defined
- 📝 NIST FIPS 199 and SP 800-18 provide framework for classifying based on CIA
Clearly Defining levels
- high: set high enough that a small number of data elements are included
- financial limits and customer impact vary, each company needs to decide for itself
Types of data
Structured
- has a defined structure that can be parsed, sorted, searched
- 📝 is not determined by where it is stored, but how
- 📝 look for relationships between data elements
examples:
- tables in a database
- formatted file structures
- XML
- JSON
- certain text files like log files
Unstructured
- not easily parsed/searched
- more difficult to modify outside originating application (word docs, pdfs)
- majority of data is unstructured
Data lifecycle
- cost of storing data is still a resource issue despite lower costs
- must be managed from backup, business continuity, disaster recovery perspective
Generation
if data is going to be persistent, need to label, classify, protect
Retention
if retained, must have metadata defined such as:
- data owner
- purpose of storing
- level of protection
- length of storage (retention policy)
protection must also consider how to protect backups and copies for DR
System logs: are considered important from a legal/compliance perspective, often have sensitive info that needs to be protected
Disposal
primary purposes of disposal:
- conserve resources
- limit liabilities
📝 Length of storage is determined by:
- business purpose
- compliance
Legal hold data is not subject to normal disposal procedures