Data Integrity
- Checksum only detect the corruption.
- Corruption healed by the mechanism of replication.
Data Integrity in HDFS
- Datanode run a deamon to do checksum.
- Checksum when READ
- Client verifies the checksum
- Datanode keeps the log of checksum verifying.
- Checksum when WRITE
- Verify checksum at the end of pipeline.
- During replication.
LocalFilesystem
ChecksumFileSystem
- Has few method to read the checksum files.