HadoopConceptsNote

Data Integrity

  • Checksum only detect the corruption.
  • Corruption healed by the mechanism of replication.

Data Integrity in HDFS

  • Datanode run a deamon to do checksum.
  • Checksum when READ
    • Client verifies the checksum
    • Datanode keeps the log of checksum verifying.
  • Checksum when WRITE
    • Verify checksum at the end of pipeline.
    • During replication.

LocalFilesystem

ChecksumFileSystem

  • Has few method to read the checksum files.