HadoopConceptsNote

Read

File Read Process

  1. (Step 1 and 2) Determine the locations of the blocks for the first few blocks in the file.
  2. Namenode returns
    • Address of datanodes, have copy of blocks.
    • Sorted according to the proximity.
    • FSDataInputStream warps an instance of DFSInputStream.
  3. Read failover.
    • DFSInputStream will try the next nearest datanode.
    • DFSInputStream will remember the datanodes that have failed
    • DFSInputStream also verifies the checksum, if a corrupted bolck is found, it would try to read replicas on other datanodes.
  4. DFSInputStream close connection after finishing each block read.
  5. Blocks are read in order.
  6. FSDataInputStream close if finished.