HadoopConceptsNote

Read

File Read Process

(Step 1 and 2) Determine the locations of the blocks for the first few blocks in the file.
Namenode returns
- Address of datanodes, have copy of blocks.
- Sorted according to the proximity.
- FSDataInputStream warps an instance of DFSInputStream.
Read failover.
- DFSInputStream will try the next nearest datanode.
- DFSInputStream will remember the datanodes that have failed
- DFSInputStream also verifies the checksum, if a corrupted bolck is found, it would try to read replicas on other datanodes.
DFSInputStream close connection after finishing each block read.
Blocks are read in order.
FSDataInputStream close if finished.