HadoopConceptsNote

Write

File Write

  • Setp 1 and 2 only create a record of new file on namenode, also namenode will check the permission and duplication before any things.
  • The main component handles communication with datanode and namenode is DFSOutputStream. And it's warpped by FSDataOutputStream.
  • FSDataOutputStream
    • Split data into packets
    • Queuing the packets into data queue.
    • Ack queue involve in the process of failover.
  • DataStreamer
    • Consumes the data queue
    • Asking the namenode to allocate new blocks by picking a list of suitable datanodes to store replicas.
    • Streams the packets in the pipeline.(Step 4 and 5)