Thursday, July 25, 2013

A First Look at Inter-Data Center Traffic Characteristics vis Yahoo! Datasets

Authors 
Yingying Chen, Sourabh Jain, Vijay Kumar Adhikari, Zhi-Li Zhang (UMN) 
Kuai XU (ASU)

! Keep changing. 

- Motivation
Nowadays many IT companies have built their own data centers on wide distributed area to provide services (main, news, messenger, game...) and give better experience to their customers(clients) by making the data close to them. For this, the data should keep being moved and replicated through multiple data centers depending on the clients' needs (low latency, availability and so on). This paper tries to find the characteristics of traffics between Data Centers (inter-data center-D2D) so that data center can be designed and managed with lower optical cost. There are few researches for the inner-data (traffic within a single data center), but a little for the traffic characteristic of inter-data center.

-  How it works
They collected NetFolw dataset which includes 1)time stamp, 2)source and destination IP and transport layer port number, 3)source and destination interface on the router, 4)IP protocol, number of bytes, and 5)packets exchanged from the Yahoo! data centers located in Dallas (DAX), Washington DC (DCP), Palo Alto (PAO), Hong Kong (HK), and United Kingdom (UK). They build their own novel heuristics to infer the Yahoo! IP and localize their location form the anonymized NetFolw dataset.

- Traffic Characteristics
D2C traffic : The traffic exchanged between Yahoo! servers and client.
D2D traffic : The traffic exchanged between different Yahoo! servers at different locations.
transit D2C and D2D traffic : border router at a given location may also carry D2C and D2D traffic for other locations.

DAX -> 50% D2C traffic : because of replication and efficiency of placement.
              20% D2D traffic : need to be reduced.
              25% transit D2C traffic : need to be reduced
              few transit D2D traffic : not much.

D2C -> Some services are strongly correlated. (email, messenger and so on). So strongly correlated services should be provided in a single data center to reduce the traffic between D2D.

D2D.
1)D2D triggered by background batch job for maintenance (replicate, placement so on) is dominant in the aggregate D2D traffic. There is no specific trend for this (small variance)
2)D2D triggered by D2C has increasing or decreasing trends depending upon the time of the day i.e pattern of usage (big variance).

- Pro
.Use real collected data from the Yahoo! data center so that their research seems very trustful.
.The first look for the traffic between inter-data center which give us some knowledge of it.

- Con
.As they says this result is too specified to Yahoo! data center. So other companies which provide different types of service e.g Amazon may have different pattern of traffic.

- Note for me
Research things for Wide-distributed-area file (storage) system, specially inter-data center (D2D). I read this paper to understand what kind of characteristic that I should consider to build a file system for Inter-data center.

No comments:

Post a Comment