Rutherford lab (RL) computer center, near Oxford England, is planning to become a BaBar tier 1 remote computing site. This means it will have a copy of all the Babar data (currently 1 TByte a day and expected to increase by a factor of 3 in the next year), and will be a major site where Babar computing will be performed. As such it needs excellent connectivity/performance to SLAC and other BaBar collaborator sites. While making throughput measurements from SLAC (pharlap.slac.stanford.edu) to RL (dev04.hepgrid.clrc.ac.uk) in preparation for the SC2001 bandwidth challenge it was discovered that the maximum throughput achievable was about 40 Mbits/s. The pipechar from SLAC to RL indicates that the connection is via ESnet and JAnet and the bottleneck is somewhere in JAnet. The pipechar from RL to SLAC appears to show taht the route is OC48 all the way. The link from SLAC to ESnet at this time was OC3. This discrepancy (OC48 vs OC3 for the SLAC - ESnet link) is probably due to pipechar inaccuracies on more distant links.
Routes
The traceroute from SLAC to IN2P3 shows that there is about 70msec. round trip time (RTT) between SLAC and ESnet and 139msec. from SLAC to RL. It also clearly indicates the carriers as being ESnet and JAnet. The traceroute from RL to SLAC indicates the route is fairly symmetric.
Follow Up
I remeasured the pipechar from SLAC to RL but the ~40Mbits/s bottleneck still appeared in JAnet. We also made some more in depth measurements of the iperf TCP throughput as a function of window size and streams. These measurements were made with the iperf client on pharlap.slac.stanford.edu (a Sun E4500 with 6*336MHz cpus running Solaris 5.8 with a GE interface and a 4MByte TCP buffer) and dev04.hepgrid.clrc.ac.rl a 604 MHz Linux 2.2.16-3 host with a GE interface and an 8MByte TCP buffer. The results are shown below, and indicate that that the maxima (10% of the measurements that give the greatest throughput) are over 38Mbits/s. This is well below the expected maximum of about 100 Mbps on an OC3 bottleneck link.
Resolution
On Jan 14 '02 I received the following email from Chris Selig at RAL:
Would it be possible for you to re-run your test between SLAC and RAL. We have identified (and fixed) a routing problem with the network to which the dev04 is connected. This problem affected ONLY the network on which the test machine lived, which explains why none of our measurements show up the problem. I reran the test, and got a maximum throughput of 37Mbits/s with the top 10% measurements being over 29Mbits/s. I also noticed that the maximum window size was set to 65KBytes and so requested that it be increased.
Would it be possible for you to re-run your test between SLAC and RAL. We have identified (and fixed) a routing problem with the network to which the dev04 is connected. This problem affected ONLY the network on which the test machine lived, which explains why none of our measurements show up the problem. I reran the test, and got a maximum throughput of 37Mbits/s with the top 10% measurements being over 29Mbits/s. I also noticed that the maximum window size was set to 65KBytes and so requested that it be increased.
On Friday March 1st Tim Folkes of RL reported: OK, Have made the change, let me know if it improves things.
I then remeasured the throughputs using varying window sizes and streams. I was able to achieve over 100Mbits/s with various streams/window combinations.
ليست هناك تعليقات:
إرسال تعليق