The following represents data and results gathered from the second research institution connection cloud transfer test and compares results from Azure’s US North Central data center and Azure’s US South Central data center. The methodology applied during this test is detailed here and should be reviewed prior to considering the results or commentary below.
Test Overview:
- 05561 Cloud Transfer Tests: Research Institution Test 02
- Local Connection: Research Networks
- Started: February 9, 2010
- Finished: February 16, 2010
- Origination Point: Oak Ridge, TN
Disclaimer:
- Standard Disclaimer Applies
Test Objectives:
- Standard objectives apply
- Specific to this test: Test a research institution connection as the researcher’s “workstation” and gather data aimed at building a realistic expectation of performance
Test Setup
- Included File Sizes:
- 2KB, 32KB, 64KB, 128KB, 256KB, 512KB, 1MB, 5MB, 10MB, 25MB, 50MB, 100MB, 250MB, 500MB, 750MB, 1GB
- Network Connectivity - “research institution”
- Consists of a computer connected to a local network router via 100Mbps hard-wire.
- Multiple switches/routers/firewalls may exist between workstation and the public internet
- There may exist multiple high-speed networks that may be leveraged for connectivity to remote datacenters (ESNet, I2, NLR
- Reasonable effort has been made to ensure that no other applications or TSRs are running on the source computer for the duration of the test.
- For this test, a newly-installed Windows 7 Professional installation was used, fully patched, with no other applications (beyond the test harness) installed.
Test Execution:
- Standard execution approach applied with the exception of the fact that Azure was tested for both cases – simply different datacenters (see slides for details)
Report Generation
- Standard report generation approach applied
Conventions:
- Standard conventions apply
Resources:
- Standard resources apply - no test-specific customizations beyond adaptations for the specific file sizes included in the test
Results:
Similar to other tests, there is some variability displayed that is obviously a result of traffic issues. We are continuing to look into this.
In general, the data from the Azure US North Central data center proved better than that of US South Central which is not altogether surprising as we are physically closer to the USNC location.
Slides 171 and 172 remain disturbing as the download values for the 750MB file size continue to be outside of what would be expected.
Slide 172 in particular is of interest as it draws attention to some wide variability across file sizes for the USSC datacenter (not just the 750MB size).
Full results are available in slide form here:
PDF of results are available here: http://sciencecloud.us/media/05561_Xfer-Research_02.pdf
Be the first to rate this post
- Currently 0/5 Stars.
- 1
- 2
- 3
- 4
- 5
The following represents data and results gathered from the first research institution connection cloud transfer test. The methodology applied during this test is detailed here and should be reviewed prior to considering the results or commentary below.
Test Overview:
- 05561 Cloud Transfer Tests: Research Institution Test 01
- Local Connection: Research Networks
- Started: February 8, 2010
- Finished: February 16, 2010
- Origination Point: Oak Ridge, TN
Disclaimer:
- Standard Disclaimer Applies
Test Objectives:
- Standard objectives apply
- Specific to this test: Test a research institution connection as the researcher’s “workstation” and gather data aimed at building a realistic expectation of performance
Test Setup
- Included File Sizes:
- 2KB, 32KB, 64KB, 128KB, 256KB, 512KB, 1MB, 5MB, 10MB, 25MB, 50MB, 100MB, 250MB, 500MB, 750MB, 1GB
- Network Connectivity - “research institution”
- Consists of a computer connected to a local network router via 100Mbps hard-wire.
- Multiple switches/routers/firewalls may exist between workstation and the public internet
- There may exist multiple high-speed networks that may be leveraged for connectivity to remote datacenters (ESNet, I2, NLR
- Reasonable effort has been made to ensure that no other applications or TSRs are running on the source computer for the duration of the test.
- For this test, a newly-installed Windows 7 Professional installation was used, fully patched, with no other applications (beyond the test harness) installed.
Test Execution:
- Standard execution approach applied
Report Generation
- Standard report generation approach applied
Conventions:
- Standard conventions apply
Resources:
- Standard resources apply - no test-specific customizations beyond adaptations for the specific file sizes included in the test
Results:
Across both services there exists an interesting amount of variability that is likely due to intermediate traffic or traffic management issues. Even within the same test run (see various scatter plots) you can detect “walls” of change wherein a the values will be hovering around a certain value and subsequently they hover around a much higher/lower value (ex. slide 133, 134).
There is not a consistent “winner” in this report. for various file sizes one platform would clearly outperform the other only to have the tables completely reversed for the next file size. This hints at network routing issues. A brief conversation with some of our local networking team indicates that some traffic (in particular Amazon’s) appeared to generally leave via the router connected to ESNet whereas most of the Microsoft traffic would leave via the router connected to Southern Crossing with subsequent connections to I2 and NLR. It may well be that the insertion of some static routs may help address some of the stability issues here.
Of particular interest is the “hump” seen by both services in slide 170. This has been seen in a similar location on the chart in other runs (see slide #82 here: http://www.slideshare.net/rgillen/cloud-storage-upload-tests-02). We don’t yet have a good explanation for this shape in the curve and are hoping to track that down soon.
Further, the shape of the Azure curve in slide 171 is inconsistent with other tests – specifically the data points for the 750MB size. We will continue to compare with other sets/runs to see if this continues or was simply transient.
What remains consistent across all tests so far is that the level of variability tends to be greater with the S3 platform as compared to the Azure Blob storage.
Full results are available in slide form here:
PDF of results are available here: http://sciencecloud.us/media/05561_Xfer-Research_01.pdf
Be the first to rate this post
- Currently 0/5 Stars.
- 1
- 2
- 3
- 4
- 5
The following describes the methodology applied to some of the data transfer tests we are performing for various cloud storage platforms. In each case, the following approach should be assumed with the exception of test-specific details which will be posted with each result set.
Disclaimer:
- The research team understands that any time the public internet is introduced into a test a number of non-controllable factors are introduced. It is the intent of this project to test various scenarios often enough and with enough variance to obtain a reasonable average and thereby allow the team to make general assumptions about the quality of service (given the constraints stated) that one can reasonably expect to encounter when utilizing a given service.
- It is similarly understood that there may exist environmental factors (i.e. routing paths, proxy servers, firewalls) that affect the transfer rates being tested. In general, it is believed that these factors should affect all tested platforms equally. However, in the case of various research institutes where specialty networks (i.e. ESNet, NLR, I2) exist, there may be routing configurations that particularly favor one service or endpoint over another. It is an objective of these tests to expose these anomalies with the goal of addressing them as appropriate.
- Baseline: For the various services tested, these initial tests were performed using no particular optimization techniques. We took the respective vendor’s shipping SDK, integrated it into a very similar wrapper (source code available for verification) and executed it. Subsequent work should focus on optimizations in the SDKs, or the methods in which the libraries are utilized, etc.
- Not A Stand-Alone Work: This data should not be considered in isolation. Rather, it is a portion of a larger data set (some of which may remain to be published) and should be interpreted for what it is – a portion of a larger collection that aims to provide a more complete view of the entire problem domain.
Test Objectives:
- General: Generate data to set expectations for users of various cloud services focusing on a scenario of local compute combined with cloud-hosted data (blob storage). Note: the reverse scenario as well as cloud-hosted compute/cloud-hosted data will be tested separately
- These tests and data are crucial to our overall objective of improving the experience of researchers interacting with cloud computing assets as they provide a baseline against which any optimizations or alterations may be compared.
Test Setup:
- Test Setup
- A collection of random-data files were generated (RandomFileGenerator.exe). For each of the following file sizes, 50 files were generated and stored on standard disks local to the test computer: Range is specific to each test set.
- Network Connectivity: specific to each test set
Test Execution:
- For each file size, AWS_Console_App1.exe was called to upload the files to Amazon’s US Standard Region and record the duration
.\amazon\aws_console_app1.exe .\data\2KB
- For each file size, DownloadFiles.exe was called to download the files just uploaded to Amazon’s US Standard Region and record the duration
.\downloader\DownloadFiles.exe -i .\amazon_2KB.csv -p 6 -m yes
- For each file size, AzureTesting.exe was called to upload the files to Azure’s US North Central region and record the duration
.\azure\azuretesting.exe .\data\2KB
- For each file size, DownloadFiles.exe was called to download the files just uploaded to Azure’s North Central region and record the duration
.\downloader\DownloadFiles.exe -i .\azure_2KB.csv -p 6 -m yes
- NOTE: immediately following each operation for each file size, the resulting file (log.csv) was renamed to represent the source, transfer direction, and file size
ren log.csv azure_ussc_upload_2KB.csv
Report Generation:
- For each service tested, and each file size tested
- For both Uploads and Downloads (separately)
- Scatter plot is generated showing the distribution for the transfer duration (seconds)
- Scatter plot is generated showing the distribution for the transfer rate (Mb/s)
- Transfer duration average (seconds) is calculated
- Transfer duration standard deviation (seconds) is calculated
- Transfer rate average (Mb/s) is calculated
- Transfer rate standard deviation (Mb/s) is calculated
- For each file size tested
- For both Uploads and Downloads (separately)
- A comparison chart (column) is generated showing the average transfer duration (seconds) and error bars indicating one standard deviation (seconds). Also plotted is a dot indicating the associated average transfer rate on the secondary Y axis (Mb/s)
- Summary Charts
- For both Uploads and Downloads (separately)
- A range chart is generated showing the band covered by one standard deviation (per service tested) for the transfer duration (seconds) across the tested file sizes
- A range chart is generated showing the band covered by one standard deviation (per service tested) for the transfer rate (Mb/s) across the tested file sizes
- Presentation
- Once the above charts have been generated, they are assembled into a PowerPoint file
- Once the power point file has been generated and saved, it is published as a PDF file
- Automation
- All of the above steps are automated via a script (ProcessTransferLogs.ps1)
Conventions:
- Naming Conventions
- Amazon_USSTD: Amazon’s US Standard region was specified when the bucket was created
- Azure_USNC: Azure’s US North Central region was selected when the storage account was created
- Error Handling
- In most runs, errors were displayed to the screen but not captured to logs.
- Existence of errors (all of which were network-related) are manifested in the logs as collections of data points less than 50 (the test source size)
- Due to the fact that the respective download tests are based on the upload source files, a download file containing less than 50 entries is not necessarily indicative of errors but may simply be tied to the fact that the input file had less than 50 entries. This being said, there were more errors on downloads than uploads.
Resources:
Results: Specific to each test set
Be the first to rate this post
- Currently 0/5 Stars.
- 1
- 2
- 3
- 4
- 5
I’ve been getting my test harness and reporting tools setup for some performance baselining that I’m doing relative to cloud computing providers and when I left the office on Friday I set off a test that was uploading a collection of binary files (NetCDF files if you care) to an Azure container. I was doing nothing fancy… looping through a directory, for each file found, upload to the container using the defaults for BlobBlock and then record the duration (start/finish) for that file and the file size. The source directory contained 144 files representing roughly 58 GB of data. 32 of the files were roughly 1.5 GB each and the remainder were about 92.5 MB.
I came in this morning expecting to find the script long finished with some numbers to start looking at. Instead, what I found is that, after uploading some 70 files (almost 15 GB), every subsequent upload attempt failed with a timeout error – stating that the operation couldn’t be completed in the default 90-second time window. I started doing some digging into what was happening and so far have uncovered the following:
- By default, the Storage Client that ships with the November CTP breaks your file up into 4 MB blocks (assuming you are using BlobBlock – which you should if your file is over the 64 MB limit.
- The client then manages 4 concurrent threads uploading the data. as each thread completes, another is started – keeping four active most the entire time.
- At some point Saturday afternoon (just after 12 noon UTC), the client could no longer successfully upload a 4 MB file (block) in the 90 second window, and all subsequent attempts failed.
- I initially assumed that my computer had simply tripped up or that a local networking event caused the problem so I restarted the tool – only to find every request continuing to fail.
- I then began to wonder if the problem was the new storage client library (not sure why) so I pulled out a tool to manage Azure storage – Cloud Storage Studio (http://www.cerebrata.com/Products/CloudStorageStudio/Default.aspx) and noticed that I was able to successfully upload a file. I remembered that CSS (by default) splits the file into fairly small blocks, so I cracked open Fiddler and began monitoring what was going on. I learned that it was using 256 KB blocks (this is configurable via settings in the app).
- I then adjusted my upload script to set the ServiceClient.WriteBlockSizeInBytes property (ServiceClient is a property of the CloudBlockBlob object) to 256k and re-ran the script. This time, I had no troubles at all (other than a painfully slow experience).
- So, I can upload data (not a service outage) but while 256K blocks work, the 4 MB blocks that worked on Friday no longer work – I’m assuming that there’s a networking issue on my end, or something in the Azure platform. To provide more clarity, I adjusted the tool again, this time using a WriteBlockSizeInBytes value of 1MB and re-ran the tool – again, seeing successful uploads.
While this last step was running, I thought it might be good to go back and do some crunching on the data I had so far. The following chart represents the uploads rate from the files that successfully were uploaded on Friday/Saturday followed by the a chart showing the probability density. The mean rate was 2.74 mbits/sec with a standard deviation of 0.1968. It is interesting to note that there was no upward drift at the end of the collection of successful runs, indicating that more than likely, the “fault” was likely caused by something specific rather than being the result of a gradual shift or failure based on usage (imagine a scenario wherein as more data is populated in a container, indexes slow down, causing upload speeds to trail off).
Upload Speeds [click image for full size]
Probability Density [click image for full size]
I then ran similar reports against the data I from this morning’s runs. I’m still in the process of generating a full report on the data, but a representative sample shows the following: The mean upload rate was 0.15 mbits/sec with a standard deviation rate of 0.0375. This is over 17x slower than Friday. This data points represented below are for three batches – the first batch used a WriteBlockSizeInBytes of 256K, the second used 1MB, and the third used 2MB (10 points per size). The file upload did not succeed with the 2MB size – only finished about 1/4th of the full file.
Upload Speeds [click image for full size]
Probability Density [click image for full size]
I’ve seen a few comments from others today that indicate the slow down may be widespread – My next course of action is to attempt to run the tests from a few different locations to hopefully eliminate my local network as the problem set and have more data with which to address the issue.
Be the first to rate this post
- Currently 0/5 Stars.
- 1
- 2
- 3
- 4
- 5