The following describes the methodology applied to some of the data transfer tests we are performing for various cloud storage platforms. In each case, the following approach should be assumed with the exception of test-specific details which will be posted with each result set.
Disclaimer:
- The research team understands that any time the public internet is introduced into a test a number of non-controllable factors are introduced. It is the intent of this project to test various scenarios often enough and with enough variance to obtain a reasonable average and thereby allow the team to make general assumptions about the quality of service (given the constraints stated) that one can reasonably expect to encounter when utilizing a given service.
- It is similarly understood that there may exist environmental factors (i.e. routing paths, proxy servers, firewalls) that affect the transfer rates being tested. In general, it is believed that these factors should affect all tested platforms equally. However, in the case of various research institutes where specialty networks (i.e. ESNet, NLR, I2) exist, there may be routing configurations that particularly favor one service or endpoint over another. It is an objective of these tests to expose these anomalies with the goal of addressing them as appropriate.
- Baseline: For the various services tested, these initial tests were performed using no particular optimization techniques. We took the respective vendor’s shipping SDK, integrated it into a very similar wrapper (source code available for verification) and executed it. Subsequent work should focus on optimizations in the SDKs, or the methods in which the libraries are utilized, etc.
- Not A Stand-Alone Work: This data should not be considered in isolation. Rather, it is a portion of a larger data set (some of which may remain to be published) and should be interpreted for what it is – a portion of a larger collection that aims to provide a more complete view of the entire problem domain.
Test Objectives:
- General: Generate data to set expectations for users of various cloud services focusing on a scenario of local compute combined with cloud-hosted data (blob storage). Note: the reverse scenario as well as cloud-hosted compute/cloud-hosted data will be tested separately
- These tests and data are crucial to our overall objective of improving the experience of researchers interacting with cloud computing assets as they provide a baseline against which any optimizations or alterations may be compared.
Test Setup:
- Test Setup
- A collection of random-data files were generated (RandomFileGenerator.exe). For each of the following file sizes, 50 files were generated and stored on standard disks local to the test computer: Range is specific to each test set.
- Network Connectivity: specific to each test set
Test Execution:
- For each file size, AWS_Console_App1.exe was called to upload the files to Amazon’s US Standard Region and record the duration
.\amazon\aws_console_app1.exe .\data\2KB
- For each file size, DownloadFiles.exe was called to download the files just uploaded to Amazon’s US Standard Region and record the duration
.\downloader\DownloadFiles.exe -i .\amazon_2KB.csv -p 6 -m yes
- For each file size, AzureTesting.exe was called to upload the files to Azure’s US North Central region and record the duration
.\azure\azuretesting.exe .\data\2KB
- For each file size, DownloadFiles.exe was called to download the files just uploaded to Azure’s North Central region and record the duration
.\downloader\DownloadFiles.exe -i .\azure_2KB.csv -p 6 -m yes
- NOTE: immediately following each operation for each file size, the resulting file (log.csv) was renamed to represent the source, transfer direction, and file size
ren log.csv azure_ussc_upload_2KB.csv
Report Generation:
- For each service tested, and each file size tested
- For both Uploads and Downloads (separately)
- Scatter plot is generated showing the distribution for the transfer duration (seconds)
- Scatter plot is generated showing the distribution for the transfer rate (Mb/s)
- Transfer duration average (seconds) is calculated
- Transfer duration standard deviation (seconds) is calculated
- Transfer rate average (Mb/s) is calculated
- Transfer rate standard deviation (Mb/s) is calculated
- For each file size tested
- For both Uploads and Downloads (separately)
- A comparison chart (column) is generated showing the average transfer duration (seconds) and error bars indicating one standard deviation (seconds). Also plotted is a dot indicating the associated average transfer rate on the secondary Y axis (Mb/s)
- Summary Charts
- For both Uploads and Downloads (separately)
- A range chart is generated showing the band covered by one standard deviation (per service tested) for the transfer duration (seconds) across the tested file sizes
- A range chart is generated showing the band covered by one standard deviation (per service tested) for the transfer rate (Mb/s) across the tested file sizes
- Presentation
- Once the above charts have been generated, they are assembled into a PowerPoint file
- Once the power point file has been generated and saved, it is published as a PDF file
- Automation
- All of the above steps are automated via a script (ProcessTransferLogs.ps1)
Conventions:
- Naming Conventions
- Amazon_USSTD: Amazon’s US Standard region was specified when the bucket was created
- Azure_USNC: Azure’s US North Central region was selected when the storage account was created
- Error Handling
- In most runs, errors were displayed to the screen but not captured to logs.
- Existence of errors (all of which were network-related) are manifested in the logs as collections of data points less than 50 (the test source size)
- Due to the fact that the respective download tests are based on the upload source files, a download file containing less than 50 entries is not necessarily indicative of errors but may simply be tied to the fact that the input file had less than 50 entries. This being said, there were more errors on downloads than uploads.
Resources:
Results: Specific to each test set
Be the first to rate this post
- Currently 0/5 Stars.
- 1
- 2
- 3
- 4
- 5
It’s late on the Friday afternoon before Christmas week which means things are pretty quiet around the office. This quiet has the net-effect of allowing me to get quite a bit done. The last few days have been very productive with respect to our research project and Azure work (more on that coming soon) which is now in full swing. We are currently working on collecting performance data from our codes running in Azure (and soon in the Amazon cloud) and are also doing some testing of transfer speeds of data both to/from the cloud as well as between compute and storage in the cloud.
I’ve been working to automate much of this testing so we can do things in a repeatable fashion as well has have something that others could run (both other users like ourselves as well as possibly vendors should we come across something that requires a repro scenario). So far, running tests and generating data in CSV or XML format is pretty simple, but I found myself wanting to automatically generate charts/graphs of the data as part of the test process to allow a quick visualization of how the test performed. I spent a good bit of the day looking at old tools for command-line generation of charts (i.e. RDTool, etc.) and none of them were exactly what I was looking for – not to mention my proclivity to using C# and VS.NET tools and my desire to have something that looked refined/polished and not overly raw.
Thankfully, I stumbled upon something I should have remembered existed but simply hadn’t had the need to use before – the System.Windows.Forms.DataVisualization.Charting class. If you aren’t familiar with this assembly, it was released at PDC08 and has a companion Web class for performing similar operations in ASP.NET applications. In my basic testing I was able to build a console application that would ingest the CSV output from my testing harness and then generate some fairly nice looking charts based on that data. The following shows a chart (click the chart to see it full size) generated from ~1800 data points, and automatically generates a 50% band and 90% band allowing the viewer to very easily ascertain the averages and data points. This was generated using a combination of the FastPoint and BoxPlot chart types.

Be the first to rate this post
- Currently 0/5 Stars.
- 1
- 2
- 3
- 4
- 5