[Cross posted from here: http://rob.gillenfamily.net/post/External-File-Upload-Optimizations-for-Windows-Azure.aspx]
I’m wrapping up a bit of the work we’ve been doing on data movement optimizations for cloud computing and the latest set of data yielded some interesting points I thought I’d share. The work done here is not really rocket science but may, in some ways, be slightly counter-intuitive and therefore seemed worthy of posting.
Summary: for those who don’t like to read detailed posts or don’t have time, the synopsis is that if you are uploading data to Azure, block your data (even down to 1MB) and upload in parallel. Set your block size based on your source file size, but if you must choose a fixed value, use 1MB. Following the above will result in significant performance gains… upwards of 10x-24x and a reduction in overall file transfer time of upwards of 90% (eg, uploading a 1GB file averaged 46.37 minutes prior to optimizations and averaged 1.86 minutes afterwards).
Detail: For those of you who want more detail, or think that the claims at the end of the preceding paragraph are over-reaching, what follows is information and code supporting these claims. As the title would indicate, these tests were run from our research facility pointing to the Azure cloud (specifically US North Central as it is physically closest to us) and do not represent intra-cloud results… we have performed intra-cloud tests and the overall results are similar in notion but the data rates are significantly different as well as the tipping points for the various block sizes… this will be detailed separately).
We started by building a very simple console application that would loop through a directory and upload each file to Azure storage. This application used the shipping storage client library from the 1.1 version of the azure tools. The only real variation from the client library is that we added code to collect and record the duration (in ms) and size (in bytes) for each file transferred. The code is available here.
We then created a directory that had a collection of files for the following sizes: 2KB, 32KB, 64KB, 128KB, 512KB, 1MB, 5MB, 10MB, 25MB, 50MB, 100MB, 250MB, 500MB, 750MB, and 1GB (50 files for each size listed). These files contained randomly-generated binary data and do not benefit from compression (a separate discussion topic). Our file generation tool is available here.
The baseline was established by running the application described above against the directory containing all of the data files. This application uploads the files in a random order so as to avoid transferring all of the files of a given size sequentially and thereby spreading the affects of periodic Internet delays across the collection of results. We then ran some scripts to split the resulting data and generate some reports. The raw data collected for our non-optimized tests is available via the links in the Related Resources section at the bottom of this post.
For each file size, we calculated the average upload time (and standard deviation) and the average transfer rate (and standard deviation). As you likely are aware, transferring data across the Internet is susceptible to many transient delays which can cause anomalies in the resulting data. It is for this reason that we randomized the order of source file processing as well as executed the tests 50x for each file size. We expect that these steps will yield a sufficiently balanced set of results.
Once the baseline was collected and analyzed, we updated the test harness application with some methods to split the source file into user-defined block sizes and then to upload those blocks in parallel (using the PutBlock() method of Azure storage). The parallelization was handled by simply relying on the Parallel Extensions to .NET to provide a Parallel.For loop (see linked source for specific implementation details in Program.cs, line 173 and following… less than 100 lines total). Once all of the blocks were uploaded, we called PutBlockList() to assemble/commit the file in Azure storage. For each block transferred, the MD5 was calculated and sent ensuring that the bits that arrived matched was was intended. The timer for the blocked/parallelized transfer method wraps the entire process (source file splitting, block transfer, MD5 validation, file committal). A diagram of the process is as follows:
We then tested the affects of blocking & parallelizing the transfers by running the updated application against the same source set and did a parameter sweep on the block size including 256KB, 512KB, 1MB, 2MB, and 4MB (our assumption was that anything lower than 256KB wasn’t worth the trouble and 4MB is the maximum size of a block supported by Azure). The raw data for the parallel tests is available via the links in the Related Resources section at the bottom of this post.
This data was processed and then compared against the single-threaded / non-optimized transfer numbers and the results were encouraging. The Excel version of the results is available here.
Two semi-obvious points need to be made prior to reviewing the data. The first is that if the block size is larger than the source file size you will end up with a “negative optimization” due to the overhead of attempting to block and parallelize. The second is that as the files get smaller, the clock-time cost of blocking and parallelizing (overhead) is more apparent and can tend towards negative optimizations. For this reason (and is supported in the raw data provided in the linked worksheet) the charts and dialog below ignore source file sizes less than 1MB.

(click chart for full size image)
The chart above illustrates some interesting points about the results:
- When the block size is smaller than the source file, performance increases but as the block size approaches and then passes the source file size, you see decreasing benefit to the point of negative gains (see the values for the 1MB file size)
- For some of the moderately-sized source files, small blocks (256KB) are best
- As the size of the source file gets larger (see values for 50MB and up), the smallest block size is not the most efficient (presumably due, at least in part, to the increased number of blocks, increased number of individual transfer requests, and reassembly/committal costs).
- Once you pass the 250MB source file size, the difference in rate for 1MB to 4MB blocks is more-or-less constant
- The 1MB block size gives the best average improvement (~16x) but the optimal approach would be to vary the block size based on the size of the source file.
(click chart for full size image)
The above is another view of the same data as the prior chart just with the axis changed (x-axis represents file size and plotted data shows improvement by block size). It again highlights the fact that the 1MB block size is probably the best overall size but highlights the benefits of some of the other block sizes at different source file sizes.
This last chart shows the change in total duration of the file uploads based on different block sizes for the source file sizes. Nothing really new here other than this view of the data highlights the negative affects of poorly choosing a block size for smaller files.
Summary
What we have found so far is that blocking your file uploads and uploading them in parallel results in significant performance improvements. Further, utilizing extension methods and the Task Parallel Library (.NET 4.0) make short work of altering the shipping client library to provide this functionality while minimizing the amount of change to existing applications that might be using the client library for other interactions.
Related Resources
Be the first to rate this post
- Currently 0/5 Stars.
- 1
- 2
- 3
- 4
- 5
The following represents data and results gathered from the second research institution connection cloud transfer test and compares results from Azure’s US North Central data center and Azure’s US South Central data center. The methodology applied during this test is detailed here and should be reviewed prior to considering the results or commentary below.
Test Overview:
- 05561 Cloud Transfer Tests: Research Institution Test 02
- Local Connection: Research Networks
- Started: February 9, 2010
- Finished: February 16, 2010
- Origination Point: Oak Ridge, TN
Disclaimer:
- Standard Disclaimer Applies
Test Objectives:
- Standard objectives apply
- Specific to this test: Test a research institution connection as the researcher’s “workstation” and gather data aimed at building a realistic expectation of performance
Test Setup
- Included File Sizes:
- 2KB, 32KB, 64KB, 128KB, 256KB, 512KB, 1MB, 5MB, 10MB, 25MB, 50MB, 100MB, 250MB, 500MB, 750MB, 1GB
- Network Connectivity - “research institution”
- Consists of a computer connected to a local network router via 100Mbps hard-wire.
- Multiple switches/routers/firewalls may exist between workstation and the public internet
- There may exist multiple high-speed networks that may be leveraged for connectivity to remote datacenters (ESNet, I2, NLR
- Reasonable effort has been made to ensure that no other applications or TSRs are running on the source computer for the duration of the test.
- For this test, a newly-installed Windows 7 Professional installation was used, fully patched, with no other applications (beyond the test harness) installed.
Test Execution:
- Standard execution approach applied with the exception of the fact that Azure was tested for both cases – simply different datacenters (see slides for details)
Report Generation
- Standard report generation approach applied
Conventions:
- Standard conventions apply
Resources:
- Standard resources apply - no test-specific customizations beyond adaptations for the specific file sizes included in the test
Results:
Similar to other tests, there is some variability displayed that is obviously a result of traffic issues. We are continuing to look into this.
In general, the data from the Azure US North Central data center proved better than that of US South Central which is not altogether surprising as we are physically closer to the USNC location.
Slides 171 and 172 remain disturbing as the download values for the 750MB file size continue to be outside of what would be expected.
Slide 172 in particular is of interest as it draws attention to some wide variability across file sizes for the USSC datacenter (not just the 750MB size).
Full results are available in slide form here:
PDF of results are available here: http://sciencecloud.us/media/05561_Xfer-Research_02.pdf
Be the first to rate this post
- Currently 0/5 Stars.
- 1
- 2
- 3
- 4
- 5
The following represents data and results gathered from the first research institution connection cloud transfer test. The methodology applied during this test is detailed here and should be reviewed prior to considering the results or commentary below.
Test Overview:
- 05561 Cloud Transfer Tests: Research Institution Test 01
- Local Connection: Research Networks
- Started: February 8, 2010
- Finished: February 16, 2010
- Origination Point: Oak Ridge, TN
Disclaimer:
- Standard Disclaimer Applies
Test Objectives:
- Standard objectives apply
- Specific to this test: Test a research institution connection as the researcher’s “workstation” and gather data aimed at building a realistic expectation of performance
Test Setup
- Included File Sizes:
- 2KB, 32KB, 64KB, 128KB, 256KB, 512KB, 1MB, 5MB, 10MB, 25MB, 50MB, 100MB, 250MB, 500MB, 750MB, 1GB
- Network Connectivity - “research institution”
- Consists of a computer connected to a local network router via 100Mbps hard-wire.
- Multiple switches/routers/firewalls may exist between workstation and the public internet
- There may exist multiple high-speed networks that may be leveraged for connectivity to remote datacenters (ESNet, I2, NLR
- Reasonable effort has been made to ensure that no other applications or TSRs are running on the source computer for the duration of the test.
- For this test, a newly-installed Windows 7 Professional installation was used, fully patched, with no other applications (beyond the test harness) installed.
Test Execution:
- Standard execution approach applied
Report Generation
- Standard report generation approach applied
Conventions:
- Standard conventions apply
Resources:
- Standard resources apply - no test-specific customizations beyond adaptations for the specific file sizes included in the test
Results:
Across both services there exists an interesting amount of variability that is likely due to intermediate traffic or traffic management issues. Even within the same test run (see various scatter plots) you can detect “walls” of change wherein a the values will be hovering around a certain value and subsequently they hover around a much higher/lower value (ex. slide 133, 134).
There is not a consistent “winner” in this report. for various file sizes one platform would clearly outperform the other only to have the tables completely reversed for the next file size. This hints at network routing issues. A brief conversation with some of our local networking team indicates that some traffic (in particular Amazon’s) appeared to generally leave via the router connected to ESNet whereas most of the Microsoft traffic would leave via the router connected to Southern Crossing with subsequent connections to I2 and NLR. It may well be that the insertion of some static routs may help address some of the stability issues here.
Of particular interest is the “hump” seen by both services in slide 170. This has been seen in a similar location on the chart in other runs (see slide #82 here: http://www.slideshare.net/rgillen/cloud-storage-upload-tests-02). We don’t yet have a good explanation for this shape in the curve and are hoping to track that down soon.
Further, the shape of the Azure curve in slide 171 is inconsistent with other tests – specifically the data points for the 750MB size. We will continue to compare with other sets/runs to see if this continues or was simply transient.
What remains consistent across all tests so far is that the level of variability tends to be greater with the S3 platform as compared to the Azure Blob storage.
Full results are available in slide form here:
PDF of results are available here: http://sciencecloud.us/media/05561_Xfer-Research_01.pdf
Be the first to rate this post
- Currently 0/5 Stars.
- 1
- 2
- 3
- 4
- 5
The following represents data and results gathered from the first consumer connection cloud transfer test. The methodology applied during this test is detailed here and should be reviewed prior to considering the results or commentary below.
Test Overview:
- 05561 Cloud Transfer Tests: Consumer Connection Test 01
- Local Connection: Comcast Residential
- Started: February 9, 2010
- Finished: February 14, 2010
- Origination Point: Knoxville, TN
Disclaimer:
- Standard Disclaimer Applies
Test Objectives:
- Standard objectives apply
- Specific to this test: Test a consumer/commodity connection as the researcher’s “workstation” and gather data aimed at building a realistic expectation of performance
Test Setup
- Included File Sizes:
- 2KB, 32KB, 64KB, 128KB, 256KB, 512KB, 1MB, 5MB, 10MB, 25MB, 50MB, 100MB
- Network Connectivity - “typical home network”
- Consists of a computer connected to a local router via 1GE hard-wire.
- Router is then directly connected to service provider’s modem
- Consumer has a “general” plan for internet connectivity
- Reasonable effort has been made to ensure that no other applications or TSRs are running on the source computer for the duration of the test.
- For this test, a newly-installed Windows 7 Professional installation was used, fully patched, with no other applications (beyond the test harness) installed.
Test Execution:
- Standard execution approach applied
Report Generation
- Standard report generation approach applied
Conventions:
- Standard conventions apply
Resources:
- Standard resources apply - no test-specific customizations beyond adaptations for the specific file sizes included in the test
Results:
In contrast to some other test runs on other networks, in this test Azure seemed to generally (if barely) out-perform the Amazon platform and, consistent with other tests, Amazon’s interaction with Amazon’s platform shows greater variability across a given file size.
The test was limited to file sizes up to and including 100MB so as to avoid being flagged by the residential ISP for poor traffic habits (an issue to be addressed for large-bandwidth users on consumer connections).
Full results are available in slide form here:
PDF of results are available here: http://sciencecloud.us/media/05561_Xfer-Consumer_01.pdf
Currently rated 5.0 by 1 people
- Currently 5/5 Stars.
- 1
- 2
- 3
- 4
- 5
I’ve been getting my test harness and reporting tools setup for some performance baselining that I’m doing relative to cloud computing providers and when I left the office on Friday I set off a test that was uploading a collection of binary files (NetCDF files if you care) to an Azure container. I was doing nothing fancy… looping through a directory, for each file found, upload to the container using the defaults for BlobBlock and then record the duration (start/finish) for that file and the file size. The source directory contained 144 files representing roughly 58 GB of data. 32 of the files were roughly 1.5 GB each and the remainder were about 92.5 MB.
I came in this morning expecting to find the script long finished with some numbers to start looking at. Instead, what I found is that, after uploading some 70 files (almost 15 GB), every subsequent upload attempt failed with a timeout error – stating that the operation couldn’t be completed in the default 90-second time window. I started doing some digging into what was happening and so far have uncovered the following:
- By default, the Storage Client that ships with the November CTP breaks your file up into 4 MB blocks (assuming you are using BlobBlock – which you should if your file is over the 64 MB limit.
- The client then manages 4 concurrent threads uploading the data. as each thread completes, another is started – keeping four active most the entire time.
- At some point Saturday afternoon (just after 12 noon UTC), the client could no longer successfully upload a 4 MB file (block) in the 90 second window, and all subsequent attempts failed.
- I initially assumed that my computer had simply tripped up or that a local networking event caused the problem so I restarted the tool – only to find every request continuing to fail.
- I then began to wonder if the problem was the new storage client library (not sure why) so I pulled out a tool to manage Azure storage – Cloud Storage Studio (http://www.cerebrata.com/Products/CloudStorageStudio/Default.aspx) and noticed that I was able to successfully upload a file. I remembered that CSS (by default) splits the file into fairly small blocks, so I cracked open Fiddler and began monitoring what was going on. I learned that it was using 256 KB blocks (this is configurable via settings in the app).
- I then adjusted my upload script to set the ServiceClient.WriteBlockSizeInBytes property (ServiceClient is a property of the CloudBlockBlob object) to 256k and re-ran the script. This time, I had no troubles at all (other than a painfully slow experience).
- So, I can upload data (not a service outage) but while 256K blocks work, the 4 MB blocks that worked on Friday no longer work – I’m assuming that there’s a networking issue on my end, or something in the Azure platform. To provide more clarity, I adjusted the tool again, this time using a WriteBlockSizeInBytes value of 1MB and re-ran the tool – again, seeing successful uploads.
While this last step was running, I thought it might be good to go back and do some crunching on the data I had so far. The following chart represents the uploads rate from the files that successfully were uploaded on Friday/Saturday followed by the a chart showing the probability density. The mean rate was 2.74 mbits/sec with a standard deviation of 0.1968. It is interesting to note that there was no upward drift at the end of the collection of successful runs, indicating that more than likely, the “fault” was likely caused by something specific rather than being the result of a gradual shift or failure based on usage (imagine a scenario wherein as more data is populated in a container, indexes slow down, causing upload speeds to trail off).
Upload Speeds [click image for full size]
Probability Density [click image for full size]
I then ran similar reports against the data I from this morning’s runs. I’m still in the process of generating a full report on the data, but a representative sample shows the following: The mean upload rate was 0.15 mbits/sec with a standard deviation rate of 0.0375. This is over 17x slower than Friday. This data points represented below are for three batches – the first batch used a WriteBlockSizeInBytes of 256K, the second used 1MB, and the third used 2MB (10 points per size). The file upload did not succeed with the 2MB size – only finished about 1/4th of the full file.
Upload Speeds [click image for full size]
Probability Density [click image for full size]
I’ve seen a few comments from others today that indicate the slow down may be widespread – My next course of action is to attempt to run the tests from a few different locations to hopefully eliminate my local network as the problem set and have more data with which to address the issue.
Be the first to rate this post
- Currently 0/5 Stars.
- 1
- 2
- 3
- 4
- 5
I’ve been working on moving a large collection data to, from, and around Azure as we are testing the data profile for scientific computing and large-scale experiment post-processing and, in order to verify the data we uploaded and processed turned out as we wanted tit to, I built a simple visualization app that does a real-time query against the data in Azure and displays it. Originally the app was built as a simple WPF desktop application, but I got to thinking that it would be particularly interesting on the Surface and therefore took a day or two to port it over. The video below is a walkthrough of the app – the dialog is a bit cheesy but the app is interesting as it provides a very tactile means of interacting with otherwise stale data.
Be the first to rate this post
- Currently 0/5 Stars.
- 1
- 2
- 3
- 4
- 5
[NOTE: these are simply introductory tests and should not be considered scientifically accurate comparisons of the various platforms]
The second run of the tests used the defaults for the parameters as provided in the original sample with the exception of the “runs” parameter which was increased from 1,000,000 to 10,000,000. The test bed used is detailed here and the results of the first run are here. The net-net is that the calculation is run 100 times and each time contains a loop that runs 10,000,000 times. Each run of the calculation can be run independently of the others as the aggregation/summation is handled in the spreadsheet once the calculations have finished. To provide some protection from anomalies, I ran each test against each platform 10 times with the first run being “cold”.
Results:
The time values on the Y axis are seconds to complete the execution. In this run, a few things jump out:
- As in the first test, the 2-node HPC cluster killed the other two. Again this isn’t surprising, although I’d like to think that as the compute to communication ratio grows, the delta b/t the HPC cluster and Azure will shrink.
- In contrast to the first run, in this run Azure beat the local run by an average of nearly 200 seconds/run. This is consistent with the theory that the communication overhead in the Azure solution is constant per request regardless of the complexity of compute resulting with the communication overhead becoming marginalized as the compute to communicate ratio increases.
I’m currently performing tests that take the runs parameter to 100,000,000 which should separate the three platforms further and it is expected that unlike the first two runs, Azure will be closer to the HPC results than to the local compute results (although the HPC cluster is still expected to outperform Azure).
Be the first to rate this post
- Currently 0/5 Stars.
- 1
- 2
- 3
- 4
- 5
[NOTE: these are simply introductory tests and should not be considered scientifically accurate comparisons of the various platforms]
The first run of the tests used the defaults for the parameters as provided in the original sample. The test bed used is detailed here. The net-net is that the calculation is run 100 times and each time contains a loop that runs 1,000,000 times. Each run of the calculation can be run independently of the others as the aggregation/summation is handled in the spreadsheet once the calculations have finished. To provide some protection from anomalies, I ran each test against each platform 10 times with the first run being “cold”.
Results:
The time values on the Y axis are seconds to complete the execution. In this run, a few things jump out:
- The 2-node HPC cluster killed the other two. This is not altogether surprising, especially considering the processor power is significantly better than the other two scenarios and this is the only scenario where the compute is happening on raw hardware (the “local” development machine is virtualized and I believe that the node instances hosted in Azure are also virtualized.
- The HPC cluster took one run to “warm up” after which the performance was remarkably solid.
- The “Local” (sequential) run beat the Azure (parallel) run. At first this might be surprising, but it really isn’t when you think about it. The Azure model has a certain communication overhead for each request (HTTP sending of each request to azure, communication b/t the web and worker roles via the queue, storage of results in the Azure tables, and the retrieval of the results). In this test run, the compute per run is not significant (averaging just over 0.5 seconds on the local box per calculation) and the communication overhead stands out more. It is supposed that in subsequent runs where the compute is heavier, the two will trade places.
Be the first to rate this post
- Currently 0/5 Stars.
- 1
- 2
- 3
- 4
- 5
I’m working through some basic performance testing and comparisons for cloud compute and this post is intended to provide a bit of detail regarding the test platform I’m using for conducting these tests. I should also be clear that the test platforms below are not designed to be perfect-world hardware platforms, but rather proof-of-concept and suitable for order-of-magnitude testing.
General Code Base
For this aspect of testing I have been working with the Asian Options Pricing WCF sample provided by Microsoft on the HPC resource kit site (http://resourcekit.windowshpc.net/). This is a simple VSTO-enabled worksheet that calculates the price of an option on the Asian market. I’m by no means a market analyst nor can a vouch for the accuracy of the calculation, but it does provide a CPU-intensive operation that can be parallelized easily making it a good candidate for scale-out compute. The sample is designed to be illustrative of submitting WCF jobs (‘micro jobs’) to a Windows HPC cluster and comes with source and instructions for getting it setup and running.
In the default configuration, the worksheet performs 100 iterations of Monte Carlo pricing runs using a function called PriceAsianOptions (code listed at the bottom of this post). The basic theory may be understood better by reading this article. The function takes the following input parameters:
- Up – the specific factor by which the underlying instrument may move up per step of the binomial price tree
- Down – the specific factor by which the underlying instrument may move down per step of the binomial price tree.
- Interest – the continuously compounded, risk-free interest rate. This number doesn’t affect the complexity of the calculation so I left it constant.
- Initial Price – the starting or current price of the stock. I left this constant as it doesn’t affect the complexity of the calculation
- Periods – the number of periods (trading days) before the option expires. The default was 20 and I left this constant. Technically it could increase the complexity of the calculation but changing the value of the Runs parameter does a suitable job of this.
- Exercise – the price at which the call should be exercised
- Runs – within a given calculation, how many times should the calculation be done prior to averaging and returning the results. This defaulted to 1,000,000 and is the prime value that I altered during the runs to increase the duration of the calculation.
Once the worksheet has finished calculating, a batch (by default 100 such calls to PriceAsianOptions) it calculates and displays the Average of the Monte Carlo runs, the Min, Max, Standard Deviation, Standard Error, and Execution Time in seconds.
NOTE: this test platform is much less about the specific problem being solved and more about taking a CPU/Compute intensive problem and expressing it on different platforms.
Windows HPC Server Environment
Hardware: My test environment consists of a small cluster with one head node and two compute nodes. The head node is a single-proc box running with 2 GB of RAM and 2 NICs. Besides the cluster Head Node role, it is also running DNS, DHCP, AD, SQL and has the WCF broker role (it does *not* have the compute role). Each compute node is a dual core Intel box with 4 GB RAM running Windows 2008 HPC Server. The cluster is configured such that the head node has one leg on the “enterprise network” (in my corner of the universe this is simply my lab network) and another leg on the private network shared with the compute nodes. The individual compute nodes are not accessible from outside this private network.
Software: While the instructions and download would lead you to believe that the code is ready to run out-of-the-box (OOTB) that is not quite true. Beyond the changes detailed in the instructions, I had to make some changes in additional locations for the name of the cluster’s head node as well as that to the dll as seen from the compute nodes. Beyond those changes however, the code used in the tests is identical to what is available on the resource kit site. The main logic is a loop to submit n jobs, with each request having an asynchronous event to process the results.
Local Compute Environment
Hardware: The platform I’m using for the “local machine” compute tests is a Windows Server 2008 Standard box (64 bit) with 4 GB of RAM and a single 2.4 Ghz processor. This certainly isn’t anything fancy but should provide a middle-of-the-road hardware spec for comparison to the other platforms.
Software: I started by taking the worksheet used for the HPC environment and added another button for the “local” runs. Since the hardware I’m running on only has a single processor, I adjusted the logic loop to not be parallelized (additional threads would simply harm performance) but rather sequential. The work is performed on a background thread and as soon as an individual computation is completed the UI is notified/updated.
Azure Environment
Hardware: The configuration for the test bed as hosted in Azure is split over two projects. The first project is the data project (used for queues and tables) and is not part of any affinity group and is configured for a geographic location of USA-Anywhere. The second project is the compute project and is configured with one web role and two worker roles. It is also not part of an affinity group and is configured with a geographic location of USA-Anywhere. NOTE: the lack of specifically selecting an affinity group or geographic location is to provide a sort of worst-case scenario assuming that selecting either of those would, if anything, only improve the performance.
Software: There was a good bit more coding to do here as I attempted to mimic (in theoretical approach at least) the general implementation of WCF services on Windows HPC. The web role is host to a WCF service that allows a client (in this case the Excel worksheet) to submit a single pricing request (providing the parameters explained above). This pricing request and parameters is serialized and placed on the Azure queue to be picked up by one of the running worker roles (incidentally, one nice side affect of this approach is that from a coding standpoint it makes no difference how many worker roles exist – if more are needed they can simply be configured and items will be processed off the queue in a quicker fashion). Once the worker role picks up the pricing request/data, it processes the request and places the resulting price (and a the request identifier) into a row in an Azure table. At this point it checks the queue again for the next pricing request. An additional method exists in the WCF service that accepts a request ID and then looks at the azure table for a result matching that request ID. If one is found, the result is returned and the data removed from the azure table.
On the Excel worksheet side, I modified the same worksheet as before with yet another button for submitting the work to the Azure-hosted WCF service. I took a similar loop to the other two scenarios but made the adjustment that it will first submit all of the requests and only then will it begin asking for the results. The idea is to avoid having the requests for results clogging the network and slowing down the submission of jobs – I wanted to keep the worker roles as busy as possible until the work was done. I initially implemented this using a similar approach to the HPC code (async anonymous methods handling the results) but this failed as the tests grew bigger because of the default web server timeouts (i.e. I’d have 100 requests queued and it was likely that more than 90 seconds would pass before the results were finished).
Sample Code
private double PriceAsianOptions(double initial,
double exercise, double up, double down, double interest,
int periods, int runs)
{
double[] pricePath = new double[periods + 1];
// Risk-neutral probabilities
double piup = (interest - down) / (up - down);
double pidown = 1 - piup;
double temp = 0.0;
Random rand = new Random();
double priceAverage = 0.0;
double callPayOff = 0.0;
for (int index = 0; index < runs; index++)
{
// Generate Path
double sumPricePath = initial;
for (int i = 1; i <= periods; i++)
{
pricePath[0] = initial;
double rn = rand.NextDouble();
if (rn > pidown)
{
pricePath[i] = pricePath[i - 1] * up;
}
else
{
pricePath[i] = pricePath[i - 1] * down;
}
sumPricePath += pricePath[i];
}
priceAverage = sumPricePath / (periods + 1);
callPayOff = Math.Max(priceAverage - exercise, 0);
temp += callPayOff;
}
return (temp / Math.Pow(interest, periods)) / runs;
}
Be the first to rate this post
- Currently 0/5 Stars.
- 1
- 2
- 3
- 4
- 5
I’ve been thinking quite a bit lately about the role of cloud computing as it applies to scientific research (as hinted at by the title of this site). One possible flaw in my approach is that I’ve been delving into MPI-based compute as much as I can to wrap my mind around how it works with the notion of then applying that paradigm to cloud compute. I list it as a flaw only because I wonder if it is possibly time to think a bit further outside the proverbial box if you will. I’ve been mulling over the following:
How do people actually use multiple machines to solve a problem? – This is really the root question behind all of this work. The first scenario is high-end shared-memory machines (ala Cray supercomputers) and I’m going to eliminate that type of compute from the conversation due to the fact that it simply can’t be well-replicated in the cloud as we currently know it. The far opposite end of the spectrum is “manual” clustering or map reduce – someone figures out a problem they want to solve, divvy’s it up amongst N nodes, and then individually runs a program on each node with the appropriate settings and then manually aggregates the results. This extreme is most likely done by ad-hoc projects or those not familiar with traditional HPC technologies and approaches. Between the two extremes listed, there are Map/Reduce implementations and traditional MPI programs targeted at distributed memory systems.
Amazon’s EC2 – very easy to utilize for lower-throughput MPI-based HPC - given you can get n Linux boxes for 0.10/cpu/hour and, because of the vast community that has grown up around it, there are pre-packaged clusters (via AIM) and even commercial vendors building businesses on top of providing HPC-style compute in EC2 in an “on-demand” fashion. Further, traditional grid computing platforms such as Nimbus have been radically adapted to provide a rather compelling local – to – cloud story for scientific HPC. It would seem that if you are working in HPC today, and simply want to utilize an HPC cluster “in the cloud” (maybe because of lack of access to sufficient hardware) that Amazon’s EC2 and the toolsets such as Nimbus (and others) that sit on top of it is a natural solution.
Microsoft’s Azure – While it is a quickly adapting platform (seeing as it hasn’t yet released) and they have hinted at plans to adapt the platform based on customer demand, if you look at it currently, there’s not an obvious fit for the traditional HPC model. The customer of Azure is given the choice of deploying web or worker roles, and one can imagine using worker roles in a fashion analogous to cluster nodes… but there currently isn’t any built-in infrastructure to bind those nodes into a single group/cluster. As it stands now, Azure seems to lean towards the manual-approach to large-scale compute. What could change this story completely is if Microsoft decided to offer HPC Pack-enabled nodes as a type of resource you could request, although there’s been nothing to hint that they are planning anything like this.
Where do we go next? – I’ve been chewing on whether or not it makes any sense to try to push HPC-style work into Azure, or if it should simply be relegated to the EC2’s of the world… One could conceivably build an implementation of MPI that, rather than relying on the underlying cluster would provide cloud-style/enabled communications between nodes… this could allow those most comfortable with (or with large existing code bases of) MPI-style apps to continue to utilize those libraries/applications, but one has to wonder if, unless the Microsoft pricing (to be announced later this summer) is incredibly cheaper than that of EC2, why would one bother (other than academic interest, of course) to build such? Again, this could be mitigated by Microsoft providing such itself, but the platform would have to provide additional compelling aspects to pull someone away from what would otherwise be a very comfortable transition (local cluster to an EC2-hosted cluster running the same software stacks).
What I’ve really been wondering – is if it is not time to throw MPI out altogether (or, more accurately, the programming paradigm that it represents). Is it time to look for ways to raise the level of abstraction for the computational researcher… and, if so, does something like Azure have a more interesting role? I’m wondering if some of the abstraction tools (workflow engines, queue services, etc.) will begin to have a role or if we need to continue to stretch for every raw bit of horsepower from the system (acquiescing to the fact that abstraction layers cost is in reduced raw system power). For many of the large simulation models it seems that the raw horsepower is indeed necessary. You also cannot simply ignore the vast collection of existing tools and libraries that already exist and target this paradigm. The flip question is that is there a collection of computational research for which, if the cost per cpu hour was low enough, and the increase in development productivity was great enough (assuming that the proposed layers of abstraction resulted in such), would it really matter if the job took an additional 30-50% time to run? This is, of course, only salient if we live in a world wherein I can get however many compute nodes I want whenever I want them (no waiting in queue).
My gut tells me we aren’t quite there yet, but I wonder how far out it really is?
Currently rated 5.0 by 1 people
- Currently 5/5 Stars.
- 1
- 2
- 3
- 4
- 5