PowerShell and Windows Azure

So I am new to PowerShell but its existence has always intrigued me. On January 16th Michael Washam announced a new version of the Windows Azure PowerShell Cmdlets Cmdlet library.

I’ve installed the snap-in before however in this new version there were a number of key features that make the commands very easy to use. Even beginners to PowerShell like myself can easily start exploring the commands. In this post I am going to outline the use of the new Import-Subscription, Get-Subscription, Set-Subscription, and Select-Subscription commands.

Continue reading

Intermediate Azure Storage: Parallel Blob Uploads

The Windows Azure SDK provides an easy interface for uploading small files to Blob Storage. If you were using files less than 64MB in size you don’t need to specify one of the two modes for manipulating blobs. However, if the requirements include the access of larger files, there are some important differences to consider before deciding between Block Blobs and Page Blobs for storage.

Page blob features are similar to an NTFS volume. They are designed for random write workflows where specific page locations within a file are modified. These pages are 512 bytes in size and are empty when specifying the initial size of the blob. Azure Drives, which are just virtual drives stored in blob storage, use Page blobs to store the Virtual Hard Drive due to the similar features to NTFS. Lastly with Page blobs the blob size can be up to 1 TB compared to the 200 GB limit of Block blobs.

Block blobs are also split into sections for ease of uploading and downloading. However, unlike Page blobs, the blocks do not need to be the same size. Block blobs are designed for streaming and use a 2 phase commit system for each uploaded block. Each uploaded or downloaded block has a size limit of 4MB and is specifically referenced by a key in the Block List. The block list has control over the inclusion and ordering of the blocks. You can even re-use the same block in multiple locations in the list, or update the data for one individual block. The rest of this post will focus on how to upload a file to blob storage in parallel using a Block Blob.

To start we need to create a storage account, blob container, and a blob reference:

var account = CloudStorageAccount.DevelopmentStorageAccount;
var blobClient = account.CreateCloudBlobClient();
var container = blobClient.GetContainerReference("parallel");
container.CreateIfNotExist();
var blockBlob = container.GetBlockBlobReference("test.jpg");
blockBlob.DeleteIfExists();
blockBlob.Properties.ContentType = "image/jpeg";

Even though I mentioned that each block can be up to 4 MB I am going to set the block size to about 500 KB because my file is only 1.2 MB.

var blockLength = 500 * 1024;
 
var dataToUpload = File.ReadAllBytes("D:\background.jpg");
 
var numberOfBlocks = ((int) dataToUpload.Length / blockLength) + 1;
 
string[] blockIds = new string[numberOfBlocks];

The last line creates an array of strings. This is crucial to the block upload. The 2 phase commit is completed by uploading a block with a unique block ID then uploading a list of ID’s to commit. The blockIds array will house the collection of ID’s.

Here is the parallel loop using the parallel library.

Parallel.For(0, numberOfBlocks, x =>
{
var blockId = Convert.ToBase64String(Guid.NewGuid().ToByteArray());
var currentLength = Math.Min(blockLength, dataToUpload.Length - (x * blockLength));
 
using(var memStream = new MemoryStream(dataToUpload, x * blockLength, currentLength))
{
blockBlob.PutBlock(blockId, memStream, null);
}
blockIds[x] = blockId;
});

To summarize, in parallel for each block a Memory Stream is created and uploaded, associated with a block ID, then the block ID is set in the array, ready to be committed.

blockBlob.PutBlockList(blockIds);

The final statement which is executed after all the uploads are done commits the blocks to the blob. The block ID’s are base64 encoded strings in most cases. They need to be unique for the blob and uniform size.

This relatively simple code will allow clients to upload blobs in smaller blocks and also manage if/when a put fails due to connection issues. Downloading blocks of a blob is a similar process using the DownloadBlockList() method.

Windows Azure Blob Storage can be very useful in many different application scenarios. Once you get to know the features and options available you can easily harness the power of cloud storage in your applications.

Thanks for reading. Please contact me if you have any comments or questions.

Links to other Intermediate Azure Storage posts:

Intermediate Azure Storage: Blob Leasing

Late last year during one of my TechDays sessions I talked about Azure Drives. An Azure Drive is a mechanism used to provide IO state to otherwise stateless virtual machines.

Essentially, while a VM is running in Azure a .vhd file can be mounted as a drive. The VHD is stored in a Page Blob which provides persistence to the VM. Because it is mounted as an NTFS volume straight to the VM, Azure does not allow more than one VM to write to a VHD. In order to enforce this rule, the Blob Lease is acquired on the blob, which is the topic of this post.

When building a multi-threaded application, a multitude of locks need to be in place to control access between threads. The same is true for a distributed system. Leases provide a mechanism for notifying other clients that a Blob is in use.

Unfortunately, even though this may be a crucial feature for a highly scalable system it is not yet integrated with the Azure SDK at all levels. The only way to manipulate blob leases is either through the REST interface or the low level objects in the Microsoft.WindowsAzure.StorageClient.Protocol namespace.

Acquiring a Blob Lease

Here is a sample REST request for acquiring a blob lease:

Request Syntax:
PUT http://myaccount.blob.core.windows.net/mycontainer/myblob?comp=lease HTTP/1.1

Request Headers:
x-ms-lease-action: acquire
x-ms-date: Sun, 25 Sep 2011 13:37:35 GMT
x-ms-version: 2011-08-18
Authorization: SharedKey myaccount:J4ma1VuFnlJ7yfk/Gu1GxzbfdJloYmBPWlfhZ/xn7GI=

MSDN Article: http://msdn.microsoft.com/en-us/library/windowsazure/ee691972.aspx

You don’t need to build such a request yourself if you use the Windows Azure SDK. To execute a request like the one above from .NET first add the following using statements.

using Microsoft.WindowsAzure.StorageClient;
using Microsoft.WindowsAzure.StorageClient.Protocol;

Try running the following code.

var account = CloudStorageAccount.DevelopmentStorageAccount;
var client = account.CreateCloudBlobClient();
var container = client.GetContainerReference("container");
container.CreateIfNotExist();
 
var blob = container.GetBlobReference("myblob");
blob.UploadText("before");
 
var request = BlobRequest.Lease(blob.Uri, 120, LeaseAction.Acquire, null);
account.Credentials.SignRequest(request);
 
string leaseId;
using (var response = request.GetResponse())
{     leaseId = response.Headers["x-ms-lease-id"];
}
 
blob.UploadText("after");

This exception should appear when performing the UploadText.

image

Since the leasing abilities are not available in the high level SDK we need to create a Blob PUT request. Remove blob.UploadText("after");  and enter the following.

var updateText = BlobRequest.Put(blob.Uri, 60, new BlobProperties(), BlobType.BlockBlob, leaseId, 0);
using (var stream = new StreamWriter(updateText.GetRequestStream()))
{     stream.Write("after");
}
client.Credentials.SignRequest(updateText);
updateText.GetResponse().Close();

In the developer blob storage instance the blob content should show “after”.

There is much more you can do with leasing but for now this has been a quick into to start off with Blob Leasing.

Thanks for reading.