The Windows Azure SDK provides an easy interface for uploading small files to Blob Storage. If you were using files less than 64MB in size you don’t need to specify one of the two modes for manipulating blobs. However, if the requirements include the access of larger files, there are some important differences to consider before deciding between Block Blobs and Page Blobs for storage.
Page blob features are similar to an NTFS volume. They are designed for random write workflows where specific page locations within a file are modified. These pages are 512 bytes in size and are empty when specifying the initial size of the blob. Azure Drives, which are just virtual drives stored in blob storage, use Page blobs to store the Virtual Hard Drive due to the similar features to NTFS. Lastly with Page blobs the blob size can be up to 1 TB compared to the 200 GB limit of Block blobs.
Block blobs are also split into sections for ease of uploading and downloading. However, unlike Page blobs, the blocks do not need to be the same size. Block blobs are designed for streaming and use a 2 phase commit system for each uploaded block. Each uploaded or downloaded block has a size limit of 4MB and is specifically referenced by a key in the Block List. The block list has control over the inclusion and ordering of the blocks. You can even re-use the same block in multiple locations in the list, or update the data for one individual block. The rest of this post will focus on how to upload a file to blob storage in parallel using a Block Blob.
To start we need to create a storage account, blob container, and a blob reference:
var account = CloudStorageAccount.DevelopmentStorageAccount;
var blobClient = account.CreateCloudBlobClient();
var container = blobClient.GetContainerReference("parallel");
container.CreateIfNotExist();
var blockBlob = container.GetBlockBlobReference("test.jpg");
blockBlob.DeleteIfExists();
blockBlob.Properties.ContentType = "image/jpeg";
Even though I mentioned that each block can be up to 4 MB I am going to set the block size to about 500 KB because my file is only 1.2 MB.
var blockLength = 500 * 1024;
var dataToUpload = File.ReadAllBytes("D:\background.jpg");
var numberOfBlocks = ((int) dataToUpload.Length / blockLength) + 1;
string[] blockIds = new string[numberOfBlocks];
The last line creates an array of strings. This is crucial to the block upload. The 2 phase commit is completed by uploading a block with a unique block ID then uploading a list of ID’s to commit. The blockIds array will house the collection of ID’s.
Here is the parallel loop using the parallel library.
Parallel.For(0, numberOfBlocks, x =>
{
var blockId = Convert.ToBase64String(Guid.NewGuid().ToByteArray());
var currentLength = Math.Min(blockLength, dataToUpload.Length - (x * blockLength));
using(var memStream = new MemoryStream(dataToUpload, x * blockLength, currentLength))
{
blockBlob.PutBlock(blockId, memStream, null);
}
blockIds[x] = blockId;
});
To summarize, in parallel for each block a Memory Stream is created and uploaded, associated with a block ID, then the block ID is set in the array, ready to be committed.
blockBlob.PutBlockList(blockIds);
The final statement which is executed after all the uploads are done commits the blocks to the blob. The block ID’s are base64 encoded strings in most cases. They need to be unique for the blob and uniform size.
This relatively simple code will allow clients to upload blobs in smaller blocks and also manage if/when a put fails due to connection issues. Downloading blocks of a blob is a similar process using the DownloadBlockList() method.
Windows Azure Blob Storage can be very useful in many different application scenarios. Once you get to know the features and options available you can easily harness the power of cloud storage in your applications.
Thanks for reading. Please contact me if you have any comments or questions.
Links to other Intermediate Azure Storage posts: