Debugging Azure Web Roles with Custom HTTP Headers and Self-Registering HttpModules

Standard

For the past year-and-a-half, I’ve been helping customers to develop web applications targeting Windows Azure PaaS. Typically, customers ask lots of similar questions and these are usually because they’re faced with similar challenges (there really isn’t such a thing as a bad question). I’ve recently had to answer this very question a few times in succession, so I figured that makes it a good candidate for a blog post! As always, I’d love to get your feedback and if you find this tip useful I’ll try to share some more common scenarios soon.

The scenario I want to focus on here today is nice and quick. It’s a reasonably common scenario in which you’ve deployed a web application (let’s say, a WebAPI project) to Azure PaaS and have more than a handful of instances serving-up requests.

Sometimes it’s tricky to determine which role instance served-up your request

When you’re developing and testing, you quite often need to locate the particular node which issued a an HTTP response to the client.

When the total number of instances serving your application are low, cycling through one or two instances of a web role (and connecting up to them via RDP) isn’t a particular issue. But as you add instances you don’t typically know which server was responsible for servicing the request, thus you have more to check or ‘hunt through’. This can make it more difficult for you to quickly jump to the root of the problem for further diagnosis.

Why not add a custom HTTP header?

In a nutshell, one possible way to help debugging calls to an API via HTTP is to have the server inject a custom HTTP header into the response which emits the role instance ID. A switch in cloud configuration (*.cscfg) can be added which allows you to turn this feature on or off, so you’re not always emitting it. The helper itself (as you’ll see below) is very lightweight and you can easily modify it to inject additional headers/detail into the response. Also, emitting the role instance ID (i.e. 0, 1, 2, 3 …) is preferable to emitting the qualified server name, for security reasons, and doesn’t really give too much info away to assist a would-be attacker.

How’s it done?

It’s rather simple and quick, really. And, you can borrow the code below to help you out but do remember to check it meets your requirements and test it thoroughly before chucking it into production! We start by creating an HTTP module in the usual way:

public class CustomHeaderModule : IHttpModule
    {

        public static void Initialize()
        {
            HttpApplication.RegisterModule(typeof(CustomHeaderModule));
        }

        public void Init(HttpApplication context)
        {
            ConfigureModule(context);
        }

        private void ConfigureModule(HttpApplication context)
        {
            // Check we're running within the RoleEnvironment and that
            // our configuration setting ("EnableCustomHttpDebugHeaders") is present. This is our "switch", effectively...
            if (RoleEnvironment.IsAvailable && RoleEnvironment.GetConfigurationSettingValue("EnableCustomHttpDebugHeaders"))
            {
                context.BeginRequest += ContextOnBeginRequest;
            }
        }

        private void ContextOnBeginRequest(object sender, EventArgs eventArgs)
        {
            var application = (HttpApplication)sender;
            var response = application.Context.Response;

            // Inject custom header(s) for response

            var roleName = RoleEnvironment.CurrentRoleInstance.Role.Name;
            var index = RoleEnvironment.Roles[roleName].Instances.IndexOf(RoleEnvironment.CurrentRoleInstance);
            response.AppendHeader("X-Diag-Instance", index.ToString());
        }

        public void Dispose()
        {
        }
    }

What we’ve got here is essentially a very simple module which injects the custom header, “X-Diag-Instance”, into the server’s response. The value for the custom header will be the index of the instance of the role in the Instances collection property of Role.

Deploying the module

Then, we want to add a little magic to have the module self-register at runtime (sure, you can put this in config if you really want to). This is great, because you could put the module into a shared library and then simply have it register itself into the pipeline automatically. Of course, you could actually substitute the config switch for a check to determine whether the solution is in debug or release mode, too (customise it to fit your needs).

To do the self-registration, we rely on a little known but extremely useful ASP.NET 4 extensibility feature called PreApplicationStartMethod. Decorating the assembly with this attribute allows the .NET framework to discover your module and auto-register it:

[assembly: PreApplicationStartMethod(typeof(PreApplicationStartCode), "Start")]
namespace MyApplication
{
    public class PreApplicationStartCode
    {
        public static void Start()
        {
            Microsoft.Web.Infrastructure.DynamicModuleHelper.DynamicModuleUtility.RegisterModule(typeof(CustomHeaderModule));
        }
    }

    public class CustomHeaderModule : IHttpModule
    {
      // ....
    }
}

This approach also works well for any custom headers you want to inject into the response, and a great use case for this would be to emit custom data you want to collect as part of a web performance or load test.

I hope you find this little tip and the code snippet useful, and thanks to @robgarfoot for the pointer to the super useful self-registration extensibility feature!

Advertisements

Unattended installation of SQL Server 2008 R2 Express on an Azure role

Standard

In certain circumstances, you might find yourself with a need to install SQL Server Express on one of your Windows Azure worker roles. Exercise caution here though folks: this is not a supported design pattern (remember, a restart of your role instance will cause all data to be lost).

It was however exactly what I needed for my scenario and I thought I’d share it in case it serves a purpose for you.

There are a couple of approaches you can take, of course, one of which is ‘startup tasks’ specified in the service definition files. However, these offered me limited configuration options because I needed to customise some of the command line arguments being passed to the installer based on values from the Role Environment itself.

The trickiest part was actually figuring out the correct command line parameters for SQL Server 2008 R2 Express, which to be honest wasn’t that fiddly at all. Here are the parameters you’ll need:

/Q/ACTION=Install/FEATURES=SQLEngine,Tools /INSTANCENAME=YourInstanceName
/HIDECONSOLE /NPENABLED=1 /TCPENABLED=1 /SQLSVCACCOUNT=”.\YourServiceAccount\” /SQLSVCPASSWORD=”YourServicePassword” /SQLSYSADMINACCOUNTS=”\.\ADMINACCOUNT” /IACCEPTSQLSERVERLICENSETERM S/INSTALLSQLDATADIR=”FullyQualifiedPathToFolder”

In the parameters above, we’re specifying a silentinstall with the /Qparameter, installing the SQL Database Engine and Management Tools (basic) with the /FEATURESparameter, setting the instance name, enabling named pipes and TCP, while setting service accounts and specifying the SQL data directory.

The next part then, is to actually build this as a command line and execute it in the cloud environment. How do we do this? Simples: we use System.Diagnostics to create a new Process()object and pass in a ProcessStartInfoobject as a parameter:

var taskInfo=new ProcessStartInfo
{
FileName=file,
Arguments=args,
Verb="runas",
UseShellExecute=false,
RedirectStandardOutput=true,
RedirectStandardError=true,
CreateNoWindow=false
};
//Start the process
_process=new Process(){StartInfo=taskInfo,EnableRaisingEvents=true};

For good measure, we’ll also redirect the standard and error output streams from the process so that we can capture those out to our log files:

//Log output
DataReceivedEventHandler outputHandler=(s,e)=>Trace.TraceInformation(e.Data);
DataReceivedEventHandler errorHandler=(s,e)=>Trace.TraceInformation(e.Data);

//Attach handlers
_process.ErrorDataReceived+=errorHandler;
_process.OutputDataReceived+=outputHandler;

Then, we’ll execute our task and ask the role to wait for it to complete before continuing with startup:

//Start process
_process.Start();
_process.BeginErrorReadLine();
_process.BeginOutputReadLine();

// Wait for the task to complete before continuing...
_process.WaitForExit();

Stick all of that into a method that you can re-use, and don’t forget to add parameters called fileand args(strings) that contain the path to the SQL Server Express installation executable and the command line arguments you want to pass in.

How to build your command line argument

If you’re wondering why I didn’t hardcode my command line options, it’s because up in Azure, the standard builds for web and worker roles don’t come preloaded with any administrative accounts – you have to specify those during design time. I actually ‘borrow’ the username of the Remote Desktop user (which is provisioned as an administrator for you when you ask to enable Remote Desktop).

I actually end-up with this quick-and-dirty snippet:

string file=Path.Combine(UnpackPath,"SQLEXPRWT_x64_ENU.exe");
string args=string.Format("/Q/ACTION=Install/FEATURES=SQLEngine,TOOLS/INSTANCENAME={2}/HIDECONSOLE/NPENABLED=1/TCPENABLED=1/SQLSVCACCOUNT=\".\\{0}\"/SQLSVCPASSWORD=\"{1}\"/SQLSYSADMINACCOUNTS=\".\\{0}\"/IACCEPTSQLSERVERLICENSETERMS/INSTALLSQLDATADIR=\"{3}\"", username,password,instanceName,dataDir);

So, ultimately, you’ll then want to wrap all of this up in to your role’s OnStart() method. Include a check to see whether SQL Express is already installed, too.

And, if you’re stuck trying to debug what’s going on with your otherwise silent installation, SQL Server Setup Logs are your friend. You’ll find them by connecting to your role via Remote Desktop and opening the following path:

%programfiles%\Microsoft SQL Server\100\Setup Bootstrap\Log\

Enjoy!

Screencast: “To the cloud!” (In 90 seconds, or less)

Standard

I’ve spoken with a lot of developers recently who haven’t yet adopted the Windows Azure platform, often because they think the process is difficult, time-consuming or requires some kind of advanced ninja training to get up and running.

This video will show you that it can be done in 90 seconds or less, without writing a single line of code!

In the screen cast, I’ll show you how to create a new Azure project in Visual Studio 2010, add a web role to the project, create a deployment package, upload it to the cloud, and then view it running in the cloud.

Before you begin though, head over to the Windows Azure site, and make sure you’ve activated a Windows Azure subscription. If you’re an MSDN or BizSpark subscriber, you get a free basic subscription anyway which is great for this demo. If you’re not, don’t worry, because until June 30th, Microsoft are giving you a free trial, too. Just get started at http://www.microsoft.com/windowsazure/free-trial/.

It’s seriously easy to do, so what are you waiting for – get going! 🙂


EDIT 30/03/2011:
In this video, I’ve shortened the “deployment” sequence to fit within the timeframe, but it should be noted it is normal for this part of the process to take anywhere between 15-30 minutes (while the Azure platform does what it does to spin up the resources it needs to run your solution). Thanks to all the watchers who pointed out that this fact was probably worth mentioning! 🙂

As always, feedback is appreciated and welcome.

The Lure of Azure

Standard

I gave a talk at Microsoft BizSparkCamp in London on March 25th, and I thought I’d follow that up with a blog post summarising some of the main benefits of deploying your next solution on Windows Azure. These reasons all form part of what I call the “Lure of Azure”.

So here’s a taste of eight of the reasons I broadly covered in the talk:

Reason #1: Financial
Building, deploying and maintaining a Windows Azure solution is likely to cost you far less than acquiring and maintaining your own physical infrastructure. If high availability and redundancy are important to you, I can pretty much guarantee you can’t do it cheaper or quicker than with the Azure platform.

Reason #2: Forget hardware
Very few applications actually need to be run on dedicated physical hardware (“Co-location” etc). Windows Azure abstracts away all the hardware and gives you a platform upon which you deploy code only, and through configuration files, you can determine how much or how little of the resources will be available to your application. Let Windows Azure take away the strain of worrying about load-balancing and redundancy, as all that’s taken care of for you.

Reason #3: Consolidation
The Windows Azure platform allows you to consolidate all your logical services under one account, with one control panel. That means you can instantly provision SQL Azure databases, and make use of infinitely scalable storage resources on-demand from a single point of administration, quickly and easily. Managing Windows Azure applications that utilise SQL Azure, Windows Azure storage and Compute is orders of magnitude easier than maintaining the equivalent physical infrastructure yourself.

Reason #4: Scale up (and down)
If you’re going to buy a physical infrastructure, chances are you’ll over-specify and end up with a lot of spare capacity because you build-in a lot of what you don’t need all the time… I.E. you build to accommodate ‘peak’. That means you’d be paying for what you don’t need (or use) most of the time with any physical infrastructure. Windows Azure solutions can be scaled-up and down on demand, meaning you only ever pay for capacity when you need it.

Reason #5: Build flexibility
If you build on physical infrastructure, when you need more of something (or less, for that matter), you end up messing about with hardware. If you lease (as many of us do), then that means contract variations, change requests, maintenance periods and perhaps even down-time but generally always cost. The Windows Azure platform lets you do all this stuff on-demand, with ease.

Reason #6: Better global reach
Locating a physical data centre in one territory is one thing, but if you want truly global scale it pays to geographically distribute your resources to other territories as well. Co-location is an expensive way to do this, and then you have to think about how you’re going to replicate your data between all your data centres yourselves and the bottom line is: that’s tricky to say the least. Locating resources globally in Windows Azure is as easy as point-and-click.

Reason #7: If you’re a .NET house already, it’s even easier
Using Visual Studio 2010? Know C#? Most of your existing code can run in the cloud immediately with just a few minor tweaks. Go download the SDK and start today.

Reason #8: Flexibility to utilise all, or a part of the platform
Fed-up maintaining your own SQL database cluster? Running out of resources locally? Hosting company charging you too much for a SQL server database? Bung your database on SQL Azure and leave your app where it is. You can use the SQL Azure Migration Wizard to move your databases over to the cloud, then it’s just a simple matter of changing your connection strings in your code. Show me a simpler way of creating a triple-redundant SQL server instance!

There are many other reasons, of course, and I could expand on any of these along the way but the point of this post was to just get you thinking about some of the key advantages by summarising some of the points I discussed in my talk. Feel free to post comments!

Open-source FTP-to-Azure blob storage: multiple users, one blob storage account

Standard

A little while ago, I came across an excellent article by Maarten Balliauw in which he described a project he was working on to support FTP directly to Azure’s blob storage. I discovered it while doing some research on a similar concept I was working on. At the time of writing this post though, Maarten wasn’t  sharing his source code and even if he did decide to at some point soon, his project appears to focus on permitting access to the entire blob storage account. This wasn’t really what I was looking for but it was very similar…

My goal: FTP to Azure blobs, many users: one blob storage account with ‘home directories’

I wanted a solution to enable multiple users to access the same storage account, but to have their own unique portion of it – thereby mimicking an actual FTP server. A bit like giving authenticated user’s their own ‘home folder’ on your Azure Blob storage account.

This would ultimately give your Azure application the ability to accept incoming FTP connections and store files directly into blob storage via any popular FTP client – mimicking a file and folder structure and permitting access only to regions of the blob storage account you determine. There are many potential uses for this kind of implementation, especially when you consider that blob storage can feed into the Microsoft CDN…

Features

  • Deploy within a worker-role
  • Support for most common FTP commands
  • Custom authentication API: because you determine the authentication and authorisation APIs, you control who has access to what, quickly and easily
  • Written in C#

How it works

In my implementation, I wanted the ability to literally ‘fake’ a proper FTP server to any popular FTP client: the server component to be running on Windows Azure. I wanted to have some external web service do my authentication (you could host yours on Windows Azure, too) and then only allow each user access to their own tiny portion of my Azure Blob Storage account.

It turns out, Azure’s containers did exactly what I wanted, more or less. All I had to do was to come up with a way of authenticating clients via FTP and returning which container they have access to (the easy bit), and write an FTP to Azure ‘bridge’ (adapting and extending a project by Mohammed Habeeb to run in Azure as a worker role).

Here’s how my first implementation works:

A quick note on authentication

When an FTP client authenticates, I grab the username and password sent by the client, pass that into my web service for authentication, and if successful, I return a container name specific to that customer. In this way, the remote user can only work with blobs within that container. In essence, it is their own ‘home directory’ on my master Azure Blob Storage account.

The FTP server code will deny authentication for any user who does not have a container name associated with them, so just return null to the login procedure if you’re not going to give them access (I’m assuming you don’t want to return a different error code for ‘bad password’ vs. ‘bad username’ – which is a good thing).

Your authentication API could easily be adapted to permit access to the same container by multiple users, too.

Simulating a regular file system from blob storage

Azure Blob Storage doesn’t work like a traditional disk-based system in that it doesn’t actually have a hierarchical directory structure – but the FTP service simulates one so that FTP clients can work in the traditional way. Mohammed’s initial C# FTP server code was superb: he wrote it so that the file system could be replaced back in 2007 – to my knowledge, before Azure existed, but it’s like he meant for it to be used this way (that is to say, it was so painless to adapt it one could be forgiven for thinking this. Mohammed, thanks!).

Now I have my FTP server, modified and adapted to work for Azure, there are many ways in which this project can be expanded…

Over to you (and the rest of the open source community)

It’s my first open source project and I actively encourage you to help me improve it. When I started out, most of this was ‘proof of concept’ for a similar idea I was working on. As I look back over the past few weekends of work, there are many things I’d change but I figured there’s enough here to make a start.

If you decide to use it “as is” (something I don’t advise at this stage), do remember that it’s not going to be perfect and you’ll need to do a little leg work – it’s a work in progress and it wasn’t written (at least initially) to be an open-source project. Drop me a note to let me know how you’re using it though, it’s always fun to see where these things end up once you’ve released them into the wild.

Where to get it

Head on over to the FTP to Azure Blob Storage Bridge project on CodePlex.

It’s free for you to use however you want. It carries all the usual caveats and warnings as other ‘free open-source’ software: use it at your own risk.

If you do use it and it works well for you, drop me an email and it’ll make me happy. 🙂

An introduction to Windows Azure (for Busy People)

Standard

I decided to write this post to provide a little technical information aimed at non-programmers (Project Managers, Department Heads and other Busy People) who want to know more about the platform; how it works and what it offers. My goal is that, after reading this article, you’ll have a basic – yet thorough – understanding of how Azure is structured so that you can make informed contributions to discussions regarding the platform. This is a work in progress.

Some of the analogies used in the following article are designed to facilitate understanding on a functional level, and may therefore be technically ‘inaccurate’. If you’ve picked that up, you’re probably more technical than this author had in mind as the intended audience!

As always, we’re all learning – if you have ideas or suggestions for improving this article, please feel free to leave a comment. Thanks!

Table of Contents

  1. Introduction
  2. Web Roles and Worker Roles
  3. Resources
  4. Storage
  5. Databases

An introduction to Windows Azure (for Busy People)

In the Azure world, you can have databases and applications all running in the cloud environment. By now, most of us know that a ‘cloud environment’ in its most basic form describes an environment in which you don’t ever see or touch the physical hardware or infrastructure as these are determined, managed and provided for you by the cloud service provider.

Developing and deploying applications onto the Azure platform requires a different approach to traditional application development, but developers can still continue to use all their existing tools (such as Visual Studio 2010) and don’t require any new software to get started. In fact, it’s actually possible to write applications for the Azure platform using the free Expression edition products provided by Microsoft.

Physically coding your applications, however, does require developers to change the way in which they build their applications, if only a little. That’s really a topic best left for someone else, or another post, to address.

On Azure, applications are referred to as ‘roles’, and there are two types of role: a “web role” or a “worker role”.

Think of a web role as a web site1, and a worker role as some repetitive computational task that takes place behind-the-scenes without any user interface at all (a good example would be processing statistical data, or – to use examples from other blogs – a thumbnail generator for images).

Roles

Web Roles are similar to web servers, in that they allow public computers to connect to your application over standard HTTP and HTTPS ports. Typical Azure deployments consist of one – maybe two – web roles, and a number of worker roles. Worker roles are also publically accessible; that is they can talk to each other and the outside world, and other Azure services.

It is important to note, however, that one web role is not actually a web server in and of itself. It is simply an instance of your software running on a web server that is publically accessible.

Azure would not be complete without two other key service offerings: storage (some place to store all your data) and SQL Azure (a variation of SQL Server, which provides relational database capabilities to your cloud applications deployed on the Azure platform).

To recap then, Azure is a platform that provides:

  • Some place to run your applications from (via web and worker roles)
  • Some place to store all your application files
  • SQL Azure – a relational database like SQL Server

Each of these functional areas are referred to as ‘hosted services’, and as you might expect there are limitations imposed by Microsoft as to the amount of resources available to each service.

Resources 

Though theoretically unlimited, in order to ensure all customers have resources available when required, Azure packages and limits what resources are available to specific deployments. Databases, storage and application instances are artificially capped according to the current limits (published online 2, updated regularly and these are commonly expected to grow over time).

Web and worker roles come in four varieties: small, medium, large and extra-large. That’s because they are actually virtual machines (VM’s – software ‘simulations’ of physical servers, many copies of which can run on a single physical server). Each represents an increase in pricing and has a different set of specifications that govern how much RAM, local storage space and CPU cores are available to the role as described below:

Size CPU Cores Memory Disk Space for Local Storage Resources
Small 1 1.7 GB 250 GB
Medium 2 3.5 GB 500 GB
Large 4 7 GB 1000 GB
Extra-large 8 14 GB 2000 GB

Each VM is provisioned when required. The ‘magic’ of Windows Azure is that when you provision a VM, the Azure platform actually provisions a further two identically configured VMs. One acts as a recovery image, the other as a failover. If Azure detects a fault condition, it takes appropriate steps to automatically recover the damaged VM.

This is one of the most useful features of Azure, and you get it for ‘free’ – i.e., you don’t need to do anything particularly special to get this to happen, it’s simply a by-product of deploying your applications on to Azure.

Getting to Azure

To utilise Azure, you need an Azure services account (one per customer). Each account has the following overall limitations:

  • Maximum 20 hosted service projects (projects contain instances)
  • Maximum 5 storage accounts
  • Limitation of 5 roles per hosted service project (i.e. 3 different web roles and two different worker roles, or any such combination)
  • 20 CPU cores across all of the hosted service projects

Configurations of the Azure platform represent significant architectural decisions as deployments not only require the correct determination of ‘size’ but also the appropriate number of ‘instances’ of that deployment which will concurrently run. It is possible, therefore, to have two instances of a ‘small’ worker role running the same application. This would consume two of your maximum 20 cores. It is worth mentioning at this point that one can, at any time, reconfigure a deployed instance to utilise a larger VM or have a higher instance count, but that some (relatively minor) downtime would be incurred.

Storage

Storage in the cloud doesn’t work like any traditional disk-based system. That is, you’ll never have a “C:\” drive or a “D:\” drive3 (local storage is a topic I’m not going to cover here). The Azure platform makes disk space available as three distinct entities: Blobs (block and page), Tables and Queues. These three entities essentially abstract space on physical disks away into different logical units, within which programmers will never be able to ‘see’ the underlying disks or access them directly. This looks a little something like this:



Blobs are stored within containers and you can have as many containers as you can fit within your storage account quota. They’re a bit like folders, but only if you consider that you get to name them once they are created, and they cannot contain subfolders (or sub-containers, for that matter). Azure tables aren’t like tables in relational databases such as SQL Server or Microsoft Access, while queues provide a mechanism through which web and worker roles can talk to each other (instance A sends a message to instance B, which might – but doesn’t have to – process the message right away, hence why it is called a queue).

Block blobs and Page blobs

Block blobs are optimised for streaming, while Page blobs are optimised for random read/write operations. Block blobs are targeted towards streaming operations specifically because writing them is a two step process: first, you upload all of the individual blocks that will comprise the blob. Next, you must commit the blocks via a block list. During the commit phase, you can add/change or remove blocks from the blob. Page blobs, on the other hand, are updated immediately – no commit phase is required.Page blobs consist of an array of pages, where each page is 512 bytes and the blob size must be a multiple of 512 bytes.

Both block and Page blobs can be read from any byte offset in the blob, meaning it’s possible to read only a specific ‘chunk’ of either blob when it is on Azure Storage.

Page blobs: primary characteristics

Page blobs are fast and range-based, which means you can read from and write to specific ranges of a blob at a time. Page blobs are initialised with a Maximum Size, but if only half the blob contains data, you’re only charged for what you actually store in the blob. Page blobs also support leasing, which means it is possible for your application to ‘lock’ a specific range of the page blob while it is updating it, then release the lock.

The Windows Azure Storage blog has this to say about Page Blobs:

Another use of Page Blobs is to use them for custom logging for their applications.  For example, for a given role instance, when the role starts up a Page Blob can be created for some MaxSize, which is the max amount of log space the role wants to use for a day.   The given role instance can then write its logs using up to 4MB range-based writes, where a header provides metadata for the size of the log entry, timestamp, etc.   When the Page Blob is filled up, then treat the Page Blob as a circular buffer and start writing from the beginning of the Page Blob, or create a new page blob, depending upon how the application wants to manage the log files (blobs).   With this type of approach you can have a different Page Blob for each role instance so that there is just a single writer to each page blob for logging.  Then to know where to start writing the logs on role failover the application can just create a new Page Blob if a role restarts, and GC the older Page Blobs after a given number of hours or days.  Since you are not charged for pages that are empty, it doesn’t matter if you don’t fill the page blob up.

Block blobs: characteristics

Block blobs consist of, well, blocks! I’d say, in my experience, most people would want to be using block blobs over page blobs because they’re a little more flexible in terms of their sizing. For instance, a block blob does not have to declare its size when you create it: you just keep adding blocks to the blob until you’re done. There’s another benefit, too. With block blobs, you can send blocks in any sequence, then arrange them later on when you call your commit function. This makes them ideally suited to transferring large files, where your client is able to use a few threads to send the file in chunks.

Understanding the limitations of block and page blobs

Storage, like the other Azure services, is also subject to some limitations (and its own pricing structure), and the current limits are described in Table 3 below:

Characteristic Limit
Blob (block and page blob) Maximum 200 GB
Block 4MB maximum size, 64KB minimum size
Overall storage limit 100 TB

You can mix and match block and page blobs within your account, but a block blob cannot suddenly ‘become’ a page blob, or vice versa. Once a blob is created as one particular type, it will always remain that type. A block blob cannot contain pages, and a page blob cannot contain blocks for instance.

Addressing blobs

Blobs aren’t accessed or written to like traditional file systems, with a nice path-to-folder-and-filename approach (e.g. “C:\My Documents\My File.jpg”). Blobs use URIs to organise their data, e.g.:

http://myservice.blob.core.windows.net/accountname/containername/

blobname/which/can/have/slashes/but/dont/represent/folders/file.jpg.

It is precisely because this system is URI-based that it can be so vast and resilient to failure, as there are many copies of each individual physical drive. Therefore, it’s safe to say that when you upload a file to Azure and store it in blob storage, it’s pretty safe!

Earlier, I explained that a blob should be thought of as a container for files. This is not strictly true, but the analogy is very similar. In actuality, blobs are containers for blocks (chunks of a single file) and pages (more on those below), and blobs are actually organised into containers themselves. One file may be one block (if it is under 4MB in size; the maximum size limit for a block), or it may be several thousand. If the file is over 64MB in size, it must be split into blocks. Azure, perhaps confusingly has two varieties of blob storage: block and page.

Let it suffice to say that block blobs can be no larger than 200GB, and page blobs can be no larger than 1TB. Any combination of the latter must not exceed 1 TB. You can therefore see that the storage system in Azure is much more complex than the traditional system we are used to, but that it offers significant advantages over the former.

Databases: SQL Azure

Microsoft has redesigned some of their core applications (such as SQL Server) to work specifically on the Azure platform, and as such, they have some very appealing advantages over the versions of the products that you can buy commercially.4

In typical server-based implementations of SQL Server, it is common to find one server acting as the master while the other one monitors it to take over should it fail (the slave). This means the database is subject to the limitations of that server (storage space, processing power and bandwidth). It also means that although you have two servers powered on and dedicated to the task of serving a database, only one is ever actually working at any one time, which represents half the total available computing power and is a good example of why paying for hardware through a traditional hosting company is actually a less appealing concept.

On Azure, SQL Server has become SQL Azure – and now, the concept of master/slaves has gone and you have multiple servers all serving the same database, resulting in massively higher processing power and greater throughput capacity. What this ultimately means is that one can work with that database much more quickly than one can with SQL Server.

Now, there are some fundamental differences between SQL Azure and SQL Server. For example, one cannot do everything one can with SQL Server within SQL Azure. Bear that in mind when your developers explain this to you, as the two products are not exactly the same.

Databases require somewhere to store their data. SQL Azure has the following database packages available:

Maximum database size Monthly standing charge (USD)
5 GB $49.95
10 GB $99.99
20 GB $199.98
30 GB $299.97
40 GB $399.96
50 GB $499.95

In addition, data transfer charges apply to the standing monthly charge:

Region Direction Charge / GB (USD)
World (exc. Asia) Inbound $0.10
World (exc. Asia) Outbound $0.15

SQL Azure offers the opportunity to pay only for what one actually uses. The standing monthly charges are amortised over the month and you only pay for the days on which you actually have the databases in each specific tier. This makes it a very cost-effective way to purchase database space in the cloud.

Also, being based on the Azure platforms means that there are a number of additional advantages:

  • Data stored in an automatic high-availability environment
  • Fault tolerance included
  • 99.9% “Monthly Availability” SLA 5

This concludes our basic high-level introduction to the Windows Azure platform and I hope you have enjoyed reading it. If you have questions, feel free to post them in the comments below and I’ll do my best to answer them.


Foot notes:

A web role does not have to be a web site – it could be a web service, such as an API. A web role is publically accessible via the World Wide Web.

Available at http://msdn.microsoft.com/en-us/library/ee814754.aspx

Service quotas are expected to grow over time and automatically become available to hosted services.

“Local storage” excepted; in this document I am discussing globally available storage.

Azure is a proprietary technology and no company can install their own private instance of it. Microsoft software written purely for Azure is not available to any third party to install and host on their own infrastructure.

See http://www.microsoft.com/windowsazure/sla/ for all the Azure platform SLAs