Setting up a Multi-Cloud Data Platform

Contributed by

12 min read

Multi-cloud environments can be an effective way to hedge risk and enable future flexibility for applications. With a secure VPN in between two or more sites, you can leverage the global namespace and high availability features in MapR (like mirroring and replication) to drive business continuity across a wide variety of use cases.

In this tutorial, I will walk through the steps of how to connect Microsoft Azure and Amazon AWS cloud using an IPsec VPN, which will enable secure IP connectivity between the two clouds and serve as a Layer-3 connection you can then use to connect two or more MapR clusters.

Multi-Cloud Architecture with Site-to-Site VPN

Let's first take a look at the end result for which we're aiming.

On the left side of the below figure is an Azure setup with a single Resource Group (MapR_Azure). We'll set this up in the 'US West' zone. On the right side is an Amazon EC2 network in a VPC, which we will deploy in the Northern Virginia zone. This is an example of using geographically disperse regions to lessen the risk of disruptions to operations. After the VPN is completed we can use MapR replication to ensure that data lives in the right place and applications can read/write to both sites seamlessly.

This "site-to-site" VPN will be encrypted and run over the open Internet, with a separate IP subnet in each cloud, using the VPN gateways to route traffic through a tunnel. Note that this is a "Layer 3" VPN, in that traffic is routed between two subnets using IP forwarding. It's also possible to do this at Layer 2, bridging Ethernet frames between the two networks, but I'll leave that for another post (or an exercise for the curious reader.)

Setting up the Azure Side

First, prepare the Resource Group; in our example, we called the group 'MapR_Azure.'

Select it, and find the 'Virtual Network' resource by selecting the '+' icon, then type 'virtual network' in the search box.

Select 'Resource Manager' as the deployment model and press 'Create.' We use the name 'clusternet' for the network.

We will create two subnets, one on each cloud side. On Azure, we create the address range of 10.11.0.0/16 and a single subnet of 10.11.11.0/24 within the range. We'll make the Azure and EC2 address prefixes 10.11 and 10.10, respectively, to make for easy identification and troubleshooting.

Create a public IP for the VPN connections. Select '+' to add a resource, then type 'public IP address' in the search box. Press 'Create,' then set up the IP as follows. Select 'Use existing' for the Resource Group, and keep 'Dynamic' for the IP address assignment. The name is 'AzureGatewayIP.'

Take note of this address; we will use it later.

Next, create a 'Virtual Network Gateway' the same way. This entity will serve as the VPN software endpoint.

Reference the public IP you created in the previous step (in our case, 'AzureGatewayIP').

Note the concept of a 'GatewaySubnet' here. This is a subnet that is to be used exclusively by the Azure VPN software. It must be within the configured address range, and you can't connect any other machines to it. Microsoft says "some configurations require more IP addresses to be allocated to the gateway services than do others." It sounds a little mysterious, but allocating a /24 network seems to work fine for most scenarios.

Select 'clusternet' as the virtual network (what you created in the earlier step), use a Gateway subnet of 10.11.0.0/24, and use the 'AzureGatewayIP' address. This will create a new subnet entry called 'GatewaySubnet' for the 10.11.0.0/24 network.

For testing purposes, select the VpnGw1 SKU. This allows up to 650 Mbps of network throughput, which is more than enough to connect a couple of small clusters, but you can go up to 1.25 Gbps with the VpnGw3 SKU.

This may take up to 45 minutes (according to Microsoft) but it usually completes in a few minutes.

Setting up the AWS Side

We need to pause here to set up a few things on AWS. First, create a VPC in the AWS Console VPC Dashboard. Here we set the IPv4 address range as 10.10.0.0/16.

Navigate to 'Subnets,' and create a subnet in the VPC. Here we use 10.10.10.0/24.

Next, create an Internet gateway to connect our VPC to the internet. This step is important (and easily overlooked), otherwise traffic cannot be routed in between the Elastic IP and the subnet we just created.

Select 'Attach to VPC,' and use the new VPC 'clusterVPC'.

Go back to the EC2 Dashboard and select 'Launch Instance.' We will create an Amazon Linux instance to maintain the VPN. Select the 'Amazon Linux' AMI, and configure the instance details as follows:

Be sure to select the 'clusterVPC' we just created and 'clustersubnet' for the subnet. Select 'Disable' for 'Auto-assign Public IP' because we want to use an Elastic IP that we will associate later.

Under the last step, select 'Edit Security Groups,' and then select 'Create a new security group.' Open the group to all traffic coming from the AzureGatewayIP we configured previously (in this case, 40.80.158.169). Also (optionally), add any rules that you need to connect to the instance via ssh.

Click on 'Launch,' and optionally create a new key pair or use an existing one for the instance.

While the instance launches, let's create an Elastic IP for the VPN endpoint. In the 'Network & Security' menu of the AWS Console, select 'Elastic IPs,' and then allocate an address.

Note this address (here, 34.231.217.197) for later.

Associate the address with the instance we just created.

Finalizing the connection on the Azure side

Let's return to the Azure setup, and use the information from AWS to complete the connection. Add a Local Network Gateway.

The term 'local' is a bit of a misnomer here because the other network is not a local one; it's another cloud network. Microsoft uses the term 'local' to refer to an on-premise network that you might want to connect to Azure. For 'IP address,' use the Elastic IP you created in the previous section. For 'Address space,' use the 10.10.10.0/24 range (the AWS subnet).

Next, add a Connection. Select 'Site-to-site' VPN, and fill in the remaining details for your Resource Group.

Select the AzureGateway we configured as well as the Local Network Gateway (AWSVPN). Enter a key that will be used for the session.

Now is a good time to launch an instance for testing. Type 'Ubuntu Server' into the search box, and select Ubuntu Server 14.04 LTS. Configure the instance details, size, and settings.

Under the last Settings window, configure the virtual network, subnet, and 'None' for a public IP address (we don't need one because the VPN will handle outbound/inbound connectivity). Select a new or existing network security group.

Finalizing the connection on the AWS side

It's a good time to make sure the subnet you created has a default route to the internet gateway. From the AWS Console, navigate to 'Route Tables' and find the subnet associated with your VPC, select the subnet and the 'Routes' tab, and add a default route:

Returning to AWS, ssh into the instance we just created and download/install strongswan along with some dependency packages.

sudo yum install gcc gmp-devel

wget https://download.strongswan.org/strongswan-5.6.0.tar.bz2

bzip2 -d strongswan-5.6.0.tar.bz2

tar xvf strongswan-5.6.0.tar

cd strongswan-5.6.0

./configure && make && sudo make install

This should install strongswan in /usr/local, where we will edit the configuration files.

Edit the file /usr/local/etc/ipsec.conf and add the following entry:

conn azure  
authby=secret  
type=tunnel  
leftsendcert=never  
left=10.10.10.222  
leftsubnet=10.10.10.0/24  
right=40.80.158.169  
rightsubnet=10.11.11.0/24  
keyexchange=ikev2  
ikelifetime=10800s  
keylife=57m  
keyingtries=1  
rekeymargin=3m  
compress=no  
auto=start

The 'left' and 'leftsubnet' options refer to the Amazon (local) side. Use the local private IP address and subnet. For the right side, use the AzureGatewayIP we configured (40.80.158.169) and the 'clusternet' subnet.

Finally, edit the file /usr/local/etc/ipsec.secrets, and add your shared secret key.

10.10.10.222 40.80.158.169 : PSK "testing123"

Start the VPN with:

sudo sudo /usr/local/sbin/ipsec start

You can run sudo tail -f /var/log/messages to check the status of the connection.

You should now be able to ping the Azure machine by running ping 10.11.11.4 (or the address of the single interface of that machine). You can check it on the Azure side by viewing the Connection:

If you see 'Connected' as above, congratulations: you have a working two-cloud environment!

Additional Notes

Here are a few other concerns to watch when embarking on a multi-cloud adventure.

Ingress and Egress Data Transfers

Inter-site bandwidth is something to consider in your plan. At the time of this writing, most use cases of data transfer_in_to EC2 are free, with some exceptions. Data transfer out is free to most other Amazon services, like S3 and Glacier, and also free up to 1GB/month to other internet-connected sites, but costs a small amount per GB after that.

Data transfer in Azure is similar: all inbound data transfers are free, and there is a schedule of costs for outgoing transfers to the internet and other Azure zones.

Bandwidth and Latency

Amazon has a page on how to check bandwidth. Doing some quick tests with iperf between the two sites, here are some typical results:

Accepted connection from 10.99.1.237, port 35688  
\[ 5\] local 10.10.10.4 port 5201 connected to 10.99.1.237 port 35690  
\[ ID\] Interval Transfer Bitrate  
\[ 5\] 0.00-1.00 sec 36.7 MBytes 308 Mbits/sec  
\[ 5\] 1.00-2.00 sec 39.8 MBytes 334 Mbits/sec  
\[ 5\] 2.00-3.00 sec 42.1 MBytes 353 Mbits/sec  
\[ 5\] 3.00-4.00 sec 39.7 MBytes 333 Mbits/sec  
\[ 5\] 4.00-5.00 sec 30.5 MBytes 256 Mbits/sec  
\[ 5\] 5.00-6.00 sec 30.0 MBytes 252 Mbits/sec  
\[ 5\] 6.00-7.00 sec 30.9 MBytes 259 Mbits/sec  
\[ 5\] 7.00-8.00 sec 36.7 MBytes 308 Mbits/sec  
\[ 5\] 8.00-9.00 sec 41.5 MBytes 348 Mbits/sec  
\[ 5\] 9.00-10.00 sec 37.0 MBytes 311 Mbits/sec  
\[ 5\] 10.00-10.03 sec 977 KBytes 245 Mbits/sec  
\- - - - - - - - - - - - - - - - - - - - - - - - -  
\- \[ ID\] Interval Transfer Bitrate  
\- \[ 5\] 0.00-10.03 sec 366 MBytes 306 Mbits/sec receiver

That's some pretty hefty bandwidth (306 Mbits/sec) between the two sites.

Ready to setup a MapR cluster?


This blog post was published August 29, 2017.
Categories

50,000+ of the smartest have already joined!

Stay ahead of the bleeding edge...get the best of Big Data in your inbox.


Get our latest posts in your inbox

Subscribe Now