Experiments in Infrastructure for Remote Work: Part 1

I’ve been dreaming of working remote from a mountain cabin in rural Australia for a while now. There are several problems with this dream - one, that I don’t have either a rural property or a driver’s licence. The second, that rural broadband in one of the least population-dense countries is awful. Today I got my learners car licence, and over the weekend I worked on a project to hopefully mitigate the issue of having not-enough internet.

Partly, this poor internet service is due to some horrible decisions by our conservative government, and partly due to our geography. As a result of these issues, the best plan one can get for internet in rural areas is a 300GB a month, with 25 Mbit/s down, 5 Mbit/s up, and a minimum latency of 300ms. I’m not complaining, data shoots up into space and hopefully down to my log cabin in the wilderness, but it certainly won’t allow me to watch much Netflix.

If I’m not careful, I’ll blow through my entire internet allowance in minutes. And, if I run out of internet, my workplace won’t be too happy with me. Even worse, I love doing data analysis - and any large dataset will take too long to arrive over that small pipe.

Initial approach

My initial thought was that perhaps I could use something like VNC to remote into a linux system running a GUI. What I heard - and please hit me up if I’m wrong - that by far the best implementation of this is RDP on Windows. I don’t particularly want to use Windows, but I also realised that I probably don’t need a GUI - what’d be best is if I could do some kind of ssh manipulation and execution of the dev environment.

I’m an avid vscode user, so at this point I learnt about the Remote - SSH extension for it. I haven’t used it successfully yet, hopefully I’ll do that in part 2 of this article (as yet unwritten), but it immediately got me thinking about the infrastructure I’d need to easily make a remote development environment.

Infrastructure as Code

I’m a bit of a weenie when it comes to infrastructure, and since I’d had marginal contact with Kubernetes, my initial thought was to try and use it. I figured, I’d generate a bunch of Docker images, and run them on a Kubernetes cluster as needed for development. I could then easily spin them up and down based on my requirements. Looking into it, I realised I’d need to configure the cluster itself, and then expose each host individually. Again - hit me up if I’m wrong - but this didn’t seem like the right approach. I’d need to add some boilerplate to make each image expose the right ports, and I’d have to find some solution to save the state ever time I wanted to shut down each container. Then I found Terraform.

I first tackled the Terraform GCP Tutorial. I chose GCP simply because I’ve enjoyed using it more than AWS. I don’t know a goddamn thing about Azure.

The cool thing about the Terraform tutorial is that it taught me some things about how GCP works. I didn’t have a very strong understanding of how networks and firewalls worked, and I came out of it feeling like I knew more. A bad thing about it was that the whole section on networking modules seemed broken. I muddled my way through.

After I completed the tutorial, it only took what felt like a short time to create the first thing I thought I needed - a jumpbox aka bastion host.

The Jumpbox

What is a jumpbox? A jumpbox is a single ingress point to a set of servers, designed to be secure. It is also, apparently, this thing:

I care about security and I was a little wary about exposing a machine’s port 22 to the world. The Google documentation pretty strongly states its preference for OS Login. This is where I hit my first Terraform stumbling block - my Service account, which I gave to Terraform to alter my infrastructure - was unable to create an osLogin role. So I uh… gave it the security role creation permission. Maybe inadvisable? Let me know.

It wasn’t initially clear to me, either, that google needed my ssh keys named in a certain way. So, make sure your ~/.ssh/google_compute_engine.pub is set up. Additionally, the project or instance will need metadata that says enable-oslogin is TRUE. Another layer of configuration!

At this point I created an instance, a firewall that exposed port 22, and logged into it. Hooray!

I felt good while I was figuring this out - because I was coding it all, so I knew next time the setup would probably be less painful. But it was at this point that I ran into my first serious hurdle.

Terraform Destroy

Terraform will destroy everything you ever loved if you run terraform destroy, and if you have every piece of infra in your base directory - that’s the way it’ll go. It’ll remove the osLogin permission from your user account, for example.

I weathered this for a while - but I didn’t feel like I really needed to delete my osLogin permissions every time. Looking into the Terraform options, I was very confused about what I ought to do to stop this behaviour.

More Security

I got a little wary of using port 22 unconditionally as my entry port to my jumpbox, even if I had the protection of an ssh key. So, I changed the port to something different. This actually turned out to also be an issue, since the prebuilt GCP images defaulted to port 22. What I ended up doing was making an instance, changing the configured ssh port, then saving that disk as the reusable base image for the jumpbox.

Predictable Development Environments

This also gave me another idea, that I ought to use persistent disks for each repo I had - and anytime I got the disk to a state that was the “base” dev environment, I could make an image that I could always boot into!!! This kind of replaced my use case for Docker. The persistent disk could save my work - but worst case if I got to a state where I broke my dev environment, I could reboot with the image, and pull down the repo. I’d only lose work that I hadn’t git pushed.

Additionally, if I had multiple repos - I could create a persistent disk for each one. My environments could be completely compartmentalized, and I could have one command line argument to pass to terraform - the repo name - to boot the image in the configuration I needed for that repo. This was sounding great - but will have to be implemented in part 2.

My idea before this was that perhaps I could have a terraform configuration for every repo - and interpolate that into the main terraform code - but this is hard because terraform doesn’t allow dynamic imports, likely with good reason.

Avoiding Destruction

Avoiding the unconditional destruction of everything with terraform destroy was actually hard. Terraform seems to kind of assume infrastructure is made once and then either updated partially, or destroyed entirely. The reason this happens seems to be that when you run terraform in a directory, it creates one file with all the infrastructure state for that directory’s .tf files. This made me believe that the best way to handle this was with multiple folders.

It seems to me that modules are a bit of a red herring here - if you include a module in your directory’s .tf files, it generates infra state for those modules in the base directory - not the module directory.

It is possible, however, to just create different directories and then run the terraform commands in them - which keeps the state separate, and the configuration you don’t want to terraform destroy undestroyed. The bad thing about this is you need to consciously include your variables that are needed for every folder with something like this (in my makefile): cd jumpbox && terraform apply -var-file="../terraform.tfvars" -auto-approve - this keeps the state compartmentalized, and loads things like your google account variables from the base directory.

But, with this new directory structure I had exactly what I wanted - some configuration that was static, and some that was not.

Even better, I could now use terraform to create my jumpbox and two firewall rules - one that only allowed my current IP to login to it, and one that allowed internal machines to accept only connections from the jumpbox! Hooray! And even on top of that - when I was done working with my environment, I could pull down both the jumpbox and the firewall rules that allowed any connections - as I sleep nobody would be banging around in my stuff.

TODO

Next, I have to create a persistent disk and the dynamic code to attach it to an instance with Terraform, and then I need to configure vscode to connect into it! Hooray! Its probably worth me using vnstat or similar to get some idea of how bandwidth-heavy development is on a project over that connection.

After that, it may be worth creating some conditional code to create a persistent disk if none exists.

Thanks for reading!