SmarterApp Logo
Deploy to Kubernetes
2024-03-26 14:03:06 -0400 |

Deploy to Kubernetes

NOTE: This checklist assumes that a user conducting the deployment has access to an AWS account with sufficient privileges to interact with the following services:

Create a VPC

Create a new VPC that will host the Kubernetes environment:

IMPORTANT: When choosing a region that will host the Kubernetes cluster, make sure there are enough Elastic IPs available for allocation. One Elastic IP is required for each node that will be a member of the Kubernetes cluster. For example, if the cluster consists of four nodes plus a bastion node, five Elastic IP addresses are required. If there are not enough Elastic IP addresses available, kops will not be able to create the cluster.

Create an S3 Bucket to Store Cluster Configuration

Generate Key for Accessing the Ops System

ssh-keygen -t rsa -C "[brief comment that describes the purpose of this file]" -f [the name of the file]

Example:

    
      ssh-keygen -t rsa -C "key for k8s dev environment" -f tds_ops_dev
    
  

Create Kubernetes Cluster

This section covers creating and initializing the base Kubernetes cluster. Whoever is deploying the system should decide what values to use for each command and remain consistent throughout as our examples have done below.

  1. Determine the name of the cluster.  For the examples below we are using <ZONE>=us-west-2a and <CLUSTER>=tdsuat.sbtds.org because sbtds.org is hosted on AWS.
  2. Create the cluster configuration:
      
        kops create cluster \
          --cloud=aws \
          --node-count=[the number of nodes desired for the cluster] \
          --zones=[The AWS region where the previously created VPC resides] \
          --master-zones=[a comma-separated list of AWS zones] \
          --dns-zone=[the Route53 hosted zone] \
          --vpc="[the id of the previously created VPC]" \
          --network-cidr="[the CIDR range for the **private subnet** of the VPC]" \
          --node-size=[The instance size of each agent node in the cluster] \
          --master-size=[The instance size of the master node in the cluster] \
          --ssh-public-key="[The path to the ssh key created in the previous step]" \
          --topology private --networking weave --bastion \
          --state [the path to the S3 bucket that stores the cluster configuration] \
          --name [the name of the cluster]
      
    
      
        kops create cluster \
          --cloud=aws \
          --node-count=4 \
          --zones=us-east-1a \
          --master-zones=us-east-1a \
          --dns-zone=sbtds.org \
          --vpc="vpc-2348765b" \
          --network-cidr="170.20.0.0/16" \
          --node-size=m3.large \
          --master-size=m3.large \
          --ssh-public-key="~/.ssh/tds_ops_dev.pub" \
          --topology private --networking weave --bastion \
          --state s3://kops-tds-cluster-state-store \
          --name tdsuat.sbtds.org
      
    

Create Redis Cluster

Create a single node AWS ElastiCache Redis.

Running kubectl Commands Against the Cluster

From here, all kubectl commands should be executed from a machine that has access to the VPC that was created (e.g. the bastion server that was created as part of the cluster or a “jump box” that has sufficient privileges to interact with the cluster).

If using the bastion server, kops and kubectl will likely have to be installed and configured. After the cluster is successfully created, the output will display the command necessary to ssh into the bastion server. Alternately, instructions can be found here.

OPTIONAL: Add the kubernetes dashboard:

OPTIONAL: Add monitoring via heapster

Create NGINX

This creates the NGINX ingress controller.

Create Rabbit

Create a RabbitMQ cluster that will be used by the application.  For convenience, we have included a cluster deployment within the Kubernetes environment, if one is not already available.

  1. Go to the deployment folder located at wherever you unzipped tds_deployment_support_files.zip
  2. Run kubectl create -f rabbit-cluster.yml
    • This will start up three rabbit containers.  You need to wait for those to stand up before moving onto the next steps.
    • All rabbit servers need to be “Running” at 1/1 via Kubectl.  You can get this information by running kubectl get po -w
  3. Ensure the rabbit cluster started properly
    • Run kubectl logs -f rabbit-server-0
    • The above command will stream logs to standard out.  You should see the rabbit nodes joining the cluster.  The example logs below show the first node joining the cluster.
  4. Setup username & password for Rabbit
    • Run kubectl exec -it rabbit-server-0 /bin/bash opening an interactive bash terminal to the pod.
    • Run export RABBITMQ_NODENAME=rabbit@$(hostname -f)
    • Create the user and give it permissions.  The username will be services which is consistent with the provided configuration files.
    • Run rabbitmqctl add_user services <PASSWORD> with the password you want to use.  Make sure to remember this password as it will be used when updating configuration files.
    • Run rabbitmqctl set_permissions -p / services ".*" ".*" ".*"

Example Rabbit Cluster Connection Logs

 =INFO REPORT==== 21-Apr-2017::21:05:02 ===
 node 'rabbit@rabbit-server-1.rabbit-service.default.svc.cluster.local' up
 =INFO REPORT==== 21-Apr-2017::21:05:12 ===
 rabbit on node 'rabbit@rabbit-server-1.rabbit-service.default.svc.cluster.local' up

Create ProgMan Configuration

This section assumes there is an existing Progman configuration setup for Student and Proctor.

  1. Go to an existing configuration.
  2. Set the Property Editor to “Property File Entry”
  3. Copy the values in the text area.
  4. Create a new profile with a meaningful name (e.g.tds, tdsuat)
  5. Once again set the Property Editor to “Property File Entry”
  6. Paste the values you copied on step 3 into the textarea and save the configuration.
  7. Go back and edit the newly created configuration.  Use the “Property File Entry” mode.
  8. Add the following properties at the end of the text area.  
    tds.exam.remote.enabled=true
    tds.exam.remote.url=http://tds-exam-service/exam
    tds.exam.legacy.enabled=false
    tds.assessment.remote.url=http://tds-assessment-service/
    tds.session.remote.url=http://tds-session-service/sessions
    tds.session.remote.enabled=true
    tds.session.legacy.enabled=false
    

Update Configuration YML Files 

These files are located in the tds_deployment_support_files.zip and should reside in your private Gitlab repository if you followed the previous steps.  After making the edits make sure that you push the changes to your Gitlab private repository on the master branch so the Spring Cloud Config can leverage the values.

Type of information being modified

There are comments within the configuration files explaining what certain properties represent.  Any property that has <SOMETHING> style means that you will need to edit it following the steps we present below.   Any property that starts with {cipher} means that we recommend you secure this value using the Spring CLI tool mentioned above.  The {cipher} keyword should remain and the value after to be replaced.  For example, {cipher}<RABBIT PASSWORD>, you’d replace the <RABBIT PASSWORD> part with the value created via the Sprint CLI.

Important: Make sure you remember the encrypt key you use for encryption and make sure to use the same one throughout.  You will also be instructed to add it to a deployment file in a later step.

Example Spring CLI Encrypt command usage

  1. Run spring encrypt my_password --key my_encrypt_key
  2. Resulting in output 604a89112855fc5fd80132c3a6f2a331639a0ffe0c280d65848833276062cebb
  3. Take that value and replace the {cipher} property with this value.  So {cipher}<RABBIT PASSWORD> would be {cipher}604a89112855fc5fd80132c3a6f2a331639a0ffe0c280d65848833276062cebb

S3 Configuration

In the tds-exam-service.yml file you will see a section like this:

    s3:
      bucketName: tds-resources
      itemPrefix: uat/
      accessKey: '{cipher}<REDACTED>'
      secretKey: '{cipher}<REDACTED>'

The accessKey and secretKey should be the AWS’s read only user’s accessKey and secretKey.  The bucketName and itemPrefix are as you configured it when you loaded the items to s3.  The values above are the ones used in the section that covered items to S3.

Update Deployment Configuration Files

This section refers to the files in the deployment directory contained in the <ZIPFILE> file.  The information in all files needs to be updated with your local deployment environment.  For example, there are references to Progman (PM) and Single Sign On (SSO) which you should have already configured if you’ve followed previous steps in the deployment checklist.

Important: Since these files should never be publicly available nor hosted the passwords should be saved as plain text.

NOTE: Any file that has a header This file requires per-deployment/environment configuration changes will have to be updated with information concerning your deployment environment.  Otherwise the file does not need to be modified.

Proctor Deployment Configuration

The following settings need to be configured in the tds-proctor-war.yml file. These configuration settings are stored in the env section of the yml file. Each configuration setting is listed as a name/value pair. When making changes to the tds-proctor-war.yml file, only the value needs to be updated. In some cases, the value does not need to be edited.

Shown below is an example of a configured env section of a tds-proctor-war.yml file:

    
    env:
    - name: GET_HOSTS_FROM
      value: dns
    - name: OAUTH_ACCESS_URL
      value: "https://sso-example.org/auth/oauth2/access_token?realm=/sbac"
    - name: PM_OAUTH_CLIENT_ID
      value: "pm"
    - name: PM_OAUTH_CLIENT_SECRET
      value: "[redacted]"
    - name: PM_OAUTH_BATCH_ACCOUNT
      value: "prime.user@example.com"
    - name: PM_OAUTH_BATCH_PASSWORD
      value: "[redacted]"
    - name: PROGMAN_BASEURI
      value: "http://progman-example.org:8080/rest/"
    - name: PROGMAN_LOCATOR
      value: "tds,example"
    - name: SPRING_PROFILES_ACTIVE
      value: "mna.client.null,progman.client.impl.integration,server.singleinstance"
    - name: CATALINA_OPTS
      value: "-XX:+UseConcMarkSweepGC -Xms512m -Xmx1512m -XX:PermSize=256m -XX:MaxPermSize=512m"
    # Note that the values below are used by SAML for SSO.  They should match the values in the associated
    # security files ./security/proctor/*
    - name: PROCTOR_SECURITY_SAML_ALIAS
      value: proctorexample
    - name: PROCTOR_SECURITY_SAML_KEYSTORE_CERT
      value: proctor-deploy-sp
    - name: PROCTOR_SECURITY_SAML_KEYSTORE_PASS
      value: [redacted]
    - name: PROCTOR_WEBAPP_SAML_METADATA_FILENAME
      value: proctor_sp_deployment.xml
    - name: RABBITMQ_ADDRESSES
      value: "rabbit-server-0.rabbit-service:5672,rabbit-server-1.rabbit-service:5672,rabbit-server-2.rabbit-service:5672"
    - name: RABBITMQ_USERNAME
      value: "services"
    - name: RABBITMQ_PASSWORD
      value: "[redacted]"
    - name: CONFIG_SERVICE_URL
      value: "http://configuration-service"
    - name: LOGSTASH_DESTINATION
      value: "logstash.sbtds.org:4560"
    
  

Student Deployment Configuration

The following settings need to be configured in the tds-proctor-war.yml file. These configuration settings are stored in the env section of the yml file. Each configuration setting is listed as a name/value pair. When making changes to the tds-proctor-war.yml file, only the value needs to be updated. In some cases, the value does not need to be edited.

Shown below is an example of a configured env section of a tds-proctor-war.yml file:

    
    env:
    - name: GET_HOSTS_FROM
      value: dns
    - name: OAUTH_ACCESS_URL
      value: "https://sso-example.org/auth/oauth2/access_token?realm=/sbac"
    - name: PM_OAUTH_CLIENT_ID
      value: "pm"
    - name: PM_OAUTH_CLIENT_SECRET
      value: "[redacted]"
    - name: PM_OAUTH_BATCH_ACCOUNT
      value: "prime.user@example.com"
    - name: PM_OAUTH_BATCH_PASSWORD
      value: "[redacted]"
    - name: PROGMAN_BASEURI
      value: "http://progman-example.org:8080/rest/"
    - name: PROGMAN_LOCATOR
      value: "tds,example"
    - name: SPRING_PROFILES_ACTIVE
      value: "mna.client.null,progman.client.impl.integration,server.singleinstance"
    - name: CATALINA_OPTS
      value: "-XX:+UseConcMarkSweepGC -Xms512m -Xmx1512m -XX:PermSize=256m -XX:MaxPermSize=512m"
    - name: RABBITMQ_ADDRESSES
      value: "rabbit-server-0.rabbit-service:5672,rabbit-server-1.rabbit-service:5672,rabbit-server-2.rabbit-service:5672"
    - name: RABBITMQ_USERNAME
      value: "services"
    - name: RABBITMQ_PASSWORD
      value: "[redacted]"
    - name: CONFIG_SERVICE_URL
      value: "http://configuration-service"
    - name: LOGSTASH_DESTINATION
      value: "logstash.sbtds.org:4560"
    - name: PERFORMANCE_REQUEST_LIMIT
      value: "50"
    
  

Cipher Decryption Configuration

You’ll need to use the value you used for my_encrypt_key password encryption for the following steps.

  1. Navigate to deployment/configuration-service.yml which should be wherever you extracted the accompanying zip file.
  2. Update the ENCRYPT_KEY property value to be your my_encrypt_key value
    • This will allow the applications to decrypt the passwords

HTTPS Communication

The default deployment files assume communication is done with HTTPS. However, if your environment does not support HTTPs you will need to make a couple edits to tds-ingress.yml which resides in the deployment directory. Remove both ingress.kubernetes.io/force-ssl-redirect: "true" annotations.

Create TDS Environment with Kubernetes

The following steps will walk you through standing up the TDS environment using Kubernetes.  All the steps below should be run within the deployment directory containing the deployment configuration yml files in the previous step.

  1. Create the Spring Cloud Configuration Service
    • Run kubectl create -f configuration-service.yml
    • Wait for the spring cloud configuration service to initialize.  You can see this by running kubectl get po -w.  Once the service is RUNNING at a 1/1 state it means it is initialized.
  2. Create the remaining TDS Kubernetes resources
    • Run cd .. to change to the parent directory 
    • Run kubectl create -f deployment to create the remain kubernetes resources in the deployment directory.
    • Note You will see “Already Created” errors for kubernetes resources you have created in previous steps.  For example, rabbit, ingress, and configuration service.  
    • Run kubectl get po -w and wait until all services are “Running” with a 1/1.  If there are errors please refer to the “Troubleshooting” section at the end of this document.

Provide External Access

The following steps walk you through setting up the system to be available for public access.

  1. Find and keep the auto-generated load balancer name for the nginx ingress service.  
    • Run kubectl -n kube-system describe svc nginx-ingress-service | grep 'LoadBalancer Ingress' 
    • Copy and store the id associated with the Ingress ELB (Elastic Load Balancer)
  2. Register a CNAME for your host (referenced in your Ingress routes) pointing to the Ingress ELB
    • We use AWS Route53 to manage this information
  3. Record the ports that the ELB is forwarding to because they are auto-generated
  4. If you’re doing SSL:
    • Register an SSL certificate with the ELB to provide SSL termination at the ELB
    • Change the ELB from TCP/SSL protocols to HTTP/HTTPS forwarding to the same HTTP port that was recorded in step 3.  This ensures the ELB adds the necessary X-Forwarded headers to requests.
    • If you forgot to write those down you can still get the information
    • Run kubectl -n kube-system describe svc nginx-ingress-service
    • The ports you want are the nodeport for HTTP and HTTPS
    • The ELB should be set up to do HTTP and HTTPS but by default may be TCP.
  5. If you’re not doing SSL:
    • Change the ELB from TCP/SSL protocols to HTTP/HTTPS forwarding to the separate ports recorded in step 3.
  6. Verify everything is running by navigating a browser to <YOUR HOST>/student and <YOUR HOST>/proctor

Configuring Proctor

Once creating a new proctor deployment you will need to do a couple extra steps to configure it to work with OpenAM and SAML needs. The steps below assume you are using SAML and OpenAM for security.

  1. Go to <Your Host>/proctor/saml/metadata
    • If things are working correctly this will give you a spring_saml_metadata.xml file
  2. Log into OpenAM.
    1. Under the Common Tasks tab and Create SAMLv2 Providers click the Register Remote Service Provider
    2. Configure the Provider
      1. Pick the realm you want to use. Should be single one if you followed the provided deployment steps.
      2. Paste the URL that you typed in on step 1 into URL where metadata is located textbox
      3. You should not have to change the Circle of trust as it should be whatever you set up in the OpenAM configuration.
      4. Click Configure at the top right.
  3. Verify it is configured in OpenAM
    1. click the Federation tab
    2. You should see a entity id + saml2 (i.e. proctorexamplesaml2)
    3. You should have the same entity id in the Entity Providers section
  4. Go to Proctor to see it is configured properly.

Redeploying a Kubernetes Pod

There are some cases when a specific pod(s) needs to be updated. For example, when a new version of the Student application is released a new Docker image will be made available for deployment. Following the steps below, the environment can be updated without having to re-deploy all pods in the environment.

Deploying a Specific Version of a Kubernetes Pod

To deploy a specific version of a pod, use the following steps:

  1. Identify the version of the pod to deploy (e.g. Student 4.1.0)
  2. Refer to the Version Compatibility matrix to ensure the desired version is compatible with other software deployed in the environment
  3. Find the deployment YAML file that describes the pod to deploy
  4. Find the image element in the deployment YAML file identified in Step 3
  5. Change the value after the : of the image name (referred to as a “tag”) to include the specific version of the Docker image to deploy
  6. Verify the imagePullPolicy is set to Always, which will ensure the image is pulled from the Docker repository
    • The default behavior is to only pull the Docker image if it does not already exist (reference).

An example of choosing a specific version (v 4.0.1) of the tds-student-war.yml file is shown below. Note only the relevant parts of the file are shown; the rest of the file has been omitted for clarity:

    
        # other parts of the tds-student-war.yml file...

        spec:
          containers:
          - name: tds-student-war
            image: fwsbac/student:4.0.1
            imagePullPolicy: Always

        # ... the rest of the tds-student-war.yml file
    
  
  1. Save the changes made to the deployment YAML file
  2. After the deployment YAML file has been updated, the environment can be updated to use the specified version:
    • The existing pods can be deleted via kubectl delete po, causing Kubernetes to re-deploy the missing pods
    • The existing pods can be updated using kubectl replace

Troubleshooting

The cause of most failures during deployment will be due to configuration issues either in the configuration or deployment yml files.

Resolving CIDR Range Conflicts

It is possible that kops will report a conflict/overlap between CIDR blocks during setup. The error will appear similar to this:

InvalidSubnet.Conflict: The CIDR 'XXX' conflicts with another subnet

NOTE: ‘XXX’ will be the CIDR range that is causing the conflict

To see the existing subnets of your VPC, use the following command:

aws ec2 describe-subnets --filters Name=vpc-id,Values=[the id of the VPC created earlier]

Example: aws ec2 describe-subnets --filters Name=vpc-id,Values=vpc-87f620fe

the output will appear similar to what’s shown below:

{
    "Subnets": [
        {
            "VpcId": "vpc-87f620fe",
            "Tags": [
                {
                    "Key": "Name",
                    "Value": "pri-sectest.sbtds.org"
                }
            ],
            "CidrBlock": "170.20.1.0/24",
            "AssignIpv6AddressOnCreation": false,
            "MapPublicIpOnLaunch": false,
            "SubnetId": "subnet-e0362186",
            "State": "available",
            "AvailableIpAddressCount": 251,
            "AvailabilityZone": "us-west-2a",
            "Ipv6CidrBlockAssociationSet": [],
            "DefaultForAz": false
        },
        {
            "VpcId": "vpc-87f620fe",
            "Tags": [
                {
                    "Key": "Name",
                    "Value": "pub-sectest.sbtds.org"
                }
            ],
            "CidrBlock": "170.20.0.0/24",
            "AssignIpv6AddressOnCreation": false,
            "MapPublicIpOnLaunch": false,
            "SubnetId": "subnet-c03522a6",
            "State": "available",
            "AvailableIpAddressCount": 249,
            "AvailabilityZone": "us-west-2a",
            "Ipv6CidrBlockAssociationSet": [],
            "DefaultForAz": false
        }
    ]
}

Edit the cluster configuration to eliminate the conflicts. A CIDR range calculator (e.g. this one or this one) may be useful in verifying that the chosen IP address ranges do not overlap

An example of a cluster configuration edited to avoid CIDR conflicts is shown below:

kops edit cluster sectest.sbtds.org

    
      apiVersion: kops/v1alpha2
      kind: Cluster
      metadata:
        creationTimestamp: 2018-01-12T02:04:16Z
        name: sectest.sbtds.org
      spec:
        api:
          loadBalancer:
            type: Public
        authorization:
          alwaysAllow: {}
        channel: stable
        cloudProvider: aws
        configBase: s3://kops-sbtds-org-state-store/sectest.sbtds.org
        dnsZone: sbtds.org
        etcdClusters:
        - etcdMembers:
          - instanceGroup: master-us-east-1a
            name: a
          name: main
        - etcdMembers:
          - instanceGroup: master-us-east-1a
            name: a
          name: events
        iam:
          allowContainerRegistry: true
          legacy: false
        kubernetesApiAccess:
        - 0.0.0.0/0
        kubernetesVersion: 1.8.4
        masterInternalName: api.internal.sectest.sbtds.org
        masterPublicName: api.sectest.sbtds.org
        networkCIDR: 170.20.0.0/16
        networkID: vpc-2348765b
        networking:
          weave:
            mtu: 8912
        nonMasqueradeCIDR: 100.64.0.0/10
        sshAccess:
        - 0.0.0.0/0
        subnets:
        - cidr: 170.20.32.0/19
          name: us-east-1a
          type: Private
          zone: us-east-1a
        - cidr: 170.20.10.0/22  # <-- this is the change to avoid the conflict; the CIDR range reported in the error was 170.20.0.0/22
          name: utility-us-east-1a
          type: Utility
          zone: us-east-1a
        topology:
          bastion:
            bastionPublicName: bastion.sectest.sbtds.org
          dns:
            type: Public
          masters: private
          nodes: private
    
  

Run kops update cluster sectest.sbtds.org --state s3://kops-sbtds-org-state-store --yes to update the cluster

Handy kubectl commands

Restarting a Kubernetes Pod

This is useful when pushing a configuration yml update.

back to Deployment Checklists

As of Fall 2018 the Smarter Balanced Test Delivery System (TDS) is no longer supported.

The code base and documentation for the TDS is available within the Smarter Balanced GitHub repository.

Creative Commons License Unless stated otherwise, all content on SmarterApp.org is licensed under a Creative Commons Attribution 4.0 International License.