2019-11-04 Openshift

Autoscaling In Openshift

So you have scaled pods up manually using the web interface or:

oc scale --replicas=3 dc/<my-project>

but now you want to ensure that this autoscales for when you aren’t around and will increase and decrease based on load

HorizontalPodAutoscaler#

A HorizontalPodAutoscaler object needs to be added to your project

The horizontal pod autoscaler computes the ratio of the current metric utilization with the desired metric utilization, and scales up or down accordingly

You should ensure your pods has gone through readiness checks to ensure it is ready for scaling

You should know the max and min pods to run beforehand:

oc autoscale dc/<project> --min 1 --max 10 --cpu-percent=80

Get the horizontal pod autoscalers

oc get hpa

It should show you min, max, target and current

$ oc get hpa
NAME      REFERENCE               TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
blog      DeploymentConfig/blog   <unknown>/80%   1         10        3          1m

More detailed info:

oc describe hpa/blog

If you get errors like:

ScalingActive  False   FailedGetResourceMetric  the HPA was unable to compute the replica count: unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: the server could not find the requested resource (get pods.metrics.k8s.io)
...failed to get cpu utilization: unable to get metrics for resource cpu

then you should request that your Openshift cluster admin enable cluster metrics

Once you have set that up by modifying the metrics cluster install configuration

oc adm diagnostics MetricsApiProxy --loglevel 10

Debugging no access to the metrics API#

To check the deployed services use:

kubectl get apiservices

We are looking for metrics.k8s.io in the output, if it is not there then it might not be registered

Get the metrics related pods with:

kubectl get pods -n kube-system

To check resource consumption you should be able to:

kubectl top

but in my case it would just give help info and this message

This command requires Heapster to be correctly configured and working on the server.

Using the docs#

You can verify the metrics were installed correctly with:

oc adm top node
oc adm top pod

You can also get all the apis with

oc get --raw /apis/

which I think is the same as

kubectl get apiservices

I decided to redeploy the metrics API with ansible and didn’t set a specific hostname

oc adm diagnostics MetricsApiProxy

I still get this error:

ERROR: [DClu4003 from diagnostic MetricsApiProxy@openshift/origin/pkg/oc/cli/admin/diagnostics/diagnostics/cluster/metrics.go:89]
    Unable to access the metrics API Proxy endpoint /api/v1/proxy/namespaces/openshift-infra/services/https:heapster:/api/v1/model/metrics:
    (*errors.StatusError) the server could not find the requested resource
    The Horizontal Pod Autoscaler is not able to retrieve metrics to drive scaling.

You can check that you can get the metrics for a pod with:

oc adm top pod --heapster-namespace='openshift-infra' --heapster-scheme='https' -n demo

you don’t need those flags though:

oc adm top pod -n demo

Configure a 80 milicore limit for CPU requests

oc patch dc/guestbook -p '{"spec":{"template":{"spec":{"containers":[{"name":"guestbook","resources":{"limits":{"cpu":"80m"}}}]}}}}'
oc autoscale dc/guestbook --min 1 --max 3 --cpu-percent=20

Get the object like any other openshift object

oc get hpa guestbook -o yaml -n myproject

In the pod artificial stress is added with:

seq 3 | xargs -0 -n1 timeout -t 60 md5sum /dev/zero

This reddit post describes the same issue and answers declare that it is insecure certs

To view the logs on openshift I used:

kubectl logs hawkular-metrics-ks6sb --namespace=openshift-infra

It seems to still be the api aggregation issue…that it was not setup correctly.

metrics.k8s.io

Can check diagnostics of the entire cluster with:

oc adm diagnostics

Get metrics pods with:

oc get pods -n openshift-infra

Checking the pod:

oc describe pod hawkular-metrics-ks6sb -n openshift-infra

Gave me a bunch of errors:

Type     Reason     Age   From                              Message
----     ------     ----  ----                              -------
Normal   Scheduled  1h    default-scheduler                 Successfully assigned openshift-infra/hawkular-metrics-ks6sb to openshift.example.co.za
Normal   Pulled     1h    kubelet, openshift.example.co.za  Container image "docker.io/openshift/origin-metrics-hawkular-metrics:v3.11.0" already present on machine
Normal   Created    1h    kubelet, openshift.example.co.za  Created container
Normal   Started    1h    kubelet, openshift.example.co.za  Started container
Warning  Unhealthy  1h    kubelet, openshift.example.co.za  Liveness probe failed: Failed to access the status endpoint : <urlopen error [Errno 111] Connection refused>.
Traceback (most recent call last):
File "/opt/hawkular/scripts/hawkular-metrics-liveness.py", line 48, in <module>
    if int(uptime) < int(timeout):
ValueError: invalid literal for int() with base 10: ''
Warning  Unhealthy  1h (x3 over 1h)  kubelet, openshift.example.co.za  Readiness probe failed: Failed to access the status endpoint : <urlopen error [Errno 111] Connection refused>. This may be due to Hawkular Metrics not being ready yet. Will try again.
Warning  Unhealthy  1h               kubelet, openshift.example.co.za  Readiness probe failed: Failed to access the status endpoint : timed out. This may be due to Hawkular Metrics not being ready yet. Will try again.
Warning  Unhealthy  1h (x3 over 1h)  kubelet, openshift.example.co.za  Readiness probe failed: The MetricService is not yet in the STARTED state [STARTING]. We need to wait until its in the STARTED state.

So I try to access heapster directly with:

curl -X GET https://${KUBERNETES_MASTER}/api/v1/proxy/namespaces/openshift-infra/services/https:heapster:/api/v1/model/metrics

And I get back:

{
    "kind": "Status",
    "apiVersion": "v1",
    "metadata": {

    },
    "status": "Failure",
    "message": "services \"https:heapster:\" is forbidden: User \"system:anonymous\" cannot proxy services in the namespace \"openshift-infra\": proxy verb changed to unsafeproxy\nno RBAC policy matched, proxy verb changed to unsafeproxy",
    "reason": "Forbidden",
    "details": {
        "name": "https:heapster:",
        "kind": "services"
    },
    "code": 403
}

The redhat forums also highlight this issue and give the following diagnosis steps:

oc adm top pod -n hooks
oc get --raw="/api/v1/proxy/namespaces/openshift-infra/services/https:heapster:/api/v1/model/metrics" 
oc get --raw="/api/v1/namespaces/openshift-infra/services/https:heapster:/proxy/api/v1/model/metrics"

The second request works, furthermore the bugzilla report notes that the incorrect url is used.

So the solution is to upgrade to a later version, use this guide and ensure to reboot the host:

ansible-playbook playbooks/byo/openshift-cluster/upgrades/v3_11/upgrade.yml -e openshift_certificate_expiry_warning_days=30

So now it works.

oc describe hpa

Now gives successful

    Type            Status  Reason            Message
----            ------  ------            -------
AbleToScale     True    ReadyForNewScale  the last scale time was sufficiently old as to warrant a new scale
ScalingActive   True    ValidMetricFound  the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
ScalingLimited  True    TooFewReplicas    the desired replica count is increasing faster than the maximum scale rate

and the hpa now shows the current cpu usage:

[root@openshift ~]# oc get hpa
NAME      REFERENCE               TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
blog      DeploymentConfig/blog   0%/50%    1         10        1          2d

You need to set CPU request and CPU Limit.

Sources#

Remember to get the correct version of the documentation - the same as the system you are running