When serious vulnerabilities are identified, having a Managed Service Provider pays off. Fortunately, your infrastructure is pretty close to bullet-proof on Qbox.
Updated 12/17: If you are on ES versions 7.x, 6.x, or 5.x, and HAVE restarted or have had support upgrade your ES version on or after December 14, 2021, your cluster has already been mitigated.
ES version 2.x and 1.x is not impacted and no mitigation is needed.
Due to the community’s latest guidance, you no longer need to upgrade your ES version to mitigate the vulnerability. Qbox has updated all of its images with the recommended mitigations in all impacted ES versions that we offer. All that is required to pull in these mitigations is a restart or rebuild of your cluster.
If you prefer for us to schedule the maintenance in order to fully implement the mitigations to avoid downtime, we have created a Calendar link where you can schedule the time. In the form, please include the list of your Qbox account email along with a list of all cluster names and in what order you would like for clusters to be restarted/rebuilt. If it is unclear, we will attempt to reach you.
As a reminder, if we are doing a rebuild on your clusters: Cluster’s that have at least 3 nodes and fully replicated indices will have minimal if any downtime. Clusters smaller than this will have downtime.
If we/you are doing a restart of your clusters: There is the possibility of downtime no matter your node count. However, this process is much quicker than a rebuild.
Also, we will be doing rebuilds and restarts of clusters 1 cluster at a time from your list and some clusters take longer than others to complete. If we are unable to restart/rebuild all clusters within this window, we will let you know so that you can reschedule the clusters that were not rebuilt.
You can create a ticket with us at any time should you have any questions.
We still recommend you upgrade your ES version. Please submit a support ticket if you would like assistance in upgrading.
Updated 12/15: Overnight, an additional vulnerability (CVE-2021-45056) related to log4j emerged, noting that the existing solutions were incomplete, relating to the JNDI Lookup Class in certain configurations. After carefully researching this bug and monitoring the Open Source community, it was determined that it mostly dealt with non-default configurations. Therefore, the guidance regarding restarting or migrating versions has not changed. Since this does not require additional action on the part of users, an email will not be sent for this update.
Updated 12/14: As we do further exploration into this vulnerability and synthesize the advice from the Open Source community and the Elasticsearch community, our advice is evolving. Some of this will require action and understanding on behalf of our customers. Security is a top priority for us, and we stand by to help. Just fill out a support ticket if you have questions.
- For users of the 5.x version releases. If you are on version 5.3.2 or 5.5.1, you should migrate to a version not lower than 5.6.16. If you already on 5.6.16 or later, you simply need to restart your cluster. This will pull an updated and mitigated version.
- For users of the 6.x version releases. If you are on 6.2.1 or 6.3.2, you should migrate to a version not lower than 6.4.3. If you already on 6.4.3 or later, you simply need to restart your cluster. This will pull an updated and mitigated version.
- For all Qbox Next-Gen users, a restart is the only thing that is needed.
- The advice from below is reiterated. If you are properly replicated and have at least 3 nodes, a restart may not require downtime.
- For all affected users (anybody using version 5.x, 6.x, or 7.x), we will assess your cluster’s readiness for a restart. If we are unsure, we will attempt to schedule it on your timetable. Eventually, we will need to force restart those that who are not taking action themselves.
- Those most affected will be those needing to migrate to a later version, since this may require testing your app for breaking changes. We will likewise reach out to you proactively to schedule.
Over the weekend, the internet was set ablaze with news of a vulnerability in Apache Log4j (CVE-2021-44228), a widely used component for logging requests. It was identified that certain versions of Elasticsearch were impacted, requiring a response from us. Refer to this blog post as our official response. We are acutely aware that It is times like these where having a managed service provider pays off in spades, and we have been working around the clock since the news hit the wires to assess the impact to your infrastructure. You will see that there is ample reason for optimism.
Two vectors of attack make this vulnerability especially insidious, Remote Code Execution (RCE) and Information Leakage. For both vectors, all known mitigation measures require a restart of your cluster. As long as you have at least 3 nodes and are properly replicated, there should be no downtime. We assume that those with less than 3 nodes are test infrastructure. If you have been running production workloads on test infrastructure, please consider adding more nodes to your cluster. In doing so, you will pull in all known mitigations that are available for your ES version. Fill out a support ticket if you are unclear how to accomplish this.
The information leakage vulnerability is of more concern to Qbox customers because RCE has mostly been patched already. More on this below, but first allow me to allay most fears by explaining some protections that are already in place:
- Pre-5.x versions are not vulnerable. At present, version 6.x and 7.x are also not vulnerable to RCE, though information leakage requires another step. Versions of 5.x are still being evaluated, and we will update this post as well as apply mitigations as soon as they are known.
- Even if you have the worst-case scenario — running on a version of Elasticsearch that is vulnerable — it is still extremely unlikely that you could be subject to an attack. Because of how Qbox clusters authenticate with allowed IP’s or with credentials, any attacker would need to bypass that protection first PLUS be able to guess or discover your endpoint before they could exploit this vulnerability. By definition, public-facing endpoints in the wild running community versions are the ones about which to be concerned. We can surmise that most attackers are looking for the lowest hanging fruit, the public and unprotected Elasticsearch clusters that can be sniffed with off-the-shelf tools. Unless you have connected your front-end directly to your ES cluster, exposing your endpoint, this should not apply to any Qbox cluster.
- We have not been able to replicate this vulnerability on any version running within our walls. When anybody restarts, upgrades, or reinstalls, our provisioner pulls the latest binary which hasn’t included vulnerable JDK versions for some time. Thus, only those customers that have both been live for some time AND never restarted could possibly be vulnerable.
Taken together, we believe these three factors are reason enough to not panic. Nonetheless, we are, of course, acutely aware that “highly unlikely” does not mean “impossible”, so we are monitoring the situation closely and taking many mitigating steps anyway. At the same time, we are balancing this concern with the highly probable risks that mass patching and migrating of versions would result in breaking changes to your apps. The cure can’t be worse than the disease.
Back to the information leakage vulnerability. We are deploying a recommended JAVA opts log4j2 mitigation on the versions that have it available. So, if you are on version 5.5.1, 5.6.16, or 6.7.2, you should restart your cluster from the dashboard to start using a mitigated version. However, there are some versions on which this does not work, though we are hoping that a universal fix will be available soon. We have identified a double digit number of clusters where the simple mitigation measures may not be compatible. Four versions are potentially vulnerable and in the Qbox environment. If you are running version 6.4.3, 6.3.2, 6.2.1, or 5.3.2, we will reach out to you separately and arrange for a migration to a later version.
Regarding our other internal infrastructure, including the Kubernetes clusters that host your clusters and our provisioner app, we have confirmed internally that this vulnerability is not present.
As always, we appreciate your trust in us, and you can be certain that we are working around the clock, monitoring the situation and taking the needed precautions until we are reasonably certain that this vulnerability has been quashed. If you have specific questions, do not hesitate to fill out a support ticket.