Skip to main content

· 阅读需要 1 分钟
Da Yin

Since Open Application Model invented in 2020, KubeVela has experienced tens of version changes and evolves advanced features towards modern application delivery. Recently, KubeVela has proposed to become a CNCF incubation project and delivered several public talks in the community. As a memorandum, this article will look back into the starting points and give a comprehensive introduction to the state of KubeVela in 2022.

What is KubeVela?

KubeVela is a modern software platform that makes delivering and operating applications across today's hybrid, multi-cloud environments easier, faster and more reliable. It has three main features:

  • Infrastructure agnotic: KubeVela is able to deploy your cloud-native application into various destinations, such as Kubernetes multi-clusters, cloud provider runtimes (like Alibaba Cloud, AWS or Azure) and edge devices.
  • Programmable: KubeVela has abstraction layers for modeling applications and delivery process. The abstraction layers allow users to use programmable ways to build higher level reusable modules for application delivery and integrate arbitrary third-party projects (like FluxCD, Crossplane, Istio, Prometheus) in the KubeVela system.
  • Application-centric: There are rich tools and eco-systems designed around the KubeVela applications, which add extra capabilities for deliverying and operating the applications, including CLI, UI, GitOps, Observability, etc.

KubeVela cares the whole lifecycle of the applications, including both the Day-1 Delivery and the Day-2 Operating stages. It is able to connect with a wide range of Continuous Integration tools, like Jenkins or GitLab CI, and help users deliver and operate applications across hybrid environments. Slide2.png

Why KubeVela?

Challenges and Difficulties

Nowadays, the fast growing of the cloud native infrastructures has given more and more capabilities for users to deploying applications, such as High Availability and Security, but also exposes an increase number of complexities directly to application developers. For example, the Ingress resource on Kubernetes enables users to expose their applications easily, but developers need to handle the Ingress upgrades when the underlying Kubernetes version shifts, which requires knowledges for the Ingress resource. The hybrid deployment across various cloud provides can make this problem even harder. These difficulties are caused by the lack of operational input in the application definition and developers must face the infrastructure details directly if they want to enjoy the benefits brought by the rich cloud-native community. Slide3.png

Open Application Model

To tackle the above challenges and bridge the gap between the use of applications and the understanding of infrasturcture details, Open Application Model (OAM) is jointly proposed by Alibaba Cloud and Microsoft Azure in 2020. The aim is to define a consistent application model for application delivery, irrelevant with the platforms and implementations. The defined application model describes an interface for developers on what an application consists of and how it should work. The former one is known as Component in OAM, which is usually used to model the workloads of the application. The latter one is defined as Trait in OAM, which attaches extra capabilities to Components. Slide4.png

KubeVela as OAM

KubeVela is one of the implementations for the Open Application Model. In KubeVela, the abstraction layer is powered by CUE, a novel configuration programming language which can describe complex rendering logics and work as a superset of JSON. The abstraction layer simplifies the configuration of resources in Kubernetes, which hides the details of implementations and exposes limited parameters to the front developers. With KubeVela application, it is easy for developers to focus on the centric logic of applications, like what container image should be used and how the service should be made accessible. Slide5.png To achieve that, best practices of using Kubernetes native resources are summarized into KubeVela X-Definitions, which provide rendering templates of resources using CUE. These templates can be accessed from various sources, including official repositories, community addons or even self customized implementations by system operators. The templates are mostly infrastructure implemetation agnostic, in other words, not necessarily bond to specific infrastructures. The developers do not need to be aware of the underlying infra when using these templates.

Components & Traits

The application model divides the abstraction of infra into two different aspects. The Component describes the main workload, which particularly in Kubernetes can be Deployments, StatefulSets, Helm Releases, etc. The Trait on the other hands, describes the added capability for the main workload, such as the scaler trait specifying the number of replicas and the gateway trait aggregates the endpoints for access. The separation of concerns in the design of Component and Trait give high extensibility and reusability to the abstraction. Slide6.png For example, the gateway trait could be backended by different infrastructures like Ingress or HTTPRoute. The application developer who uses the trait only needs to care about the exposed parameters, including the path, port and domain. The trait can be attached to various types of workloads, abstracted by different types of components, such as Deployment, StatefulSet, CloneSet, etc. Slide7.png In the cases where application developers and SRE are in the different teams, KubeVela makes clear division for their responsibilities.

  • The platform team providing infrastructures, are responsible to build up X-Definitions where they enforce best practices and deployment confidence.
  • The end users only need to choose the Component and Trait provided by the platform team and use them to assemble applications. They can simply enjoy PaaS-like experiences instead of directly interacting with the infra behind.

These are made possible thanks to the flexible, extensible and programmable system of KubeVela and can be applied under varying environments. Slide8.png

Unified Delivery

Application delivery could happen everywhere. Therefore, another goal for KubeVela application is to build up unified delivery and provide consistent usage for users under various scenarios.

Hybrid-Cloud & Multi-Cluster

In addition to the abstraction layer, KubeVela also supports hybrid-cloud or multi-cluster architecture natively as modern cloud native applications are not only about containers but involves lots of cloud resources as well. Besides, more and more users and teams start facing the difficulties of deliverying applications to various environments or multi-clusters for different purposes, such as testing or high availability. Slide9.png The KubeVela application allows user to define delivery targets and differentiated configurations through policies. The abstraction helps hide the details of how clusters are registered and connected and provide runtime-agnostic usages to app developers. Slide10.png

Addon Integration

To enrich the delivery capability, users can leverage KubeVela addons to make extensions to their system. The addons are discoverable, reusable and easy-to-install capability bundles. They usually contain capability providers, including a wide range of third-party projects, like FluxCD, ClickHouse, Crossplane, etc. Addons not only install those projects into the system but create corresponding definitions for the integration concurrently, which extends the types of Component and Trait that application developers are able to use. The KubeVela community currently have 50+ addons already. Platform builders could enjoy these out-of-box integrations in systems depending on their customized demands. Slide11.png

With addons enabled in the system, it would be possible for end users to assemble applications in more customized ways, such as deploying cloud resources or using advanced workloads. Slide12.png

KubeVela Workflow

While the Open Application Model defines the composition of an application, in real cases, the delivery process of the compositions could still vary a lot. For example, the different components in one application could have inter dependencies or data passing where delivery steps must be executed in specific order. Furthermore, the delivery process sometimes also involves more actions apart from the delivery of resources, such as rollouts or notifications. An extensible workflow is therefore designed to fulfill the needs of the process customization in KubeVela. Slide13.png Similar to Component and Trait, KubeVela workflow also leverages CUE to define workflow steps, providing flexibility, extensibility and programmability. It can be seen as another form of Infrastructure as Code (IaC). A bunch of build-in workflow steps has already provided rich out-of-box capabilities in KubeVela, such as making multi-cluster deployments and sending notifications through slack or email. The lightweight engine ensures the high performance and safety of step executions, compared to other types of engines involving running extra containers. Slide14.png Differ from the Component and Trait definitions in KubeVela, the WorkflowStep definition does not render templates into resources. Instead, it describes the actions to be executed in the step, which calls underlying atomic functions in various providers. Slide15.png With the use of workflow and addons, users are able to build arbitrary delivery process and make customized integrations. For example, it is possible to let the Continuous Integration tools to trigger the delivery of KubeVela applications and implement the GitOps solutions combining FluxCD and other addons. Slide16.png

Day-2 Management

KubeVela cares more other than Day-1 Delivery. It also provides a unified Day-2 application management capability for all it's extensibility. The day-2 management is necessary for system operators and application developers to make continuous operation for the delivered applications and ensure the applications are always under control.

Resource Management

The basic capabilities for application management are for its resources. KubeVela's core controller continuously watches the difference between the current state and the desired state of delivered resources. It makes sure that the live spec is accord with the declared spec recorded in the delivery process and therefore effectively prevents any configuration drits. Slide18.png Besides, the automated garbage collection help recycle the resources that are not in-use during upgrades or deletion. There are also times resources need to be shared across multiple applications. These are all made possible in KubeVela application through the use of policies. Slide19.png

Version Control

KubeVela application keeps history records for deliveries. These snapshots are useful when new version publish are out of expectations. The change inspectation could be used to diagnose the possible error changes and the rollback allows fast recovery to the previous successful states. Slide20.png

Observability

KubeVela treats observability as first class citizen. It is the eyes to users for monitoring the state of applications and observing exceptions. There are multiple tools and methods in KubeVela to do the observation job. One of the most straightforward way is to use the CLI tool of KubeVela. The Vela CLI is able to provide in-time status info for the application in fine-grain or aggregated level. Slide21.jpg For users that prefer web interfaces, VelaUX provides an alternative way to view application status. Slide22.png In the cases applications are monitored through third-party projects, such as Grafana, Prometheus or Loki, KubeVela further provides addons for bootstrapping the observability infrastructures and empower users to customize the observing rules as codes in applications, through the abstraction layer. Slide23.png A series of out-of-box metrics and dashboards give users the basic capability of automated system observability. These can be used to diagnose system level exceptions and help improve the overall performance. Slide24.png

Eco-system

In addition to the above mentioned tools, KubeVela also has several other tools in the eco-systems to facilitate application delivery.

  • Vela CLI: KubeVela CLI provides various commands that helps you to operate applications, such as managing definitions, viewing resources, restarting workflow, rolling versions.
  • VelaUX: VelaUX is the Web UI for KubeVela. Besides, it incorporates business logics into fundamental APIs and provides out-of-box user experiences for non-k8s-expert users.
  • Terraform Controller: The terraform controller in KubeVela allows users to use Terraform to manage cloud resources through Kubernetes Custom Resources.
  • Cluster Gateway: The gateway that provides unified multi-cluster access interface. Working as Kubernetes Aggregated API Server, the gateway leverages the native Authentication and Authorization modules and enforces secure and transparent access to managed clusters.
  • VelaD: Building on top of k3s & k3d, VelaD integrates KubeVela with Kubernetes cores, which can be extremely helpful for building dev/test environment.
  • Vela Prism: The extension API server for KubeVela built upon the Kubernetes Aggregated API Server. It projects native APIs like creating dashboards on Grafana into Kubernetes resource APIs, so that users can manage 3rd-party resources as Kubernetes native resources.
  • Vela Workflow: The workflow engine translates CUE-based steps and executes them. It works as a pure delivery tool and can be used aside by the KubeVela application. Compared to Tekton, it mainly organize the process in CUE style, instead of using Pods and Jobs directly.

Slide25.png

Stability

To ensure KubeVela is able to handle certain amount of applications under limited resources, multiple load testings have been conducted under various circumstances. The experiments have demonstrated that the performance of KubeVela system is capable of dealing thousands of applications in an ordinary-sized cluster. The observability infrastructure further exposes the bottleneck of KubeVela and guides system operators to do customized tunning to improve the performance in specific use environments.Slide26.png

In a nutshell

Currently, KubeVela has already been applied in production by a number of adopters from various areas. Some mainly use KubeVela's abstraction capability to simplify the use and deploy of applications. Some build application-centric management system upon KubeVela. Some use the customized workflow to orchestrate the delivery process. It is especially welcomed in high-tech industries and shown to be helpful for delivering and managing enourmous applications.

Slide27.png

The KubeVela community has attracted world-wide contributors and continuously evolves over the past two years. Nowadays, there are over 200 contributors from various contries have participated in the developing of KubeVela. Thousands of issues have been raised and 85% of them are already solved. There are also bi-weekly community meetings held in both English and Chinese community.

Slide28.png

With more and more people coming into the community, KubeVela is consistently upgrading itself to fit into more complex, varying use cases and scenarios.

· 阅读需要 1 分钟
Daniel Higuero

Application Delivery on Kubernetes

The cloud-native landscape is formed by a fast-growing ecosystem of tools with the aim of improving the development of modern applications in a cloud environment. Kubernetes has become the de facto standard to deploy enterprise workloads by improving development speed, and accommodating the needs of a dynamic environment.

Kubernetes offers a comprehensive set of entities that enables any potential application to be deployed into it, independent of its complexity. This however has a significant impact from the point of view of its adoption. Kubernetes is becoming as complex as it is powerful, and that translates into a steep learning curve for newcomers into the ecosystem. Thus, this has generated a new trend focused on providing developers with tools that improve their day-to-day activities without losing the underlying capabilities of the underlying system.

Napptive Application Platform

The NAPPTIVE Playground offers a cloud native application platform focused on providing a simplified method to operate on Kubernetes clusters and deploy complex applications without needing to work with low-level Kubernetes entities. This is especially important to be able to accommodate non-developer users' personas as the computing resources of a company are being consolidated in multi-purpose, multi-tenant clusters. Data scientists, business analysts, modeling experts and many more can benefit from simple solutions that do not require any type of knowledge of cloud computing infrastructures, or orchestration systems such as Kubernetes, which enable them to run their applications with ease in the existing infrastructure.

Any tool that works in this space must start by analyzing the existing abstractions. In particular, at Napptive, we focus on the Applications. This quite understood abstraction is not present in Kubernetes, requiring users to manually tag, identify and reason about the different components involved in an application. The Open Application Model provides an excellent abstraction to represent any type of application independently of the cloud provider, containerization technology, or deployment framework. The model is highly customizable by means of adding Traits, Policies, or new Component Definitions.

KubeVela in Napptive

The Napptive Playground leverages Kubevela as the OAM runtime for Kubernetes deployments. Our Playground provides an environment abstraction with multi-tenant guarantees that is equivalent to partitioning a shared cluster by means of differentiated namespaces. The benefit of our approach is the transparency of its configuration using a higher abstraction level that does not involve any Kubernetes knowledge.

The following diagram describes the overall architecture of Napptive and Kubevela deployed in a Kubernetes cluster.

napptive-arch

The user has the ability to interact with the cluster by using the Napptive user interface (CLI or Web UI), or by means of using the standard Kubernetes API with kubectl. Isolated environments can be easily created to establish the logical separations such as type of environment (e.g., deployment, staging, production), or purpose (e.g., projectA, projectB), or any other approach to differentiate where to deploy an application. Once the OAM application is deployed on Kubernetes, Kubevela is in charge of managing the low-level application lifecycle creating the appropriate Kubernetes entities as a result. One of the main differences with other adopters of Kubevela, is the fact that we use Kubevela in a multi-tenant environment. The following figure shows the specifics of an application deployment in this type of scenario.

napptive-arch

The Napptive Playground is integrated with Kubernetes RBAC and offers a native user management layer that works in both on-premise and cloud deployments. User identity is associated with each environment, and Kubevela is able to take that information to ensure that users can only access their allowed resources. Once a user deploys an OAM application, Kubevela will intercept the call and attach the user identity as an annotation. After that, the rendering process will ensure that the user has access to the entities (e.g., trait, component, policy, workflow step definitions) and that the target namespaces are also accessible by the user. Once the application render is ready, the workflow orchestrator will create the different entities in the Kubernetes namespace.

Napptive in the Community

In terms of our involvement with the OAM/Kubevela community, it has evolved overtime from being passive members of the community simply exploring the possibilities of OAM in Kubernetes, to becoming active contributor members in different areas. In particular, we have closely worked with the core Kubevela development team to test and overcome the different challenges related to using a multi-tenant, RBAC compliance installation of Kubevela. Security in this type of installation is critical and it is one of our main focuses within the Kubevela community. We are committed to continue working with the community not only to ensure that multi-tenancy is maintained throughout the different releases, but also to add our own perspective representing our customer use cases.

The Kubevela community is growing at a fast pace, and we try to contribute our adaptations, features, feedback, and thoughts back to the community. This type of framework is deployed in a multitude of environments. The most common one is probably where one or more clusters are inside a company. In our particular application, we are interested in exploring methods to offer computing capabilities for a variety of user personas in a shared cluster, and we contribute our view and experience in the community.

Future Evolution

In terms of future evolution, we believe the Napptive Playground is a great tool to experience working with the Open Application Model without the need of installing new clusters and frameworks. From the point of view of our contributions in the community, we are internally working in exploring QA mechanisms that ensure that customer applications remain working after upgrades, and identifying potential incompatibilities between releases. Once that work is ready, we plan to contribute our testing environment setup back to the community so that it can be adopted in the main branch. We are also excited with new functionalities that have been recently added to Kubevela such as multi-cluster support, and are exploring methods to adopt it inside the Playground. Moreover, we are actively working on an OAM-compatible application catalog that will simplify the way organizations store and make accessible application definitions so that they can be deployed into a cluster from a central repository. The catalog focuses on the OAM entities and relies on existing container registries for storing the image.

· 阅读需要 1 分钟
孙健波

今天,阿里云云原生应用平台总经理丁宇在云栖大会隆重发布了 KubeVela 的全新升级!本次升级是 KubeVela 从应用交付到应用管理不断量变形成的一次质变,同时也开创了业界基于可扩展模型构建交付和管理一体化应用平台的先河

· 阅读需要 1 分钟
姜洪烨

KubeVela 插件(addon)可以方便地扩展 KubeVela 的能力。正如我们所知,KubeVela 是一个微内核高度可扩展的平台,用户可以通过 模块定义(Definition)扩展 KubeVela 的系统能力,而 KubeVela 插件正是方便将这些自定义扩展及其依赖打包并分发的核心功能。不仅如此,KubeVela 社区的插件中心也在逐渐壮大,如今已经有超过 50 款插件,涵盖可观测性、微服务、FinOps、云资源、安全等大量场景功能。

这篇博客将会全方位介绍 KubeVela 插件的核心机制,教你如何编写一个自定义插件。在最后,我们将展示最终用户使用插件的体验,以及插件将如何融入到 KubeVela 平台,为用户提供一致的体验。

· 阅读需要 1 分钟
曾庆国

KubeVela 1.5 于近日正式发布。在该版本中为社区带来了更多的开箱即用的应用交付能力,包括新增系统可观测;新增Cloud Shell 终端,将 Vela CLI 搬到了浏览器;增强的金丝雀发布;优化多环境应用交付工作流等。进一步提升和打磨了 KubeVela 作为应用交付平台的高扩展性体验。另外,社区也正式开始推动项目提级到 CNCF Incubation 阶段,同时在多次社区会议中听取了多个社区标杆用户的实践分享,这也证明了社区的良性发展。项目的成熟度,采纳度皆取得了阶段性成绩。这非常感谢社区 200 多位开发者的贡献。

· 阅读需要 1 分钟

背景

Helm 是云原生领域被广泛采用的客户端应用打包和部署工具,其简洁的设计和易用的体验得到了用户的认可并形成了自己的生态,到如今已有近万个应用使用 Helm Chart 的方式打包。Helm 的设计理念足够简洁,甚至可以归纳为以下两条:

  1. 对复杂的 Kubernetes API 做打包和模板化,抽象并简化为少量参数。
  2. 给出应用生命周期解决方案:制作、上传(托管)、版本化、分发(发现)、部署。

这两条设计原则保证了 Helm 足够灵活,同时也足够简单,能够涵盖全部的 Kubernetes API,很好的解决了云原生应用一次性交付的场景。然而对于具备一定规模的企业而言,使用 Helm 做软件的持续交付就出现了不小的挑战。

Helm 持续交付的挑战

Helm 设计之初就为了保证其简单易用,放弃了复杂的组件编排。所以在应用部署时,Helm 是一股脑将所有的资源交付到 Kubernetes 集群中,期望通过 Kubernetes 面向终态的自愈能力,自动化的解决应用的依赖和编排问题。这样的设计在首次部署时可能没有问题,然而对于具备一定规模的企业生产环境而言,就显得过于理想化了。

一方面,在应用升级时一股脑将资源全部更新很容易因为部分服务短暂的不可用造成整体的服务中断;另一方面,如果软件存在 BUG,也无法及时回滚,很容易将影响范围扩大,难以控制。在某些更严重的场景下,如存在生产环境部分配置被运维人工修改过,由于 Helm 一次性部署会将原有的修改全部覆盖,而 Helm 之前的版本与生产环境可能并不一致,导致回滚也无法恢复,形成更大面积的故障。

由此可见,当具备一定规模以后,软件在生产环境的灰度和回滚的能力极其重要,而 Helm 自身并不能保证足够的稳定性。

如何针对 Helm 做金丝雀发布?

通常情况下,一个严谨的软件升级过程会遵从类似如下流程:大致分成三个阶段,第一阶段升级少量(如 20% )的实例,并切换少量流量到新版本,完成这个阶段后先暂停升级。经过人工确认之后继续第二个阶段,升级更大比例(如 90% )的实例和流量,再次暂停等待人工确认。最后阶段将全量升级到新版本并验证完毕,从而完成整个发布过程。如果升级期间发现包括业务指标在内的任何异常,例如 CPU或 memory 异常使用率升高或请求 500 日志过多等情况,可以快速回滚。

image

上面就是一个典型的金丝雀发布的场景,那么针对 Helm Chart 应用,我们该如何完成上面这个流程呢?业界的典型做法通常有如下两种:

  1. 修改 Helm Chart,将工作负载变成两份,并分别暴露出不同的 Helm 参数,在发布时不断修改两份工作负载的镜像、实例数和流量比例,从而实现灰度发布。
  2. 修改 Helm Chart,将原先的基础工作负载修改为具备同样功能但是具备灰度发布能力的自定义工作负载,并暴露出 Helm 参数,在发布是操纵这些灰度发布的 CRD。

这两个方案都很复杂,有不小的改造成本,尤其是当你的 Helm Chart 是第三方组件无法修改或自身不具备维护 Helm Chart 能力时,这些方法都是不可行的。即使真的去改造了,相比于原先简单的工作负载模式,也存在不小的稳定性风险。究其原因,还是在于 Helm 本身的定位只是一个包管理工具,设计时并不考虑灰度发布、也不针对工作负载做管理。

事实上,当我们跟社区的大量用户深入交流以后,我们发现大多数用户的应用并不复杂,类别都是诸如 Deployment、StatefulSet 这些经典的类型。所以,我们通过 KubeVela( http://kubevela.net/ ) 强大的插件机制,联合 OpenKruise (https://openkruise.io/)社区做了一款针对这些限定类型的金丝雀发布插件。这款插件帮助你不做任何迁移改造,轻松完成 Helm Chart 的灰度发布。不仅如此,如果你的 Helm Chart 比较复杂,你完全可以针对你的场景定制一个插件,获得同样的体验。

下面我们通过一个实际的例子(以 Deployment工作负载为例),手把手带你感受一下完整的流程。

使用 KubeVela 做金丝雀发布

环境准备

  • 安装 KubeVela
$ curl -fsSl https://static.kubevela.net/script/install-velad.sh | bash
velad install

See this document for more installation details.

  • 启用相关的 addon
$ vela addon enable fluxcd
$ vela addon enable ingress-nginx
$ vela addon enable kruise-rollout
$ vela addon enable velaux

在这一步中,启动了以下几个插件:

  1. fluxcd 插件帮助我们具备 helm 交付的能力;
  2. ingress-nginx 插件用于提供金丝雀发布的流量管理能力;
  3. kruise-rollout 提供金丝雀发布能力;
  4. velaux 插件则提供界面操作和可视化。
  • 将 nginx ingress-controller 的端口映射到本地
$ vela port-forward addon-ingress-nginx -n vela-system

首次部署

通过执行下面的命令,第一次发布 helm 应用。在这一步中,我们通过 vela 的 CLI 工具部署,如果你熟悉 Kubernetes,也可以通过 kubectl apply 部署,效果完全相同。

cat <<EOF | vela up -f -
apiVersion: core.oam.dev/v1beta1
kind: Application
metadata:
name: canary-demo
annotations:
app.oam.dev/publishVersion: v1
spec:
components:
- name: canary-demo
type: helm
properties:
repoType: "helm"
url: "https://wangyikewxgm.github.io/my-charts/"
chart: "canary-demo"
version: "1.0.0"
traits:
- type: kruise-rollout
properties:
canary:
# The first batch of Canary releases 20% Pods, and 20% traffic imported to the new version, require manual confirmation before subsequent releases are completed
steps:
- weight: 20
# The second batch of Canary releases 90% Pods, and 90% traffic imported to the new version.
- weight: 90
trafficRoutings:
- type: nginx
EOF

在上面的例子中,我们声明了一个名为 canary-demo 的应用,其中包含一个 helm 类型的组件(KubeVela 也支持其他类型的组件部署),在组件的参数中包含 chart 的地址以及版本等信息。

另外,我们还为这个组件声明了 kruise-rollout 的运维特征,这个就是 kruise-rollout 这个插件安装后具备的能力。其中可以指定 helm 的升级策略,第一个阶段先升级 20% 的实例和流量,经过人工确认之后再升级90%,最后全量升到最新的版本。

需要注意的是,为了演示效果直观(体现版本变化),我们专门准备了一个 chart 。该 helm chart 的主体包含一个 Deployment 和 Ingress 对象,这是 helm chart 制作时最通用的场景。如果你的 helm chart 同样具备上述的资源,也一样可以通过这个例子进行金丝雀的发布。

部署成功之后,我们通过下面的命令访问你集群内的网关地址,将会看到下面的效果:

$ curl -H "Host: canary-demo.com" http://localhost:8080/version
Demo: V1

另外,通过 VelaUX 的资源拓扑页面,我们可以看到五个 V1 版本的实例已经全部就绪。

image

升级应用

应用下面的这个 yaml ,来升级你的应用。

cat <<EOF | vela up -f -
apiVersion: core.oam.dev/v1beta1
kind: Application
metadata:
name: canary-demo
annotations:
app.oam.dev/publishVersion: v2
spec:
components:
- name: canary-demo
type: helm
properties:
repoType: "helm"
url: "https://wangyikewxgm.github.io/my-charts/"
chart: "canary-demo"
# Upgade to version 2.0.0
version: "2.0.0"
traits:
- type: kruise-rollout
properties:
canary:
# The first batch of Canary releases 20% Pods, and 20% traffic imported to the new version, require manual confirmation before subsequent releases are completed
steps:
- weight: 20
# The second batch of Canary releases 90% Pods, and 90% traffic imported to the new version.
- weight: 90
trafficRoutings:
- type: nginx
EOF

我们注意到新的 application 和首次部署的相比仅有两处改动:

  1. 把 app.oam.dev/publishVersion 的 annotation 从 v1 升级到了 v2。这代表这次修改是一个新的版本。
  2. 把 helm chart 的版本升级到了 2.0.0 ,该版本的 chart 中的 deployment 镜像的 tag 升级到了 V2。

一段时间之后,我们会发现升级过程停在了我们上面定义的第一个批次,也就是只升级 20% 的实例和流量。这个时候多次执行上面访问网关的命令,你会发现 Demo: v1和 Demo: v2交替出现,并且有差不多 20% 的概率得到 Demo: v2的结果。

$ curl -H "Host: canary-demo.com" http://localhost:8080/version
Demo: V2

再次查看应用的资源的拓扑状态,会看到由 kruise-rollout trait 创建出来的 rolloutCR 为我们创建了一个新版本的实例,而之前工作负载创建出来的5个旧版本的实例并没有发生变化。

image

接下来,我们通过 vela 的 CLI 执行下面的命令,通过人工审核恢复继续升级:

$ vela workflow resume canary-demo

一段时间之后,通过资源拓扑图,我们看到五个新版本的实例被创建出来了。这个时候我们再次访问网关,会发现出现 Demo:v2 的概率大幅增加,接近于90%。

快速回滚

通常在一个真实场景中的发布中,经常会有经过人工审核之后,发现新版本应用的状态异常,需要终止当前的升级,快速将应用回退到升级开始前的版本。

我们就可以执行下面的命令,先暂停当前的发布工作流:

$ vela workflow suspend canary-demo
Rollout default/canary-demo in cluster suspended.
Successfully suspend workflow: canary-demo

紧接着回滚到发布之前的版本,也就是 V1 :

$ vela workflow rollback canary-demo
Application spec rollback successfully.
Application status rollback successfully.
Rollout default/canary-demo in cluster rollback.
Successfully rollback rolloutApplication outdated revision cleaned up.

这个时候,我们再次访问网关,会发现所有的请求结果又回到了 V1 的状态。

$ curl -H "Host: canary-demo.com" http://localhost:8080/version
Demo: V1

这时候,通过资源拓扑图,我们可以看到,金丝雀版本的实例也全部被删除了,并且从始至终,v1 的五个实例,作为稳定版本的实例,一直没有发生任何变化。

image

如果你将上面的回滚操作改为恢复继续升级,将会继续执行后续的升级过程,完成全量发布。

上述 demo 的完整操作过程请参考 文档

如果你希望直接使用原生的 k8s 资源实现上面过程可以参考 文档 。另外,除了 Deployment ,kruise-rollout 插件还支持了 StatefulSet 和 OpenKruise 的 CloneSet ,如果你的 chart 中的工作负载类型是以上三种都可以通过上面的例子实现金丝雀发布。

相信你也看注意到,上面的例子我们给出的是基于 nginx-Ingress-controller 的七层流量切分方案,另外我们也支持了 Kubernetes Gateway 的 API 从而能够支持更多的网关类型和四层的流量切分方案。

发布过程的稳定性是如何保证的?

首次部署完成后,kruise rollout 插件(以下简称 rollout)会监听 Helm Chart部署的资源,在我们的例子中就是 deployment, service 和 ingress ,还支持 StatefulSet 以及 OpenKruise Cloneset。rollout 会接管这个 deployment 后续的升级动作。

在进行升级时,新版本的 Helm 部署首先生效,会将 deployment 的镜像更新为 v2,然而这个时候 deployment 的升级过程会被 rollout 从 controller-manager 手中接管,使得 deployment 下面的 Pod 不会被升级。于此同时,rollout 会复制一个金丝雀版本的 deployment,镜像的 tag 为 v2,并创建一个 service 筛选到它下面的实例,和一个指向这个 service 的 ingress ,最后通过设置 ingress 相对应的 annotation,让这个 ingress 承接金丝雀版本的流量,具体可以参考 文档 ,从而实现流量切分。

在通过所有的人工确认步骤之后,完成全量发布时,rollout 会把稳定版本的 deployment 升级控制权交还给 controller-manager ,届时稳定版本的实例会陆续升级到新版本,当稳定版本的实例全部就绪之后,才会陆续销毁金丝雀版本的 deployment,service 和 ingress,从而保证了整个过程中请求流量不会因为打到没有就绪的实例上,导致请求异常,做到无损的金丝雀发布。

之后我们还将在以下方面持续迭代,支持更多的场景并带来更加稳定可靠的升级体验:

  1. 升级过程对接 KubeVela 的 workflow 体系,从而引入更加丰富的中间步骤扩展体系,支持升级过程中通过 workflow 执行通知发送等功能。甚至在各个步骤的暂停阶段,对接外部的可观测性体系,通过检查日志或者监控等指标,自动决策是否继续发布或回滚,从而实现无人值守的发布策略。
  2. 集成 istio 等 更多的 addon,支持 serviceMesh 的流量切分方案。
  3. 除了支持基于百分比流量切分方式,支持基于 header 或 cookie 的流量切分规则,以及支持诸如蓝绿发布等特性。

总结

前文已经提到,KubeVela 支持 Helm 做金丝雀发布的流程完全是通过 插件(Addon)体系实现的,fluxcd addon 助我们通过部署和管理 helm chart 的生命周期。kruise-rollout addon 帮助我们实现 workload 的实例升级以及在升级过程中流量的切换。通过组合两个 addon 的方式,实现了对于 helm 应用的全生命周期的管理和金丝雀升级,不需要对你的 Helm Chart 做任何改动。你也可以针对你的场景 编写插件 ,完成更特殊的场景或流程。

基于 KubeVela 强大的可扩展能力,你不仅可以灵活地组合这些 addon,你还可以保持上层应用不做任何变动的情况下,根据不同的平台或环境动态替换底层的能力实现。例如,如果你更希望采用 argocd 不是 fluxcd 实现对于 helm 应用的部署,你就可以通过启用 argocd 的 addon 实现相同的功能,上层的 helm 应用不需要做任何改变或迁移。

现在 KubeVela 社区已经提供了数十个 addon ,可以能够帮助平台扩展 可观测性,gitOps,finOps ,rollout 等各方面的能力。

image

Addon 的仓库地址是:https://github.com/kubevela/catalog ,如果你对 addon 感兴趣的话,也非常欢迎为这个仓库提交你的自定义插件,为社区贡献新的生态能力!

· 阅读需要 1 分钟

你可能已经从这篇博客 中了解到,我们可以通过 terraform 插件使用 vela 来管理云资源(如 s3 bucket、AWS EIP 等)。 我们可以创建一个包含一些云资源组件的应用,这个应用会生成这些云资源,然后我们可以使用 vela 来管理它们。

有时我们已经有一些 Terraform 云资源,这些资源可能由 Terraform 二进制程序或其他程序创建和管理。 为了获得 使用 KubeVela 管理云资源的好处 或者只是在管理云资源的方式上保持一致性,我们可能希望将这些现有的 Terraform 云资源导入 KubeVela 并使用 vela 进行管理。如果我们只是创建一个描述这些云资源的应用,这些云资源将被重新创建并可能导致错误。 为了解决这个问题,我们制作了 一个简单的 backup_restore 工具。 本博客将向你展示如何使用 backup_restore 工具将现有的 Terraform 云资源导入 KubeVela。

· 阅读需要 1 分钟

Helm Charts 如今已是一种非常流行的软件打包方式,在其应用市场中你可以找到接近一万款适用于云原生环境的软件。然后在如今的混合云多集群环境中,业务越来越依赖部署到不同的集群、不同的环境、同时指定不同的配置。再这样的环境下,单纯依赖 Helm 工具可能无法做到灵活的部署和交付。

在本文中,我们将介绍如何通过 KubeVela 解决多集群环境下 Helm Chart 的部署问题。如果你手里没有多集群也不要紧,我们将介绍一种仅依赖于 Docker 或者 Linux 系统的轻量级部署方式,可以让你轻松的体验多集群功能。当然,KubeVela 也完全具备单集群的 Helm Chart 交付能力。

· 阅读需要 1 分钟

如果您正在寻找将 Terraform 生态系统与 Kubernetes 世界粘合在一起的东西,那么恭喜!你在这个博客中得到了你想要的答案。

随着各大云厂商产品版图的扩大,基础计算设施,中间件服务,大数据/AI 服务,应用运维管理服务等都可以直接被企业和开发者拿来即用。我们注意到也有不少企业基于不同云厂商的服务作为基础来建设自己的企业基础设施中台。为了更高效,统一的管理云服务,IaC 思想近年来盛行,其中 Terrafrom 更是成功得到了几乎所有的云厂商的采纳和支持。以 Terrafrom 模型为核心的云服务 IaC 生态已经形成。然而在 Kubernetes 大行其道的今天,IaC 被冠以更广大的想象空间,Terraform IaC 能力和生态成果如果融入 Kubernetes 世界,我们认为这是一种强强联合。

· 阅读需要 1 分钟
孙健波,曾庆国

KubeVela 是一个现代化的软件交付控制平面,目标是让应用的部署和运维在如今的混合多云环境下更简单、敏捷、可靠。自 1.1 版本发布以来,KubeVela 架构上天然打通了企业面向混合多云环境的交付难题,且围绕 OAM 模型提供了充分的可扩展性,赢得了大量企业开发者的喜爱,这也使得 KubeVela 的迭代速度不断加快。

1.2 版本我们发布了开箱即用的可视化控制台,终端用户可以通过界面发布和管理多样化的工作负载;1.3 版本 的发布则完善了以 OAM 模型为核心的扩展体系,提供了丰富的插件功能,并给用户提供了包括 LDAP 权限认证在内的大量企业级功能,同时为企业集成提供了巨大的便利。至今为止,你已经可以在 KubeVela 社区的插件中心里获得 30 多种插件,其中不仅包含了 argocd、istio、traefik 这样的 CNCF 知名项目,更有 flink、mysql 等数据库中间件,以及上百种不同云厂商资源可供直接使用。

在这次发布的 1.4 版本中,我们围绕让应用交付更安全、上手更简单、过程更透明三个核心,加入了包括多集群权限认证和授权、复杂资源拓扑展示、一键安装控制平面等核心功能,全面加固了多租户场景下的交付安全性,提升了应用开发和交付的一致性体验,也让应用交付过程更加透明化。