We started evaluating Google Cloud for one of our customers who were new to Cloud and wanted a comprehensive comparison of services and pricing of leading cloud providers. We ended up choosing GCP for their needs and in the process, delved deep into the world of cloud and Google’s offering to cloud services. With the knowledge gained, our team designed and conducted a workshop internally to onboard developers to Cloud computing, followed by an introduction to GCP components and hands-on workshop on key GCP components.
This blog is the brief transcript of the presentation on the Overview of GCP components. We’ve put together the most important details and resources of the presentation in this blog to help developers and architects better understand Google’s role as a cloud provider.
Google Cloud Platform (GCP) has grown into one of the premier cloud computing platforms on the market today. While it is still following its top competitors Amazon Web Services (AWS) and Microsoft Azure, Google is holding its own in the cloud wars and continues to make investments in GCP that make the product more attractive to big customers and SMBs alike.
What is Google Cloud Platform (GCP)?
Google Cloud Platform, as the name implies, is a cloud computing platform that provides infrastructure tools and services for users to build on top of.
In 2008, to capture the growing interest in web applications, Google launched Google App Engine, a Platform as a Service (PaaS) cloud tool that allowed developers to build and host their apps on Google’s infrastructure. App Engine struggled early on, due to the fact that it didn’t support certain key developer languages.
Google then released a host of complementary tools, such as its data storage layer and its Infrastructure as a Service (IaaS) component known as the Google Compute Engine, which supports the use of virtual machines. After growing as an IaaS provider, Google added additional products including a load balancer, DNS, monitoring tools, and data analysis services, making GCP better able to compete in the cloud market and increasing its market share.
When was GCP announced?
Google announced its first cloud tool, Google App Engine, back in 2008, and as it continued to add more tools and services until they collectively became known as the Google Cloud Platform later on.
How can I use GCP?
Google has provided documentation for getting started and a frequently asked questions page for developers and IT leaders to investigate the platform.
Who does GCP affect?
Any organization in need of cloud computing should consider Google Cloud Platform for their needs—especially SMBs, which the platform was initially geared toward.
Why does GCP matter?
Google Cloud Platform is regarded as the third biggest cloud provider in terms of revenue behind AWS in the first place and Microsoft Azure in second.
GCP is primarily a public cloud provider. Google does have a network of private cloud providers that can help users build out a hybrid cloud deployment, but its proprietary space is the public cloud. The platform also has a host of other partners that provide additional services.
While AWS and Microsoft consistently push each other to lower prices, Google follows its own pricing pattern and routinely boasts that it offers the lowest cost of the three providers. However, Google really differentiates itself in its services.
10,000 Feet overview of GCP components
The overview of GCP components in this blog is a bird’s eye view of all the services that GCP offers.
Current GCP products span the following 13 categories:
- Compute – App Engine, Compute Engine, Kubernetes Engine, Cloud Functions (beta)
- Storage & Databases – Cloud Storage, Cloud Bigtable, Cloud SQL, Cloud Datastore, and more
- Networking – Virtual Private Cloud (VPC), Cloud Load Balancing, Network Service Tiers, Cloud Armor, and more
- Big Data – BigQuery, Cloud Dataflow, Cloud Dataproc, Cloud Pub/Sub, and more
Cloud AI – Cloud Machine Learning Engine, Cloud TPU, Cloud AutoML (beta), various machine learning APIs
- Identity & Security – Cloud Identity, Cloud IAM, Security Key Enforcement, Cloud Security Scanner, Cloud Resource Manager, and more
- Management Tools – Stackdriver Overview, Monitoring, Trace, Logging, Debugger, Cloud Console, and more
- Developer Tools – Cloud SDK, Container Registry, Container Builder, Cloud Test lab, and more
- API Platform and Ecosystems – Google Maps Platform, API Analytics, API Monetization, Cloud Endpoints, and more
- Data Transfer – Google Transfer Appliance, Cloud Storage Transfer Service, Google BigQuery Transfer Service
- Productivity Tools – G Suite, Hire, Chrome, Android
- Professional Services – Consulting, Technical Account Management, Training, Certification, and more
Internet of Things – Cloud IoT Core
Scenario – Building an e-commerce application in GCP
What we might be used to thinking of as software and hardware products, become services in cloud computing. These services provide access to the underlying resources. The list of available GCP services is long, and it keeps growing. When we develop our website or application on GCP, we mix and match these services into combinations that provide the infrastructure we need, and then add our code to enable the scenarios we want to build.
Let us take for instance a scenario where we would want to build an eCommerce application from scratch.
Let us hit the requirements stated above one by one and see how we can map these requirements to GCP components.
An e-commerce application will need storage in multiple forms. Firstly to store structured information like product information, like category, name, description, price, specifications etc.
This product information can be stored in either SQL or No-SQL. It’s an architectural decision and GCP provides services for both SQL and NoSQL solutions.
Cloud SQL is a Fully-Managed PostgreSQL & MySQL database service that makes it easy to set up, maintain, manage, and administer our relational PostgreSQL and MySQL databases in the cloud. Cloud SQL offers high performance, scalability, and convenience. Hosted on Google Cloud Platform, Cloud SQL provides a database infrastructure for applications running anywhere.
Cloud Datastore is a highly-scalable NoSQL database for our applications. Cloud Datastore automatically handles sharding and replication, providing we with a highly available and durable database that scales automatically to handle our applications’ load. Cloud Datastore provides a myriad of capabilities such as ACID transactions, SQL-like queries, indexes and much more.
Storage can secondly pertain to product images/videos and such large objects that are not structured information perse but can be organized into buckets of information that can be referred from the application directly / from the databases.
Cloud Storage allows world-wide storage and retrieval of any amount of data at any time. We can use Cloud Storage for a range of scenarios including serving website content, storing data for archival and disaster recovery, or distributing large data objects to users via direct download.
The data is stored in buckets and can be imported either online through APIs/SDKs or offline through disks. The data can be protected with access control lists (ACL) and supports versioning and change notification.
A bucket has three properties that we specify when we create it: a globally unique name, a location where the bucket and its contents are stored, and a default storage class for objects added to the bucket.
Cloud Storage offers four storage classes: Multi-Regional Storage, Regional Storage, Nearline Storage, and Coldline Storage. All storage classes offer low latency (time to first byte typically tens of milliseconds) and high durability. The classes differ by their availability, minimum storage durations, and pricing for storage and access.
Nearline and Coldline storage classes can be used for less frequently accessed data like backup/disaster recovery archives. These provide very low-cost storage and charge for accesses.
Comparing storage options
As we saw here, whatever our application is, we’ll need to store some data. GCP provides a variety of storage services, including:
- A SQL database in Cloud SQL, which provides either MySQL or PostgreSQL databases.
- A fully managed, mission-critical, relational database service in Cloud Spanner that offers transactional consistency at global scale, schemas, SQL querying, and automatic, synchronous replication for high availability.
- Two options for NoSQL data storage: Cloud Datastore and Cloud Bigtable.
- Consistent, scalable, large-capacity data storage in Cloud Storage. Cloud Storage comes in several flavors:
- Multi-Regional provides maximum availability and geo-redundancy.
Regional provides high availability and a localized storage location.
Nearline provides low-cost archival storage ideal for data accessed less than once a month.
Coldline provides the lowest-cost archival storage for backup and disaster recovery.
- Persistent disks on Compute Engine, for use as primary storage for our instances. Compute Engine offers both hard-disk-based persistent disks, called standard persistent disks and solid-state persistent disks (SSD).
Here is a flowchart on choosing storage option for our application based on requirements:
Computing and Hosting Services
GCP gives us options for computing and hosting. We can choose to:
- Work in a serverless environment.
- Use a managed application platform.
- Leverage container technologies to gain lots of flexibility.
- Build our own cloud-based infrastructure to have the most control and flexibility.
We can imagine a spectrum where, at one end, we have most of the responsibilities for resource management and, at the other end, Google has most of those responsibilities
GCP’s unmanaged compute service is Google Compute Engine. We can think of Compute Engine as providing an infrastructure as a service (IaaS) because the system provides a robust computing infrastructure, but we must choose and configure the platform components that we want to use. With Compute Engine, it’s our responsibility to configure, administer, and monitor the systems. Google will ensure that resources are available, reliable, and ready for we to use, but it’s up to we to provision and manage them. The advantage, here, is that we have complete control of the systems and unlimited flexibility.
When we build on Compute Engine, we can:
- Use virtual machines (VMs), called instances, to build our application, much like we would if we had our own hardware infrastructure. we can choose from a variety of instance types to customize our configuration to meet our needs and our budget.
- Choose which global regions and zones to deploy our resources in, giving we control over where our data is stored and used.
- Choose which operating systems, development stacks, languages, frameworks, services, and other software technologies we prefer.
- Create instances from public or private images.
- Use GCP storage technologies or any third-party technologies we prefer.
- Use Google Cloud Launcher to quickly deploy pre-configured software packages. For example, we can deploy a LAMP or MEAN stack with just a few clicks.
- Create instance groups to more easily manage multiple instances together.
- Use auto-scaling with an instance group to automatically add and remove capacity.
- Attach and detach disks as needed.
- Use SSH to connect directly to our instances
With container-based computing, we can focus on our application code, instead of on deployments and integration into hosting environments. Google Kubernetes Engine, GCP’s containers as a service (CaaS) offering, is built on the open source Kubernetes system, which gives we the flexibility of on-premises or hybrid clouds, in addition to GCP’s public cloud infrastructure.
When we build with Kubernetes Engine, we can:
- Create and manage groups of Compute Engine instances running Kubernetes, called clusters. Kubernetes Engine uses Compute Engine instances as nodes in a cluster. Each node runs the Docker runtime, a Kubernetes node agent that monitors the health of the node, and a simple network proxy.
- Declare the requirements for our Docker containers by creating a simple JSON configuration file.
- Use Google Container Registry for secure, private storage of Docker images. we can push images to our registry and then we can pull images to any Compute Engine instance or our own hardware by using an HTTP endpoint.
- Create single- and multi-container pods. Each pod represents a logical host that can contain one or more containers. Containers in a pod work together by sharing resources, such as networking resources. Together, a set of pods might comprise an entire application, a micro-service, or one layer in a multi-tier application.
Google App Engine is GCP’s platform as a service (PaaS). With App Engine, Google handles most of the management of the resources for you. For example, if our application requires more computing resources because traffic to our website increases, Google automatically scales the system to provide those resources. If the system software needs a security update, that’s handled for you, too.
When we build our app on App Engine, we can:
- Build our app on top of the App Engine standard environment runtimes in the languages that the standard environment supports, including: Python 2.7, Java 8, Java 7, PHP 5.5, and Go 1.8, 1.6.
- Build our app on top of the App Engine flexible environment runtimes in the languages that App Engine flexible supports, including: Python 2.7/3.6, Java 8, Go 1.8, Node.js, PHP 5.6, 7, .NET, and Ruby. Or use custom runtimes to use an alternative implementation of a supported language or any other language.
- Let Google manage app hosting, scaling, monitoring, and infrastructure for you.
- Use the App Engine SDK to develop and test on our local machine in an environment that simulates App Engine on GCP.
- Easily use the storage technologies that App Engine is designed to support in the standard and flexible environments.
- Google Cloud SQL is a SQL database, supporting either MySQL or PostgreSQL. Google Cloud Datastore is a schemaless, NoSQL datastore. Google Cloud Storage provides space for our large files.
- In the standard environment, we can also choose from a variety of third-party databases to use with our applications such as Redis, MongoDB, Cassandra, and Hadoop.
- In the flexible environment, we can easily use any third-party database supported by our language, if the database is accessible from the Google App Engine instance.
- In either environment, these third-party databases can be hosted on Compute Engine, hosted on another cloud provider, hosted on-premises, or managed by a third- party vendor.
Comparing Compute options
Here is a flowchart on comparing compute options in GCP based on our requirements :
Our eCommerce application is going to host multiple microservices (Order Service, Inventory Service, Notification Service, User service) which need to talk to each other asynchronously and need a channel for their communication.
Cloud Pub/Sub is an asynchronous messaging service. Our application can send messages, as JSON data structures, to a publishing unit called a topic. Because Cloud Pub/Sub topics are a global resource, other applications in projects that we own can subscribe to the topic to receive the messages in HTTP request or response bodies.
Cloud Pub/Sub’s usefulness isn’t confined to big data. We can use Cloud Pub/Sub in many circumstances where we need an asynchronous messaging service.
The following diagram illustrates the Topic and Subscription part of Cloud Pub/Sub and asynchronous one to many and many to one communication of the e-commerce services.
We have built our application, decided storage, used Cloud Pub/Sub for asynchronous communication and deployed the microservices using the discussed co,pute options. How are all these components connected in google’s cloud? How is the network managed?
Let us look at GCP’s network components to see how all these are connected.
Geography of cloud
GCP consists of a set of physical assets, such as computers and hard disk drives, and virtual resources, such as virtual machines (VMs), that are contained in Google’s data centers around the globe. Each data center location is in a global region. Regions include Central US, Western Europe, and East Asia. Each region is a collection of zones, which are isolated from each other within the region. Each zone is identified by a name that combines a letter identifier with the name of the region. For example, zone a in the East Asia region is named asia-east1-a.
This distribution of resources provides several benefits, including redundancy in case of failure and reduced latency by locating resources closer to clients. This distribution also introduces some rules about how resources can be used together.
Some resources can be accessed by any other resource, across regions and zones. These global resources include pre-configured disk images, disk snapshots, and networks. Some resources can be accessed only by resources that are located in the same region. These regional resources include static external IP addresses. Other resources can be accessed only by resources that are located in the same zone. These zonal resources include VM instances, their types, and disks.
The scope of an operation varies depending on what kind of resources you’re working with. For example, creating a network is a global operation because a network is a global resource while reserving an IP address is a regional operation because the address is a regional resource.
As we start to optimize our GCP applications, it’s important to understand how these regions and zones interact. For example, even if we could, we wouldn’t want to attach a disk in one region to a computer in a different region because the latency you’d introduce would make for very poor performance. Thankfully, GCP won’t let we do that; disks can only be attached to computers in the same zone.
Depending on the level of self-management required for the computing and hosting service we choose, we might or might not need to think about how and where resources are allocated
Virtual Private Cloud (VPC)
We can scale our applications on Google Compute Engine from zero to full-throttle with Google Cloud Load Balancing, with no pre-warming needed. We can distribute our load-balanced compute resources in single or multiple regions, close to our users and to meet our high availability requirements.
Virtual Private Cloud (VPC) gives us the flexibility to scale and control how workloads connect regionally and globally. When we connect our on-premises or remote resources to GCP, we will have global access to our VPCs without needing to replicate connectivity or administrative policies in each region.
A single Google Cloud VPC can span multiple regions without communicating across the public Internet. Single connection points between VPC and on-premises resources provides global VPC access, reducing cost and complexity.
With a single VPC for an entire organization, teams can be isolated within projects, with separate billing and quotas, yet still maintain a shared private IP space and access to commonly used services such as VPN or Cloud Interconnect.
If our website or application is running on Compute Engine, the time might come when we are ready to distribute the workload across multiple instances. Compute Engine’s server-side load balancing features provide us with the following options:
Network load balancing lets us distribute traffic among server instances in the same region based on incoming IP protocol data, such as address, port, and protocol. Network load balancing is a great solution if, for example, we want to meet the demands of increasing traffic to our website.
HTTP/HTTPS load balancing enables us to distribute traffic across regions so we can ensure that requests are routed to the closest region or, in the event of a failure or over-capacity limitations, to a healthy instance in the next closest region. We can also use HTTP/HTTPS load balancing to distribute traffic based on content type. For example, we might set up our servers to deliver static content, such as images and CSS, from one server and dynamic content, such as PHP pages, from a different server. The load balancer can direct each request to the server that provides each content type.
Google Cloud DNS is a scalable, reliable and managed authoritative Domain Name System (DNS) service running on the same infrastructure as Google. It has low latency, high availability and is a cost-effective way to make our applications and services available to our users. Cloud DNS translates requests for domain names like www.google.com into IP addresses like 126.96.36.199. Cloud DNS is programmable. we can easily publish and manage millions of DNS zones and records using our simple user interface, command-line interface or API.
Google Cloud CDN leverages Google’s globally distributed edge points of presence to accelerate content delivery for websites and applications served out of Google Compute Engine and Google Cloud Storage. Cloud CDN lowers network latency, offloads origins, and reduces serving costs. Once you’ve set up HTTP(S) Load Balancing, simply enable Cloud CDN with a single checkbox.
IAM – security
Cloud Identity & Access Management (Cloud IAM) lets administrators authorize who can take action on specific resources, giving we full control and visibility to manage cloud resources centrally. For established enterprises with complex organizational structures, hundreds of workgroups, and potentially many more projects, Cloud IAM provides a unified view into security policy across our entire organization, with built-in auditing to ease compliance processes.
The IAM policies are defined by ‘WHO’ ‘CAN DO’ ‘WHAT’ on a resource.
The WHO could be any user with a valid Google, Gsuite, GCP or google groups account, or also a service account. The CAN DO defines what the user can do in terms of the roles and permissions he possesses.
The WHAT defines, the resource on while thee roles and permissions are applied.
Any GCP resources that we allocate and use must belong to a project. we can think of a project as the organizing entity for what you’re building. A project is made up of the settings, permissions, and other metadata that describe our applications. Resources within a single project can work together easily, for example by communicating through an internal network, subject to the regions-and-zones rules. The resources that each project contains remain separate across project boundaries; we can only interconnect them through an external network connection.
Each GCP project has:
- A project name, which we provide.
- A project ID, which we can provide or GCP can provide for us.
- A project number, which GCP provides.
As we work with GCP, we will use these identifiers in certain command lines and API calls.
Once we have created a project, we can delete the project but its ID can never be used again. A project serves as a namespace. This means every resource within each project must have a unique name, but we can usually reuse resource names if they are in separate projects. Some resource names must be globally unique.
IAM Role Types
There are three types of roles in Cloud IAM:
- Primitive roles, which include the Owner, Editor, and Viewer roles that existed prior to the introduction of Cloud IAM
- Predefined roles, which provide granular access for a specific service and are managed by GCP
- Custom roles, which provide granular access according to a user-specified list of permissions
Higher level services
Google continues to add higher-level services, such as those related to big data and machine learning, to its cloud platform. Google big data services include those for data processing and analytics, such as Google BigQuery for SQL-like queries made against multi-terabyte data sets. In addition, Google Cloud Dataflow is a data processing service intended for analytics; extract, transform and load (ETL); and real-time computational projects. The platform also includes Google Cloud Dataproc, which offers Apache Spark and Hadoop services for big data processing.
For artificial intelligence (AI), Google offers its Cloud Machine Learning Engine, a managed service that enables users to build and train machine learning models. Various APIs are also available for the translation and analysis of speech, text, images and videos.
Google also provides services for IoT, such as Google Cloud IoT Core, which is a series of managed services that enables users to consume and manage data from IoT devices.
The Google Cloud Platform suite of services is always evolving, and Google periodically introduces, changes or discontinues services based on user demand or competitive pressures. Google’s main competitors in the public cloud computing market include Amazon Web Services (AWS) and Microsoft Azure.
Interacting with GCP
GCP provides sustained use discounts starting from a month and provides various concessions, per second billing and innovative pricing that is compelling to new Cloud practitioners to existing cloud users to migrate to Google Cloud.
Corresponding AWS components
For developers who are familiar with AWS components, here is a list of their corresponding counterparts in GCP.
Cloud SQL <-> RDS
Cloud Datastore <-> Dynamo DB
Cloud Storage <-> S3
Compute Engine <-> EC2 instances
App Engine <-> Beanstalk
Container Engine <-> ECS / EKS
When compared to Amazon and Microsoft, Google is different in its approach to open source. Unlike its competitors, Google is one of the largest contributors to OSS. During the last decade, it created over 2000 open source projects. Android, Angular, Chromium and Go are some of the most successful OSS projects from Google.
When Google got serious about getting a foothold in the cloud industry, it turned to open source as the key differentiator. Three OSS projects that Google announced in the recent past have become the foundation of Google Cloud Platform – Kubernetes, APache Beam and TensorFlow. They are not only acknowledged by the industry and competitors but also driving adoption of its cloud platform. Right from day one of announcing these projects, Google had a clear strategy to align them with its cloud offerings.