However, some ADBMSs have a minimum charge
policy. For example, Snowflake has a minimum
charge policy of 1 minute, even if your query only
takes 10 seconds. Other systems require their users to
pay upfront, such as Redshift, regardless whether
users are running queries or not. So, the pricing model
requires their users to consider the economic
implications of resource allocation and query
execution time, which may not support using more
processors to keep on reducing the overall execution
time. That is, such a minimum cost structure
introduces an economic constraint that needs to be
considered when determining the optimal allocation
of computing resources.
Gustafson-Barsis' Law focuses on maximizing the
benefits of scaling up to improve performance.
However, in the presence of a minimum cost, it
becomes crucial to strike a balance between the
desired performance gains and the associated costs.
When deciding on the allocation of computing
resources, it is essential to consider the cost
implications at different scales. Scaling up resources
may lead to better performance, but it also increases
costs proportionally.
For some systems. underutilization of resources
leads to unnecessary waste and drives up costs, while
other system overutilization may result in
diminishing returns in terms of the costs. Gustafson-
Barsis' Law only tells us adding more processors to
reduce overall execution time. Just, we have to keep
in mind that adding processors results in additional
costs. However, when problem size increases, not
adding more processors may also incur unnecessary
additional costs due to the impact of storage
limitations on query execution time. This is primarily
because smaller warehouses or smaller processor
count have less storage capacity, which can lead to
data spillage into secondary storage, which comprises
lower-cost, higher-capacity, higher latency and
slower read/write speeds storage options. Such a
spillage can significantly degrade query performance,
even crashing the entire ADBMS server.
When deciding on the allocation of computing
resources, it is essential to consider the cost
implications at different scales. We need to evaluate
the cost-effectiveness of different resource allocation
strategies. This evaluation should be performed on a
case-by-case basis, making informed choices to
maximize performance while minimizing costs. The
bottom line is that neither Amdahl's law nor
Gustafson's law is sufficient in guiding us to answer
the question "how many processors should we use to
gain better performance economically."
3 POPULAR ADBMSs
The authors have been exposed to ADBMSs for more
than 10 years, started with Vertica, an ADBMS runs
either on-premises or in an enterprise’s own Virtual
Private Cloud, that is, it does not deliver database
functionalities as a service. ADBMSs are not
designed to support OLTP operations. Instead, they
provide super-fast response times for aggregated
queries known as OLAP queries (Heavy.AI, 2023).
ADBMSs are generally more scalable, distributed,
columnar in data stores, and heavily compressing
their data. In addition, because the data is distributed,
they naturally utilize concurrency for performance
improvement (Heavy.AI, 2023).
In this section, we briefly discuss three extremely
popular cloud based ADBMSs: Google’s BigQuery,
Amazon’s Redshift, and Snowflake, all are fully
managed cloud service, which means their vendors
handle the underlying infrastructure, such as
hardware provisioning, software patching, and
backup management. We select these three because
we have either used them in the past or are still using
them. Performance and cost related discussions and
analysis are available (Fraser, 2022), so we will not
repeat the same experiment. Instead, we will provide
our position on selecting an ADBMS at the end of the
section.
All the three ADBMSs discussed here, and most
of the other systems on the market, leverage columnar
storage and Massively Parallel Processing (MPP) to
deliver high-performance analytics capabilities. This
columnar storage approach offers advantages for
analytical workloads as it enables efficient
compression, improves query performance, and
reduces I/O requirements. By storing columns
together, ADBMSs can read and process only the
necessary columns and utilize special algorithms that
take advantage of compression type, which leads to
faster query execution. ADBMSs utilize MPP
architecture to distribute query processing across
multiple computing nodes to be processed. This
parallel processing capability allows ADBMSs to
handle large datasets and complex analytical queries
efficiently. Combining columnar storage and MPP,
ADBMSs can achieve faster query performance,
efficient data compression, and scalability to handle
large volumes of data. These features make ADBMSs
well-suited for analytical workloads that require
complex queries, data aggregations, and ad-hoc
analysis.