Internet Scale Application using Oracle Database Sharding

Trushar Borse
6 min readApr 22, 2021

Yes, you read it right. I am talking about a huge, incredibly busy Internet/Web Scale application running with Oracle Relational Database at its backend.

While Oracle database has been time tested over decades to run the biggest of OLTP and OLAP implementations (packaged and bespoke stacks) across the industries right from Banks, Stock Exchanges, Insurance, Telecom, Aviation, Manufacturing up to Pharmaceuticals and many more, there have been theories doing the round on whether it (or for that matter the whole community of RDBMS databases that exist) can handle the new age Internet Scale web applications handling E-Commerce, IoT and similar ones with lightning speed and humungous scales.

Much of this thought process has to do with the advent of polyglot persistence (choosing a right persistence store depending on nature of data) and the variety of point functionality NoSQL databases mostly in the last decade or so.

Everything that survives the test of time (even if it’s for a limited span of few years) in this ever evolving technology space, has merits attached to its existence and there is no denying that.

It, therefore, makes it highly necessary to understand the choices we make with such huge and critical implementations from long term roadmap and sustainability perspective.

Photo by Tobias Fischer on Unsplash

Some myths around Oracle Database…

So, what are those myths surrounding Oracle Database when it comes to contemplating it as a serious option for running the modern Internet scale web application.

  • Oracle (RDBMS) does not scale like a NoSQL Database does
  • Oracle does not give the performance expected from a large scale Web Application
  • Oracle does not give the flexibility of dealing with semi and un-structured data whereas NoSQL databases provide the same
  • Oracle does not gel well with Microservices and containerized applications which are the quintessential part of agile development and DevOps practices for modern Web/Mobile Apps.

While all these myths can be debunked with facts, I would oblige to the current topic.

In remainder of this article, I would cover the Scalability and Performance aspect of Oracle Database which are addressed using the concept of Sharding.

(You can read more on Oracle’s Converged Database (Multi-model) story here and how Microservices based applications can be smoothly and in a scalable manner handled by Oracle Database here. Both blogs have been written by a colleague of mine and give a good glimpse in those respective topics.)

Some bits on Sharding in general….

Sharding, as we are aware of, is the most fundamental ingredient of building a Distributed Database.

It’s all about distributing load (scaling out), faster response, more write bandwidth..

It is an architectural pattern wherein data is partitioned horizontally so that each of the data holding nodes participating in the sharded system hosts different subset of data.

Shards implement shared-nothing architecture from infrastructure point of view and hence independent.

Conventionally, Sharding has been implemented by various NoSQL databases to achieve their main goal of Low Latency and High Throughput.

NoSQL databases achieve this by laying highest emphasis on de-normalized data thus allowing repetition and higher storage footprint but ensuring the response time SLAs to the tune of single digit milliseconds are adhered to.

One of the most common observation with the likes of Web, Mobile, Ecommerce applications with massive userbase is their well-defined Data Distribution and Application Model strategies.

In my view, that covers two very important aspects of a Distributed Database.

1. Criterion for distribution of data using a sharding key

2. Data access pattern (purpose built queries e.g. product catalog, search etc.)

Sharding in Oracle Database:

Brings in Linear Scalability, Fault Isolation and Geo-distribution.

There are mainly two types of tables in an Oracle Sharded database.

A sharded table is a table that is partitioned into smaller and more manageable pieces among multiple databases, called shards.

A duplicated table is a table with the same contents in each shard.

The combination of sharded and duplicated tables enables all transactions associated with a sharding key to be processed by a single shard. This technique enables linear scalability and fault isolation.

Sharded & Duplicated Tables

It comes with the flexibility to choose the right sharding method based on application & data-model:

· System Managed — Partition data across shards by CONSISTENT HASH. Data sharded/re-sharded automatically. Data is evenly distributed across shards.

· User Defined (Range, List) — Partition data across shards by RANGE or LIST. Ranges or lists of sharding key values are assigned to shards by the user. User needs to maintain balanced data distribution. More efficient range queries

· Composite — Data is partitioned across Shardspaces by LIST or RANGE on super-sharding key (e.g. geography). Within each shardspace, data is partitioned across shards by CONSISTENT HASH using sharding key (e.g. customer id). For geo-distribution or hybrid clouds with linear scalability

Automated deployment along with replication

Direct and Proxy routing for OTLP & Reporting requirements

Direct Route (Single Shard Query):

Fast path for sharding key based access (OLTP)

Connection pool maintains the shard topology cache

Upon first connection to a shard-

-Connection pool retrieves all sharding key ranges in the shard

-Connection pool caches the key range mappings

DB request for a key that’s in any of the cached key ranges goes directly to the shard (i.e. bypasses shard director)

Direct Routing using Shard Key

Proxy Route (Multi-shard query):

Multi-shard query or query without sharding key

Applications connect to Query coordinator/Shard Catalog

Coordinator rewrites the query to do most processing on the shards

Supports shard pruning and scatter-gather

Final aggregation performed on the coordinator

Proxy Routing (without a shard key)

It is important to note that while Oracle database becomes a distributed database by implementing Sharding, it is logically still a single database for the application world and external connections.

Moreover, the core functionalities and options that are available in an Oracle Database can be continued to be leveraged in the same fashion.

Best of both the worlds (Relational + Distributed)

Adopting Oracle Database Sharding makes tremendous sense as it brings in a lot of value to the solution by making it more scalable, adaptable and efficient.

· Enterprise features: Advanced Security, RMAN, Data Guard, Advanced Compression, Partitioning, etc.

· Leverage Oracle Innovations: SMP scalability, RAC, Exadata, in-memory columnar, online redefinition, JSON document store, etc.

· Future-Proof: Features not part of initial product releases (e.g. started as purely OLTP but now need to perform Reporting/Analytics) can be easily accommodated later in future because of converged nature of Oracle Database.

· Easier application maintenance — schema in database instead of application

· SQL and all the programmatic interfaces (PL-SQL, OCI, JDBC, etc.)

· Better consistency than NoSQL databases (strictly consistent within shard) without extra efforts

· Leverage the existing Oracle DBA skillset

· Enterprise-standard support

Conclusion:

Oracle Database Sharding can be embraced with supreme confidence because it emanates from the same stable of Oracle Database and is well placed to handle the scale, performance, security, future-readiness, innovations and support needed for modern day agile Internet scale applications.

Reference:

https://www.oracle.com/in/database/technologies/high-availability/sharding.html

https://blogs.oracle.com/database/oracle-bluekai-data-management-platform-scales-to-1-million-transactions-per-second-with-oracle-database-sharding-deployed-in-oracle-cloud-infrastructure

Disclaimer: I am currently employed with Oracle Corporation. All views expressed in this article are in my personal capacity and do not necessarily reflect my employer’s views.

--

--