+bytes from my programming experiences: June 2024

DynamoDB is a noSQL database from Amazon. More information here.

Let us assume we are building a REST API that deals with three main resources: City, Team, Player. Each resource has different attributes.

Following are the endpoints for each resource:

City:

POST: AddCity
PATCH: UpdateCity
DELETE: DeleteCity
GET: ListCities and GetCityByID

Team:

POST: AddTeam
PATCH: UpdateTeam
DELETE: DeleteTeam
GET: ListTeams and GetTeamByID

Player:

POST: AddPlayer
PATCH: UpdatePlayer
DELETE:DeletePlayer
GET: ListPlayers and GetPlayerByID

Based on the end points, it is evident that we will mostly be doing simple read/write operations and there will be no real need for any complex queries. Sometimes there can be a high volume of read operations, but thats about it. So fast read writes and scalability is our main criteria and of course cost optimization associated with software and hardware and usage.

Implementation with some SQL database (ex: SQL Server)

You will have to provision the servers and the software.
Maintain the servers in the future.
Manage the scaling as data grows. Might require additional servers.
We will have to manage the replication, partitioning etc.
We will need three tables for the three entities - city, team and player and associated referential integrity relationships between them. The database generates the keys for the tables and support for constraints is also built-in.
Updates to the database schema are non-trivial. Adding additional attributes to a city or a team would require a good deal of work.
No special support for fast read/write operations
We can write complex queries (which is not our requirement)
We will be charged irrespective of the amount of usage

Implementation with DynamoDB

It is a managed database, so no need for any software or hardware installations
No maintenance of the software or hardware. Amazon will take care of this.
We can use just one table for all the three entities. We store data as key-value pairs and as a result, we can store rows of varying attributes in DynamoDB. So we can store a row for the city, a row for the team and a row for the player in the same table. The only requirement is that each row should have a unique primary key attribute (which is defined in the schema definition). This is my favorite aspect of DynamoDB.
Since we are storing the data as key-value pairs, adding additional attributes or removing existing attributes (other than primary key) is trivial. This makes DynamoDB very powerful for scenarios that require flexibility(with dynamic requirements).
Supports fast read/write operations, which is our main requirement.
Amazon will scale the table as it grows, we do not need to worry about it.
We do not have to worry about partitioning or replication of data.
Amazon does not charge for the amount of data, but for the number of reads/writes. We can optimize the costs by designing the schema based on our access patterns. This is the most important aspect to keep in mind when dealing with DynamoDB. For example - if we are trying to update a row with 100kb of data, DynamoDB will charge a lot more than if we were trying to update a row with 1kb of data. So we can perform optimizations like storing the data that will be updated, in a separate row and so on.
We will have to generate the keys for the table. Other than enforcing the primary key constraint, DynamoDB does not support any other kind of uniqueness or constrains by default. We will need to implement those on our own. The beauty is that we can add manage additional constraints by adding additional rows to the same table and performing atomic operations in our code. For example - if we had a requirement that a city cannot have more than one team with the same name. As soon as we add new team for a city, within the same transaction, we can add another row with a primary key value that is a combination of cityID and team name. The next time we try to add another team with the same name in a city, it will violate the primary key constraint. So it is pretty easy to implement custom constraints in DynamoDB.

It is very clear that DynamoDB is better suited for our REST API requirements than any other SQL or noSQL databases. There are other noSQL options like MongoDB and Cassandra, with their own distinct advantages, but they do not make sense for our requirements. MongoDB for example supports a lot of data types and aggregate queries and transactions while Cassandra supports varying columns of data.

+bytes from my programming experiences

6/19/24

A Use Case for DynamoDB