9/22/24

Building a REST API from scratch

Designing and building a REST API requires a lot of planning and hard work. For the past year, I have been part of a team that has been building an API to meet a certain requirement in my company. It has been quite a learning process and immensely satisfying. I have decided to document the journey so that the information is available for the future. 

Based on my experience, following are the steps involved in designing and building a good API:
  • Understand the requirements
    • The first step in the process of designing an API is to understand the business requirements. What problem are we trying to solve?  Who are the consumers? What is the scope and scale? Do we really need a new API or is there any existing API that can be enhanced to meet our need? It is extremely important to collaborate with the product managers and product owners regularly to clarify all the questions that you need answered. This is THE most critical phase of API development.
  • Identify resources and operations(defining end points)
    • At the core of a rest api are resources. Through the API end points, we are figuring out the best way for the consumers to access these resources in a user-friendly manner. The API end points need to be very intuitive and there should be a certain consistency to the design.
  • Create Stoplight documentation for the API end points
    • Documenting each of the end points in detail adds additional clarity and exposes potential loopholes in our design. It helps us streamline the request and response payloads associated with each end point. Once we have the documentation, we can share it with other team members and solicit early feedback. We now have a source of truth for development. 
  • Picking the appropriate database (if there is a need)
    • Based on the service requirements, we need to figure out what kind of a database makes sense. If we are looking at structured data with a lot of relationships between the data and there is going to be a lot of analysis of the data (beyond basic queries), a SQL database would probably make sense. If we want more flexibility on the kind of data that will be stored and if we are going to be dealing with frequent schema changes, we should probably go with a noSQL database that allows storing unstructured data. Other considerations include team member expertise with various databases and support for various databases within the company. If more teams are using it, you have more people to ask questions. If there is already an existing infrastructure for the database setup and deployment, it makes things much easier.
  • Data model design
    • Once you have decided on the database, we need to define the schema. In order to have a well-defined schema, it helps a lot to understand all the potential access patterns. We will then define the entities, relationships between the entities, the constraints, logging/auditing requirements, replication, security of the data etc. We need to account for future flexibility, scalability and performance. Once you have the initial design draft in place, document the design in a wiki and request the team for feedback at an early stage so that we can catch any design flaws early.
  • Security design
    • Create a Continuous Threat Modeling document (CTM) to document all the information pertaining to the security of the service. Most companies usually have a CTM template that is used across the company. You will need to define the authentication and authorization for each of the API end points,. You will need to mention any encoding/decoding that is being used anywhere in the service (including data at rest). You will need to explain how you plan to deal with bad data in the request payload or URIs.  You will need to explain the transportation security in place (like SSL). If you are using a cloud provider like AWS, then you will have to explain the security apparatus in the cloud. In addition, we need to make sure to restrict access to stoplight and other API related wiki pages to ensure the integrity of the data. Request for feedback from the team, as soon as the initial draft is in place, so that changes can be made at an early stage.
  • Code Implementation
    • Picking the right tech stack. In most cases, this is probably going to be based on what is already being used in the company. Nowadays most techs stacks offer similar features and unless there is a really special requirement that would warrant a specific language/tool, it makes sense to stick with the most popular ones within the company. Design the operations in a modular, extensible and reusable manner with attention to encoding/decoding wherever necessary. Make sure to pay attention to string encodings. Make sure to not use user-supplier information to build keys in the backend since they are subject to change. Make sure to validate the input for each operation before allowing the operation to proceed. Make sure to log messages as needed. design the exception handling carefully to account for all kinds of exceptions with appropriate messaging, so that it becomes easy to troubleshoot issues when they arise. Make sure to add unit/integration/contract/functional and smoke tests for the end points. Build the testing infrastructure in a modular, extensible and reusable manner as well. Make sure to have good code coverage. Code coverage and styling and security checks can be strictly enforced in the Jenkins pipeline as well.
  • Development and Deployment Environments:
    • It is important to have a good understanding of the various environments in place for the API. We can have different environments like dev, where all the development happens, followed by Staging, which is where the QA team tests and finally the Prod environment. During development, if we are adding any backend resources or configurations, we need to make sure we do it in a way that enables them to be deployed to each of the environments in a consistent manner. We also need to have notifications in place to be notified of success/failure during the deployment. We also need to work with the devops teams to ensure that appropriate security is in place for each environment.
  • Testing:
    • Work with the QA teams in parallel to ensure that test plans are in place for each of the end points. In addition to manual testing, the QA teams should have their own automated testing in place. If the API is being consumed inside an application, there need to be end-end tests for the same. There can be UI tests to validate the responses in the front-end.
  • Performance and Load testing
    • It is important to test the performance of our APIs to make sure the latency is within the acceptable range. We can use tools like Blazemeter for the purpose. It is also important to determine the maximum load our APIs can handle so that we can configure the backend servers appropriately. Based on the test results, we can look into additional measures like code optimizations, altering the database schema  or implementing caching. It also helps to have a good understanding of observability tools like Splunk and Kibana so that we can query the vast amounts of error logs and troubleshoot problems effectively.
  • Documenting the entire process in a wiki so that we have a blueprint in place the next time we decide to build another API. 

6/19/24

A Use Case for DynamoDB

 DynamoDB is a noSQL database from Amazon. More information here.

Let us assume we are building a REST API that deals with three main resources: City, Team, Player. Each resource has different attributes.

Following are the endpoints for each resource:

  • City:
    • POST: AddCity
    • PATCH: UpdateCity
    • DELETE: DeleteCity
    • GET: ListCities and GetCityByID
  • Team:
    • POST: AddTeam
    • PATCH: UpdateTeam
    • DELETE: DeleteTeam
    • GET: ListTeams and GetTeamByID
  • Player:
    • POST: AddPlayer
    • PATCH: UpdatePlayer
    • DELETE:DeletePlayer
    • GET: ListPlayers and GetPlayerByID


Based on the end points, it is evident that we will mostly be doing simple read/write operations and there will be no real need for any complex queries. Sometimes there can be a high volume of read operations, but thats about it. So fast read writes and scalability is our main criteria and of course cost optimization associated with software and hardware and usage.
  • Implementation with some SQL database (ex: SQL Server)
    • You will have to provision the servers and the software.
    • Maintain the servers in the future.
    • Manage the scaling as data grows. Might require additional servers.
    • We will have to manage the replication, partitioning etc.
    • We will need three tables for the three entities - city, team and player and associated referential integrity relationships between them. The database generates the keys for the tables and support for constraints is also built-in.
    • Updates to the database schema are non-trivial. Adding additional attributes to a city or a team would require a good deal of work.
    • No special support for fast read/write operations
    • We can write complex queries (which is not our requirement)
    • We will be charged irrespective of the amount of usage
  • Implementation with DynamoDB
    • It is a managed database, so no need for any software or hardware installations
    • No maintenance of the software or hardware. Amazon will take care of this.
    • We can use just one table for all the three entities. We store data as key-value pairs and as a result, we can store rows of varying attributes in DynamoDB. So we can store a row for the city, a row for the team and a row for the player in the same table. The only requirement is that each row should have a unique primary key attribute (which is defined in the schema definition). This is my favorite aspect of DynamoDB.
    • Since we are storing the data as key-value pairs, adding additional attributes or removing existing attributes (other than primary key) is trivial. This makes DynamoDB very powerful for scenarios that require flexibility(with dynamic requirements).
    • Supports fast read/write operations, which is our main requirement.
    • Amazon will scale the table as it grows, we do not need to worry about it.
    • We do not have to worry about partitioning or replication of data.
    • Amazon does not charge for the amount of data, but for the number of reads/writes. We can optimize the costs by designing the schema based on our access patterns. This is the most important aspect to keep in mind when dealing with DynamoDB. For example - if we are trying to update a row with 100kb of data, DynamoDB will charge a lot more than if we were trying to update a row with 1kb of data. So we can perform optimizations like storing the data that will be updated, in a separate row and so on.
    • We will have to generate the keys for the table. Other than enforcing the primary key constraint, DynamoDB does not support any other kind of uniqueness or constrains by default. We will need to implement those on our own. The beauty is that we can add manage additional constraints by adding additional rows to the same table and performing atomic operations in our code. For example - if we had a requirement that a city cannot have more than one team with the same name. As soon as we add new team for a city, within the same transaction, we can add another row with a primary key value that is a combination of cityID and team name. The next time we try to add another team with the same name in a city, it will violate the primary key constraint. So it is pretty easy to implement custom constraints in DynamoDB.
It is very clear that DynamoDB is better suited for our REST API requirements than any other SQL or noSQL databases. There are other noSQL options like MongoDB and Cassandra, with their own distinct advantages, but they do not make sense for our requirements. MongoDB for example supports a lot of data types and aggregate queries and transactions while Cassandra supports varying columns of data.