+bytes from my programming experiences

9/22/24

Building a REST API from scratch

Designing and building a REST API requires a lot of planning and hard work. For the past year, I have been part of a team that has been building an API to meet a certain requirement in my company. It has been quite a learning process and immensely satisfying. I have decided to document the journey so that the information is available for the future.

Based on my experience, following are the steps involved in designing and building a good API:

Understand the requirements

The first step in the process of designing an API is to understand the business requirements. What problem are we trying to solve? Who are the consumers? What is the scope and scale? Do we really need a new API or is there any existing API that can be enhanced to meet our need? It is extremely important to collaborate with the product managers and product owners regularly to clarify all the questions that you need answered. This is THE most critical phase of API development.

Identify resources and operations(defining end points)

At the core of a rest api are resources. Through the API end points, we are figuring out the best way for the consumers to access these resources in a user-friendly manner. The API end points need to be very intuitive and there should be a certain consistency to the design.

Create Stoplight documentation for the API end points

Documenting each of the end points in detail adds additional clarity and exposes potential loopholes in our design. It helps us streamline the request and response payloads associated with each end point. Once we have the documentation, we can share it with other team members and solicit early feedback. We now have a source of truth for development.

Picking the appropriate database (if there is a need)

Based on the service requirements, we need to figure out what kind of a database makes sense. If we are looking at structured data with a lot of relationships between the data and there is going to be a lot of analysis of the data (beyond basic queries), a SQL database would probably make sense. If we want more flexibility on the kind of data that will be stored and if we are going to be dealing with frequent schema changes, we should probably go with a noSQL database that allows storing unstructured data. Other considerations include team member expertise with various databases and support for various databases within the company. If more teams are using it, you have more people to ask questions. If there is already an existing infrastructure for the database setup and deployment, it makes things much easier.

Data model design

Once you have decided on the database, we need to define the schema. In order to have a well-defined schema, it helps a lot to understand all the potential access patterns. We will then define the entities, relationships between the entities, the constraints, logging/auditing requirements, replication, security of the data etc. We need to account for future flexibility, scalability and performance. Once you have the initial design draft in place, document the design in a wiki and request the team for feedback at an early stage so that we can catch any design flaws early.

Security design

Create a Continuous Threat Modeling document (CTM) to document all the information pertaining to the security of the service. Most companies usually have a CTM template that is used across the company. You will need to define the authentication and authorization for each of the API end points,. You will need to mention any encoding/decoding that is being used anywhere in the service (including data at rest). You will need to explain how you plan to deal with bad data in the request payload or URIs. You will need to explain the transportation security in place (like SSL). If you are using a cloud provider like AWS, then you will have to explain the security apparatus in the cloud. In addition, we need to make sure to restrict access to stoplight and other API related wiki pages to ensure the integrity of the data. Request for feedback from the team, as soon as the initial draft is in place, so that changes can be made at an early stage.

Code Implementation

Picking the right tech stack. In most cases, this is probably going to be based on what is already being used in the company. Nowadays most techs stacks offer similar features and unless there is a really special requirement that would warrant a specific language/tool, it makes sense to stick with the most popular ones within the company. Design the operations in a modular, extensible and reusable manner with attention to encoding/decoding wherever necessary. Make sure to pay attention to string encodings. Make sure to not use user-supplier information to build keys in the backend since they are subject to change. Make sure to validate the input for each operation before allowing the operation to proceed. Make sure to log messages as needed. design the exception handling carefully to account for all kinds of exceptions with appropriate messaging, so that it becomes easy to troubleshoot issues when they arise. Make sure to add unit/integration/contract/functional and smoke tests for the end points. Build the testing infrastructure in a modular, extensible and reusable manner as well. Make sure to have good code coverage. Code coverage and styling and security checks can be strictly enforced in the Jenkins pipeline as well.

Development and Deployment Environments:

It is important to have a good understanding of the various environments in place for the API. We can have different environments like dev, where all the development happens, followed by Staging, which is where the QA team tests and finally the Prod environment. During development, if we are adding any backend resources or configurations, we need to make sure we do it in a way that enables them to be deployed to each of the environments in a consistent manner. We also need to have notifications in place to be notified of success/failure during the deployment. We also need to work with the devops teams to ensure that appropriate security is in place for each environment.

Testing:

Work with the QA teams in parallel to ensure that test plans are in place for each of the end points. In addition to manual testing, the QA teams should have their own automated testing in place. If the API is being consumed inside an application, there need to be end-end tests for the same. There can be UI tests to validate the responses in the front-end.

Performance and Load testing

It is important to test the performance of our APIs to make sure the latency is within the acceptable range. We can use tools like Blazemeter for the purpose. It is also important to determine the maximum load our APIs can handle so that we can configure the backend servers appropriately. Based on the test results, we can look into additional measures like code optimizations, altering the database schema or implementing caching. It also helps to have a good understanding of observability tools like Splunk and Kibana so that we can query the vast amounts of error logs and troubleshoot problems effectively.

Documenting the entire process in a wiki so that we have a blueprint in place the next time we decide to build another API.

6/19/24

A Use Case for DynamoDB

DynamoDB is a noSQL database from Amazon. More information here.

Let us assume we are building a REST API that deals with three main resources: City, Team, Player. Each resource has different attributes.

Following are the endpoints for each resource:

City:

POST: AddCity
PATCH: UpdateCity
DELETE: DeleteCity
GET: ListCities and GetCityByID

Team:

POST: AddTeam
PATCH: UpdateTeam
DELETE: DeleteTeam
GET: ListTeams and GetTeamByID

Player:

POST: AddPlayer
PATCH: UpdatePlayer
DELETE:DeletePlayer
GET: ListPlayers and GetPlayerByID

Based on the end points, it is evident that we will mostly be doing simple read/write operations and there will be no real need for any complex queries. Sometimes there can be a high volume of read operations, but thats about it. So fast read writes and scalability is our main criteria and of course cost optimization associated with software and hardware and usage.

Implementation with some SQL database (ex: SQL Server)

You will have to provision the servers and the software.
Maintain the servers in the future.
Manage the scaling as data grows. Might require additional servers.
We will have to manage the replication, partitioning etc.
We will need three tables for the three entities - city, team and player and associated referential integrity relationships between them. The database generates the keys for the tables and support for constraints is also built-in.
Updates to the database schema are non-trivial. Adding additional attributes to a city or a team would require a good deal of work.
No special support for fast read/write operations
We can write complex queries (which is not our requirement)
We will be charged irrespective of the amount of usage

Implementation with DynamoDB

It is a managed database, so no need for any software or hardware installations
No maintenance of the software or hardware. Amazon will take care of this.
We can use just one table for all the three entities. We store data as key-value pairs and as a result, we can store rows of varying attributes in DynamoDB. So we can store a row for the city, a row for the team and a row for the player in the same table. The only requirement is that each row should have a unique primary key attribute (which is defined in the schema definition). This is my favorite aspect of DynamoDB.
Since we are storing the data as key-value pairs, adding additional attributes or removing existing attributes (other than primary key) is trivial. This makes DynamoDB very powerful for scenarios that require flexibility(with dynamic requirements).
Supports fast read/write operations, which is our main requirement.
Amazon will scale the table as it grows, we do not need to worry about it.
We do not have to worry about partitioning or replication of data.
Amazon does not charge for the amount of data, but for the number of reads/writes. We can optimize the costs by designing the schema based on our access patterns. This is the most important aspect to keep in mind when dealing with DynamoDB. For example - if we are trying to update a row with 100kb of data, DynamoDB will charge a lot more than if we were trying to update a row with 1kb of data. So we can perform optimizations like storing the data that will be updated, in a separate row and so on.
We will have to generate the keys for the table. Other than enforcing the primary key constraint, DynamoDB does not support any other kind of uniqueness or constrains by default. We will need to implement those on our own. The beauty is that we can add manage additional constraints by adding additional rows to the same table and performing atomic operations in our code. For example - if we had a requirement that a city cannot have more than one team with the same name. As soon as we add new team for a city, within the same transaction, we can add another row with a primary key value that is a combination of cityID and team name. The next time we try to add another team with the same name in a city, it will violate the primary key constraint. So it is pretty easy to implement custom constraints in DynamoDB.

It is very clear that DynamoDB is better suited for our REST API requirements than any other SQL or noSQL databases. There are other noSQL options like MongoDB and Cassandra, with their own distinct advantages, but they do not make sense for our requirements. MongoDB for example supports a lot of data types and aggregate queries and transactions while Cassandra supports varying columns of data.

3/5/22

Microservices - the good and the not-so-good

A microservice can be considered as a self contained unit of functionality, that is usually small to medium in size. Since the beginning of last year, I have been working on a large web application that is is powered by microservices. This was my first exposure to microservices and I have come to realize that they have their own advantages and disadvantages.

The good:

A microservice is a well defined unit of functionality. It does only one thing and does it well.
A microservice is usually small in size and is easy to develop and test and maintain.
A microservice is an independent unit of functionality, so any tech stack can be used to develop it
If you decide to use the same tech stack for new microservices, you can copy/paste an existing microservice and reuse the configuration and deployment scripts and maybe even some code.
Each microservice can have it's own security requirements
Each microservice can be developed by a small dedicated team of engineers
It is easy to make changes and deploy a microservice without fear of impacting other applications or services. A great deal of regression or smoke testing is not required like in the case of monolithic applications

The not-so-good:

Troubleshooting an error in an application powered by microservices is not straight-forward. Each call can potentially traverse across several microservices before the required results are returned. It becomes necessary to pass certain request related data (like a Request ID) from the application to all the services in the chain so that all the errors can be logged with the same request and can be used to identify and troubleshoot errors.
From a development point of view, each microservice is a separate project and a small dedicated team of engineers usually manage a few microservices. so it becomes hard to get up to speed and become familiar with many different projects. This becomes even more difficult if a different tech stack is used for different microservices
All interactions with microservices happen via Http and it is a totally different paradigm compared to regular application development. The testing and debugging is done via tools like Postman. It takes some getting used to.
Since microservices are well defined units of functionality, they make calls to other microservices for additional information. As a result, there is a lot of dependency between microservices. This can get frustrating, especially during development stage, since some teams keep changing their interfaces constantly. This can also result in a lot of code rewrite.
Since microservices depend on other microservices, the capability of a given microservice is limited by those dependencies. For example, a microservice may be able to process 50 requests per minute (RPM), but if one of the other dependency microservice can only handle 20 RPM. then this microservice will need to work with that speed.
Since different microservices could potentially be developed and maintained by different teams in different time zones, a great deal of collaboration and understanding is required for completing a project. This can be very frustrating at times and can lead to a lot of friction.
You will need to use tools like Splunk (there are several others) for logging and querying information about various microservices. This represents a learning curve, especially if you need to write complex queries for displaying information on dashboards. Splunk, for example, uses regular expressions to query data for meaningful insights. This is again a totally different paradigm compared to typical monolithic applications.
You will need to document the functionality provided by each microservice via documentation tools like Swagger or Stoplight (there are others as well). This requires a good deal of work and good attention to detail since your documentation is your source of truth for the service users. Any changes need to be updated promptly as well.

2/22/22

Planning for Error Logging

The purpose of error logging is to log information about errors in a way which makes it easy to diagnose and troubleshoot issues when they arise. While error logging is necessary for all applications, it is indispensable when it comes to large distributed applications.

In the past, I have written some error logging code but I had never worked on a extensive error logging requirement, that too, from scratch. Last month, as I started work on error logging for a large batch project, I quickly realized that a lack of planning and insight can lead to logs that are hard to track, analyze and troubleshoot and could have disastrous consequences on the outcome of the project.

Here are some questions and thoughts to keep in mind while planning and coding for it:

Questions

Why do we want to log errors?
What do we want to do with the logs?
Are we going to just search and view them in a tool like Splunk?
Are we going to build dashboards on top of the logs? if yes, what types of statistics do we want to show on the dashboards?
what are the different types of errors that need to be logged to account for the various stats that we are interested in?
What information needs to be logged with each error?
If we are going to build dashboards, we need to be able to query the logs. So what format makes sense for querying?

Thoughts

Ensure that the format and structure of the data is concise and meaningful
Ensure that the messages and formats are consistent across the board
Try to use common libraries and classes to centralize the access to logging code for consistency and ease of maintenance.
Create specific exception classes for each scenario, so that we can track, analyze and troubleshoot errors faster.
Write extensive unit tests with excellent coverage including tests on specific error messages because it is very difficult to test all scenarios when code changes are made to the error logging code.
If you are dealing with microservices, make sure to log all information required to track errors that span several services. RequestId is an example of one such piece of information that can be used to connect a request across various services and help troubleshoot issues.

12/28/19

Designing a Ticket Turnstile System at a Subway Train Station

Recently, I interviewed at a company where the interviewer asked me to explain how I would go about designing a ticket turnstile system at a subway train station. For some inexplicable reason, I just talked about it at a very high level, making incorrect assumptions along the way. I never made an attempt to first write down my thoughts on the notebook in front of me and then try to go about explaining my ideas to the interviewer (and to add to my bad luck, the interviewer also did not ask me to draw or explain anything on the white board).

After I came home, I decided to work on that design again, starting with a basic use case, with a pen and a notebook in hand. As i started writing down my thoughts, I realized that the requirements were not complex at all. By not writing down my ideas on paper, I had just complicated things for myself and that made me come across as an incompetent engineer, incapable of designing software systems. Enough venting and crying. In under 10 minutes this is what I was able to come up with :

Consider a simple use case of a person, who has just bought a ticket for a ride from station X to station Y. I am therefore going to assume that there is a unique barcode generator of some sort and when the user bought a ticket, he was assigned a ticket with a unique barcode.
Now that the user has bought the ticket, I am going to assume that there is a separate database of all tickets that are current and this particular ticket has been moved to that database. I am assuming that there is one row of information per ticket in the database. This row would include fields like the serial number,location, payment details, IsCheckedIn and IsCheckedOut.
My assumption is that once the user scans the ticket at the starting point, the IsCheckedIn field is set to true and when the user exits from the destination station, the IsCheckedOut field is set to true.So considering a happy path, the user would checkin and checkout without any issues and the database would be updated accordingly and that would signal a completed transaction for that ticket.
Because the data for each ticket requires just one row, given the fact that there is no other relational data, I am assuming that we do not need to use a relational database. We can use a no-sql database like ravendb where all the information can be stored in a single document or a JSON blob.
Because each ticket is unique and there is nothing relational, there shouldn't be any performance issues due to locking of resources. All that needs to happen is the updating of the IsCheckedIn and IsCheckedOut fields for each ticket. As a result, the performance of the system would be a function of the processing power of the database server and the number of connections it can handle.There will be no need for any kind of database replications to keep the information up to date.
Now, let us say, the user has bought a ticket and for some reason, he is not able to get into the station (or get out) because the scanner wont recognize his ticket for some reason. So I am going to assume that the user will go to an attendant at the station, who will try to look up the ticket on a computer. So the design would need to include a client, that will point to the same database of all current tickets.
Once the user has checked out from the destination station, the IsCheckedOut field will be set to true, meaning that the transaction is complete. This record can then be moved to another database of all completed transactions, where they can stay for a mandated(by government, is my assumption) time period before the same barcodes can be recycled.
For security reasons, I am also assuming that both the IsCheckedIn and IsCheckedout fields need to be set to true for a transaction to be considered as complete. My assumption is - the user can scan out only if the user has scanned in.

I still feel bad about not having come up with something along these lines during the interview. This is how I normally approach all my design projects (writing down my thoughts on paper), not sure why I kept talking in the air that day. Was I tense or careless or stressed out or just plain stupid? It still haunts me.

12/2/19

Azure App Services : Serverless functions

Azure offers a set of PAAS(Platform-As-A-Service) offerings known as App Services. Serverless functions are one of the services under the App Services Category.

A Serverless function is a piece of code that can be run independently. It can be triggered by certain events or invoked explicitly depending on the business requirements. Every time a serverless function runs it uses a certain amount of memory and Azure makes sure to run it on the appropriate server (depending on the language used to create the serverless function. You can create serverless functions in different languages like C#, Java, Javascript, Python and some others). So when you configure serverless functions in the Azure portal, you just specify the amount of memory you want to allocate. You do not have to be bothered about managing or scaling the server instances. Because of this, they are known as serverless functions. It is not because they don't run on servers. They do.

Here is an example of a serverless function :

There is a cool online shoe store where you go to buy your shoes. You go to their website and you want to get this brand new model xyz100, but that model is out of stock. So the store offers an option for you to be notified when the stock is replenished. You just need to provide your email which gets stored in their database. Behind the scenes, the store will create a serverless function that will be invoked whenever the model xyz100's stock is replenished. This function will then go to the database and get a list of people(including you) who have requested to be notified when this model is in stock and will send out emails to all of them.

In the case of serverless functions you are charged only when the function runs. You can always increase the amount of memory depending on your business needs. Since serverless functions are stand-alone and can be created in several different languages, they provide a lot of flexibility and freedom when it comes to the development process. You can have separate teams building and managing different functions. You can also choose the technology that best suits the requirement.

App Services in Microsoft Azure : Logic Apps

Azure offers a set of PAAS(Platform-As-A-Service) offerings known as App Services. Logic Apps are one of the services under the App Services Category. They are similar to workflows and can be used to perform a set of connected operations. Here is an example of a logic app :

Let us say there are a set of tables on two different database servers and you are trying to replicate data from one server to the other. Your goal is to monitor the destination tables and perform two specific actions when new data comes in :

Call a stored proc that processes the new data and returns a result set.
Invoke an API and pass the result set from the previous step.

Within a few minutes, you can create a logic app which can be scheduled to run at specified intervals. This logic app will have two connected nodes. The first one calls a stored procedure that queries the destination tables for new data and does some business rules processing and returns a result set and the second node can then take this result set and pass it to an API.

We can also scale up (i.e. increase the processing power) or scale out (increase the number of instances) very easily via the Azure portal. We do have to deal with any concurrency issues that may arise when several instances of the logic app are running simultaneously.