Database Architecture
Database architecture refers to the design and structure of a database system. A database’s architecture dictates how an organization can store, access, manage, and secure data.
Choosing the wrong architecture inevitably leads to future issues and likely rebuilds. In this article, we’ll help you avoid that fate by guiding you through the basics, including what a database architecture is and what it’s composed of. We’ll also show you some of the practical elements of database architecture, including the importance of uptime and the role databases play in microservices.
What is database architecture?
Database architecture refers to how a company and its developers have set up and configured database systems to support their apps, websites, and infrastructures.
The database design dictates entirely different ways to store, secure, manage, and access data, making it an important decision to make early on – one that can have lasting effects on the functionality of your systems.
Four components primarily comprise a database architecture, but each component can vary depending on the database in question.
- Data model: The data model structures how data is organized, stored, and manipulated.
- Types/layers: Databases come in three types or layers – 1-tier, 2-tier, or 3-tier.
- Database Management System (DBMS): Software for managing the database.
- Schemas: Logical or physical ways of structuring the tables, fields, and relationships within a database.
In a complex system, companies often adopt multiple databases with different versions of these components and different data management methodologies.
For example, a company might use Snowflake as an analytical database for storing analytical data and supporting the writing of long, complex queries, Redis as an in-memory database for quick operations to support microservices, Kafka as a streaming database to support streaming data from location to location, and a range of production databases, such as MySQL, for application data storage.
Types of database architecture
When people are searching for the right database, they often get caught up in nuance and overcomplicate the questions they should ask.
You might know, for instance, that the right database needs to be able to structure data in a granular way to make for efficient retrieval or that the right database needs to support numerous ways to query it. But when you’re starting your search, it’s best to first rewind, look at the basic typology, and filter options from a high level down to lower, more nuanced levels.
Before you get into the weeds, determine whether the database architecture you need is relational or non-relational and whether it should be one-tier, two-tier, or three-tier.
Relational vs. non-relational databases
The choice between relational databases and non-relational databases (the latter more commonly known as NoSQL) comes down to a few major factors:
- Data structure: Relational databases tend to be better suited for structured data with well-defined relationships between entities, whereas NoSQL databases tend to be best for unstructured or complex data that requires flexible schema.
- Scalability: Relational databases tend to be better at vertical scaling, whereas NoSQL databases tend to be better at horizontal scaling.
- Data consistency: Relational databases offer strong consistency through ACID (Atomicity, Consistency, Isolation, Durability), whereas NoSQL databases offer eventual consistency through BASE (Basically Available, Soft state, Eventual consistency).
- Performance: Relational databases tend to have predictable performance levels for structured data use cases but slow down with write-heavy operations, whereas NoSQL databases tend to be a better fit for use cases that require high-performance read/write operations.
- Flexibility: Relational databases require a fixed schema, so changing the data structure can be difficult, whereas NoSQL databases are flexible or schema-less, making them a better fit for most companies with evolving data models.
The usage of each database type is particularly different because, in relational databases, you write SQL queries to relate data across and between tables; in NoSQL databases, there aren’t any tables, and the database doesn’t define or enforce a particular way of relating data.
1-Tier Architecture
In a one-tier architecture, all the components of the database—including the data itself, the user interface, and the application logic—reside on the same server.
Databases with single-tier architectures are rare in enterprise environments. These databases are more common for applications that only need to run on a small scale or by people who need to prioritize saving on costs.
2-Tier Architecture
In a 2-tier architecture (also known as client-server architecture), the database is split into two. The client (or, more often, multiple clients) directly connects to the server where the database resides, but the two are logically and physically separated.
Two-tier architectures are also not common in modern enterprise environments. In the past, these architectures were common when use cases were as simple as connecting a desktop application to a database hosted on an on-premises server. Now, however, long after the rise of the cloud, SaaS, and microservices, 2-tier architectures aren’t as popular.
3-Tier Architecture
In a 3-tier architecture, the database is split into three. Multiple clients connect to a backend, and the backend connects to the database. Three-tier architectures use a backend as an intermediary, which has numerous advantages, making it the most common type in enterprise environments.
For example, enterprises can limit access and make breaches less likely by ensuring the database only connects to a single backend. Similarly, by designing their databases with this level of separation as a first principle, enterprises can ensure developers can operate the layers independently, making scalability easier.
Why do keys matter in database architecture?
If you’re designing a database architecture, understanding keys – and the nuances of how they work within different architectures – is essential to building a database that supports your needs. In short, keys are how databases identify records within a table and create links between tables.
Keys matter for three primary reasons:
- Keys provide data integrity because they are unique. This ensures that each record in a table is unique and no two rows have the same primary key.
- Keys improve database performance by creating an index for the corresponding columns. With these indexes, the database can efficiently locate rows without scanning the entire table, making queries faster.
- Keys organize database structure by providing ways for tables to relate to other tables. This allows users to rely on more efficient joins and better query planning. Keys also enable hierarchical data organization, which allows databases to be organized into clear parent-child relationships.
Keys come in several different types, including primary keys, which uniquely identify each record; foreign keys, which create links between tables; and composite keys, which combine multiple columns in a single table.
Why is uptime important in a database system?
Today, users expect levels of uptime that can only be described with numerous decimal points (“high nines” or 99.999% of uptime). Even though software has become much more complex and interdependent, companies that want to maintain user trust must build uptime and availability considerations into their earliest database decisions.
The CAP theorem and its implications for uptime
You won’t be searching for database options for long before you hear about the CAP theorem. CAP is an acronym for Consistency, Availability, and Partition tolerance. The theory is simple on the surface but complex once you dig into the details.
In short, the CAP theorem is a classic “three options, but you can only pick two” problem: A distributed system cannot simultaneously be consistent, available, and partition tolerant.
Here, consistency means that read operations that start after write operations must return that value; availability means every request received by a node in the system must result in a response; and partition tolerance means the network will be allowed to lose messages sent from one node to another.
Eric Brewer, now vice-president of infrastructure at Google and a professor emeritus of computer science at the University of California, Berkeley, formulated the original theorem and presented it in 2000. Even decades later, the theorem remains an important anchor point for people thinking through their database needs.
Each choice among the three poses tradeoffs, and companies building databases need to embrace the harsh reality that having all three is impossible.
The costs of downtime
For most companies, availability – the lack of which can result in downtime – stands out as the part of the theorem that’s too costly to deprioritize.
Splunk research shows, for example, that the global 2000 companies lose $400B annually to downtime and service degradation. Major companies, such as Meta and Amazon, which lost $100 million in revenue and $34 million in sales during downtime events, respectively, have also faced this issue.
Businesses invest a lot of effort and money into scale, but if their infrastructure and the databases that support them can’t handle that scale, increased customer reach can turn into increased customer disappointment. Early-stage companies often have more flexibility, but enterprises, which have become essential tools for so many businesses and users, need to prioritize availability to maintain the trust their customers depend on.
How databases maintain uptime
The essential puzzle of database availability is figuring out how to ensure databases have enough resources available to complete requests and enough resources at any one time to quickly complete a new request.
As new users and requests pour in, a database has two options: Scale vertically or horizontally. Scaling vertically means, in short, that enterprises make their servers bigger or faster. Scaling horizontally means enterprises distribute the database in question across multiple, smaller servers.
There is no “best way” here because each direction poses tradeoffs. There are nuances, but Justin Gage, author of the Technically newsletter, summarizes the tradeoffs like this: “Scaling vertically is easy, but scaling horizontally is efficient.”