How OpenAI is scaling the PostgreSQL database to 800 million customers

Whereas vector databases nonetheless have many legitimate use circumstances, organizations together with OpenAI are leaning on PostgreSQL to get issues carried out.

In a blog post on Thursday, OpenAI disclosed the way it is utilizing the open-source PostgreSQL database.

OpenAI runs ChatGPT and its API platform for 800 million customers on a single-primary PostgreSQL occasion — not a distributed database, not a sharded cluster. One Azure PostgreSQL Versatile Server handles all writes. Almost 50 learn replicas unfold throughout a number of areas deal with reads. The system processes hundreds of thousands of queries per second whereas sustaining low double-digit millisecond p99 latency and five-nines availability.

The setup challenges typical scaling knowledge and provides enterprise architects perception into what truly works at large scale.

The lesson right here isn’t to copy OpenAI’s stack. It’s that architectural choices ought to be pushed by workload patterns and operational constraints — not by scale panic or modern infrastructure selections. OpenAI’s PostgreSQL setup reveals how far confirmed techniques can stretch when groups optimize intentionally as an alternative of re-architecting prematurely.

“For years, PostgreSQL has been considered one of the most important, under-the-hood knowledge techniques powering core merchandise like ChatGPT and OpenAI’s API,” OpenAI engineer Bohan Zhang wrote in a technical disclosure. “Over the previous yr, our PostgreSQL load has grown by greater than 10x, and it continues to rise shortly.”

The corporate achieved this scale by means of focused optimizations, together with connection pooling that reduce connection time from 50 milliseconds to 5 milliseconds and cache locking to stop ‘thundering herd’ issues the place cache misses set off database overload.

Why PostgreSQL issues for enterprises

PostgreSQL handles operational knowledge for ChatGPT and OpenAI’s API platform. The workload is closely read-oriented, which makes PostgreSQL an excellent match. Nonetheless, PostgreSQL’s multiversion concurrency management (MVCC) creates challenges underneath heavy write masses.

When updating knowledge, PostgreSQL copies complete rows to create new variations, inflicting write amplification and forcing queries to scan by means of a number of variations to discover present knowledge.  

Quite than preventing this limitation, OpenAI constructed its technique round it. At OpenAI’s scale, these tradeoffs aren’t theoretical — they decide which workloads keep on PostgreSQL and which of them should transfer elsewhere.

How OpenAI is optimizing PostgreSQL

At giant scale, typical database knowledge factors to considered one of two paths: shard PostgreSQL throughout a number of major situations so writes could be distributed, or migrate to a distributed SQL database like CockroachDB or YugabyteDB designed to deal with large scale from the begin. Most organizations would have taken considered one of these paths years in the past, effectively before reaching 800 million customers.

Sharding or transferring to a distributed SQL database eliminates the single-writer bottleneck. A distributed SQL database handles this coordination mechanically, however each approaches introduce vital complexity: utility code should route queries to the appropriate shard, distributed transactions change into more durable to handle and operational overhead will increase considerably.

As an alternative of sharding PostgreSQL, OpenAI established a hybrid technique: no new tables in PostgreSQL. New workloads default to sharded techniques like Azure Cosmos DB. Present write-heavy workloads that may be horizontally partitioned get migrated out. Every part else stays in PostgreSQL with aggressive optimization.

This method provides enterprises a sensible various to wholesale re-architecture. Quite than spending years rewriting tons of of endpoints, groups can determine particular bottlenecks and transfer solely these workloads to purpose-built techniques.  

Why this issues

OpenAI’s expertise scaling PostgreSQL reveals a number of practices that enterprises can undertake no matter their scale.

Construct operational defenses at a number of layers. OpenAI’s method combines cache locking to stop “thundering herd” issues, connection pooling (which dropped their connection time from 50ms to 5ms), and charge limiting at utility, proxy and question ranges. Workload isolation routes low-priority and high-priority visitors to separate situations, making certain a poorly optimized new function cannot degrade core companies.

Evaluate and monitor ORM-generated SQL in manufacturing. Object-Relational Mapping (ORM) frameworks like Django, SQLAlchemy, and Hibernate mechanically generate database queries from utility code, which is handy for builders. Nonetheless, OpenAI discovered one ORM-generated question becoming a member of 12 tables that induced a number of high-severity incidents when visitors spiked. The comfort of letting frameworks generate SQL creates hidden scaling dangers that solely floor underneath manufacturing load. Make reviewing these queries a regular follow.

Implement strict operational self-discipline. OpenAI permits solely light-weight schema adjustments — something triggering a full desk rewrite is prohibited. Schema adjustments have a 5-second timeout. Lengthy-running queries get mechanically terminated to stop blocking database upkeep operations. When backfilling knowledge, they implement charge limits so aggressive that operations can take over every week.

Learn-heavy workloads with burst writes can run on single-primary PostgreSQL longer than generally assumed. The choice to shard ought to rely on workload patterns fairly than consumer counts.

This method is significantly related for AI functions, which frequently have closely read-oriented workloads with unpredictable visitors spikes. These traits align with the sample the place single-primary PostgreSQL scales successfully.

The lesson is easy: determine precise bottlenecks, optimize confirmed infrastructure the place attainable, and migrate selectively when essential. Wholesale re-architecture is not all the time the reply to scaling challenges.

Disclaimer: This article is sourced from external platforms. OverBeta has not independently verified the information. Readers are advised to verify details before relying on them.

Your Bookmarks

Sorry, you have no bookmarks yet.

‘It’s Undignified’: A whole bunch of...

The AI Abilities Wage Premium

The Finest Deal in Video Video...

Tech

AI

SEO

Security

How-To

How OpenAI is scaling the PostgreSQL database to 800 million customers

Search

Follow Us

Join Our Community

Why PostgreSQL issues for enterprises

How OpenAI is optimizing PostgreSQL

Why this issues

Read Also:

No Firm Has Admitted to Changing Employees With AI in New York

The Alpha Is Not LLM Monitoring

The Newest Push to Prolong Key US Spy Powers Is Nonetheless a...

Arcee goals to reboot U.S. open supply AI with new Trinity fashions...

ByteDance’s Different AI Chatbot Is Quietly Gaining Traction Round the World

It is time for Apple to reinstate ICEBlock

Google Revises Uncover Tips Alongside Core Replace

Gear Information of the Week: Apple’s AI Wearable and...

How to use Google Images' new Me Meme function

Stay Updated!

Recent Posts:

‘It’s Undignified’: A whole bunch of Staff...

The AI Abilities Wage Premium

The Finest Deal in Video Video games...

Why Sharing a Screenshot Can Get You...

The Race Is on to Preserve AI...

OpenAI’s Newest Launch Appears Like the Undertaking...

The Psychology of Nudges: Why the Smallest...

Your Bookmarks

Sorry, you have no bookmarks yet.

Search

Follow Us

Join Our Community

Why PostgreSQL issues for enterprises

How OpenAI is optimizing PostgreSQL

Why this issues

Read Also:

Post Activity

Share this post

Stay Updated!

Recent Posts: