October 2, 2016 · springboot hibernate

Spring-boot Schema based multi tenancy

TLDR; This article will explain multi tenancy, focusing in on the SCHEMA strategy and how to implement it in two simple steps using Spring Boot and Hibernate.

Multi-tenancy is the sharing of process and infrastructure across multiple customers or tenants efficiently. The alternative to this is having a siloed application environment per customer. This brings it's own problems such as;

Ofttimes multi tenancy offers the easiest way to scale customer growth while minimizing infrastructure and operational costs.

There are a few principle approaches to multi tenancy;

Discriminator Strategy

The discriminator pattern works on a single database service and single schema for all tenants. Constituent tenants are discriminated by a specific strategy such as a tenant_id field embedded in tables containing tenant specific data. Beyond the below pro's/con's this strategy is a non-starter for use case which legally require 'air-space' between tenants.
Pros

Cons

Scaling can be problematic depending on the underlying storage technology chosen due to the monolithic nature of the schema. If a traditional RDBMS is chosen replicas can be employed for read scaling and a sharding strategy employed for write scaling. If using a RDBMS this particular strategy lends itself well to use cases where historic data can be archived leaving just hot data in the primary database system. These considerations change if using a NoSQL technology such as AWS Aurora or MongoDB where r/w scaling is handled transparently as part of the storage service layer and not a concern of the application itself. In addition to this schema upgrades can be challenging based on the volume of potential data and all customers being affected simultaneously. Even with a backing technology supporting 'online schema updates' the application may have to consider supporting multiple data schema versions until the schema update is complete.

Schema Strategy

The schema strategy employs a single database server like the DISCRIMINATOR strategy but specifies a schema instance per tenant meaning that each tenant has complete isolation at the data layer from other tenants.
Pros

Cons

Database Strategy

The database strategy takes the SCHEMA strategy one step further whereby each tenant has a separate schema instance on a separate database.

Pros

Cons

Concluding strategy thoughts

The pro/cons for each strategy are entirely subjective to the use-case under consideration. From a general standpoint I personally favor the SCHEMA approach having seen it work successfully in production many times. I also believe it strikes the right balance between pragmatic pros & cons as well as offering architectural escape routes should performance and scaling problems arise.

Further reading can be found here on the Hibernate website which is the default ORM for SpringBoot applications

Implementing the SCHEMA strategy

So now we've taken a quick high-level tour of the main multi tenant strategies lets run through what it takes to add one to a typical Spring Boot application. Here we'll be employing the SCHEMA strategy. It's actually surprising how trivial and flexible it is.

As a quick side note, while SCHEMA & DATABASE strategies are supported as of Hibernate 4.1, support for the DISCRIMINATOR pattern was introduced in 5.x (see here for more details)

Step 1. Tenant awareness

So first thing's first. For an application to be multi tenant it must have a way to detect and store the correct tenant for the transaction it is serving.

For the purposes of this entry we will assume a simple tenant naming schema where the name of the tenant id matches the name of the tenant schema in the database. We will also assume we are starting from a simple SpringBoot MVC CRUD application with RESTful API. A basic example can be found on the SpringBoot guide page here or you can look at the full working example documented here on github.
The following will serve as our tenant storage interface, storing the tenant as data against the current thread (see here for more information on ThreadLocal usage).

public class TenantContext {

  final public static String DEFAULT_TENANT = "test";

  private static ThreadLocal<String> currentTenant = new ThreadLocal<String>()
  {
    @Override
    protected String initialValue() {
      return DEFAULT_TENANT;
    }
  };

  public static void setCurrentTenant(String tenant) {
    currentTenant.set(tenant);
  }

  public static String getCurrentTenant() {
    return currentTenant.get();
  }

  public static void clear() {
    currentTenant.remove();
  }
}

One thing to note here is the DEFAULT_TENANT. This is necessary from a Spring framework point of view to initialize the connection pool to the database and Hibernate (see later) will complain on initial startup of the application is this is null and a multi-tenant strategy is in place. This can be implemented much cleaner in Java 8+ than in the code sample above. The DEFAULT_TENANT could be a real tenant but if that makes you uneasy you could use a demo/empty tenant or your architecture may have the concept of a shared 'master' database for centralized tenant and shared dictionary management.

But how does this get set? Our tenant could passed in the header, subdomain (e.g. http://tenantid.myapp.com/....), URI (e.g. http://myapp.com/tenant_id/....) cookie or ideally as part of the authentication strategy such as a property in a JWT.

For this example we will use a simple http header property (X-TenantID). You should absolutely not use this strategy in any production application under any circumstances, this approach is purely to simplify the concepts.

Regardless the vehicle for the tenant data, it is desirable to have the multi tenant mechanics isolated away from, and as invisible to, the main application as much as possible. For instance, no tenant specific business logic should ever be visible in the controllers. To this end, the HTTP HandlerInterceptorAdapter class is perfect for this and requires two additions to our application; the interceptor itself and the configuration to hook the interceptor in.

@Component
public class TenantInterceptor extends HandlerInterceptorAdapter {

  private static final String TENANT_HEADER = "X-TenantID";

  @Override
  public boolean preHandle(HttpServletRequest req, HttpServletResponse res, Object handler)
      throws Exception {

    String tenant = req.getHeader(TENANT_HEADER);
    boolean tenantSet = false;

    if(StringUtils.isEmpty(tenant)) {
      res.setStatus(HttpServletResponse.SC_BAD_REQUEST);
      res.setContentType(MediaType.APPLICATION_JSON_VALUE);
      res.getWriter().write("{\"error\": \"No tenant supplied\"}");
      res.getWriter().flush();
    } else {
      TenantContext.setCurrentTenant(tenant);
      tenantSet = true;
    }

    return tenantSet;
  }

  @Override
  public void postHandle(
      HttpServletRequest request, HttpServletResponse response, Object handler, ModelAndView modelAndView)
          throws Exception {
    TenantContext.clear();
  }

In the interceptor above, note the logic to return an appropriate response code and message body if a tenant is missing. This logic becomes unnecessary if the tenant is part of the authentication schema and securely transmitted in a JWT for instance which, by definition, is generated by a trusted entity.

And finally the configuration to wire the interceptor in;

@Configuration
public class WebMvcConfig extends WebMvcConfigurerAdapter {

  @Autowired
  HandlerInterceptor tenantInterceptor;

  @Override
  public void addInterceptors(InterceptorRegistry registry) {
    registry.addInterceptor(tenantInterceptor);
  }
}

It's interesting to note that interceptors can be applied to specific URL path patterns which opens up the possibility of different tenant strategies for different parts of the application. For instance everything under \admin could be handled by a different tenant interceptor which could force the tenant id to ADMIN and use a schema dedicated to centralized management of all the tenants in the system.

At this point, you can test your progress with curl and a simple endpoint responding to GET.

Without a X-TenantID header

$ curl -v localhost:8080/person/1 | jq .
*   Trying 127.0.0.1...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0* Connected to localhost (127.0.0.1) port 8080 (#0)
> GET /person/1 HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.45.0
> Accept: */*
> 
< HTTP/1.1 400  
< X-Application-Context: application  
< Content-Type: application/json;charset=ISO-8859-1  
< Transfer-Encoding: chunked  
< Date: Thu, 29 Sep 2016 15:04:36 GMT  
< Connection: close  
<  
{ [37 bytes data]
100    31    0    31    0     0   1880      0 --:--:-- --:--:-- --:--:--  2066  
* Closing connection 0
{
  "error": "No tenant supplied"
}

X-TenantID doesn't do anything at this point, we are simply detecting and storing the desired tenant context. So with any X-TenantID header you should see the following

$ curl -v -H "X-TenantID:foo" localhost:8080/person/1 | jq .
*   Trying 127.0.0.1...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0* Connected to localhost (127.0.0.1) port 8080 (#0)
> GET /person/1 HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.45.0
> Accept: */*
> X-TenantID:test
> 
< HTTP/1.1 200  
< X-Application-Context: application  
< Content-Type: application/hal+json;charset=UTF-8  
< Transfer-Encoding: chunked  
< Date: Thu, 29 Sep 2016 15:04:30 GMT  
<  
{ [244 bytes data]
100   238    0   238    0     0   5803      0 --:--:-- --:--:-- --:--:--  5950  
* Connection #0 to host localhost left intact
{
  "_links": {
    "self": {
      "href": "http://localhost:8080/person/1"
    }
  },
  "lastName": "Baggins",
  "firstName": "Frodo",
  "updatedAt": "2016-09-25T23:01:10.000+0000",
  "createdAt": "2016-09-25T23:01:10.000+0000"
}

Step 2. Hibernate schema changing

So now we have the tenant context we need to change the schema transparently and reliably. Remember we do not want to burden developers with the concern of interacting with the correct context at the detriment of business logic and feature simplicity and scope. This is a great example of Aspect Oriented Programming (AOP). To this end, as mentioned previously, Hibernate natively supports SCHEMA based multi tenancy and requires three main components.

The CurrentTenantIdentifierResolver is remarkably straight forward and essentially, in this case, a proxy to our TenantContext class. This would be an appropriate place to handle any transformations necessary between the tenant id and the database schema name for the tenant. In this example there is a one to one match between the tenant id and schema name so no transformation is necessary but that would most likely not be true in a real production app. Often a naming convention to clearly identify tenant schemas will be useful in a growing production application.

@Component
public class CurrentTenantIdentifierResolverImpl implements CurrentTenantIdentifierResolver {

  @Override
  public String resolveCurrentTenantIdentifier() {
    return TenantContext.getCurrentTenant();
  }

  @Override
  public boolean validateExistingCurrentSessions() {
    return true;
  }
}

The MultiTenantConnectionProvider is again remarkably simple. Here we are using Mysql as the backing store and the standard USE database; SQL statement to change schemas which is very cheap to use from a database cost/performance standpoint. Errors such as the tenant database not existing are propagated up the stack in this example.

@Component
public class MultiTenantConnectionProviderImpl implements MultiTenantConnectionProvider {  
  private static final long serialVersionUID = 6246085840652870138L;

  @Autowired
  private DataSource dataSource;

  @Override
  public Connection getAnyConnection() throws SQLException {
    return dataSource.getConnection();
  }

  @Override
  public void releaseAnyConnection(Connection connection) throws SQLException {
    connection.close();
  }

  @Override
  public Connection getConnection(String tenantIdentifier) throws SQLException {
    final Connection connection = getAnyConnection();
    try {
      connection.createStatement().execute( "USE " + tenantIdentifier );
    }
    catch ( SQLException e ) {
      throw new HibernateException(
          "Could not alter JDBC connection to specified schema [" + tenantIdentifier + "]",
          e
          );
    }
    return connection;
  }

  @Override
  public void releaseConnection(String tenantIdentifier, Connection connection) throws SQLException {
    try {
      connection.createStatement().execute( "USE " + TenantContext.DEFAULT_TENANT );
    }
    catch ( SQLException e ) {
      throw new HibernateException(
          "Could not alter JDBC connection to specified schema [" + tenantIdentifier + "]",
          e
          );
    }
    connection.close();
  }

  @SuppressWarnings("rawtypes")
  @Override
  public boolean isUnwrappableAs(Class unwrapType) {
    return false;
  }

  @Override
  public <T> T unwrap(Class<T> unwrapType) {
    return null;
  }

  @Override
  public boolean supportsAggressiveRelease() {
    return true;
  }

}

And finally the configuration class to wire Hibernate correctly.

@Configuration
public class HibernateConfig {

  @Autowired
  private JpaProperties jpaProperties;

  @Bean
  public JpaVendorAdapter jpaVendorAdapter() {
    return new HibernateJpaVendorAdapter();
  }

  @Bean
  public LocalContainerEntityManagerFactoryBean entityManagerFactory(DataSource dataSource,
      MultiTenantConnectionProvider multiTenantConnectionProviderImpl,
      CurrentTenantIdentifierResolver currentTenantIdentifierResolverImpl) {
    Map<String, Object> properties = new HashMap<>();
    properties.putAll(jpaProperties.getHibernateProperties(dataSource));
    properties.put(Environment.MULTI_TENANT, MultiTenancyStrategy.SCHEMA);
    properties.put(Environment.MULTI_TENANT_CONNECTION_PROVIDER, multiTenantConnectionProviderImpl);
    properties.put(Environment.MULTI_TENANT_IDENTIFIER_RESOLVER, currentTenantIdentifierResolverImpl);

    LocalContainerEntityManagerFactoryBean em = new LocalContainerEntityManagerFactoryBean();
    em.setDataSource(dataSource);
    em.setPackagesToScan("com.srai");
    em.setJpaVendorAdapter(jpaVendorAdapter());
    em.setJpaPropertyMap(properties);
    return em;
  }
}

Of particular note you will see the multi tenant strategy set to SCHEMA and our multiTenantConnectionProviderImpl and currentTenantIdentifierResolverImpl classes supplied to the configuration to satisfy that strategy's requirements. You will also note that we are using the default hibernate jpaProperties that SpringBoot uses. This is important to get things like the default naming strategy which converts snake case in database schemas to camel case in the Java entities transparently (see here)

And that's really all there is to it. When you look at the amount of code to power and how neatly abstracted it is away from your business logic it is hard to imagine a cleaner and simpler implementation for Hibernate & Spring to provide.

A full implementation of the code samples above can be found on github (https://github.com/singram/spring-boot-multitenant)

I hope you found this useful.

If you want to read further around the topic and differing approaches, the following articles may be of interest and were of great use in the development of the code and this article.