<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:media="http://search.yahoo.com/mrss/"><channel><title><![CDATA[Stuart Ingram]]></title><description><![CDATA[Husband, father, Brit, software architect/engineer, nerd enthusiast/herder, occasional wood worker, doing the best I can]]></description><link>http://stuartingram.com:80/</link><generator>Ghost 0.8</generator><lastBuildDate>Mon, 30 Dec 2024 12:57:15 GMT</lastBuildDate><atom:link href="http://stuartingram.com:80/rss/" rel="self" type="application/rss+xml"/><ttl>60</ttl><item><title><![CDATA[Record versioning with Mysql]]></title><description><![CDATA[<p>The topic of how to handle record versioning came up recently in a number of projects.  This is a topic known commonly as <strong>slowly changing dimensions</strong>.  There are a number of approaches depending on your requirements; a good overview can be found <a href="https://en.wikipedia.org/wiki/Slowly_changing_dimension">here</a>.</p>

<p>Considerations in design</p>

<ul>
<li>need to persist accurate</li></ul>]]></description><link>http://stuartingram.com:80/2017/02/19/record-versioning-with-mysql-2/</link><guid isPermaLink="false">b525882c-3e7c-4554-be97-5e6d165c1b5a</guid><category><![CDATA[Mysql]]></category><dc:creator><![CDATA[Stuart Ingram]]></dc:creator><pubDate>Sun, 19 Feb 2017 15:57:30 GMT</pubDate><content:encoded><![CDATA[<p>The topic of how to handle record versioning came up recently in a number of projects.  This is a topic known commonly as <strong>slowly changing dimensions</strong>.  There are a number of approaches depending on your requirements; a good overview can be found <a href="https://en.wikipedia.org/wiki/Slowly_changing_dimension">here</a>.</p>

<p>Considerations in design</p>

<ul>
<li>need to persist accurate history</li>
<li>number of changing fields within a record</li>
<li>maintainability</li>
<li>placement of business logic and number of actors on the data store</li>
</ul>

<p>This article aims to explore a number of approaches and how <a href="https://www.mysql.com/">Mysql</a>'s capabilities can be used to enforce integrity or automate the strategies where possible.  Generally I'm not a fan of stored procedures or complex triggers as this essentially divests business logic from the core code base/ service layer.  However this is a space that I wanted to explore how far Mysql triggers could assist.</p>

<p>Regardless of which strategy is the best fit for the requirements at hand, all scenarios should have appropriate integration tests to ensure expected behavior over time, especially as there are multiple ways to achieve these strategies.  Integration tests should cover basic scenarios such as</p>

<ul>
<li>Inserting new records into the system</li>
<li>Updating existing records
<ul><li>Are updates reflected correctly?</li>
<li>Is history of the old record maintained?</li></ul></li>
</ul>

<p>For discussion we'll use Mysql 5.7 and model the common scenario of a supply table with 4 simple properties;</p>

<ul>
<li>Internal primary surrogate key</li>
<li>Natural key</li>
<li>Description</li>
<li>Cost</li>
</ul>

<p>Where the assumption is that the natural key is globally unique.  e.g.  </p>

<pre><code class="language-sql">CREATE TABLE supplies (  
     id INT NOT NULL AUTO_INCREMENT,
     supply_key CHAR(10) NOT NULL,
     description CHAR(30) NOT NULL,
     cost INT DEFAULT 0,
     PRIMARY KEY (id),
     UNIQUE KEY (supply_key)
);
</code></pre>

<p>You can get started very simply if you have <a href="https://www.docker.com/">docker</a> installed with the following;</p>

<pre><code>docker run --name mysql_container --env MYSQL_ALLOW_EMPTY_PASSWORD=YES -p 3306:3306 mysql:5.7  
</code></pre>

<p>This will download and run mysql in an isolated container and the following will allow you to connect  </p>

<pre><code>docker exec -it mysql_container mysql -uroot  
</code></pre>

<p>This allows you to run and utilize Mysql in a completely isolated form without polluting your host system.  It's also extremely simple to experiment between versions by simply pulling different images.  See <a href="https://hub.docker.com/_/mysql/">here</a> for a complete list of offical Mysql docker images.  </p>

<h3 id="type1overwrite">Type 1 - Overwrite</h3>

<p>Essentially, using this strategy, there is only one record per <code>supply_key</code> and fields are updated in place with no historic values retained. <br>
Pros</p>

<ul>
<li>Simple to implement.  Use <code>INSERT IGNORE ...</code> or <code>REPLACE ....</code> statements depending on your needs.  This will overwrite any existing data with new data.</li>
</ul>

<p>Cons</p>

<ul>
<li>Historic trends cannot be extracted from the supplies table.</li>
<li>Referential integrity to the supplies table must be thought through as data can change at any point.</li>
<li>Data must be copied from the supplies table to an instance of an order record to preserve state at a given point in time </li>
</ul>

<h3 id="type2addnewrow">Type 2 - Add new row</h3>

<p>This approach involves creating a new row for a record that has changed and delineating it from existing versions.  The <a href="https://en.wikipedia.org/wiki/Slowly_changing_dimension">Wikipeadia</a> article  mentions two common approaches; an incrementing <code>version</code> column grouped on <code>supply_key</code> or a combination of <code>start</code> and <code>end</code> dates. <br>
I favor the latter approach for 2 reasons. First, it provides temporal relevance which is useful for a wide variety of reporting and auditing reasons and second it naturally provides an easy way for determining the current record.  It's a far easier query to find out which record has a <code>NULL</code> <code>end</code> date than which <code>version</code> is the largest in a group.  For example, to get a list of current supplies with dates could be as simple as;  </p>

<pre><code class="language-sql">SELECT * FROM supplies WHERE ended_at IS NULL;  
</code></pre>

<p>With incrementing version columns the same result could be achieved with the following more complexe statement;  </p>

<pre><code class="language-sql">SELECT *  
FROM supplies  
INNER JOIN  
  (SELECT supply_key, MAX(version) AS version 
   FROM supplies 
   GROUP BY supply_key) AS current
WHERE supplies.supply_key = current.supply_key  
  AND supplies.version = current.version
</code></pre>

<p>That being said, let's see how we can automate this and reduce the cognitive burden off the developer.  First let's start with a few assumptions to make this simple.</p>

<ul>
<li>Updates for a particular <code>supply_key</code> will be inserted in order</li>
<li>Only the <code>start</code> date is supplied and is assumed to be the <code>end</code> date of the preceding record version.</li>
<li>There is always a current record for a given <code>supply_key</code> with no end date.  Yes, this isn't realistic as you can never remove supplies with this constraint but we're going to roll with it for the sake of discussion.</li>
</ul>

<p>Given these requirements our new supplies table may look something like;</p>

<pre><code class="language-sql">DROP TABLE supplies;  
CREATE TABLE supplies (  
     id INT NOT NULL AUTO_INCREMENT,
     supply_key VARCHAR(10) NOT NULL,
     description VARCHAR(30) NOT NULL,
     cost INT DEFAULT 0,
     started_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
     ended_at DATETIME,
     PRIMARY KEY (id),
     UNIQUE KEY (supply_key, started_at),
     KEY (supply_key, ended_at)
);
</code></pre>

<p>Of note;</p>

<ul>
<li>The start date is mandatory, the end date is not since for a given <code>supply_key</code> there is always one record with a NULL <code>ended_at</code> date.</li>
<li>Our <code>UNIQUE</code> key constraint now covers both the natural key for the supply and the start date.</li>
</ul>

<p>Our automation should handle the following forms;</p>

<ul>
<li>A record <a href="https://en.wikipedia.org/wiki/Merge_(SQL)">UPSERT</a> with an assumed start date equal to when the record was inserted</li>
<li>A record UPSERT with an explicit start date</li>
<li>A record UPSERT with an explicit end date</li>
</ul>

<pre><code class="language-sql">INSERT INTO supplies SET supply_key='A', description='foo', cost='1';  
INSERT INTO supplies SET supply_key='A', description='foo', cost='1', started_at=NOW();  
INSERT INTO supplies SET supply_key='A', description='foo', cost='1', started_at=DATE_SUB(NOW(), INTERVAL 1 DAY), ended_at=NOW();  
</code></pre>

<p>Ideally we'd use something like the following trigger to handle this automatically.</p>

<pre><code class="language-sql">DROP TRIGGER supplies_before_insert;  
DROP TRIGGER supplies_after_insert;  
delimiter |

CREATE TRIGGER supplies_before_insert BEFORE INSERT ON supplies  
  FOR EACH ROW
  BEGIN
    SET NEW.ended_at=NULL;
  END;
|
CREATE TRIGGER supplies_after_insert AFTER INSERT ON supplies  
  FOR EACH ROW
  BEGIN
    UPDATE supplies SET ended_at=NEW.started_at WHERE supply_key=NEW.supply_key AND ended_at IS NULL AND id!=NEW.id;
  END;
|

delimiter ;  
</code></pre>

<p><strong>Unfortunately mysql does not permit updates in triggers on the same table that you inserted to.</strong>  There reasons for this are deadlocks and infinite loops.  The update in the trigger will indeed cause the trigger to trip again and so on and so forth.</p>

<p>Our likely approach here is to push this on the application using something like the following;  </p>

<pre><code class="language-langauge-sql">START TRANSACTION;  
  INSERT INTO supplies 
          SET supply_key='B', description='bar', cost=2; 

  SELECT @end_date:=MAX(started_at) 
    FROM supplies 
   WHERE supply_key='B';

  UPDATE supplies 
     SET ended_at=@end_date 
   WHERE supply_key='B' 
     AND ended_at IS NULL 
     AND id!=last_insert_id(); 
COMMIT;  
</code></pre>

<p>Pro's</p>

<ul>
<li>Orders can safely reference the supplies table and remain accurate over time.</li>
<li>Simple reporting and trending</li>
<li>Efficient use of covering index to locate current records.</li>
</ul>

<p>Cons</p>

<ul>
<li>Your supplies table may become very large with historic data depending on the rate of change.  </li>
<li>The business logic to maintain historical records must be observed by the client applications of the database meaning decentralized logic if not fronted by a service API.</li>
<li>The way this particular insert query is written coupled with the automated <code>started_at</code> date field could lead to unnecessary duplication.  Consider a field that may toggle between multiple states over time.</li>
<li>The above INSERT, SELECT, UPDATE combination isn't the most efficient but robustly handles automatic <code>started_at</code> values as well as specified ones.</li>
</ul>

<h3 id="type3addnewattribute">Type 3 - Add new attribute</h3>

<p>In this approach the system only keeps track of the original &amp; current values of selected fields and retains one record per supply.  In the following example the fields <code>description</code> and <code>cost</code> are of particular interest.</p>

<pre><code class="language-sql">DROP TABLE supplies;  
CREATE TABLE supplies (  
     id INT NOT NULL AUTO_INCREMENT,
     supply_key CHAR(10) NOT NULL,
     description CHAR(30) NOT NULL,
     cost INT DEFAULT 0,
     original_description CHAR(30) DEFAULT '',
     original_cost INT DEFAULT 0,
     PRIMARY KEY (id),
     UNIQUE KEY (supply_key)
);
</code></pre>

<p>We generally want two guarantees from a system with this approach; <br>
1. Inserts automatically fill the <code>original_*</code> fields. <br>
2. Updates preserve the <code>original_*</code> fields.</p>

<p>This can be done with triggers in Mysql with the following.</p>

<pre><code class="language-sql">DROP TRIGGER supplies_insert;  
DROP TRIGGER supplies_update;  
delimiter |

CREATE TRIGGER supplies_insert BEFORE INSERT ON supplies  
  FOR EACH ROW
  BEGIN
    SET NEW.original_description = NEW.description;
    SET NEW.original_cost = NEW.cost;
  END;
|

CREATE TRIGGER supplies_update BEFORE UPDATE ON supplies  
  FOR EACH ROW
  BEGIN
    SET NEW.original_description = OLD.original_description;
    SET NEW.original_cost = OLD.original_cost;
  END;
|

delimiter ;  
</code></pre>

<p>With these triggers in place you can safely use standard <code>INSERT</code> and <code>UPDATE</code> statements or use the following <code>UPSERT</code> form negating the need to know upfront whether your application already contains a record for a particular <code>supply_key</code>.</p>

<pre><code class="language-sql">INSERT INTO supplies SET  
    supply_key='B', description='bar', cost=2 
  ON DUPLICATE KEY UPDATE 
    cost=VALUES(cost), description=VALUES(description);
</code></pre>

<p>Pro's</p>

<ul>
<li>Simple to implement in either Mysql or code space</li>
<li>Guarantees at the database level can be made to preserve the <code>original_*</code> fields regardless of the client interacting with the database.</li>
</ul>

<p>Cons</p>

<ul>
<li>Limited use in terms of accurate record keeping.  A de-normalized copy of the supplies data at the point of use must be made for accurate record keeping.</li>
<li>Trend reporting is impossible.  Only start and current costs are tracked.  Trending on de-normalized orders is not recommended since there is no guarantee that an order was made when a supply was a particular cost.</li>
</ul>

<h3 id="type4addhistorytable">Type 4 - Add history table</h3>

<p>Aside from strategy <a href="http://stuartingram.com:80/2017/02/19/record-versioning-with-mysql-2/#type2addnewrow">Type 2</a>, which retains history in the same table, the other common approach to this is to seperate historic records from current records in seperate tables;</p>

<p>The following creates 2 tables, a <code>supplies</code> table and a <code>supplies_archive</code> table based on the structure of the current supplies table.  The current <code>supplies</code> table still needs to know when the current record became relevant and so we need the <code>started_at</code> date.  In the <code>supplies_archive</code> we also need an <code>ended_at</code> date.</p>

<pre><code class="language-sql">DROP TABLE IF EXISTS supplies;  
DROP TABLE IF EXISTS supplies_archive;  
CREATE TABLE supplies (  
     id INT NOT NULL AUTO_INCREMENT,
     supply_key CHAR(10) NOT NULL,
     description CHAR(30) NOT NULL,
     cost INT DEFAULT 0,
     started_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP
                                ON UPDATE CURRENT_TIMESTAMP,
     PRIMARY KEY (id),
     UNIQUE KEY (supply_key, started_at)
);
CREATE TABLE supplies_archive LIKE supplies;  
ALTER TABLE supplies_archive  
  ADD COLUMN ended_at DATETIME NOT NULL AFTER started_at;
</code></pre>

<p>Here we can set up a trigger to automatically create a new record in the <code>supplies_archive</code> table.</p>

<pre><code class="language-sql">DROP TRIGGER supplies_after_update;  
delimiter |

CREATE TRIGGER supplies_after_update AFTER UPDATE ON supplies  
  FOR EACH ROW
  BEGIN
    IF NEW.cost != OLD.cost 
       OR NEW.description != OLD.description THEN
      INSERT INTO supplies_archive 
      SELECT NULL, NEW.supply_key, OLD.description, OLD.cost, OLD.started_at, NEW.started_at ;
    END IF;
  END;
|

delimiter ;  
</code></pre>

<p>Note that this does not protect against <code>UPDATES</code> that explicitly set the <code>started_at</code> data and nothing else, which breaks our desired behavior. </p>

<p>As of MySQL 5.5, you can use the SIGNAL syntax to throw an exception to assist in refining TRIGGER behavior:</p>

<pre><code class="language-sql">SIGNAL sqlstate '45000' SET message_text = 'My Error Message';  
</code></pre>

<p>State 45000 is a generic state representing "unhandled user-defined exception".</p>

<p>In any approach to a complex problem with many entry points there are workarounds.  The above solution is far from robust but is limited by the capabilities of Mysql triggers. The goal here is to provide as much consistency as possible at the database level which remains the lowest common denominator between modes of interaction, whether they are multiple clients or developers with direct access to the database.  Of course this statement is highly dependent on design and deployment environment.  For instance if you have a storage API in front of the database and ban any other method of interaction your design evaluation changes significantly.</p>

<p>If you have a dedicated storage API I would recommend taking the archive logic and encoding it simply in code space, forgoing all of the limitations of triggers and the bifurcation of business logic across application and database code spaces.  By encoding business logic at the application level the code is also significantly more portable assuming the use of an <a href="https://en.wikipedia.org/wiki/Object-relational_mapping">ORM</a>.</p>

<p>If you have multiple clients with direct access to the database, triggers are a useful tool to protect data contract expectations but do have limitations that are sometimes hard to work around.  Dependency on triggers and stores procedures also reduces portability and behavior transparency.</p>

<p>This article is not meant to be categorical by any stretch.  As will all things relating to software design your mileage will vary on your particular needs and requirements.</p>]]></content:encoded></item><item><title><![CDATA[Joy and pain with @Scheduled and @RefreshScope in SpringBoot]]></title><description><![CDATA[<p>| <strong>TLDR;</strong> <code>@Scheduled</code> and <code>@RefreshScope</code> are powerful tools but do not work out of the box together causing dangerous inconsistencies.  Find out how to get them to play nicely and more advanced scheduling opporunities.</p>

<p>So I'm a fan of the <a href="https://projects.spring.io/spring-boot/">SpringBoot</a> framework and it's adoption of the <a href="http://rubyonrails.org/">Rails</a> "convention over configuration"</p>]]></description><link>http://stuartingram.com:80/2016/11/07/joy-and-pain-with-schedule-and-refreshscope-in-springboot-2/</link><guid isPermaLink="false">433d3588-ff0a-4a25-bec1-9f09ffa645fe</guid><category><![CDATA[springboot]]></category><category><![CDATA[scheduling]]></category><dc:creator><![CDATA[Stuart Ingram]]></dc:creator><pubDate>Mon, 07 Nov 2016 21:39:39 GMT</pubDate><content:encoded><![CDATA[<p>| <strong>TLDR;</strong> <code>@Scheduled</code> and <code>@RefreshScope</code> are powerful tools but do not work out of the box together causing dangerous inconsistencies.  Find out how to get them to play nicely and more advanced scheduling opporunities.</p>

<p>So I'm a fan of the <a href="https://projects.spring.io/spring-boot/">SpringBoot</a> framework and it's adoption of the <a href="http://rubyonrails.org/">Rails</a> "convention over configuration" as well as it's powerful and simple annotations.  All of which make a developers life more productive and allow the team to focus on delivering functional value to end users.</p>

<p>One of those annotations is <code>@Scheduled</code> which can be applied to any bean method.  Only two pieces of code are required;  </p>

<p>1. the enabling of scheduling in the application, typically achieved with something like the following; (note the use of <code>@EnableScheduling</code>  </p>

<pre><code class="language-java">@SpringBootApplication
@EnableScheduling
public class ExampleApplication {  
  public static void main(String[] args) {
    SpringApplication.run(ExampleApplication.class, args);
  }
}
</code></pre>

<p>2. And the actual method you want repeated.  For instance;  </p>

<pre><code class="language-java">@Slf4j
@Configuration
public class Followers {

  @Value("${follower.count:10}")
  private int followers;

  @Scheduled(fixedRate = 4000)
  public void outputFollowers() {
    log.info("===========&gt; Followers " + followers);
  }
}
</code></pre>

<p>In this example a count of followers is autowired from the <a href="http://docs.spring.io/spring-boot/docs/current/reference/html/boot-features-external-config.html">spring property hierarchy</a> and outputted to the log every 4 seconds as per the <code>@Scheduled</code> declaration.  If you're wondering where the <code>log</code> object is defined, this is part of the magic of <a href="https://projectlombok.org/">Project Lombok</a> which abstracts a lot of standard Java boiler-plating and is provided by the <code>@Slf4j</code> annotation on the class.</p>

<p><code>@Scheduled</code> is a very powerful and simple annotation.  Common forms of it include;</p>

<ul>
<li><strong>Fixed rate</strong> - Repeat every x milliseconds. <code>@Scheduled(fixedRate=1000)</code></li>
<li><strong>Fixed delay</strong> - Repeat every x milliseconds between previous completion and next start. <code>@Scheduled(fixedDelay=1000)</code></li>
<li><strong>Crontab</strong> - Defines a cadence using the same expressions as *nix crontab definitions.  <code>@Scheduled(cron="0 0 * * * *")</code></li>
</ul>

<p>A great introduction to this topic can be found <a href="http://howtodoinjava.com/spring/spring-core/4-ways-to-schedule-tasks-in-spring-3-scheduled-example/">here</a>.</p>

<p>So at this part in the article we change track over to automatic property updating.  As previously mentioned, SpringBoot has a extensive <a href="http://docs.spring.io/spring-boot/docs/current/reference/html/boot-features-external-config.html">property hierarchy</a> but no way natively to refresh properties should they change.  Enter <a href="https://cloud.spring.io/spring-cloud-config/">Spring Cloud Config</a>.  The main focus of this project is to establish a centralized repository of configuration for a distributed set of applications and have those applications update automatically if properties change.  A great introduction to externalized properties and centralized property management with Spring Cloud Config can be found <a href="https://spring.io/blog/2015/01/13/configuring-it-all-out-or-12-factor-app-style-configuration-with-spring">here</a>.</p>

<p>Spring Cloud Config automatically provides a JMX interface and a HTTP interface (<code>\refresh</code>) to refresh all properties in the application in classes marked with the <code>@RefreshScope</code> annotation.  Meaning if the external property source changes, all you have to do is hit <code>\refresh</code> on your application and the configuration changes are automatically pulled in.</p>

<p>And for the most part this works pretty nice and seamlessly as you would expect... except it doesn't work with the original <code>@Scheduled</code> code example at the start of the article.  In fact everything but <code>@Scheduled</code> annotated code will refresh causing system inconsistencies within an application.  The problem here is that the scheduled method (<code>outputFollowers()</code>) has a dependency on a property injected by the Spring framework and even when refreshed the property change is not propagated down into the scheduled code.  A discussion on this can be found in <a href="https://github.com/spring-cloud/spring-cloud-commons/issues/97">common Spring Cloud issues</a> .</p>

<p>The solution, rather than relying on the magic of the <code>@Scheduled</code> annotation, is to specify the scheduling configuration manually.  For example;</p>

<pre><code class="language-java">@SpringBootApplication
public class ExampleApplication {  
  public static void main(String[] args) {
    SpringApplication.run(ExampleApplication.class, args);
  }
}
</code></pre>

<pre><code class="language-java">@Slf4j
@RefreshScope
@Configuration
public class Followers {

  @Value("${follower.count:10}")
  private int followers;

  public void outputFollowers() {
    log.info("===========&gt; Followers " + followers);
  }
}
</code></pre>

<pre><code class="language-java">@Configuration
@EnableScheduling
public class SchedulingConfiguration implements SchedulingConfigurer {

  @Autowired
  Followers followers;

  @Override
  public void configureTasks(ScheduledTaskRegistrar taskRegistrar) {
    taskRegistrar.addTriggerTask(
        () -&gt; followers.outputFollowers(),
        triggerContext -&gt; {
          Instant nextTriggerTime = Instant.now().plus(4, SECONDS);
          return Date.from(nextTriggerTime);
        });
  }
}
</code></pre>

<p>Tackling the problem this way now yields an application that refreshes properties throughout on demand and consistently.</p>

<p>While initially this is somewhat of a pain, it is a solution first and foremost but it also enables more complex scheduling.</p>

<p>For instance you could build a trigger context that dynamically calculates the next time to run, potentially throttling an action per time period. See <a href="http://stackoverflow.com/questions/14630539/scheduling-a-job-with-spring-programmatically-with-fixedrate-set-dynamically">here</a> for an example.</p>

<p>Scheduled code by default is limited to a single thread meaning there is an assumption that the previous call should be finished before the next execution.  When this assumption is incorrect a execution thread pool is necessary which again can be manually configured such as;</p>

<pre><code class="language-java"> @Configuration
 @EnableScheduling
 public class AppConfig implements SchedulingConfigurer {

     @Override
     public void configureTasks(ScheduledTaskRegistrar taskRegistrar) {
         taskRegistrar.setScheduler(taskExecutor());
     }

     @Bean(destroyMethod="shutdown")
     public Executor taskExecutor() {
         return Executors.newScheduledThreadPool(100);
     }
 }
</code></pre>

<p>Further reading can be found on the <a href="http://docs.spring.io/spring/docs/current/javadoc-api/org/springframework/scheduling/annotation/EnableScheduling.html">@EnableScheduling</a> java docs.</p>

<p>You can of course do far more than just the simple examples above but this should be enough to firstly resolve any problems between the SpringBoot <code>@Scheduled</code> annotation and the live configuration updates you can attain by incorporating the <code>@RefreshScope</code> annotation from the Spring Cloud Config project.</p>

<p>Enjoy.</p>]]></content:encoded></item><item><title><![CDATA[Testing rake tasks effeciently in JRuby]]></title><description><![CDATA[<p>After writing <a href="https://github.com/singram/cucumber_characteristics">cucumber_characteristics</a> to profile where a large <a href="http://jruby.org/">JRuby</a> cucumber integration suite was taking it's time it soon became apparent that 30% of the time was wrapped up in testing rake tasks.</p>

<p>This is particularly challenging in JRuby as the usual approach to testing rake tasks is to execute</p>]]></description><link>http://stuartingram.com:80/2016/11/05/testing-rake-tasks-effeciently-in-jruby/</link><guid isPermaLink="false">97f16e33-41c2-436a-b3d4-27b4d175041b</guid><category><![CDATA[integration testing]]></category><category><![CDATA[jruby]]></category><category><![CDATA[rake]]></category><dc:creator><![CDATA[Stuart Ingram]]></dc:creator><pubDate>Sat, 05 Nov 2016 18:22:06 GMT</pubDate><content:encoded><![CDATA[<p>After writing <a href="https://github.com/singram/cucumber_characteristics">cucumber_characteristics</a> to profile where a large <a href="http://jruby.org/">JRuby</a> cucumber integration suite was taking it's time it soon became apparent that 30% of the time was wrapped up in testing rake tasks.</p>

<p>This is particularly challenging in JRuby as the usual approach to testing rake tasks is to execute them in a new shell process, capturing the system and testing any mutative effects.  Something like;  </p>

<pre><code class="language-ruby">When /^I run the rake task$/ do |count|  
  @output = `rake some_task`
end  
</code></pre>

<p>This approach is problematic specifically when using JRuby.  The above invocation needs to start up a completely new JVM per test taking several seconds each time.  In the test suite I was working with, this alone accounted for <strong>45 minutes</strong> on an average developer machine.</p>

<p>Clearly a significant problem event with Continuous Integration.</p>

<p>To tackle this I wrote <a href="https://github.com/singram/cucumber_rake_runner">cucumber_rake_runner</a> which executes a rake task in the same JVM process as the integration test, negating the cost of spinning up a new JVM process per rake test.  The original invocation simply becomes.</p>

<pre><code class="language-ruby">When /^I run the rake task$/ do |count|  
  @output = run_rake_task('some_task')
end  
</code></pre>

<p>The @output captures <code>STDIN</code>, <code>STDOUT</code> and timing information.</p>

<p>This was immensely useful to the project and team I was working with, reducing the time to run the full integration suite by over 30%. Hopefully this will be useful to you as well.</p>]]></content:encoded></item><item><title><![CDATA[Profiling ruby cucumber integration tests]]></title><description><![CDATA[<p>I've been working on a large JRuby project for a number of years now with over a thousand integration tests.  With so many tests it's important to know where time is being spend to see if there are optimizations that can be made to improve the overall performance of the</p>]]></description><link>http://stuartingram.com:80/2016/10/27/profiling-ruby-cucumber-integration-tests/</link><guid isPermaLink="false">14f4c113-377c-48b3-8759-bfbf82bfa7d8</guid><category><![CDATA[Ruby]]></category><category><![CDATA[integration testing]]></category><category><![CDATA[cucumber]]></category><dc:creator><![CDATA[Stuart Ingram]]></dc:creator><pubDate>Thu, 27 Oct 2016 18:26:38 GMT</pubDate><content:encoded><![CDATA[<p>I've been working on a large JRuby project for a number of years now with over a thousand integration tests.  With so many tests it's important to know where time is being spend to see if there are optimizations that can be made to improve the overall performance of the test suite.  Locating highly utilized steps and dead code is also a normal part of code curation.</p>

<p>With this in mind, I wrote the <a href="https://github.com/singram/cucumber_characteristics">cucumber_characteristics</a> gem.  The gem requires very little configuration and should work transparently with your existing setup.  Essentially it is a formatter and should drop into your existing setup transparently generating a html and/or json reports. Installation and usage instructions can be found on github <a href="https://github.com/singram/cucumber_characteristics">here</a></p>

<p>For each cucumber step definition executed the following is reported;</p>

<ul>
<li>Location of definition &amp; regex</li>
<li>Step usage location and number of times executed (background/outline etc)</li>
<li>Counts for success/failure/pending/etc</li>
<li>Total time taken in test run along with average, fastest, slowest times per step</li>
</ul>

<p>For each feature test, the following is reported;</p>

<ul>
<li>Location and time taken to run feature</li>
<li>Result and number of steps run</li>
<li>Breakdown of feature by individual example run if a Scenario Outline.</li>
</ul>

<p>There is also added support to list out all unused steps in a cucumber test run to aid step curation.   Be aware if you are only running a specific test set, for example via a TAG as you will get a larger number of unused steps than are not ‘true’ unused steps.</p>

<p>The gem supports ruby 1.9+ and cucumber 1.x &amp; 2.x</p>

<p>Hope this is useful.  Please get in touch if there are further enhancements that would be useful or better yet submit a pull request.</p>]]></content:encoded></item><item><title><![CDATA[Mysql conditional INSERTS]]></title><description><![CDATA[<p>Every now again it's useful to have slightly more control over a MySQL insert than simply making it idempotent via the <code>IGNORE</code> keyword.  For example;  </p>

<pre><code class="language-sql">INSERT IGNORE INTO foo (id, column_bar) values (1, 'aaa'),  (2, 'bbb');  
</code></pre>

<p>The <code>IGNORE</code> keyword will simply skip over any primary or unique key constraint</p>]]></description><link>http://stuartingram.com:80/2016/10/06/mysql-conditional-inserts/</link><guid isPermaLink="false">ea881e54-4c37-49e8-a880-44664e548042</guid><category><![CDATA[Mysql]]></category><dc:creator><![CDATA[Stuart Ingram]]></dc:creator><pubDate>Thu, 06 Oct 2016 02:09:10 GMT</pubDate><content:encoded><![CDATA[<p>Every now again it's useful to have slightly more control over a MySQL insert than simply making it idempotent via the <code>IGNORE</code> keyword.  For example;  </p>

<pre><code class="language-sql">INSERT IGNORE INTO foo (id, column_bar) values (1, 'aaa'),  (2, 'bbb');  
</code></pre>

<p>The <code>IGNORE</code> keyword will simply skip over any primary or unique key constraint violations, essentially making the above statement idempotent assuming a primary key on <code>id</code>.</p>

<p>However let us suppose we have a data set without a primary key or more precisely the data we want to insert has more complex conditional requirements.   Unfortunately MySQL's <code>INSERT</code> statement does not directly allow for greater selectivity but the <code>SELECT</code> statement does allowing us to take advantage of the <code>INSERT...SELECT...</code> form.</p>

<pre><code class="language-sql">CREATE TEMPORARY TABLE tmp_users LIKE users;  
INSERT INTO tmp_users VALUES ( ....... *default user list*)

INSERT INTO users SELECT * FROM tmp_users WHERE .... &lt;conditional logic here&gt;

-- Optional as temporary tables only exist for duration of session.
DROP TABLE tmp_users;  
</code></pre>

<p>Admittedly this is a contrived example where there would likely be a <code>UNIQUE</code> key on <code>username</code> which could be taken advantage of in the <code>INSERT IGNORE...</code> statement format.  However this is simply to illustrate how more complex logic can be wrapped around an <code>INSERT</code> statement when needed without any supporting code or stored procedures.</p>

<p>See <a href="https://dev.mysql.com/doc/refman/5.7/en/insert.html">here</a> for full documentation on Mysql <code>INSERT</code> statement.</p>]]></content:encoded></item><item><title><![CDATA[Simple SpringBoot profiles]]></title><description><![CDATA[<blockquote>
  <p><strong>TLDR</strong> SpringBoot profiles, what they are and how to use them simply with flyway example.</p>
</blockquote>

<p>Many frameworks have the concept of scoping application settings together around the concept of environments, examples being; dev, test, stage &amp; production.  Largely what you scope around is irrelevant but these examples are the most</p>]]></description><link>http://stuartingram.com:80/2016/10/04/simple-springboot-profiles/</link><guid isPermaLink="false">b19fffe6-630a-40df-85e8-33fcc86c9250</guid><category><![CDATA[springboot]]></category><category><![CDATA[gradle]]></category><category><![CDATA[flyway]]></category><dc:creator><![CDATA[Stuart Ingram]]></dc:creator><pubDate>Tue, 04 Oct 2016 17:31:39 GMT</pubDate><content:encoded><![CDATA[<blockquote>
  <p><strong>TLDR</strong> SpringBoot profiles, what they are and how to use them simply with flyway example.</p>
</blockquote>

<p>Many frameworks have the concept of scoping application settings together around the concept of environments, examples being; dev, test, stage &amp; production.  Largely what you scope around is irrelevant but these examples are the most common to scope around.</p>

<p>So what do I mean and how is this useful?  Well for local development you probably want the local database credentials in your application properties, you may also have threads turned down or certain services disabled.  Whatever is most appropriate to facilitate and accelerate local development for you and your team.  Clearly the application settings you run against in production will be different from both performance, debugging and security perspectives and should thus be managed separately.</p>

<p>The <a href="https://12factor.net">12 Factor Application</a> manifesto also has a great read on configuration management from a different perspective which is well worth your time, <a href="https://12factor.net/config">here</a></p>

<p>SpringBoot supports this concept in the form of <em><a href="http://docs.spring.io/spring-boot/docs/current/reference/html/boot-features-profiles.html">profiles</a></em> which are also analogous to the Ruby on Rails runtime environments (see <a href="http://guides.rubyonrails.org/configuring.html#creating-rails-environments">here</a>).</p>

<p><strong>How do I use profiles?</strong></p>

<p>Very simply a SpringBoot's default application properties are specified by <code>src/main/resources/application.properties</code>.  Profile specific properties can be specified in the same file or by a separate file of the following format <code>src/main/sources/application-&lt;profilename&gt;.properties</code> such as <code>src/main/sources/application-dev.properties</code>.  Profile properties override the default properties in much the same way CSS does.  Profile properties do not need to specify all properties, only the ones which you wish to change from the default <code>application.properties</code> set.</p>

<p>Assuming you have <a href="http://docs.spring.io/spring-boot/docs/current/reference/htmlsingle/#production-ready">Actuator</a> in your class path (if not, why not?!?), any <code>info.*</code> properties are exposed through the <code>\info</code> endpoint.  This makes it especially useful for exposing build &amp; release information as well as the profile under which the application is running.</p>

<p>For instance in <code>application.properties</code> you may have  </p>

<pre><code class="language-properties">info.profile=default  
spring.jackson.serialization.write-dates-as-timestamps=false  
management.context-path=/actuator  
</code></pre>

<p>and in <code>application-dev.properties</code> you may have  </p>

<pre><code class="language-properties">info.profile=dev  
</code></pre>

<p>Meaning that when running your application with the <code>dev</code> profile enabled that the <code>\actuator\info</code> endpoint will yield something like  </p>

<pre><code class="language-json">{
  "profile": "dev"
}
</code></pre>

<p><strong>So how do you pass in the desired profile to your SpringBoot application?</strong>  Very simply;</p>

<pre><code>$ SPRING_PROFILES_ACTIVE=dev gradle bootRun
</code></pre>

<p>Just like any SpringBoot property there's a hierarchy down which it searches (see <a href="http://docs.spring.io/spring-boot/docs/current/reference/html/boot-features-external-config.html">here</a>), so this doesn't have to be a environment variable (useful for a Containerization strategy) but could also be a property in your default <code>application.properties</code> file or specified some other way.</p>

<p><strong>Flyway example</strong></p>

<p>Recently a problem arose at work where default development data needed to be automatically loaded into all local development environments but only local development environments.  Using <a href="https://flywaydb.org/">Flyway</a> to manage database migrations and data assets this became trivial to implement with the help of profiles.</p>

<p>Schema migrations were located in the default <code>resources/db/migrations</code> location and development environment specific migrations/data assests located in <code>resources/db/dev</code></p>

<p>With the database files in place all that was needed was a <code>application-dev.properties</code> file with  </p>

<pre><code class="language-properties">info.environment=dev  
flyway.locations=classpath:db/migration,classpath:db/dev  
</code></pre>

<p>This did two things;</p>

<ul>
<li>publish the runtime profile to the <code>\info</code> endpoint provided by actuator</li>
<li>overrode the default locations flyway examines for migrations and callback files to include both the standard schema migrations as well as any <code>dev</code> environment specific files.</li>
</ul>

<p>One item of note is that in this particular case, the development need was for default data.  With this in mind while the data file(s) could be versioned using the standard versioned Flyway <a href="https://flywaydb.org/documentation/migration/sql">naming schema</a> this requires some consideration to make sure that the versions of the dev data assets and schema migrations do not clash.  Flyway also supports callbacks which are the perfect solution to this problem (see <a href="https://flywaydb.org/documentation/callbacks">here</a>).  In particular the <code>afterMigrate</code> hook.  Be aware to make your migration idempotent as it will run every time on startup, regardless of the number of migrations executed.</p>

<p>Simple when you know how, but sometimes the documentation isn't that transparent.  Hope this helps.  A full working example can be found on github <a href="https://github.com/singram/spring-boot-profiles">here</a>.</p>]]></content:encoded></item><item><title><![CDATA[Spring-boot Schema based multi tenancy]]></title><description><![CDATA[<blockquote>
  <p><strong>TLDR;</strong> This article will explain multi tenancy, focusing in on the SCHEMA strategy and how to implement it in two simple steps using Spring Boot and Hibernate.</p>
</blockquote>

<p>Multi-tenancy is the sharing of process and infrastructure across multiple customers or tenants efficiently.  The alternative to this is having a siloed application</p>]]></description><link>http://stuartingram.com:80/2016/10/02/spring-boot-schema-based-multi-tenancy/</link><guid isPermaLink="false">b268eb40-ec7f-4410-8fad-56b5bdf77076</guid><category><![CDATA[springboot]]></category><category><![CDATA[hibernate]]></category><dc:creator><![CDATA[Stuart Ingram]]></dc:creator><pubDate>Sun, 02 Oct 2016 13:06:26 GMT</pubDate><content:encoded><![CDATA[<blockquote>
  <p><strong>TLDR;</strong> This article will explain multi tenancy, focusing in on the SCHEMA strategy and how to implement it in two simple steps using Spring Boot and Hibernate.</p>
</blockquote>

<p>Multi-tenancy is the sharing of process and infrastructure across multiple customers or tenants efficiently.  The alternative to this is having a siloed application environment per customer.  This brings it's own problems such as;</p>

<ul>
<li>Linearly scaling infrastructure costs (assuming equal customers).</li>
<li>Inefficient use of infrastructure.</li>
<li>Divergent infrastructure &amp; configuration without strict infrastructure automation and change management.</li>
<li>High overheads to keep multiple environments up to date and in sync.</li>
<li>Opens the door to solution forks under business pressure which incurs huge technical debt and operational overhead as teams now must support multiple solution versions.</li>
</ul>

<p>Ofttimes multi tenancy offers the easiest way to scale customer growth while minimizing infrastructure and operational costs.</p>

<p>There are a few principle approaches to multi tenancy;  </p>

<h4 id="discriminatorstrategy">Discriminator Strategy</h4>

<p>The discriminator pattern works on a single database service and single schema for all tenants. Constituent tenants are discriminated by a specific strategy such as a <code>tenant_id</code> field embedded in tables containing tenant specific data.  Beyond the below pro's/con's this strategy is a non-starter for use case which legally require 'air-space' between tenants. <br>
<strong>Pros</strong></p>

<ul>
<li>Single database and schema instance to manage</li>
<li>Single schema to backup</li>
<li>Single schema to archive, upgrade etc.</li>
<li>Simple reporting across tenants (e.g. <code>SELECT .... GROUP BY tenant_id</code>)</li>
<li>Single database service account to manage per application.</li>
<li>Single database instance to tune and maintain.</li>
</ul>

<p><strong>Cons</strong></p>

<ul>
<li>Tenant data is interwoven meaning backup &amp; restore is an all or nothing proposition.</li>
<li>Care needs to be taken with every database interaction that the data returned is appropriately scoped.</li>
<li>If your database goes down, all your customers go down, therefor necessitating a high availability strategy which is generally a good idea but essential in this strategy.</li>
<li>If a table becomes corrupted it becomes corrupted for all users.</li>
<li>If a tenant leaves, it can be tricky to extract and archive the information
<ul><li>If that tenant comes back it can be trickier to reinsert the data and easier to integrate from scratch.</li>
<li>While storage is cheap, performance is not and an inactive tenant in a single schema will take up database buffer pool resources simply by it's existence through indices alone.</li></ul></li>
<li>Because it is likely that a single service account will be used to access the schema and all tenants reside in the schema it can be challenging to trace database load to specific tenant usage.</li>
<li>As a single database service is serving all tenants, performance is subject to "noisy neighbors".</li>
</ul>

<p>Scaling can be problematic depending on the underlying storage technology chosen due to the monolithic nature of the schema.  If a traditional <a href="https://en.wikipedia.org/wiki/Relational_database_management_system">RDBMS</a> is chosen replicas can be employed for read scaling and a sharding strategy employed for write scaling.  If using a RDBMS this particular strategy lends itself well to use cases where historic data can be archived leaving just hot data in the primary database system.  These considerations change if using a NoSQL technology such as <a href="https://aws.amazon.com/rds/aurora/">AWS Aurora</a> or <a href="https://www.mongodb.com/">MongoDB</a> where r/w scaling is handled transparently as part of the storage service layer and not a concern of the application itself.  In addition to this schema upgrades can be challenging based on the volume of potential data and all customers being affected simultaneously.  Even with a backing technology supporting 'online schema updates' the application may have to consider supporting multiple data schema versions until the schema update is complete.</p>

<h4 id="schemastrategy">Schema Strategy</h4>

<p>The schema strategy employs a single database server like the <code>DISCRIMINATOR</code> strategy but specifies a schema instance per tenant meaning that each tenant has complete isolation at the data layer from other tenants. <br>
<strong>Pros</strong></p>

<ul>
<li>Tenant data is robustly isolated from other tenant data
<ul><li>This in turn means for simpler more robust application development.  However the application must be tenant aware and capable of switching tenants reliably.</li>
<li>Schema &amp; table corruption affects only a single tenant </li>
<li>Ad-hoc queries are automatically scoped to a single tenant.</li></ul></li>
<li>Granular backups can be taken and restored with ease &amp; in parallel.</li>
<li>Tenants can be migrated to and from different environments easily.</li>
<li>Instrumentation is available on a per schema basis allowing the attribution of load and bottlenecks to specific tenant generated load.</li>
<li>Single database service account to manage per application.</li>
<li>Single database instance to tune and maintain.</li>
</ul>

<p><strong>Cons</strong></p>

<ul>
<li>As a single database service is serving all tenants, performance is subject to noisy neighbors similar to the <code>DISCRIMINATOR</code> strategy.  However it is trivial to move problem customers onto dedicated databases should the need arise.</li>
<li>If your database goes down, all your customers go down, again necessitating a good failover strategy.</li>
<li>Tooling needs to be built to handle schema updates, backups and restores of the tenant schemas with an environment.</li>
<li>Reporting across tenants requires additional tooling.</li>
<li>De-normalization of common reference tables may be necessary or a 'common/admin' schema employed and shared by all tenants.  This in itself can assist in some of the maintenance tooling mentioned.</li>
</ul>

<h4 id="databasestrategy">Database Strategy</h4>

<p>The database strategy takes the <code>SCHEMA</code> strategy one step further whereby each tenant has a separate schema instance on a separate database.</p>

<p><strong>Pros</strong></p>

<ul>
<li>Tenant data is robustly isolated from other tenant data
<ul><li>This in turn means for simpler more robust application development.  </li>
<li>Schema &amp; table corruption affects only a single tenant</li></ul></li>
<li>Granular backups can be taken and restored with ease &amp; in parallel.</li>
<li>Tenants can be migrated to and from environments easily.</li>
<li>Instrumentation is available on a per schema basis allowing the attribution of load and bottlenecks to specific tenant generated load.</li>
<li>"Noisy neighbor" problems are eliminated at the database layer.</li>
</ul>

<p><strong>Cons</strong></p>

<ul>
<li>Multiple databases instances to tune and maintain.</li>
<li>Additional infrastructure cost of the multiple database instances.</li>
<li>A connection pool per tenant per application is now required (assuming the application layer is multi tenant) which may require additional tuning when considering the number of application instances you need to scale to and the overhead each connection incurs on your storage service.</li>
<li>Multiple database service accounts to manage per application.
<ul><li>This assumes that an application will switch between tenants and therefor need connection credentials to all databases making this strategy equal from a security standpoint to a single service account.</li></ul></li>
<li>If a database goes down, only a single tenant is affected.</li>
<li>Tooling needs to be built to handle schema updates, backups and restores of the entire environment.</li>
<li>Reporting across tenants requires additional tooling.
<ul><li>This may be complicated by the multiple service accounts to connect with each database.</li></ul></li>
</ul>

<h4 id="concludingstrategythoughts">Concluding strategy thoughts</h4>

<p>The pro/cons for each strategy are entirely subjective to the use-case under consideration.  From a general standpoint I personally favor the <code>SCHEMA</code> approach having seen it work successfully in production many times.  I also believe it strikes the right balance between pragmatic pros &amp; cons as well as offering architectural escape routes should performance and scaling problems arise.</p>

<p>Further reading can be found <a href="https://docs.jboss.org/hibernate/orm/5.2/userguide/html_single/Hibernate_User_Guide.html#multitenacy">here</a> on the <a href="http://hibernate.org/">Hibernate</a> website which is the default ORM for <a href="https://projects.spring.io/spring-boot/">SpringBoot</a> applications</p>

<h3 id="implementingtheschemastrategy">Implementing the SCHEMA strategy</h3>

<p>So now we've taken a quick high-level tour of the main multi tenant strategies lets run through what it takes to add one to a typical Spring Boot application.  Here we'll be employing the <code>SCHEMA</code> strategy.  It's actually surprising how trivial and flexible it is.</p>

<p>As a quick side note, while <code>SCHEMA</code> &amp; <code>DATABASE</code> strategies are supported as of Hibernate 4.1, support for the <code>DISCRIMINATOR</code> pattern was introduced in 5.x (see <a href="https://docs.jboss.org/hibernate/orm/4.2/devguide/en-US/html/ch16.html#d5e4780">here</a> for more details)</p>

<h4 id="step1tenantawareness">Step 1. Tenant awareness</h4>

<p>So first thing's first.  For an application to be multi tenant it must have a way to detect and store the correct tenant for the transaction it is serving.</p>

<p>For the purposes of this entry we will assume a simple tenant naming schema where the name of the <code>tenant id</code> matches the name of the tenant schema in the database.  We will also assume we are starting from a simple SpringBoot MVC CRUD application with <a href="https://en.wikipedia.org/wiki/Representational_state_transfer">RESTful</a> API.  A basic example can be found on the SpringBoot guide page <a href="http://spring.io/guides/gs/rest-service/">here</a> or you can look at the full working example documented here on <a href="https://github.com/singram/spring-boot-multitenant/">github</a>. <br>
The following will serve as our tenant storage interface, storing the tenant as data against the current thread (see <a href="http://stackoverflow.com/questions/817856/when-and-how-should-i-use-a-threadlocal-variable">here</a> for more information on <code>ThreadLocal</code> usage).</p>

<pre><code class="language-java">public class TenantContext {

  final public static String DEFAULT_TENANT = "test";

  private static ThreadLocal&lt;String&gt; currentTenant = new ThreadLocal&lt;String&gt;()
  {
    @Override
    protected String initialValue() {
      return DEFAULT_TENANT;
    }
  };

  public static void setCurrentTenant(String tenant) {
    currentTenant.set(tenant);
  }

  public static String getCurrentTenant() {
    return currentTenant.get();
  }

  public static void clear() {
    currentTenant.remove();
  }
}
</code></pre>

<p>One thing to note here is the <code>DEFAULT_TENANT</code>.  This is necessary from a Spring framework point of view to initialize the connection pool to the database and Hibernate (see later) will complain on initial startup of the application is this is null and a multi-tenant strategy is in place.  This can be implemented much cleaner in Java 8+ than in the code sample above.  The <code>DEFAULT_TENANT</code> could be a real tenant but if that makes you uneasy you could use a demo/empty tenant or your architecture may have the concept of a shared 'master' database for centralized tenant and shared dictionary management.</p>

<p><strong>But how does this get set?</strong>  Our tenant could passed in the header, subdomain (e.g. <a href="http://tenantid.myapp.com/">http://tenantid.myapp.com/</a>....), URI (e.g. <a href="http://myapp.com/tenant_id/">http://myapp.com/tenant_id/</a>....) cookie or ideally as part of the authentication strategy such as a property in a <a href="https://jwt.io/">JWT</a>.</p>

<p>For this example we will use a simple http header property (<code>X-TenantID</code>).  You should absolutely <strong>not</strong> use this strategy in any production application under any circumstances, this approach is purely to simplify the concepts.</p>

<p>Regardless the vehicle for the tenant data, it is desirable to have the multi tenant mechanics isolated away from, and as invisible to, the main application as much as possible.  For instance, no tenant specific business logic should ever be visible in the controllers.  To this end, the HTTP <a href="http://docs.spring.io/spring/docs/current/javadoc-api/org/springframework/web/servlet/handler/HandlerInterceptorAdapter.html">HandlerInterceptorAdapter</a> class is perfect for this and requires two additions to our application; the interceptor itself and the configuration to hook the interceptor in.</p>

<pre><code class="language-java">@Component
public class TenantInterceptor extends HandlerInterceptorAdapter {

  private static final String TENANT_HEADER = "X-TenantID";

  @Override
  public boolean preHandle(HttpServletRequest req, HttpServletResponse res, Object handler)
      throws Exception {

    String tenant = req.getHeader(TENANT_HEADER);
    boolean tenantSet = false;

    if(StringUtils.isEmpty(tenant)) {
      res.setStatus(HttpServletResponse.SC_BAD_REQUEST);
      res.setContentType(MediaType.APPLICATION_JSON_VALUE);
      res.getWriter().write("{\"error\": \"No tenant supplied\"}");
      res.getWriter().flush();
    } else {
      TenantContext.setCurrentTenant(tenant);
      tenantSet = true;
    }

    return tenantSet;
  }

  @Override
  public void postHandle(
      HttpServletRequest request, HttpServletResponse response, Object handler, ModelAndView modelAndView)
          throws Exception {
    TenantContext.clear();
  }
</code></pre>

<p>In the interceptor above, note the logic to return an appropriate response code and message body if a tenant is missing.  This logic becomes unnecessary if the <code>tenant</code> is part of the authentication schema and securely transmitted in a JWT for instance which, by definition, is generated by a trusted entity.</p>

<p>And finally the configuration to wire the interceptor in;</p>

<pre><code class="language-java">@Configuration
public class WebMvcConfig extends WebMvcConfigurerAdapter {

  @Autowired
  HandlerInterceptor tenantInterceptor;

  @Override
  public void addInterceptors(InterceptorRegistry registry) {
    registry.addInterceptor(tenantInterceptor);
  }
}
</code></pre>

<p>It's interesting to note that interceptors can be applied to specific URL path patterns which opens up the possibility of different tenant strategies for different parts of the application.  For instance everything under <code>\admin</code> could be handled by a different tenant interceptor which could force the tenant id to <code>ADMIN</code> and use a schema dedicated to centralized management of all the tenants in the system.</p>

<p>At this point, you can test your progress with <code>curl</code> and a simple endpoint responding to <code>GET</code>.</p>

<p>Without a <code>X-TenantID</code> header  </p>

<pre><code>$ curl -v localhost:8080/person/1 | jq .
*   Trying 127.0.0.1...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0* Connected to localhost (127.0.0.1) port 8080 (#0)
&gt; GET /person/1 HTTP/1.1
&gt; Host: localhost:8080
&gt; User-Agent: curl/7.45.0
&gt; Accept: */*
&gt; 
&lt; HTTP/1.1 400  
&lt; X-Application-Context: application  
&lt; Content-Type: application/json;charset=ISO-8859-1  
&lt; Transfer-Encoding: chunked  
&lt; Date: Thu, 29 Sep 2016 15:04:36 GMT  
&lt; Connection: close  
&lt;  
{ [37 bytes data]
100    31    0    31    0     0   1880      0 --:--:-- --:--:-- --:--:--  2066  
* Closing connection 0
{
  "error": "No tenant supplied"
}
</code></pre>

<p><code>X-TenantID</code> doesn't do anything at this point, we are simply detecting and storing the desired tenant context.  So with any <code>X-TenantID</code> header you should see the following</p>

<pre><code>$ curl -v -H "X-TenantID:foo" localhost:8080/person/1 | jq .
*   Trying 127.0.0.1...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0* Connected to localhost (127.0.0.1) port 8080 (#0)
&gt; GET /person/1 HTTP/1.1
&gt; Host: localhost:8080
&gt; User-Agent: curl/7.45.0
&gt; Accept: */*
&gt; X-TenantID:test
&gt; 
&lt; HTTP/1.1 200  
&lt; X-Application-Context: application  
&lt; Content-Type: application/hal+json;charset=UTF-8  
&lt; Transfer-Encoding: chunked  
&lt; Date: Thu, 29 Sep 2016 15:04:30 GMT  
&lt;  
{ [244 bytes data]
100   238    0   238    0     0   5803      0 --:--:-- --:--:-- --:--:--  5950  
* Connection #0 to host localhost left intact
{
  "_links": {
    "self": {
      "href": "http://localhost:8080/person/1"
    }
  },
  "lastName": "Baggins",
  "firstName": "Frodo",
  "updatedAt": "2016-09-25T23:01:10.000+0000",
  "createdAt": "2016-09-25T23:01:10.000+0000"
}
</code></pre>

<h4 id="step2hibernateschemachanging">Step 2. Hibernate schema changing</h4>

<p>So now we have the tenant context we need to change the schema transparently and reliably.  Remember we do not want to burden developers with the concern of interacting with the correct context at the detriment of business logic and feature simplicity and scope.  This is a great example of Aspect Oriented Programming (<a href="http://docs.spring.io/spring/docs/current/spring-framework-reference/html/aop.html">AOP</a>).  To this end, as mentioned previously, Hibernate natively supports <code>SCHEMA</code> based multi tenancy and requires three main components.</p>

<ul>
<li><strong>CurrentTenantIdentifierResolver</strong> - Class responsible for resolving the correct tenant</li>
<li><strong>MultiTenantConnectionProvider</strong> - Class responsible for providing and closing tenant connections</li>
<li><strong>Configuration</strong> - Wiring up Hibernate correctly</li>
</ul>

<p>The <code>CurrentTenantIdentifierResolver</code> is remarkably straight forward and essentially, in this case, a proxy to our <code>TenantContext</code> class.  This would be an appropriate place to handle any transformations necessary between the <code>tenant id</code> and the database schema name for the tenant.  In this example there is a one to one match between the tenant id and schema name so no transformation is necessary but that would most likely not be true in a real production app.  Often a naming convention to clearly identify tenant schemas will be useful in a growing production application.  </p>

<pre><code class="language-java">@Component
public class CurrentTenantIdentifierResolverImpl implements CurrentTenantIdentifierResolver {

  @Override
  public String resolveCurrentTenantIdentifier() {
    return TenantContext.getCurrentTenant();
  }

  @Override
  public boolean validateExistingCurrentSessions() {
    return true;
  }
}
</code></pre>

<p>The <code>MultiTenantConnectionProvider</code> is again remarkably simple.  Here we are using <a href="https://www.mysql.com/">Mysql</a> as the backing store and the standard <code>USE database;</code> SQL statement to change schemas which is very cheap to use from a database cost/performance standpoint.  Errors such as the tenant database not existing are propagated up the stack in this example.  </p>

<pre><code class="language-java">@Component
public class MultiTenantConnectionProviderImpl implements MultiTenantConnectionProvider {  
  private static final long serialVersionUID = 6246085840652870138L;

  @Autowired
  private DataSource dataSource;

  @Override
  public Connection getAnyConnection() throws SQLException {
    return dataSource.getConnection();
  }

  @Override
  public void releaseAnyConnection(Connection connection) throws SQLException {
    connection.close();
  }

  @Override
  public Connection getConnection(String tenantIdentifier) throws SQLException {
    final Connection connection = getAnyConnection();
    try {
      connection.createStatement().execute( "USE " + tenantIdentifier );
    }
    catch ( SQLException e ) {
      throw new HibernateException(
          "Could not alter JDBC connection to specified schema [" + tenantIdentifier + "]",
          e
          );
    }
    return connection;
  }

  @Override
  public void releaseConnection(String tenantIdentifier, Connection connection) throws SQLException {
    try {
      connection.createStatement().execute( "USE " + TenantContext.DEFAULT_TENANT );
    }
    catch ( SQLException e ) {
      throw new HibernateException(
          "Could not alter JDBC connection to specified schema [" + tenantIdentifier + "]",
          e
          );
    }
    connection.close();
  }

  @SuppressWarnings("rawtypes")
  @Override
  public boolean isUnwrappableAs(Class unwrapType) {
    return false;
  }

  @Override
  public &lt;T&gt; T unwrap(Class&lt;T&gt; unwrapType) {
    return null;
  }

  @Override
  public boolean supportsAggressiveRelease() {
    return true;
  }

}
</code></pre>

<p>And finally the configuration class to wire Hibernate correctly.  </p>

<pre><code class="language-java">@Configuration
public class HibernateConfig {

  @Autowired
  private JpaProperties jpaProperties;

  @Bean
  public JpaVendorAdapter jpaVendorAdapter() {
    return new HibernateJpaVendorAdapter();
  }

  @Bean
  public LocalContainerEntityManagerFactoryBean entityManagerFactory(DataSource dataSource,
      MultiTenantConnectionProvider multiTenantConnectionProviderImpl,
      CurrentTenantIdentifierResolver currentTenantIdentifierResolverImpl) {
    Map&lt;String, Object&gt; properties = new HashMap&lt;&gt;();
    properties.putAll(jpaProperties.getHibernateProperties(dataSource));
    properties.put(Environment.MULTI_TENANT, MultiTenancyStrategy.SCHEMA);
    properties.put(Environment.MULTI_TENANT_CONNECTION_PROVIDER, multiTenantConnectionProviderImpl);
    properties.put(Environment.MULTI_TENANT_IDENTIFIER_RESOLVER, currentTenantIdentifierResolverImpl);

    LocalContainerEntityManagerFactoryBean em = new LocalContainerEntityManagerFactoryBean();
    em.setDataSource(dataSource);
    em.setPackagesToScan("com.srai");
    em.setJpaVendorAdapter(jpaVendorAdapter());
    em.setJpaPropertyMap(properties);
    return em;
  }
}
</code></pre>

<p>Of particular note you will see the multi tenant strategy set to <code>SCHEMA</code> and our <code>multiTenantConnectionProviderImpl</code> and <code>currentTenantIdentifierResolverImpl</code> classes supplied to the configuration to satisfy that strategy's requirements.  You will also note that we are using the default hibernate <code>jpaProperties</code> that SpringBoot uses.  This is important to get things like the default naming strategy which converts snake case in database schemas to camel case in the Java entities transparently (see <a href="http://stackoverflow.com/questions/25283198/spring-boot-jpa-column-name-annotation-ignored/25293929#25293929">here</a>)</p>

<p><strong>And that's really all there is to it.</strong>  When you look at the amount of code to power and how neatly abstracted it is away from your business logic it is hard to imagine a cleaner and simpler implementation for Hibernate &amp; Spring to provide.</p>

<p>A full implementation of the code samples above can be found on github (<a href="https://github.com/singram/spring-boot-multitenant">https://github.com/singram/spring-boot-multitenant</a>)</p>

<p>I hope you found this useful.</p>

<p>If you want to read further around the topic and differing approaches, the following articles may be of interest and were of great use in the development of the code and this article.</p>

<ul>
<li><a href="http://anakiou.blogspot.com/2015/08/multi-tenant-application-with-spring.html">http://anakiou.blogspot.com/2015/08/multi-tenant-application-with-spring.html</a></li>
<li><a href="http://fizzylogic.nl/2016/01/24/Make-your-Spring-boot-application-multi-tenant-aware-in-2-steps/">http://fizzylogic.nl/2016/01/24/Make-your-Spring-boot-application-multi-tenant-aware-in-2-steps/</a></li>
<li><a href="http://www.greggbolinger.com/tenant-per-schema-with-spring-boot/">http://www.greggbolinger.com/tenant-per-schema-with-spring-boot/</a></li>
<li><a href="http://jannatconsulting.com/blog/?p=41">http://jannatconsulting.com/blog/?p=41</a></li>
<li><a href="http://stackoverflow.com/questions/29928404/internationalization-by-subdomain-in-spring-boot">http://stackoverflow.com/questions/29928404/internationalization-by-subdomain-in-spring-boot</a></li>
<li><a href="https://dzone.com/articles/stateless-session-multi-tenant">https://dzone.com/articles/stateless-session-multi-tenant</a></li>
<li><a href="http://publicstaticmain.blogspot.com/2016/05/multitenancy-with-spring-boot.html">http://publicstaticmain.blogspot.com/2016/05/multitenancy-with-spring-boot.html</a></li>
</ul>]]></content:encoded></item><item><title><![CDATA[Separating Unit from Integration tests in Java using Gradle]]></title><description><![CDATA[<p>Having spent some significant time in the Ruby community and finding a new found appreciation for clean unit and integration tests, it often befuddles me why there isn't such a clean separation of test responsibility and scope in other languages.</p>

<p>Java has learned a lot from other test frameworks over</p>]]></description><link>http://stuartingram.com:80/2016/09/15/separating-unit-from-integration-tests-in-java-using-gradle/</link><guid isPermaLink="false">d760dc12-5456-4307-bd33-b345933a8c71</guid><category><![CDATA[java]]></category><category><![CDATA[gradle]]></category><category><![CDATA[unit testing]]></category><category><![CDATA[integration testing]]></category><dc:creator><![CDATA[Stuart Ingram]]></dc:creator><pubDate>Thu, 15 Sep 2016 02:07:32 GMT</pubDate><content:encoded><![CDATA[<p>Having spent some significant time in the Ruby community and finding a new found appreciation for clean unit and integration tests, it often befuddles me why there isn't such a clean separation of test responsibility and scope in other languages.</p>

<p>Java has learned a lot from other test frameworks over the last decade.  The venerable <a href="http://junit.org/">Junit</a> test framework has matured significantly but with the addition of <a href="https://en.wikipedia.org/wiki/Fluent_interface">fluent libraries</a> such as <a href="https://github.com/rest-assured/rest-assured">RestAssured</a>, simple mocking frameworks such as <a href="http://mockito.org/">Mockito</a>, and verbose matching capabilities such as <a href="http://hamcrest.org/JavaHamcrest/">Hamcrest</a>, to name a few, it's now possible to write tests with properties matching many other languages touted for readability and focused test intent.</p>

<p>So having established that Java has some really great testing libraries, that doesn't address how to use them and this is part of the problem.  There is no simple way to separate unit test from integration test in Java so why should there be a clean understanding of what a unit or integration test is?  Often times I see purported unit tests in Junit that are truely integration tests requiring a database and a full application stack to be stood up with no sign of a true unit test in sight.</p>

<p>Very simply stated;</p>

<ul>
<li>If your unit tests require a database you're doing it wrong.</li>
<li>If your unit tests require an external service or dependency you're doing it wrong.</li>
<li>If your unit tests start the spring framework or an application container, you're doing it wrong.</li>
<li>If your unit tests spend a lot of time setting up preconditions in other classes you're doing it wrong.</li>
<li>If your integration tests are not using publicly exposed interfaces you're doing it wrong.</li>
<li>If your integration tests are stubbing or mocking parts of the system, you're doing it wrong.  (External service stubbing however makes sense)</li>
<li>If your integration tests are not hitting a running application you're doing it wrong.</li>
</ul>

<p>The design, purpose and intent of unit and integration tests is the subject of a much larger discussion and outside the scope of this post.</p>

<p>So back to the problem at hand.  Having a desire to separate fast running unit tests from  integration tests I have struggled for an answer until I came across the <a href="https://github.com/unbroken-dome/gradle-testsets-plugin">gradle-testsets-plugin</a> and these posts from <a href="https://www.petrikainulainen.net/">Petri kainulainen</a>, <a href="https://www.petrikainulainen.net/programming/gradle/getting-started-with-gradle-integration-testing/">here</a> and <a href="https://www.petrikainulainen.net/programming/gradle/getting-started-with-gradle-integration-testing-with-the-testsets-plugin/">here</a></p>

<p>I would strongly recommend reading both posts, but for brevity, here are the main mechanics and some further tips beyond.</p>

<p><strong>Step 1.</strong></p>

<p>Include <code>jcenter</code> as a source for your build script dependencies and pull in the <a href="https://github.com/unbroken-dome/gradle-testsets-plugin">gradle-testsets-plugin</a> dependency</p>

<pre><code class="language-groovy">buildscript {  
  repositories {
    jcenter()
  }
  dependencies {
    classpath 'org.unbroken-dome.gradle-plugins:gradle-testsets-plugin:1.0.2'
  }
}
</code></pre>

<p><strong>Step 2.</strong></p>

<p>Apply the plugin to the build.  Be sure to activate this after the <code>java</code> plugin and before any plugins which may build off the gradle tasks automatically created by the plugin.  </p>

<pre><code class="language-groovy">apply plugin: 'org.unbroken-dome.test-sets'  
</code></pre>

<p><strong>Step 3.</strong></p>

<p>Create the new test set definition and configuration.  Here we want to add an integration test suite but this could be any category of tests you wish to scope together.  </p>

<pre><code class="language-groovy">testSets {  
  integrationTest
}
</code></pre>

<p>Ensure that the <code>check</code> step executes the new test definition and that the new <code>integrationTest</code> step runs after the normal <code>test</code> (unit) step.  </p>

<pre><code class="language-groovy">check.dependsOn integrationTest  
integrationTest.mustRunAfter test  
</code></pre>

<p>Ensure that integration tests are always run regardless if they passed on previous runs  </p>

<pre><code class="language-groovy">project.integrationTest {  
  outputs.upToDateWhen { false }
}
</code></pre>

<p>Finally ensure that the output for tasks of type <code>Test</code> are namespaced appropriately so reports are separated for the <code>test</code> (unit) and <code>integrationTest</code> tasks  </p>

<pre><code class="language-groovy">tasks.withType(Test) {  
  reports.html.destination = file("${reporting.baseDir}/${name}")
}
</code></pre>

<p><strong>Step 4.</strong></p>

<p>Test compile dependencies should be reviewed and the new <code>integrationTestCompile</code> dependencies declared appropriately <br>
<em>e.g.</em></p>

<pre><code class="language-groovy">testCompile("junit:junit")  
integrationTestCompile("org.springframework.boot:spring-boot-starter-test",  
                       "com.jayway.restassured:json-path:2.8.0",
                       "com.jayway.restassured:rest-assured:2.8.0",
                       "com.jayway.restassured:spring-mock-mvc:2.8.0",
                       "com.jayway.restassured:xml-path:2.8.0")
</code></pre>

<p><strong>Step 5.</strong></p>

<p>Restructure your test file layout.  Your directory structure should look something like the following.  </p>

<pre><code>src/  
  main/
    java/...
    resources/...
  integrationTest/
    java/...
    resources/...
  test/
    java/...
    resources/...
</code></pre>

<p>At this point you should be able to run <code>gradle clean build</code> and see your separate <code>test</code> and <code>integrationTest</code> related tasks execute.</p>

<p><strong>Real time test reporting</strong></p>

<p>To see a visual report of test execution and outcome as it happens in the console, add the following  </p>

<pre><code class="language-groovy">test {  
  afterTest { desc, result -&gt;
    println "Executing test [${desc.className}].${desc.name} with result: ${result.resultType}"
    }
}
integrationTest {  
  afterTest { desc, result -&gt;
    println "Executing test [${desc.className}].${desc.name} with result: ${result.resultType}"
    }
}
</code></pre>

<p><strong>Test Coverage</strong></p>

<p>I use <a href="http://www.eclemma.org/jacoco/">Jacoco</a> for test coverage with the help of the <a href="https://docs.gradle.org/current/userguide/jacoco_plugin.html">Jacoco gradle plugin</a>.  While it would be ideal to have separate test coverage for integration and unit test suites in separate reports I was unable to find a simple method to generate them independently.  However you can combine the coverage from both suites with the following;  </p>

<pre><code class="language-groovy">apply plugin: 'jacoco'  
.....
jacoco {  
    toolVersion = "0.7.5.201505241946"
}

jacocoTestReport {  
    reports {
        xml.enabled false
        csv.enabled false
        html{
            enabled true
            destination "${buildDir}/reports/jacoco"
        }
    }
    executionData(test, integrationTest)
}

tasks.build.dependsOn(jacocoTestReport)
</code></pre>

<p>I hope this post has proved useful.  The separation of test types has many benefits including;</p>

<ul>
<li>forcing developers to think about test types &amp; purpose</li>
<li>enforcing unit test conventions.  If you need anything beyond java or are firing up an application server it's not a unit test.</li>
<li>separating fail fast unit tests from potentially costly integration tests</li>
<li>allowing finer control over CI builds and development process. </li>
</ul>]]></content:encoded></item><item><title><![CDATA[Kubernetes local-up-cluster - Heapster Metrics]]></title><description><![CDATA[<p>As it turns out I still didn't quite have the <a href="http://stuartingram.com/2016/09/02/kubernetes-local-up-cluster-addons-in-ubuntu/">local kubernetes</a> setup right.  The documentation around running some of the standard services with local kubernetes is lacking.  There again, it is primarily geared towards kubernetes development and light weight local testing so getting Heapster up and running is a</p>]]></description><link>http://stuartingram.com:80/2016/09/08/local-kubernetes-heapster-metrics/</link><guid isPermaLink="false">e2d2f81c-336d-46a7-b303-f0836b1a374a</guid><category><![CDATA[kubernetes]]></category><category><![CDATA[Install]]></category><category><![CDATA[docker]]></category><dc:creator><![CDATA[Stuart Ingram]]></dc:creator><pubDate>Thu, 08 Sep 2016 19:53:51 GMT</pubDate><content:encoded><![CDATA[<p>As it turns out I still didn't quite have the <a href="http://stuartingram.com/2016/09/02/kubernetes-local-up-cluster-addons-in-ubuntu/">local kubernetes</a> setup right.  The documentation around running some of the standard services with local kubernetes is lacking.  There again, it is primarily geared towards kubernetes development and light weight local testing so getting Heapster up and running is a little outside of the wheelhouse so to speak for the targeted audience.</p>

<p>Assuming you have followed my <a href="http://stuartingram.com/2016/09/02/kubernetes-local-up-cluster-addons-in-ubuntu/">previous steps</a> to get a local kubernetes cluster up and functional, you can get Heapster and Grafana running out of the box with the following</p>

<pre><code class="language-bash">kubectl create -f https://raw.githubusercontent.com/kubernetes/heapster/master/deploy/kube-config/influxdb/heapster-controller.yaml  
kubectl create -f https://raw.githubusercontent.com/kubernetes/heapster/master/deploy/kube-config/influxdb/influxdb-grafana-controller.yaml  
kubectl create -f https://raw.githubusercontent.com/kubernetes/heapster/master/deploy/kube-config/influxdb/influxdb-service.yaml  
kubectl create -f https://raw.githubusercontent.com/kubernetes/heapster/master/deploy/kube-config/influxdb/grafana-service.yaml  
kubectl create -f https://raw.githubusercontent.com/kubernetes/heapster/master/deploy/kube-config/influxdb/heapster-service.yaml  
</code></pre>

<p>Running <code>kubectl cluster-info</code> should yield a url to the Grafana front end which you can open up in a browser and view various stats at the node and pod level.  Pretty nice!</p>

<p><strong>But it's empty right.  There is no data!</strong></p>

<p>Finding the Heapster pod (<code>kubectl get po --all-namespaces=true</code>) and displaying the logs (<code>kubectl logs heapster-0sbna --namespace=kube-system</code>) should yield something like  </p>

<pre><code class="language-bash">E0907 18:47:05.041415       1 kubelet.go:230] error while getting containers from Kubelet: failed to get all container stats from Kubelet URL "http://127.0.0.1:10255/stats/container/": Post http://127.0.0.1:10255/stats/container/: dial tcp 127.0.0.1:10255: getsockopt: connection refused  
</code></pre>

<p>If you run <code>curl http://127.0.0.1:10255/stats/container/</code> from your local host you should see stats returned just fine.</p>

<p><strong>So what's going on?</strong></p>

<p>Well Heapster has got the list of nodes from Kubernetes and is now trying to pull stats from the kublete process on each node (which has a built in cAdvisor collecting stats on the node).  In this case there's only one node and it's known by 127.0.0.1 to kubernetes.  And there's the problem.  The Heapster container is trying to reach the node at 127.0.0.1 which is itself and of course finding no kublete process to interrogate within the Heapster container.</p>

<p>127.0.0.1 is normally the IP address assigned to the "loopback" or local-only interface. This is a "fake" network adapter that can only communicate within the same host. It's often used when you want a network-capable application to only serve clients on the same host.</p>

<p><strong>So how do we solve this?</strong></p>

<p>As it turns out two things need to happen. <br>
1. We need to reference the kublete worker node (our host machine running kubernetes) by something else other than the loopback network address of 127.0.0.1 <br>
2. The kublete process needs to accept traffic from the new network interface/address </p>

<p>To change the hostname by which the kublete is referenced is pretty simple.  You can take more elaborate approaches but setting this to your <code>eth0</code> ip worked fine for me (<code>ifconfig eth0</code>).  The downside is that you need a eth0 interface and this is subject to DHCP so your mileage may vary as to how convenient this is. <br>
<code>export HOSTNAME_OVERRIDE=10.0.2.15</code></p>

<p>To get the kublete process to accept traffic from any network interface is just as simple. <br>
<code>export KUBELET_HOST=0.0.0.0</code></p>

<p>So all together the following will start a local kubernetes instance with DNS and the ability for containers to reach and interact with the kublete process</p>

<pre><code class="language-bash">export KUBERNETES_PROVIDER=local  
export API_HOST=`ifconfig docker0 | grep "inet addr" | awk -F'[: ]+' '{ print $4 }'`  
export KUBE_ENABLE_CLUSTER_DNS=true  
export KUBELET_HOST=0.0.0.0  
export HOSTNAME_OVERRIDE=`ifconfig eth0 | grep "inet addr" | awk -F'[: ]+' '{ print $4 }'`  
hack/local-up-cluster.sh  
</code></pre>

<p>You will of course need to reload all the replica and service definitions for Heapster.  But after doing this and waiting a minute or two you should see data accumulate in the graphs.  Data points are recorded every 60 seconds so give the system time to prove it's working.  You can also check the Heapster pod logs for errors while you wait to verify everything is working.</p>

<p>As an added bonus if you are running the <a href="https://github.com/kubernetes/dashboard">Kubernetes dashboard</a> (see <a href="http://stuartingram.com/2016/09/02/kubernetes-local-up-cluster-addons-in-ubuntu/">here</a> for instructions) you will also get statistics from Heapster fed through to that automatically.  </p>

<p><strong>Awesome sauce!</strong></p>]]></content:encoded></item><item><title><![CDATA[Kubernetes local-up-cluster - dns fixes on Ubuntu]]></title><description><![CDATA[<p>So as it turns out I didn't get too far beyond the <a href="http://stuartingram.com/2016/08/31/installing-kubernetes-on-ubuntu-14-04/">local kubernetes install</a> without running into some issues.  The first being the lack of DNS (I wanted to run the amazing <a href="https://github.com/kubernetes/dashboard">dashboard UI</a>) and then port forwarding to access pod functionality directly.</p>

<p><strong>Ubuntu prerequisites</strong></p>

<p>As it turns out</p>]]></description><link>http://stuartingram.com:80/2016/09/02/kubernetes-local-up-cluster-addons-in-ubuntu/</link><guid isPermaLink="false">7fdfc5f5-0bd1-4549-8b42-99b946d3a99c</guid><category><![CDATA[Ubuntu]]></category><category><![CDATA[kubernetes]]></category><category><![CDATA[dns]]></category><dc:creator><![CDATA[Stuart Ingram]]></dc:creator><pubDate>Fri, 02 Sep 2016 21:47:08 GMT</pubDate><content:encoded><![CDATA[<p>So as it turns out I didn't get too far beyond the <a href="http://stuartingram.com/2016/08/31/installing-kubernetes-on-ubuntu-14-04/">local kubernetes install</a> without running into some issues.  The first being the lack of DNS (I wanted to run the amazing <a href="https://github.com/kubernetes/dashboard">dashboard UI</a>) and then port forwarding to access pod functionality directly.</p>

<p><strong>Ubuntu prerequisites</strong></p>

<p>As it turns out there are a number of Ubuntu 14.04 specific hurdles to overcome before kubernetes will work happily.</p>

<p>First of all <code>dnsmasq</code> needs to be disabled so comment it out and restart networking services via the following  </p>

<pre><code class="language-bash">sudo nano /etc/NetworkManager/NetworkManager.conf  
sudo restart network-manager  
</code></pre>

<p>Find out more about <code>dnsmasq</code> and ubuntu <a href="https://help.ubuntu.com/community/Dnsmasq">here</a></p>

<p>Next the tools <code>socat</code> and <code>nsenter</code> are required for kubernetes port forwarding. <br>
To install <code>socat</code> run  </p>

<pre><code class="language-bash">sudo apt-get install socat  
</code></pre>

<p>To install <code>nsenter</code> is slightly more work due to lack of 14.04 support but not much thanks to the work of <a href="http://jpetazzo.github.io">Jérôme Petazzoni</a>.  </p>

<pre><code class="language-bash">docker run --rm jpetazzo/nsenter cat /nsenter &gt; /tmp/nsenter &amp;&amp; chmod +x /tmp/nsenter  
sudo cp /tmp/nsenter /usr/local/bin  
</code></pre>

<p>Check out the repo <a href="https://github.com/jpetazzo/nsenter">here</a> or this <a href="https://gist.github.com/mbn18/0d6ff5cb217c36419661">gist</a> if you want to go step by step</p>

<p>To find out more about <code>socat</code> <a href="http://www.dest-unreach.org/socat/doc/README">here</a> and <code>nsenter</code> <a href="http://man7.org/linux/man-pages/man1/nsenter.1.html">here</a></p>

<p><strong>Back to kubernetes</strong></p>

<p>After these steps it's hopefully smooth sailing.  So lets start kubernetes with DNS on by default by running the following</p>

<pre><code class="language-bash">export KUBERNETES_PROVIDER=local  
export API_HOST=`ifconfig docker0 | grep "inet addr" | awk -F'[: ]+' '{ print $4 }'`  
export KUBE_ENABLE_CLUSTER_DNS=true  
hack/local-up-cluster.sh  
</code></pre>

<p>Instructions for validating your DNS setup can be found <a href="https://github.com/kubernetes/kubernetes/blob/master/build/kube-dns/README.md">here</a></p>

<p>Let's add the dashboard  </p>

<pre><code class="language-bash">kubectl create -f https://rawgit.com/kubernetes/dashboard/master/src/deploy/kubernetes-dashboard.yaml  
</code></pre>

<p>This can be accessed via  </p>

<pre><code class="language-bash">firefox http://172.17.0.1/ui  
</code></pre>

<p>From this you can view and manage most things you can via the <code>kubectl</code> cli.</p>

<p><strong>Kind of gotcha but not really</strong></p>

<p>One thing to note is that when you terminate the kubernetes process all the docker containers remain running (see <code>docker ps</code>).  This at first caused concern, but then remember kubernetes is designed so that the containers it manages are themselves not dependent on kubernetes to function.  If the scheduler dies, the containers are unaffected, only scheduling.  This is a consistent philosophy throughout the kubernetes system and makes sense that upon shutdown would not remove all running containers.  A few properties to note <br>
1. If kubernetes is subsequently spun up it will reconcile the state of the system with desired state.  As you would expect <br>
2. Other docker containers can be spun up and down locally &amp; independent of those managed by kubernetes. <br>
These properties are possible due to docker labels being applied by kubernetes to the containers it manages.</p>]]></content:encoded></item><item><title><![CDATA[Dangling docker volumes]]></title><description><![CDATA[<p>As anyone who works with docker knows, images and containers accumulate rapidly.</p>

<p>All containers can be cleared down with  </p>

<pre><code class="language-bash">docker rm $(docker ps -a -q)  
</code></pre>

<p>And likewise, all Images with  </p>

<pre><code class="language-bash">docker rmi -f $(docker images -q)  
</code></pre>

<p>What I wasn’t aware of was the dangling volume issue.  While I had</p>]]></description><link>http://stuartingram.com:80/2016/09/01/dangling-docker-volumes/</link><guid isPermaLink="false">4ee25862-d269-414f-ba5d-10c0f860d7c1</guid><category><![CDATA[docker]]></category><dc:creator><![CDATA[Stuart Ingram]]></dc:creator><pubDate>Thu, 01 Sep 2016 16:45:07 GMT</pubDate><content:encoded><![CDATA[<p>As anyone who works with docker knows, images and containers accumulate rapidly.</p>

<p>All containers can be cleared down with  </p>

<pre><code class="language-bash">docker rm $(docker ps -a -q)  
</code></pre>

<p>And likewise, all Images with  </p>

<pre><code class="language-bash">docker rmi -f $(docker images -q)  
</code></pre>

<p>What I wasn’t aware of was the dangling volume issue.  While I had no images or containers left after the above, I did however have 20Gb taken up in dangling volumes which I was oblivious to until I wondered where all my system space had disappeared to.</p>

<p>You can check for dangling volumes independent of containers with  </p>

<pre><code class="language-bash">docker volume ls -f dangling=true  
</code></pre>

<p>And remove them with  </p>

<pre><code class="language-bash">docker volume rm $(docker volume ls -qf dangling=true)  
</code></pre>

<p>Or you can remove them with the associated container by adding the <code>–v</code> flag (e.g. <code>docker rm –v container name</code>) if you remember to put the flag on every time. <br>
I would suggest incorporating these to your team purge scripts/procedures for a better cleanup.</p>

<p>More information can be found here (recommended reading)</p>

<ul>
<li><a href="http://serverfault.com/questions/683910/removing-docker-data-volumes">http://serverfault.com/questions/683910/removing-docker-data-volumes</a></li>
<li><a href="http://container42.com/2014/11/03/docker-indepth-volumes/">http://container42.com/2014/11/03/docker-indepth-volumes/</a></li>
</ul>]]></content:encoded></item><item><title><![CDATA[Installing Kubernetes on Ubuntu 14.04]]></title><description><![CDATA[<p>I typically run my linux environment via VirtualBox on a Windows host for mainly corporate reasons.  <a href="https://github.com/kubernetes/minikube">MiniKube</a> is the new recommended way to get up and running with Kubernetes for local development, however this requires a host system capable of running a vm and at this time VirtualBox does not</p>]]></description><link>http://stuartingram.com:80/2016/08/31/installing-kubernetes-on-ubuntu-14-04/</link><guid isPermaLink="false">fcea39c2-7a8c-4e87-bde1-1284315dc378</guid><category><![CDATA[docker]]></category><category><![CDATA[kubernetes]]></category><category><![CDATA[Ubuntu]]></category><dc:creator><![CDATA[Stuart Ingram]]></dc:creator><pubDate>Wed, 31 Aug 2016 13:24:29 GMT</pubDate><content:encoded><![CDATA[<p>I typically run my linux environment via VirtualBox on a Windows host for mainly corporate reasons.  <a href="https://github.com/kubernetes/minikube">MiniKube</a> is the new recommended way to get up and running with Kubernetes for local development, however this requires a host system capable of running a vm and at this time VirtualBox does not support 64bit nested VM's.  With that in mind here are the steps I took to install kubernetes locally, mostly taken from <a href="https://github.com/kubernetes/kubernetes/blob/release-1.3/docs/devel/running-locally.md">this</a> guide.</p>

<p><strong>Install Docker</strong></p>

<pre><code class="language-bash">apt-get install apparmor lxc cgroup-lite  
wget -qO- https://get.docker.com/ | sh  
sudo usermod -aG docker YourUserNameHere  
sudo service docker restart  
</code></pre>

<p><strong>Install OpenSSL</strong></p>

<pre><code class="language-bash">sudo apt-get install openssl  
</code></pre>

<p><strong>Install etcd</strong></p>

<pre><code class="language-bash">curl -L https://github.com/coreos/etcd/releases/download/v3.0.6/etcd-v3.0.6-linux-amd64.tar.gz -o etcd-v3.0.6-linux-amd64.tar.gz  
tar xzvf etcd-v3.0.6-linux-amd64.tar.gz &amp;&amp; cd etcd-v3.0.6-linux-amd64  
sudo mv etcd /usr/local/bin  
etcd --version  
</code></pre>

<p>Original install instructions <a href="https://github.com/coreos/etcd/releases">here</a></p>

<p><strong>Install Go 1.6+</strong></p>

<p>Remember to remove any previous version installed.</p>

<pre><code class="language-bash">wget https://storage.googleapis.com/golang/go1.7.linux-amd64.tar.gz  
tar xzf go1.7.linux-amd64.tar.gz  
export GOPATH="/home/singram/personal"  
export GOROOT="/home/singram/go/"  
export PATH=$PATH:$GOROOT/bin:$GOPATH/bin

go get -u github.com/jteeuwen/go-bindata/go-bindata  
</code></pre>

<p>Full instructions can be found <a href="https://golang.org/doc/install">here</a></p>

<p><strong>Install Kubernetes</strong></p>

<pre><code class="language-bash">mkdir -p $GOPATH/src  
cd $GOPATH/src  
git clone --depth=1 https://github.com/kubernetes/kubernetes.git  
</code></pre>

<p><strong>Build and Run kubernetes</strong></p>

<pre><code class="language-bash">hack/local-up-cluster.sh  
</code></pre>

<p>Beware, you will most likely be prompted for your root password towards the end of the build process.  If you let this timeout, your system will have a number of processes running which are somewhat annoying to cleanup.  If this happens, restarting the system proved the simplest method to reset and retry this step.</p>

<p>If successful you should have a running kubernetes system up and running.</p>

<p><strong>Configure Kubectl</strong></p>

<p>From the previous step you should see some output similar to the the commands below.  Open up a new shell and execute the following to set up your <code>~/.kube/config</code>  </p>

<pre><code class="language-bash">export KUBERNETES_PROVIDER=local  
cluster/kubectl.sh config set-cluster local --server=http://127.0.0.1:8080 --insecure-skip-tls-verify=true  
cluster/kubectl.sh config set-context local --cluster=local  
cluster/kubectl.sh config use-context local  
cluster/kubectl.sh  
</code></pre>

<p>From this point on you have a working kubernetes system.  You can either use the <code>cluster/kubectl.sh</code> or simply install <code>kubectl</code> separately as part of your system.  The config file in your home directory is configured and the important part which is what both kubectl versions will key off.</p>

<p>Check out your kubernetes cluster nodes (there'll only be one)  </p>

<pre><code class="language-bash">kubectl get no  
kubectl describe no 127.0.0.1  
</code></pre>

<p>What about your pods  </p>

<pre><code class="language-bash">kubectl get pods  
</code></pre>

<p>And now you should have a fully working locally hosted kubernetes cluster of one.  Superb!</p>]]></content:encoded></item><item><title><![CDATA[Links that caught my eye]]></title><description><![CDATA[<p>First part of a reoccurring series.</p>

<ul>
<li><a href="http://varianceexplained.org/r/trump-tweets/">Great NLP analysis of tweets from Mr Trump</a></li>
<li><a href="https://peteris.rocks/blog/openstreetmap-city-blocks-as-geojson-polygons/">Interpreting OpenStreet maps to blocks</a></li>
<li><a href="http://mewo2.com/notes/terrain/">Generating fantasy maps</a></li>
<li><a href="https://blog.hartleybrody.com/scrape-amazon/">19 lessons learned scraping Amazon</a></li>
<li><a href="http://roy.red/slitscan-.html">Recreating the Doctor Who Time Tunnel in GLSL</a></li>
<li><a href="http://jvns.ca/blog/2016/08/10/how-does-gdb-work/">How does GDB work?</a>
<ul><li><a href="http://www.brendangregg.com/blog/2016-08-09/gdb-example-ncurses.html">GDB Example</a></li></ul></li>
<li><a href="https://blog.codeship.com/level-up-your-security-in-rails/">Rails security tips</a></li>
<li><a href="https://boxfuse.com/blog/go-aws">Deploy 7 MB Go VMs effortlessly</a></li></ul>]]></description><link>http://stuartingram.com:80/2016/08/15/interesting-links-2/</link><guid isPermaLink="false">82815aa2-b322-455b-a690-7a9908424ed8</guid><dc:creator><![CDATA[Stuart Ingram]]></dc:creator><pubDate>Mon, 15 Aug 2016 17:31:00 GMT</pubDate><content:encoded><![CDATA[<p>First part of a reoccurring series.</p>

<ul>
<li><a href="http://varianceexplained.org/r/trump-tweets/">Great NLP analysis of tweets from Mr Trump</a></li>
<li><a href="https://peteris.rocks/blog/openstreetmap-city-blocks-as-geojson-polygons/">Interpreting OpenStreet maps to blocks</a></li>
<li><a href="http://mewo2.com/notes/terrain/">Generating fantasy maps</a></li>
<li><a href="https://blog.hartleybrody.com/scrape-amazon/">19 lessons learned scraping Amazon</a></li>
<li><a href="http://roy.red/slitscan-.html">Recreating the Doctor Who Time Tunnel in GLSL</a></li>
<li><a href="http://jvns.ca/blog/2016/08/10/how-does-gdb-work/">How does GDB work?</a>
<ul><li><a href="http://www.brendangregg.com/blog/2016-08-09/gdb-example-ncurses.html">GDB Example</a></li></ul></li>
<li><a href="https://blog.codeship.com/level-up-your-security-in-rails/">Rails security tips</a></li>
<li><a href="https://boxfuse.com/blog/go-aws">Deploy 7 MB Go VMs effortlessly to AWS</a></li>
<li><a href="http://flink.apache.org/news/2016/08/08/release-1.1.0.html">Apache Flink 1.1.0 released</a></li>
</ul>]]></content:encoded></item><item><title><![CDATA[Packaging a git tag]]></title><description><![CDATA[<p>So the other day I was presented with the following requirements. </p>

<p>From a git repository retrieve a history tag and it's commit history to deliver to a client.  No other branches should be presented to the client, nor work committed after the tag. </p>

<p>This actually proved to be a little</p>]]></description><link>http://stuartingram.com:80/2013/08/24/packaging_a_git_tag/</link><guid isPermaLink="false">5909decd-9274-4f37-a127-72604519b907</guid><category><![CDATA[Git]]></category><dc:creator><![CDATA[Stuart Ingram]]></dc:creator><pubDate>Sat, 24 Aug 2013 22:40:00 GMT</pubDate><content:encoded><![CDATA[<p>So the other day I was presented with the following requirements. </p>

<p>From a git repository retrieve a history tag and it's commit history to deliver to a client.  No other branches should be presented to the client, nor work committed after the tag. </p>

<p>This actually proved to be a little tricky and I'm certain I'm missing some git wizardry but here's what I did. </p>

<p>Clone the repository (foo) to work on locally  </p>

<pre><code class="language-bash">git clone myname@github:foo  
</code></pre>

<p>Checkout the tag and create a branch from it.  </p>

<pre><code class="language-bash">cd foo  
git checkout mytag_1.0.0  
git checkout -b mytag_1.0.0_snapshot  
</code></pre>

<p>Remove all other local branches and cleanup the repository  </p>

<pre><code class="language-bash">git branch -D master  
git gc  
</code></pre>

<p>At this point you should have a local repository with a single local branch representing the tag you want and a number of references to remote branches.  This can be verified with  </p>

<pre><code class="language-bash">git branch -a  
</code></pre>

<p>Now clone your local repository again  </p>

<pre><code class="language-bash">cd ..  
git clone foo foo_final  
</code></pre>

<p>The foo_final repository should contain nothing but the branch representing the tag at this point. <br>
Zip is up, throw it on a flash drive and deliver as appropriate. </p>

<p>Now I make no claims that this is the best way to do this.  In fact I'm certain there should be a better way but this is what I ended up doing.</p>]]></content:encoded></item><item><title><![CDATA[Reloading FactoryGirl definitions in a Rails 3.2 console]]></title><description><![CDATA[<p><strong>Problem</strong> </p>

<p>You've developed a rich set of class definitions using FactoryGirl and find them useful while developing and testing in rails console.  The problem is that when you reload! your classes the FactoryGirl definitions are not reloaded causing confusion and errors.  On top of this if, in your application, you</p>]]></description><link>http://stuartingram.com:80/2012/10/14/reloading_factory_girl_definitions_in_a_rails_3_2_console/</link><guid isPermaLink="false">0848430c-5629-4ff7-b89b-364513b5635d</guid><category><![CDATA[Ruby]]></category><category><![CDATA[Factorygirl]]></category><category><![CDATA[Rails]]></category><dc:creator><![CDATA[Stuart Ingram]]></dc:creator><pubDate>Sun, 14 Oct 2012 19:40:00 GMT</pubDate><content:encoded><![CDATA[<p><strong>Problem</strong> </p>

<p>You've developed a rich set of class definitions using FactoryGirl and find them useful while developing and testing in rails console.  The problem is that when you reload! your classes the FactoryGirl definitions are not reloaded causing confusion and errors.  On top of this if, in your application, you are initializing class variables upon bootup these are lost also.  Basically adding undue weight to a simple class refresh. </p>

<p>After much searching online for a solution article provided useful answers and is well worth a read. </p>

<p><strong>Solution</strong> </p>

<p>Please note that this solution has only been tested in Rails 3.2 with FactoryGirl 4.0.0 </p>

<p>In <code>environments/development.rb</code></p>

<pre><code class="language-ruby">MyApplication.configure do  
  .... 
  .... 
  ActionDispatch::Reloader.to_prepare do 
    # first init will load unless  
    FactoryGirl.factories.entries.empty? 
      FactoryGirl.reload 
    end
    SomeClass.reinitialize unless SomeClass.initialized? 
  end  
end  
</code></pre>

<p><a href="http://wondible.com/2011/12/30/rails-autoloading-cleaning-up-the-mess/">this</a></p>]]></content:encoded></item></channel></rss>