I think ... - open-sourcehttps://blog.kmonsoor.com/2018-01-18T00:00:00+06:00HA(High-Availability) Setup for InfluxDB2018-01-18T00:00:00+06:002018-01-18T00:00:00+06:00Khaled Monsoortag:blog.kmonsoor.com,2018-01-18:/ha-setup-for-influxdb/<p>Create a robust, highly-available, time-series InfluxDB cluster with the community(free) version of it</p><p><strong><span class="caps">NOTE</span></strong>
<em>Since I have written this article, all the components used in this below architecture have gone through many updates and releases. While the general premise involving <code>influxdb-relay</code> and the multiplexing might still hold, please sync up with the latest release docs before jumping into some serious system design.</em></p>
<hr>
<p>Currently, from version 0.9, you cannot create an InfluxDB cluster from the open-sourced free edition. Only commercially available InfluxDB Enterprise can do that for now. That stirred up the early-adopter enthusiast users, especially for their usage in professional setups. They complained that InfluxData, the company behind InfluxDB, is trying to milk the <span class="caps">OSS</span> solution for profit.</p>
<p><img alt="Archiving isn't easy ... tobias-fischer-PkbZahEG2Ng" src="https://i.imgur.com/0IdYOYnl.jpg"></p>
<p>I can’t blame the InfluxData guys much, as they got to pay their bills too. So far, we — the users of open-source systems — couldn’t show much promise about the financial realities of the projects. Continuing development of <span class="caps">OSS</span> products, by only depending on donations, patrons, or enterprise sponsorship, is far too rare and unpredictable, even for the projects that many successful organizations heavily rely on.</p>
<p>Anyways, InfluxDB then promised and later introduced <code>Influx Relay</code> as a complimentary consolation for missing <span class="caps">HA</span> parts of InfluxDB. You can get the details here and here about that. </p>
<h2 id="premise">Premise<a class="headerlink" href="#premise" title="Permanent link">¶</a></h2>
<p>For my needs, I have to try to create a reliable <span class="caps">HA</span>(High-Availability) setup from available free options, hence InfluxDB and the relay. It’s quite a bit far from an InfluxDB-cluster in terms of robustness or ease of setup, but it’s got the job done, at least for me.</p>
<p>I needed a setup to receive system-stats from at least 500+ instances and to store them for a while, but without breaking the bank in bills from <span class="caps">AWS</span>. Meaning, I could ask for and could use only couple of instances for my solution.</p>
<p>Here were my trade-offs.</p>
<ul>
<li>Not too many instances for this purpose. Neither, any of the heavyweight lifters e.g. <span class="caps">AWS</span>’ m3-xlarge etc. To use only what’s necessary. </li>
<li>To satisfy the budget, hence avoiding pay-per-use solutions as far as it is possible.</li>
<li>Solutions must not be crazy complex, so that handover to the DevOps team be smooth.</li>
<li>Reading the data would be too rarely w.r.t. writing. The related Grafana dashboards will be only used to investigate issues by a handful of people.</li>
</ul>
<h2 id="overall-design">Overall Design<a class="headerlink" href="#overall-design" title="Permanent link">¶</a></h2>
<h3 id="write">Write<a class="headerlink" href="#write" title="Permanent link">¶</a></h3>
<p>From a birds’ eye view, I decided to use two server instances to run parallelly, hosting InfluxDB on them independently and then sending the same data over to them for storing. This scheme mostly looks like <a href="https://en.wikipedia.org/wiki/Standard_RAID_levels#RAID_1"><span class="caps">RAID</span>-1 systems</a>.</p>
<p><img alt="Overall architecture" src="https://i.imgur.com/ZKYIyOd.png"></p>
<p>That brings up a couple of challenges.</p>
<ul>
<li>
<p>None of the agents I used on the sender side could multiplex output. That means, they were able to send data to a single destination, not multiple.
On the Windows front, I’ve used <code>Telegraf</code> which is able randomly to switch between pre-listed destinations, but <span class="caps">NOT</span> multiple at-once.<br>
In the case of Linux hosts, I used <code>Netdata</code> which is excellent in its own right, but unable to send stats to multiple destinations.<br>
Here comes <code>Influx-relay</code>. It can receive time-series data-stream from hosts on a <span class="caps">TCP</span> or <span class="caps">UDP</span> port, buffer for a while, and then re-send those received and buffered data to multiple receive ends which can either be an InfluxDB instance or another listening Influx-relay instances.<br>
This chaining can broaden the relaying scheme even further. However, for my purpose, this relay-chaining was not necessary. Rather, from the relay, I am sending data to the separate InfluxDB instances, running on two separate instances. </p>
</li>
<li>
<p>Now that I partially multiplexed the output, my hosts (senders) still are able to send to one destination. So, I need a proxy as well as a load-balancer. For a while, I was torn between <span class="caps">NGINX</span> and HAProxy. Both were new to me. </p>
</li>
</ul>
<p>However, for a couple of reasons, I went for HAProxy. Firstly, I don’t need <span class="caps">HTTP</span> session management. Secondly, as I wanted to keep my <span class="caps">UDP</span> for later, HAProxy was perfectly capable of that.<br>
<span class="caps">NGINX</span> has the support recently, but the maturity was a concern. Also, configuring <span class="caps">NGINX</span> seems a little intimidating (which I know might not be so true). Last but not least, and for what it’s worth, out-of-the-box, HAProxy’s stat page carries much more in-depth information than that of free-version of <span class="caps">NGINX</span>.<br>
Upon receiving the stats stream, HAProxy was supposed to send that to different Influx-relays in a load-balanced fashion.</p>
<p>So, here’s my rough plan. </p>
<p>collector-agent → HAProxy → (50/50 load-balanced) → Influx-relay → (multiplexed) → 2 InfluxDB instances</p>
<p>Now, each one of the received data is to go to both of the InfluxDB instances, or at least to one in case of failure (or, overload per se) of any the relays or Influx instances.
Also, I have chosen to keep Influx-relays deployed as Dockerized and kept HAProxy and InfluxDB instances running as native services. Of course, you can Dockerize HAProxy and InfluxDB, too. </p>
<h3 id="read">Read<a class="headerlink" href="#read" title="Permanent link">¶</a></h3>
<p>As I’ve already noted in the section that reading the data, meaning to fetch data to visualize on Grafana end, will happen rarely and sporadically; only to investigate alarms or any other client-side performance issues. </p>
<p>So, the read requests, reaching the HAProxy end, needed not much routing, other than directly to InfluxDB itself. Still, to better distribute the load I decided to load-balance it 50/50 basis.</p>
<h3 id="ports">Ports<a class="headerlink" href="#ports" title="Permanent link">¶</a></h3>
<ul>
<li>As all the <span class="caps">READ</span> requests are routed through <code>HAProxy</code> running on each of the instances, to the external world only HAProxy’s port should be opened for this purpose. </li>
<li>On the other hand, for <span class="caps">WRITE</span> requests, InfluxDBs are receiving data from relays, one of its own instance and another one on other instance, so InfluxDB should listen on its own port for <span class="caps">WRITE</span> requests only. But, this must be accessible only from own <span class="caps">VPS</span> zone, but not open to the outside world.</li>
<li>In case of HAProxy as well as InfluxDB, you can use the default ports, obviously, which is 8086 <span class="amp">&</span> 8088 respectively. Or, you can choose to go for other ports (security through obfuscation). Your call. In this writing, I’ll go with the defaults.</li>
</ul>
<h3 id="authentication-ssl">Authentication, <span class="caps">SSL</span><a class="headerlink" href="#authentication-ssl" title="Permanent link">¶</a></h3>
<p>You can configure <span class="caps">SSL</span> with your own server certificates through the HAProxy configs. You can even go for <span class="caps">SSL</span> from the relays to InfluxDB writes. If your sender hosts are connecting to your HAProxy through public internet, you should at least go for password-based authentication, better to utilize <span class="caps">SSL</span>. However, for brevity’s sake, I’ll skip them in this post.</p>
<p>**Note: *
Please bear in mind, this is an “in-progress” post; prematurely published to force me to work on it. I have the plan to add all the necessary configurations <span class="amp">&</span> commands, that I used, here.</p>Open Source as-if You Gonna Die Tonight2015-12-22T00:00:00+06:002015-12-23T00:00:00+06:00Khaled Monsoortag:blog.kmonsoor.com,2015-12-22:/open-source-as-if-you-gonna-die-tonight/<p>You should open-source as-if you gonna die tonight. Literally.</p><p><strong><em>[ To keep the spirit of this post honest, I am going to publish this blog, immidiately. No <strong>draft</strong>-ing. This post will be, I hope, under continuous improvement.<br>
This post is <a href="https://github.com/kmonsoor/blog.kmonsoor.com/edit/live/content/articles/open-your-source-as-if-you-gonna-die-tonight.md">available for edit on GitHub</a>, currently in its version 0.0.6 ]</em></strong></p>
<p><strong>Yes, I mean it. Literally.</strong></p>
<p>I see too many post/comments/blogs, in different meeting-places for techies e.g. hackernews, reddit, etc., which say the same thing: </p>
<blockquote>
<p><span class="dquo">“</span>I am working on <strong><em>something</em></strong> which I will open-source/publish
<strong><em>someday</em></strong> after taking it <strong><em>somewhere</em></strong>.”</p>
</blockquote>
<p>See the ambiguous sense in the words? </p>
<p>Unless the code/script/blog you are working on is something sensitive which will make a mess published a “draft” form, you should not wait for “someday”. Or if you have thousands of subscribers to your blog ;)</p>
<p>Or if you decided that you will <strong><em>never</em></strong> publish it in public - that’s an entirely different story.</p>
<p>If it is something of your company’s code-base, commit it in your remote branch, so that your last 19 days of work isn’t just gone just because you are “gone”.</p>
<p><strong>As human, we are far more fragile than we think.</strong></p>
<p>I am not talking about publishing a physical book using a printing press. In that scenario, writers were supposed to write the perfect words, then type it using type-writer to avoid any handwriting-related gotchas. Then it went to the reviewer, then proofreader, then the typesetter makes a block character-by-character. Then comes printing on the paper. The writer had to be sure what he is writing about, <span class="caps">ABSOLUTELY</span>. Else, each of the <em>2000</em> copies of the <em>first-edition</em> would have the same mistakes.</p>
<p>I am also not talking about pushing the critical code in the <code>production</code> server. That stuff should go through rigorous coding practices, code-reviews, testing etc.</p>
<p>Aside from those cases, <strong>this is 2015</strong> - How much does each <code>git push origin gh-pages</code> cost? How much each WordPress post update cost? Or, a single GitHub gist?</p>
<p>Publishing your stuff is free, no matter how many times you update it.</p>
<p><strong>So, why do we think about the paradigm of the printing press when we think of “publishing”?</strong></p>
<h2 id="frequently-shared-confusions-fsq">Frequently shared confusions (<span class="caps">FSQ</span>)<a class="headerlink" href="#frequently-shared-confusions-fsq" title="Permanent link">¶</a></h2>
<ul>
<li>
<p><strong>What if this, my thing, is just plain crap ?</strong><br>
<strong>A.</strong> Are you sure? You never know for sure. Throw it in the wild. If it is really crap, nobody will remember or hold you responsible for it. How many crappy DaVinci paintings you know? I guess, <strong><em>none.</em></strong> But the <a href="https://en.wikipedia.org/wiki/Mona_Lisa">Mona Lisa</a> didn’t just appear from thin air. Did it?</p>
</li>
<li>
<p><strong>This is my toy(or pet) project.</strong><br>
<strong>A.</strong> Don’t be that selfish kid from the school who don’t let others touch it just because. If you are having fun building something why not let others join in the fun?</p>
</li>
<li>
<p><strong>This is a one-off script on this ancient <em><span class="caps">COBOL</span></em> platform, no one is going to need it. Ever.</strong><br>
<strong>A.</strong> You never know. You are a human. You can’t imagine what people gonna need. Throw it in a <a href="https://gist.github.com/">gist</a>, just include a suitable title. Add in some comments if you please. May be couple of years later, your script will save someone’s job, and he can still put food on the table. You never know. </p>
</li>
<li>
<p><strong>If I publish this now some genius with free time will steal my idea and make it something grand without me.</strong><br>
<strong>A.</strong> Unless you are a big hot-shot with a ground-breaking idea, no one will even notice it. Most <em>genius’</em> mind is already filled with their own to-do list. Even if they take your idea, let them. Move on.
Don’t be a muddy pond, rather be like a river. Rivers don’t dry off due to peasants “stealing” some water. </p>
</li>
<li>
<p><strong>I haven’t collected my thoughts enough to make this post a grand one yet.</strong><br>
<strong>A.</strong> Don’t think too high yourself. Let others do that for you.<br>
No project is born grand, and no great man born great. Your contributions is what goes ahead. You the person? Not so much. Time passed along “Romeo <span class="amp">&</span> Juliet”, but Shakespeare is dead and gone. </p>
</li>
<li>
<p><strong>I am special. My words/code should be special, perfect, coherent like a pearl-necklace.</strong> <em>(yes, we all feel like that, we just don’t acknowledge it publicly.)</em><br>
<strong>A.</strong> No, you are not. You are not a special unique snowflake. See the previous answer.</p>
</li>
<li>
<p><strong>Who the ***k are you to tell me what to do?</strong><br>
<strong>A.</strong> It’s not about me, I am nobody. Just a open-source enthusiast who wants to see more and more open-source projects, scripts, blog-posts which haven’t gone to grave with their mortal creators. I’m just sharing my own thoughts about it.
<strong>It is your code on your own personal-pc, after all.</strong></p>
</li>
</ul>
<h2 id="why-should-i-open-my-sourcepostthoughtsetc-tonight">Why should I open my source(post/thoughts/etc) tonight?<a class="headerlink" href="#why-should-i-open-my-sourcepostthoughtsetc-tonight" title="Permanent link">¶</a></h2>
<ul>
<li>
<p>You can literally die tonight.
Then all of your pet-projects are just gone. ‘Cause probably none in your family is in coding business or they aren’t sure about your intention. </p>
</li>
<li>
<p>Tomorrow morning, your mind will just drift-away.<br>
What is a vivid idea tonight that could impact thousands of people’s lives, tomorrow morning will become a faded, will-do-it-someday idea. One month after tonight, you probably will probably be oblivious about your own idea, draft, script, code.</p>
</li>
<li>
<p>It’s a mind-trick to force ourselves to work on something to avoid public shame. We feel obliged to correct errata that is in public, but something hidden away in a private, hidden, local folder we don’t have to feel bad about. </p>
</li>
</ul>
<h2 id="to-avoid-embarrassment">To avoid embarrassment<a class="headerlink" href="#to-avoid-embarrassment" title="Permanent link">¶</a></h2>
<ul>
<li>
<p>Make sure your audience (or colleagues for that matter) know the content’s status. Put a “prelude” section mentioning the half-done condition. Better yet, use some <a href="http://semver.org/">version number</a>.</p>
</li>
<li>
<p>Make your genereal idea clear. For code, point out what it is and what it is supposed to do. For a blog, present the basic idea at least, even if it is not with perfect grammer.</p>
</li>
</ul>
<h2 id="how-infrastructure-can-improve">How infrastructure can improve<a class="headerlink" href="#how-infrastructure-can-improve" title="Permanent link">¶</a></h2>
<p>Open-source mainstream hosting platforms e.g. Github, Bitbucket, GitLab etc. could have a <strong>“Open the source”</strong> trigger-switch for individual projects where a software developer can enable the trigger with some condition like:</p>
<ul>
<li><span class="dquo">“</span><strong>Open the source</strong>” if I don’t login GitHub for <code>1 year</code> (which means I am dead or I’ve gone crazy trying to remember GitHub)</li>
<li><span class="dquo">“</span><strong>Open the source</strong>” on a pre-set date e.g. <code>2020-02-20</code></li>
</ul>
<p><em>[dear reader, thanks a lot for reading up to here. I am sure there are many points missing on this post. Also, as English is not my first language, hence there must some misused words or phrase. But you get the idea. Please comment/criticise/point out the missing stuff. I will try to discuss, update, correct that.]</em></p>
<hr>
<h2 id="contributors">Contributors<a class="headerlink" href="#contributors" title="Permanent link">¶</a></h2>
<table>
<thead>
<tr>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td><a href="https://github.com/kmonsoor">Khaled Monsoor</a></td>
<td>initial author, maintainer</td>
</tr>
<tr>
<td><a href="https://github.com/waynew">Wayne Werner</a></td>
<td>editor (v.0.0.3 → v.0.0.4)</td>
</tr>
</tbody>
</table>
<h2 id="some-related-inspirations-from-some-open-source-jedis">Some related inspirations from some open-source jedis<a class="headerlink" href="#some-related-inspirations-from-some-open-source-jedis" title="Permanent link">¶</a></h2>
<ul>
<li><a href="https://rhettinger.wordpress.com/2011/01/28/open-your-source-more/">Raymond Hettinger :: Open Source Challenge: Open Your Source, More</a></li>
<li><a href="https://www.jeffknupp.com/blog/2013/08/16/open-sourcing-a-python-project-the-right-way/">Jeff Knupp :: Open Sourcing a Python Project the Right Way</a></li>
<li><a href="https://archive.org/stream/GuerillaOpenAccessManifesto/Goamjuly2008_djvu.txt">Aaron Swartz :: Guerilla Open Access Manifesto</a></li>
</ul>