I think ... - databasehttps://blog.kmonsoor.com/2018-01-18T00:00:00+06:00HA(High-Availability) Setup for InfluxDB2018-01-18T00:00:00+06:002018-01-18T00:00:00+06:00Khaled Monsoortag:blog.kmonsoor.com,2018-01-18:/ha-setup-for-influxdb/<p>Create a robust, highly-available, time-series InfluxDB cluster with the community(free) version of&nbsp;it</p><p><strong><span class="caps">NOTE</span></strong> <em>Since I have written this article, all the components used in this below architecture have gone through many updates and releases. While the general premise involving <code>influxdb-relay</code> and the multiplexing might still hold, please sync up with the latest release docs before jumping into some serious system&nbsp;design.</em></p> <hr> <p>Currently, from version 0.9, you cannot create an InfluxDB cluster from the open-sourced free edition. Only commercially available InfluxDB Enterprise can do that for now. That stirred up the early-adopter enthusiast users, especially for their usage in professional setups. They complained that InfluxData, the company behind InfluxDB, is trying to milk the <span class="caps">OSS</span> solution for&nbsp;profit.</p> <p><img alt="Archiving isn't easy ... tobias-fischer-PkbZahEG2Ng" src="https://i.imgur.com/0IdYOYnl.jpg"></p> <p>I can&rsquo;t blame the InfluxData guys much, as they got to pay their bills too. So far, we — the users of open-source systems — couldn&rsquo;t show much promise about the financial realities of the projects. Continuing development of <span class="caps">OSS</span> products, by only depending on donations, patrons, or enterprise sponsorship, is far too rare and unpredictable, even for the projects that many successful organizations heavily rely&nbsp;on.</p> <p>Anyways, InfluxDB then promised and later introduced <code>Influx Relay</code> as a complimentary consolation for missing <span class="caps">HA</span> parts of InfluxDB. You can get the details here and here about&nbsp;that. </p> <h2 id="premise">Premise<a class="headerlink" href="#premise" title="Permanent link">&para;</a></h2> <p>For my needs, I have to try to create a reliable <span class="caps">HA</span>(High-Availability) setup from available free options, hence InfluxDB and the relay. It&rsquo;s quite a bit far from an InfluxDB-cluster in terms of robustness or ease of setup, but it&rsquo;s got the job done, at least for&nbsp;me.</p> <p>I needed a setup to receive system-stats from at least 500+ instances and to store them for a while, but without breaking the bank in bills from <span class="caps">AWS</span>. Meaning, I could ask for and could use only couple of instances for my&nbsp;solution.</p> <p>Here were my&nbsp;trade-offs.</p> <ul> <li>Not too many instances for this purpose. Neither, any of the heavyweight lifters e.g. <span class="caps">AWS</span>&rsquo; m3-xlarge etc. To use only what&rsquo;s&nbsp;necessary. </li> <li>To satisfy the budget, hence avoiding pay-per-use solutions as far as it is&nbsp;possible.</li> <li>Solutions must not be crazy complex, so that handover to the DevOps team be&nbsp;smooth.</li> <li>Reading the data would be too rarely w.r.t. writing. The related Grafana dashboards will be only used to investigate issues by a handful of&nbsp;people.</li> </ul> <h2 id="overall-design">Overall Design<a class="headerlink" href="#overall-design" title="Permanent link">&para;</a></h2> <h3 id="write">Write<a class="headerlink" href="#write" title="Permanent link">&para;</a></h3> <p>From a birds&rsquo; eye view, I decided to use two server instances to run parallelly, hosting InfluxDB on them independently and then sending the same data over to them for storing. This scheme mostly looks like <a href="https://en.wikipedia.org/wiki/Standard_RAID_levels#RAID_1"><span class="caps">RAID</span>-1 systems</a>.</p> <p><img alt="Overall architecture" src="https://i.imgur.com/ZKYIyOd.png"></p> <p>That brings up a couple of&nbsp;challenges.</p> <ul> <li> <p>None of the agents I used on the sender side could multiplex output. That means, they were able to send data to a single destination, not multiple. On the Windows front, I&rsquo;ve used <code>Telegraf</code> which is able randomly to switch between pre-listed destinations, but <span class="caps">NOT</span> multiple at-once.<br> In the case of Linux hosts, I used <code>Netdata</code> which is excellent in its own right, but unable to send stats to multiple destinations.<br> Here comes <code>Influx-relay</code>. It can receive time-series data-stream from hosts on a <span class="caps">TCP</span> or <span class="caps">UDP</span> port, buffer for a while, and then re-send those received and buffered data to multiple receive ends which can either be an InfluxDB instance or another listening Influx-relay instances.<br> This chaining can broaden the relaying scheme even further. However, for my purpose, this relay-chaining was not necessary. Rather, from the relay, I am sending data to the separate InfluxDB instances, running on two separate&nbsp;instances. </p> </li> <li> <p>Now that I partially multiplexed the output, my hosts (senders) still are able to send to one destination. So, I need a proxy as well as a load-balancer. For a while, I was torn between <span class="caps">NGINX</span> and HAProxy. Both were new to&nbsp;me. </p> </li> </ul> <p>However, for a couple of reasons, I went for HAProxy. Firstly, I don&rsquo;t need <span class="caps">HTTP</span> session management. Secondly, as I wanted to keep my <span class="caps">UDP</span> for later, HAProxy was perfectly capable of that.<br> <span class="caps">NGINX</span> has the support recently, but the maturity was a concern. Also, configuring <span class="caps">NGINX</span> seems a little intimidating (which I know might not be so true). Last but not least, and for what it&rsquo;s worth, out-of-the-box, HAProxy&rsquo;s stat page carries much more in-depth information than that of free-version of <span class="caps">NGINX</span>.<br> Upon receiving the stats stream, HAProxy was supposed to send that to different Influx-relays in a load-balanced&nbsp;fashion.</p> <p>So, here&rsquo;s my rough&nbsp;plan. </p> <p>collector-agent &rarr; HAProxy &rarr; (50/50 load-balanced) &rarr; Influx-relay &rarr; (multiplexed) &rarr; 2 InfluxDB&nbsp;instances</p> <p>Now, each one of the received data is to go to both of the InfluxDB instances, or at least to one in case of failure (or, overload per se) of any the relays or Influx instances.   Also, I have chosen to keep Influx-relays deployed as Dockerized and kept HAProxy and InfluxDB instances running as native services. Of course, you can Dockerize HAProxy and InfluxDB,&nbsp;too. </p> <h3 id="read">Read<a class="headerlink" href="#read" title="Permanent link">&para;</a></h3> <p>As I&rsquo;ve already noted in the section that reading the data, meaning to fetch data to visualize on Grafana end, will happen rarely and sporadically; only to investigate alarms or any other client-side performance&nbsp;issues. </p> <p>So, the read requests, reaching the HAProxy end, needed not much routing, other than directly to InfluxDB itself. Still, to better distribute the load I decided to load-balance it 50/50&nbsp;basis.</p> <h3 id="ports">Ports<a class="headerlink" href="#ports" title="Permanent link">&para;</a></h3> <ul> <li>As all the <span class="caps">READ</span> requests are routed through <code>HAProxy</code> running on each of the instances, to the external world only HAProxy&rsquo;s port should be opened for this&nbsp;purpose. </li> <li>On the other hand, for <span class="caps">WRITE</span> requests, InfluxDBs are receiving data from relays, one of its own instance and another one on other instance, so InfluxDB should listen on its own port for <span class="caps">WRITE</span> requests only. But, this must be accessible only from own <span class="caps">VPS</span> zone, but not open to the outside&nbsp;world.</li> <li>In case of HAProxy as well as InfluxDB, you can use the default ports, obviously, which is 8086 <span class="amp">&amp;</span> 8088 respectively. Or, you can choose to go for other ports (security through obfuscation). Your call. In this writing, I&rsquo;ll go with the&nbsp;defaults.</li> </ul> <h3 id="authentication-ssl">Authentication, <span class="caps">SSL</span><a class="headerlink" href="#authentication-ssl" title="Permanent link">&para;</a></h3> <p>You can configure <span class="caps">SSL</span> with your own server certificates through the HAProxy configs. You can even go for <span class="caps">SSL</span> from the relays to InfluxDB writes. If your sender hosts are connecting to your HAProxy through public internet, you should at least go for password-based authentication, better to utilize <span class="caps">SSL</span>. However, for brevity&rsquo;s sake, I&rsquo;ll skip them in this&nbsp;post.</p> <p>**Note: * Please bear in mind, this is an &ldquo;in-progress&rdquo; post; prematurely published to force me to work on it. I have the plan to add all the necessary configurations <span class="amp">&amp;</span> commands, that I used,&nbsp;here.</p>Generate ER diagram from a SQL-based database2014-12-18T00:00:00+06:002014-12-18T00:00:00+06:00Khaled Monsoortag:blog.kmonsoor.com,2014-12-18:/generate-er-diagram-from-sql-database/<p>When you are &ldquo;study&rdquo;-ing someone else&rsquo;s database with 300+ tables. It&rsquo;s like spaghetti, but not enjoyable. Rather,&nbsp;horrific.</p><p>When you are &ldquo;study&rdquo;-ing (for whatever reason) someone else&rsquo;s database, and the database has more than 20 tables, you might be in trouble to understand what&rsquo;s going&nbsp;where.</p> <p>Now, imagine a database with 300+ tables. It&rsquo;s like spaghetti, just not as&nbsp;enjoyable.</p> <p>I faced a similar challenge recently with a database of 250+ tables. Yes, I felt like in a deep sh*t. And, I started looking for tools that can describe the tables, or at least a decent <span class="caps">ER</span> diagram. If anything more, better. And, preferably&nbsp;free.</p> <p>Then, I found <a href="http://schemaspy.sourceforge.net/">SchemaSpy</a>, originally authored by <a href="https://sites.google.com/site/johncurrier/">John Currier</a>. It generates a complete in-depth <span class="caps">HTML</span>-based description (of course, including clickable <span class="caps">ER</span>-diagram) of the database, which you can then browse with your browser. This post is about its primary&nbsp;usage.</p> <p>The output will be kind of like this: <img alt="schemapy output sample" src="https://i.imgur.com/K1yYBID.png"></p> <p>It is based on Java, but it can work its magic on most of the major database products. However, it would require an appropriate <span class="caps">JDBC</span> connector for that&nbsp;database.</p> <p>Quote from the&nbsp;author:</p> <blockquote> <p><code>SchemaSpy</code> uses <span class="caps">JDBC</span>&rsquo;s database metadata extraction services to gather the majority of its information but has to make vendor-specific <span class="caps">SQL</span> queries to gather some information such as the <span class="caps">SQL</span> associated with a view and the details of check&nbsp;constraints.</p> </blockquote> <p>In this post, as an example, I have shown how to use it with PostgreSQL. But, it&rsquo;s not the only one that&rsquo;s supported. You can use it with any proper <span class="caps">RDBMS</span> system as long as it has a <span class="caps">JDBC</span>-connector. Now, to use it, you need to get these&nbsp;stuffs.</p> <ul> <li><strong>First of all</strong>, your system should have <strong>Java runtime</strong> properly installed. <a href="https://adoptopenjdk.net/">Download from&nbsp;here.</a></li> <li><strong>SchemaSpy, which is a .jar file</strong>. <a href="https://sourceforge.net/projects/schemaspy/files/">Get it here</a>. At the time of writing, it was version&nbsp;5.0.0.</li> <li><strong><span class="caps">JDBC</span> connector to PostgreSQL</strong>. Make sure to match your PostgreSQL version. You can <a href="https://jdbc.postgresql.org/download.html">download it from here</a>.<br> You can check your PostgreSQL version by executing: <code>SELECT version();</code> query on <strong><em>psql</em></strong>&nbsp;prompt.</li> <li>Also, SchemaSpy depends on <strong>GarphViz</strong> to generate the <span class="caps">ER</span> diagrams, so you need to be installed it on your system. Get it from here.(<a href="http://www.graphviz.org/Download..php">http://www.graphviz.org/Download..php</a>)</li> <li>And, of course, make sure the target database instance is running <span class="amp">&amp;</span> serving the database you are trying to&nbsp;visualize.</li> </ul> <p>Quoting from the&nbsp;author:</p> <blockquote> <p>SchemaSpy uses the dot executable from <a href="http://www.graphviz.org/">Graphviz</a> to generate graphical representations of the table/view relationships. The visual representation of the connections is a fundamental feature of the&nbsp;tool. </p> <p>Graphviz is not required to view the output generated by SchemaSpy, but <strong>the dot program should be in your <span class="caps">PATH</span></strong> (not <span class="caps">CLASSPATH</span>) when running SchemaSpy, or none of the entity-relationship diagrams will be generated. Or, maybe <a href="http://schemaspy.sourceforge.net/#gvparam">use the <code>-gv</code></a>&nbsp;option).</p> </blockquote> <p>I kept the .jar files (both the <span class="caps">JDBC</span>-connector and the <code>SchemaSpy</code>) in my home folder for convenience.<br> Now, in my case, my <span class="caps">OS</span> is Linux, and the database is hosted locally; hence address is <code>127.0.0.1</code>, running <code>PostgreSQL-9.3</code> at port <code>5432</code>. So, I run the command like&nbsp;this:</p> <div class="highlight"><pre><span></span><code><span class="linenos" data-linenos="1 "></span> $ java -jar ./schemaSpy_5.0.0.jar -t pgsql -host 127.0.0.1:5432 -db your_database_name \ <span class="linenos" data-linenos="2 "></span> -u your_DB_user_name -p your_password -s public \ <span class="linenos" data-linenos="3 "></span> -dp ./postgresql-9.3-1102.jdbc3.jar \ <span class="linenos" data-linenos="4 "></span> -o output_folder </code></pre></div> <p>It may take a little while, depending on the size of the schema of the database.<br> After that, you will find the output folder/directory named <code>output_folder</code>.<br> You&rsquo;ll see some output when the magic is going on, similar to this&nbsp;below.</p> <div class="highlight"><pre><span></span><code><span class="linenos" data-linenos="1 "></span>Using database properties: [./schemaSpy_5.0.0.jar]/net/sourceforge/schemaspy/dbTypes/pgsql.properties <span class="linenos" data-linenos="2 "></span>Gathering schema details....................(6sec) <span class="linenos" data-linenos="3 "></span>Writing/graphing summary....................(2sec) <span class="linenos" data-linenos="4 "></span>Writing/diagramming detail..................(31sec) <span class="linenos" data-linenos="5 "></span>Wrote relationship details of 113 tables/views to directory &#39;output&#39; in 41 seconds. <span class="linenos" data-linenos="6 "></span> <span class="linenos" data-linenos="7 "></span>View the results by opening output_folder/index.html </code></pre></div> <p>Now, all the generated files are in the <code>output_folder</code>. Start your journey by starting from the <code>index.html</code> in the output folder. Open it by using any&nbsp;browser.</p> <p>Good luck.&nbsp;:)</p> <hr> <p>If you find this post helpful, you can show your support <a href="https://www.patreon.com/kmonsoor">through Patreon</a> or by <a href="https://ko-fi.com/kmonsoor">buying me a coffee</a>. <em>Thanks!</em></p>