I think ... - programminghttps://blog.kmonsoor.com/2016-02-08T08:02:00+06:00R vs Python — A Summary Comparison2016-02-08T08:02:00+06:002016-02-08T08:02:00+06:00Khaled Monsoortag:blog.kmonsoor.com,2016-02-08:/R-vs-Python/<p>A tabular, objective, comparative summary view between R and Python programming languages. Facts&nbsp;matters.</p><table> <thead> <tr> <th>..</th> <th>R</th> <th>Python</th> </tr> </thead> <tbody> <tr> <td>Home</td> <td><a href="http://r-project.org">http://r-project.org</a></td> <td><a href="http://python.org">http://python.org</a></td> </tr> <tr> <td><a href="http://githut.info/">Birth</a></td> <td>1993</td> <td>1991</td> </tr> <tr> <td>Designer</td> <td>Ross Ihaka, Robert Gentleman</td> <td>Guido van Rossum</td> </tr> <tr> <td>Package count</td> <td>7,798 (<span class="caps">CRAN</span>)</td> <td>73,402 (PyPI)</td> </tr> <tr> <td>Purpose</td> <td>statistical computing</td> <td>General purpose</td> </tr> <tr> <td>Paradigm</td> <td>array, object-oriented, imperative, functional, procedural, reflective</td> <td>object-oriented, imperative, functional, procedural, reflective</td> </tr> <tr> <td>License</td> <td><span class="caps">GPL</span></td> <td><span class="caps">MIT</span></td> </tr> <tr> <td>Platform</td> <td>cross-platform</td> <td>cross-platform</td> </tr> <tr> <td>Implementations</td> <td>Main, <a href="https://github.com/bedatadriven/renjin">Rejnin</a>, <a href="https://bitbucket.org/allr/fastr">FastR</a></td> <td><a href="https://github.com/python/cpython">cPython</a>, <a href="https://bitbucket.org/pypy/pypy">PyPy</a>, <a href="http://www.jython.org">Jython</a>, <a href="https://github.com/IronLanguages/main/wiki">IronPython</a></td> </tr> <tr> <td><a href="http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html"><span class="caps">TIOBE</span><sup><span class="caps">TM</span></sup> index</a></td> <td>19</td> <td>5</td> </tr> <tr> <td>Dev. <span class="caps">IDE</span></td> <td>R Studio, Tinn-R, Architect</td> <td>PyCharm, IPython/Jupyter, Spyder, Komodo <span class="caps">IDE</span></td> </tr> <tr> <td>Graphics support</td> <td>built-in, ggplot2, lattice, Vistat, googleVis</td> <td>matplotlib, Seaborn, Plotly, Bokeh</td> </tr> <tr> <td>Machine-learning</td> <td>e1071, rpart, igraph, nnet, kernlab, caret</td> <td>scikit-learn, PyML, PyMC, Shogun</td> </tr> <tr> <td>Deep-Learning</td> <td>H<sub>2</sub>O, Darch</td> <td>TensorFlow, Theano, Decaf, <span class="caps">PDNN</span>, Keras, Caffe</td> </tr> <tr> <td>Image-Analysis</td> <td>Raster, EBImage, imager</td> <td>scikit-image, OpenCV, scipy.ndimage</td> </tr> <tr> <td><a href="http://githut.info/">Github project count</a></td> <td>34,268</td> <td>164,852</td> </tr> <tr> <td>Latest release</td> <td>2015-12-10 (Main: R-3.2.3)</td> <td>2015-12-21 (cPython: Python 3.4.4)</td> </tr> </tbody> </table> <p><strong><span class="caps">N.B.</span></strong> * Time-dependent data are up-to-date as of: <strong>2016-02-08</strong>&nbsp;* </p> <p>&hellip;</p>Se7en Deadly Sins to Do in Python code2015-08-10T17:05:00+06:002015-11-21T00:05:00+06:00Khaled Monsoortag:blog.kmonsoor.com,2015-08-10:/seven-deadly-sins-in-python-code/<p>There are a lot of ways someone can make his (or her) Python code extremely difficult for himself and his fellow developers to work with and maintain. However, some are quite destructive by virtue. These ones are in my&nbsp;top-list.</p><h3 id="prelude">Prelude<a class="headerlink" href="#prelude" title="Permanent link">&para;</a></h3> <p>I have used the word &ldquo;deadly&rdquo; to express the potential to diminish the productivity of a Python programmer or his fellow teammate(s) who will work on the same code. Please take all these with quite a bit of salt, due to my limited expertise <span class="amp">&amp;</span> limited experience with different types of projects based on&nbsp;Python.*</p> <p><em>7</em> is just a catchy number. And, of course, this top list is subject to change along with my experience.<br> You are also most welcome to suggest your own-finding to make into this&nbsp;list.</p> <p>There are a lot of ways someone can make his (or her) Python code extremely difficult for himself and his fellow developers to work with and maintain. However, some are quite destructive by virtue. These ones are in my&nbsp;top-list.</p> <h3 id="1-the-try-except-pass-trio"><strong>1. The <code>try: except: pass</code> trio</strong><a class="headerlink" href="#1-the-try-except-pass-trio" title="Permanent link">&para;</a></h3> <p>You know about design patterns, right ? At least, you know a little&nbsp;bit. </p> <p>From <a href="https://en.wikipedia.org/wiki/Software_design_pattern">Wikipedia</a>,</p> <blockquote> <p>Design patterns can speed up the development process by providing tested, proven development&nbsp;paradigms.</p> <p>Effective software design requires considering issues that may not become visible until later in the implementation. Reusing design patterns helps to prevent subtle issues that can cause major problems, and it also improves code readability for coders and architects who are familiar with the&nbsp;patterns.</p> </blockquote> <p>Now, think of the complete opposite of design-pattern. It is called <em>anti-pattern</em> which silently &ldquo;destroys&rdquo; efficiency in code. The below pattern can be considered the most deadly anti-pattern in Python code.<br> <a href="http://redsymbol.net/">Aaron Maxwell</a> called it <a href="https://realpython.com/blog/python/the-most-diabolical-python-antipattern/">most diabolical</a> or &ldquo;evil&rdquo;&nbsp;anti-pattern.</p> <div class="highlight"><pre><span></span><code><span class="linenos" data-linenos="1 "></span><span class="k">try</span><span class="p">:</span> <span class="linenos" data-linenos="2 "></span> <span class="n">subtle_buggy_operation</span><span class="p">()</span> <span class="c1"># possibly with I/O or DB operation</span> <span class="linenos" data-linenos="3 "></span><span class="k">except</span><span class="p">:</span> <span class="linenos" data-linenos="4 "></span> <span class="k">pass</span> </code></pre></div> <p>You thought to save some development time by &ldquo;pass&rdquo;ing them by. But, it will take hours, if not days, to find possible bugs, inside the block, later as all the exceptions are masked by the &ldquo;pass&rdquo; and the error location will be somewhere else outside this <code>try:except</code> block which may look like the most innocent&nbsp;code.</p> <p>Again, quoting from Aaron&nbsp;&hellip;</p> <blockquote> <p>In my nearly ten years of experience writing applications in Python, both individually and as part of a team, this pattern has stood out as the single greatest drain on developer productivity and application reliability, especially over the long&nbsp;term.</p> </blockquote> <h3 id="2-wildcard-imports-ie-from-module-import">*<em>2. Wildcard imports i.e. <code>from module import *</code> *</em><a class="headerlink" href="#2-wildcard-imports-ie-from-module-import" title="Permanent link">&para;</a></h3> <p>This one single practice can render a nice (clean) module into a nightmare. According to a core Python developer <a href="http://python.net/~goodger/projects/pycon/2007/idiomatic/handout.html#importing">David Goodger</a>,</p> <blockquote> <p>Wild-card imports are from the dark side of&nbsp;Python.</p> <p><strong>Never!</strong></p> <p>The <code>from module import *</code> wild-card style leads to namespace pollution. You&rsquo;ll get things in your local namespace that you didn&rsquo;t expect to get. You may see imported names obscuring module-defined local names. You won&rsquo;t be able to figure out where certain names come from. Although a convenient shortcut, this should not be in production&nbsp;code.</p> <p><strong>Moral:</strong> don&rsquo;t use wild-card&nbsp;imports!</p> </blockquote> <p>Also, in light of <a href="http://www.yodaquotes.net/">Yoda&rsquo;s mythical conversation</a>s, David&nbsp;writes:</p> <blockquote> <p><span class="caps">LUKE</span>: Is from module import * better than explicit imports?<br> <span class="caps">YODA</span>: No, not better. Quicker, easier, more seductive.<br> <span class="caps">LUKE</span>: But how will I know why explicit imports are better than the wild-card form?<br> <span class="caps">YODA</span>: Know you will when your code you try to read six months from&nbsp;now. </p> </blockquote> <p>If you use this practice in between inter-connected modules in a mid-sized project, worry not. You&rsquo;ll start to get errors due to circular references soon&nbsp;enough.</p> <p>Sounds funny&nbsp;?</p> <h3 id="3-thinking-that-tryexceptelse-construct-is-not-a-natural-control-flow-in-python"><strong>3. Thinking that <code>try:except:else</code> construct is not a natural control flow in Python</strong><a class="headerlink" href="#3-thinking-that-tryexceptelse-construct-is-not-a-natural-control-flow-in-python" title="Permanent link">&para;</a></h3> <p>If you are coming from Java(or, similar) world, I understand your confusion. However, Python adopted this construct so much different than Java. It helps to realize Python&rsquo;s philosophy <a href="https://docs.python.org/2/glossary.html#term-eafp">Ask for Forgiveness than Permission</a>, aka &ldquo;<span class="caps">EAFP</span>&nbsp;paradigm&rdquo;.</p> <p>Trying to avoid this will result in messy, unpythonic code. As this <a href="http://stackoverflow.com/a/16138864/617185">great answer on StackOverflow</a>, by a core Python developer, Raymond Hettinger, on this matter where he nicely portrays the philosophy behind&nbsp;it.</p> <p>Quoting him&nbsp;:</p> <blockquote> <p>In the Python world, using exceptions for flow control is common and normal. Even the Python core developers use exceptions for flow-control and that style is heavily baked into the language (i.e. the iterator protocol uses <em>StopIteration</em> to signal loop termination). In addition, the try-except-style is used to prevent the race-conditions inherent in some of the &ldquo;look-before-you-leap&rdquo;&nbsp;constructs. </p> <p>For example, testing <code>os.path.exists</code> results in information that may be out-of-date by the time you use it. Likewise, <code>Queue.full</code> returns information that may be stale. The <code>try:except:else</code> style will produce more reliable code in these cases. In some other languages, that rule reflects their cultural norms as reflected in their libraries. The &ldquo;rule&rdquo; is also based in-part on performance considerations for those&nbsp;languages.</p> </blockquote> <p>Also, consider checking out <a href="http://stackoverflow.com/a/180974/617185">this Q&amp;A on StackOverflow</a> on the same&nbsp;premise.</p> <h3 id="4-making-everything-a-class-aka-overusing-classes"><strong>4. Making everything a Class aka Overusing classes</strong><a class="headerlink" href="#4-making-everything-a-class-aka-overusing-classes" title="Permanent link">&para;</a></h3> <p>What I am referring to is <a href="https://www.youtube.com/watch?v=o9pEzgHorH0">this talk by Jack Diederich</a> on PyCon 2012. You should watch this couple of times and then once in every week.<br> His summary is like &hellip; <strong>Stop</strong> creating classes, and modules in every now and then. Before creating one, think hard. Probably, what you need is writing just a&nbsp;function.</p> <p><a href="https://www.python.org/dev/peps/pep-0020/">Zen of Python</a> described it as below. Read it again, again, and&nbsp;again.</p> <blockquote> <ul> <li>Beautiful is better than&nbsp;ugly.</li> <li>Explicit is better than&nbsp;implicit.</li> <li>Simple is better than&nbsp;complex.</li> <li>Flat is better than&nbsp;nested.</li> <li>Readability&nbsp;counts.</li> <li>If the implementation is hard to explain, it&rsquo;s a bad&nbsp;idea.</li> </ul> </blockquote> <p>Though the below is a perfectly valid class, it is a perfect example case of b***sh*t&nbsp;classes:</p> <div class="highlight"><pre><span></span><code><span class="linenos" data-linenos="1 "></span><span class="k">class</span> <span class="nc">Greeting</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span> <span class="linenos" data-linenos="2 "></span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">greeting</span><span class="o">=</span><span class="s1">&#39;hello&#39;</span><span class="p">):</span> <span class="linenos" data-linenos="3 "></span> <span class="bp">self</span><span class="o">.</span><span class="n">greeting</span> <span class="o">=</span> <span class="n">greeting</span> <span class="linenos" data-linenos="4 "></span> <span class="linenos" data-linenos="5 "></span> <span class="k">def</span> <span class="nf">greet</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">):</span> <span class="linenos" data-linenos="6 "></span> <span class="k">return</span> <span class="s1">&#39;</span><span class="si">%s</span><span class="s1">! </span><span class="si">%s</span><span class="s1">&#39;</span> <span class="o">%</span> <span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">greeting</span><span class="p">,</span> <span class="n">name</span><span class="p">)</span> <span class="linenos" data-linenos="7 "></span> <span class="linenos" data-linenos="8 "></span><span class="n">greeting</span> <span class="o">=</span> <span class="n">Greeting</span><span class="p">(</span><span class="s1">&#39;hola&#39;</span><span class="p">)</span> <span class="linenos" data-linenos="9 "></span><span class="nb">print</span><span class="p">(</span><span class="n">greeting</span><span class="o">.</span><span class="n">greet</span><span class="p">(</span><span class="s1">&#39;bob&#39;</span><span class="p">))</span> </code></pre></div> <p>It is doing exactly same&nbsp;as:</p> <div class="highlight"><pre><span></span><code><span class="linenos" data-linenos="1 "></span><span class="k">def</span> <span class="nf">greet</span><span class="p">(</span><span class="n">greeting</span><span class="p">,</span> <span class="n">target</span><span class="p">):</span> <span class="linenos" data-linenos="2 "></span> <span class="k">return</span> <span class="s1">&#39;</span><span class="si">%s</span><span class="s1">! </span><span class="si">%s</span><span class="s1">&#39;</span> <span class="o">%</span> <span class="p">(</span><span class="n">greeting</span><span class="p">,</span> <span class="n">target</span><span class="p">)</span> </code></pre></div> <p>He also showed a practical example how he simplified (aka re-factored) an <span class="caps">API</span>&rsquo;s complete code,&nbsp;consisting:</p> <ul> <li>1 Package, 22&nbsp;Modules,</li> <li>20&nbsp;Classes,</li> <li>660 Source Lines of&nbsp;Code</li> </ul> <p>into this below, a grand total of 8 lines. Yes, just 8 lines&nbsp;!!!</p> <div class="highlight"><pre><span></span><code><span class="linenos" data-linenos="1 "></span><span class="n">MUFFIN_API</span> <span class="o">=</span> <span class="n">url</span><span class="o">=</span><span class="s1">&#39;https://api.wbsrvc.com/</span><span class="si">%s</span><span class="s1">/</span><span class="si">%s</span><span class="s1">/&#39;</span> <span class="linenos" data-linenos="2 "></span><span class="n">MUFFIN_API_KEY</span> <span class="o">=</span> <span class="s1">&#39;SECRET-API-KEY&#39;</span> <span class="linenos" data-linenos="3 "></span> <span class="linenos" data-linenos="4 "></span><span class="k">def</span> <span class="nf">request</span><span class="p">(</span><span class="n">noun</span><span class="p">,</span> <span class="n">verb</span><span class="p">,</span> <span class="o">**</span><span class="n">params</span><span class="p">):</span> <span class="linenos" data-linenos="5 "></span> <span class="n">headers</span> <span class="o">=</span> <span class="p">{</span><span class="s1">&#39;apikey&#39;</span> <span class="p">:</span> <span class="n">MUFFIN_API_KEY</span><span class="p">}</span> <span class="linenos" data-linenos="6 "></span> <span class="n">request</span> <span class="o">=</span> <span class="n">urllib2</span><span class="o">.</span><span class="n">Request</span><span class="p">(</span><span class="n">MUFFIN_API</span> <span class="o">%</span> <span class="p">(</span><span class="n">noun</span><span class="p">,</span> <span class="n">verb</span><span class="p">),</span> \ <span class="linenos" data-linenos="7 "></span> <span class="n">urllib</span><span class="o">.</span><span class="n">urlencode</span><span class="p">(</span><span class="n">params</span><span class="p">),</span> <span class="n">headers</span><span class="p">)</span> <span class="linenos" data-linenos="8 "></span> <span class="k">return</span> <span class="n">json</span><span class="o">.</span><span class="n">loads</span><span class="p">(</span><span class="n">urllib2</span><span class="o">.</span><span class="n">urlopen</span><span class="p">(</span><span class="n">request</span><span class="p">)</span><span class="o">.</span><span class="n">read</span><span class="p">())</span> </code></pre></div> <p><strong>Moral:</strong></p> <ul> <li>Stop re-inventing the&nbsp;wheel,</li> <li>use more of built-in library&nbsp;functions,</li> <li>use much-less own long chains of&nbsp;class-hierarchy.</li> </ul> <p>Still want to see a worst scenario of creating classes? Check this&nbsp;out:</p> <div class="highlight"><pre><span></span><code><span class="linenos" data-linenos="1 "></span><span class="k">class</span> <span class="nc">Flow</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span> <span class="linenos" data-linenos="2 "></span> <span class="sd">&quot;&quot;&quot;Base class for all Flow objects.&quot;&quot;&quot;</span> <span class="linenos" data-linenos="3 "></span> <span class="k">pass</span> <span class="linenos" data-linenos="4 "></span> <span class="linenos" data-linenos="5 "></span><span class="k">class</span> <span class="nc">Storage</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span> <span class="linenos" data-linenos="6 "></span> <span class="k">def</span> <span class="nf">put</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">):</span> <span class="n">_abstract</span><span class="p">()</span> <span class="linenos" data-linenos="7 "></span> <span class="k">def</span> <span class="nf">get</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="n">_abstract</span><span class="p">()</span> <span class="linenos" data-linenos="8 "></span> <span class="linenos" data-linenos="9 "></span><span class="k">def</span> <span class="nf">_abstract</span><span class="p">():</span> <span class="k">raise</span> <span class="ne">NotImplementedError</span> </code></pre></div> <p>Yes, this is a real piece of code from Google <span class="caps">API</span> client code. (which, in total, has <em>10,000 <span class="caps">SLOC</span>, 115 modules, 207 classes</em>). Whereas <a href="https://github.com/jackdied/python-foauth2">someone did implemented the same</a>, well maybe not extremely robust, but in <em>135 <span class="caps">SLOC</span>, 3 classes</em> in&nbsp;total.</p> <p>You see the point, right? Guido did. <a href="https://plus.google.com/+JackDiederich/posts/iPiqWHjwcf3">Check his&nbsp;comment.</a></p> <p><img alt="guido-google-comment" src="https://blog.kmonsoor.com/images/articles/guido-google-comment.jpg"></p> <h3 id="5-saving-time-by-not-writing-any-documentation-or-inline-comments"><strong>5. Saving time by not writing any documentation or inline comments</strong><a class="headerlink" href="#5-saving-time-by-not-writing-any-documentation-or-inline-comments" title="Permanent link">&para;</a></h3> <p>If you don&rsquo;t write comments with your semi-obfuscated code, and no docstrings as well saving time and meeting deadlines, stay assure that within a short period you&rsquo;ll hate yourself when you will not remember what (&amp; why) you did something while reading your own&nbsp;code.</p> <p>Today or tomorrow, you will leave the company. And, that code will haunt all the members of your team who will come across this code-like zombies; unless they totally cut-off-the-head(e.g. replace) of your&nbsp;code.</p> <p>There is just no excuse that you don&rsquo;t do &ldquo;documentation&rdquo; except you just don&rsquo;t care. If you would care, you would not only write minimal doc-strings and comments on complex code-sections, but also name your functions, methods, variables to reflect the purpose of the component to make them&nbsp;&ldquo;self-documented&rdquo;.</p> <p>Here is a nice guide to properly <a href="http://docs.python-guide.org/en/latest/writing/documentation/">documenting your Python&nbsp;code.</a></p> <p>However, there will still be deniers out there&nbsp;&hellip;</p> <p><img alt="code-quality" src="http://i.imgur.com/Fzb8epA.png"> * source: <a href="https://xkcd.com/1513/">https://xkcd.com/1513/</a>&nbsp;*</p> <h3 id="6-avoiding-unit-tests-and-doc-tests-until-the-doomsday-comes"><strong>6. Avoiding Unit-tests (and doc-tests) until the doomsday comes</strong><a class="headerlink" href="#6-avoiding-unit-tests-and-doc-tests-until-the-doomsday-comes" title="Permanent link">&para;</a></h3> <p>Yes, the judgement day will come.<br> It will happen on the production server, with customer&rsquo;s downtime due to a &ldquo;completely&rdquo; manually-tested new feature, which will break something &ldquo;almost&rdquo;&nbsp;unrelated.</p> <p>Yes, your company can lose millions and <a href="http://dougseven.com/2014/04/17/knightmare-a-devops-cautionary-tale/">can be out of business.</a> Maybe after some sleep-less night of the development team, the &ldquo;bug&rdquo; would have found&nbsp;out.</p> <p>Maybe, this whole mess could be simply avoided if the developer wrote his/her modules&rsquo; <a href="https://docs.python.org/2/library/unittest.html">unit-test</a> as well as <a href="https://docs.python.org/2/library/doctest.html">doctests</a> for the functions or methods. And, after implementing the feature he would have run the tests once across the project. The online book Dive-in-Python has an excellent introduction on <code>unittest</code>. Also, you can start with <a href="http://docs.python-guide.org/en/latest/writing/tests/#unittest">Hitchhiker&rsquo;s guide&rsquo;s introduction</a>.</p> <h3 id="7-mixing-tab-and-space-in-the-same-file"><strong>7. Mixing <code>TAB</code> and <code>SPACE</code> in the same file</strong><a class="headerlink" href="#7-mixing-tab-and-space-in-the-same-file" title="Permanent link">&para;</a></h3> <p>You will need no more reason to curse yourself just a while after. It will haunt you whenever you&rsquo;ll need to open the source-code in any editor other than your usual one. And, for others, &ldquo;oh my! I can&rsquo;t literally&nbsp;even&hellip;&rdquo;.</p> <p>While Python<strong>3</strong> will simply refuse to interpret this &ldquo;half-breed&rdquo; file, in Python<strong>2</strong>, the interpretation of <span class="caps">TAB</span> is as if it is converted to spaces using 8-space tab stops. So while executing, you may have no clue how a specific-line is being executed as part of which&nbsp;code-block.</p> <p>For any code that you think someday someone else will read or use, to avoid confusion, you should stick with <a href="http://legacy.python.org/dev/peps/pep-0008/#tabs-or-spaces"><span class="caps">PEP</span>-8</a>, or your team-specific coding style. <span class="caps">PEP</span>-8 strongly discourage mixing <span class="caps">TAB</span> and Space in a same&nbsp;file.</p> <p>Also, check out this <a href="http://programmers.stackexchange.com/a/197839/74557">Q&amp;A on&nbsp;StackExchange.</a></p> <blockquote> <p>​1. The first downside is that it quickly becomes a&nbsp;mess.</p> <p>&hellip; Formatting should be the task of the <span class="caps">IDE</span>. Developers have already enough work to care about the size of tabs, how much spaces will an <span class="caps">IDE</span> insert, etc. The code should be formatted correctly, and displayed correctly on other configurations, without forcing developers to think about&nbsp;it. </p> </blockquote> <p>Also, <a href="http://www.secnetix.de/olli/Python/block_indentation.hawk">remember&nbsp;this</a></p> <blockquote> <p>Furthermore, it can be a good idea to avoid tabs altogether, because the semantics of tabs are not very well-defined in the computer world, and they can be displayed completely differently on different types of systems and editors. <br> Also, tabs often get destroyed or wrongly converted during <em>copy-paste</em> operations, or when a piece of source code is inserted into a web page or other kind of markup&nbsp;code.</p> </blockquote> <h3 id="fin"><strong>Fin</strong><a class="headerlink" href="#fin" title="Permanent link">&para;</a></h3> <p>That&rsquo;s all for now. That&rsquo;s my list. This list hopes to evolve with my experience and expertise as well as the ever-changing collective wisdom of all the Python&nbsp;community.</p> <p>What&rsquo;s your take on the worst &ldquo;un-pythonic&rdquo; nightmares in Python code? Please feel free to share your&nbsp;2-cents.</p>Generate ER diagram from a SQL-based database2014-12-18T00:00:00+06:002014-12-18T00:00:00+06:00Khaled Monsoortag:blog.kmonsoor.com,2014-12-18:/generate-er-diagram-from-sql-database/<p>When you are &ldquo;study&rdquo;-ing someone else&rsquo;s database with 300+ tables. It&rsquo;s like spaghetti, but not enjoyable. Rather,&nbsp;horrific.</p><p>When you are &ldquo;study&rdquo;-ing (for whatever reason) someone else&rsquo;s database, and the database has more than 20 tables, you might be in trouble to understand what&rsquo;s going&nbsp;where.</p> <p>Now, imagine a database with 300+ tables. It&rsquo;s like spaghetti, just not as&nbsp;enjoyable.</p> <p>I faced a similar challenge recently with a database of 250+ tables. Yes, I felt like in a deep sh*t. And, I started looking for tools that can describe the tables, or at least a decent <span class="caps">ER</span> diagram. If anything more, better. And, preferably&nbsp;free.</p> <p>Then, I found <a href="http://schemaspy.sourceforge.net/">SchemaSpy</a>, originally authored by <a href="https://sites.google.com/site/johncurrier/">John Currier</a>. It generates a complete in-depth <span class="caps">HTML</span>-based description (of course, including clickable <span class="caps">ER</span>-diagram) of the database, which you can then browse with your browser. This post is about its primary&nbsp;usage.</p> <p>The output will be kind of like this: <img alt="schemapy output sample" src="https://i.imgur.com/K1yYBID.png"></p> <p>It is based on Java, but it can work its magic on most of the major database products. However, it would require an appropriate <span class="caps">JDBC</span> connector for that&nbsp;database.</p> <p>Quote from the&nbsp;author:</p> <blockquote> <p><code>SchemaSpy</code> uses <span class="caps">JDBC</span>&rsquo;s database metadata extraction services to gather the majority of its information but has to make vendor-specific <span class="caps">SQL</span> queries to gather some information such as the <span class="caps">SQL</span> associated with a view and the details of check&nbsp;constraints.</p> </blockquote> <p>In this post, as an example, I have shown how to use it with PostgreSQL. But, it&rsquo;s not the only one that&rsquo;s supported. You can use it with any proper <span class="caps">RDBMS</span> system as long as it has a <span class="caps">JDBC</span>-connector. Now, to use it, you need to get these&nbsp;stuffs.</p> <ul> <li><strong>First of all</strong>, your system should have <strong>Java runtime</strong> properly installed. <a href="https://adoptopenjdk.net/">Download from&nbsp;here.</a></li> <li><strong>SchemaSpy, which is a .jar file</strong>. <a href="https://sourceforge.net/projects/schemaspy/files/">Get it here</a>. At the time of writing, it was version&nbsp;5.0.0.</li> <li><strong><span class="caps">JDBC</span> connector to PostgreSQL</strong>. Make sure to match your PostgreSQL version. You can <a href="https://jdbc.postgresql.org/download.html">download it from here</a>.<br> You can check your PostgreSQL version by executing: <code>SELECT version();</code> query on <strong><em>psql</em></strong>&nbsp;prompt.</li> <li>Also, SchemaSpy depends on <strong>GarphViz</strong> to generate the <span class="caps">ER</span> diagrams, so you need to be installed it on your system. Get it from here.(<a href="http://www.graphviz.org/Download..php">http://www.graphviz.org/Download..php</a>)</li> <li>And, of course, make sure the target database instance is running <span class="amp">&amp;</span> serving the database you are trying to&nbsp;visualize.</li> </ul> <p>Quoting from the&nbsp;author:</p> <blockquote> <p>SchemaSpy uses the dot executable from <a href="http://www.graphviz.org/">Graphviz</a> to generate graphical representations of the table/view relationships. The visual representation of the connections is a fundamental feature of the&nbsp;tool. </p> <p>Graphviz is not required to view the output generated by SchemaSpy, but <strong>the dot program should be in your <span class="caps">PATH</span></strong> (not <span class="caps">CLASSPATH</span>) when running SchemaSpy, or none of the entity-relationship diagrams will be generated. Or, maybe <a href="http://schemaspy.sourceforge.net/#gvparam">use the <code>-gv</code></a>&nbsp;option).</p> </blockquote> <p>I kept the .jar files (both the <span class="caps">JDBC</span>-connector and the <code>SchemaSpy</code>) in my home folder for convenience.<br> Now, in my case, my <span class="caps">OS</span> is Linux, and the database is hosted locally; hence address is <code>127.0.0.1</code>, running <code>PostgreSQL-9.3</code> at port <code>5432</code>. So, I run the command like&nbsp;this:</p> <div class="highlight"><pre><span></span><code><span class="linenos" data-linenos="1 "></span> $ java -jar ./schemaSpy_5.0.0.jar -t pgsql -host 127.0.0.1:5432 -db your_database_name \ <span class="linenos" data-linenos="2 "></span> -u your_DB_user_name -p your_password -s public \ <span class="linenos" data-linenos="3 "></span> -dp ./postgresql-9.3-1102.jdbc3.jar \ <span class="linenos" data-linenos="4 "></span> -o output_folder </code></pre></div> <p>It may take a little while, depending on the size of the schema of the database.<br> After that, you will find the output folder/directory named <code>output_folder</code>.<br> You&rsquo;ll see some output when the magic is going on, similar to this&nbsp;below.</p> <div class="highlight"><pre><span></span><code><span class="linenos" data-linenos="1 "></span>Using database properties: [./schemaSpy_5.0.0.jar]/net/sourceforge/schemaspy/dbTypes/pgsql.properties <span class="linenos" data-linenos="2 "></span>Gathering schema details....................(6sec) <span class="linenos" data-linenos="3 "></span>Writing/graphing summary....................(2sec) <span class="linenos" data-linenos="4 "></span>Writing/diagramming detail..................(31sec) <span class="linenos" data-linenos="5 "></span>Wrote relationship details of 113 tables/views to directory &#39;output&#39; in 41 seconds. <span class="linenos" data-linenos="6 "></span> <span class="linenos" data-linenos="7 "></span>View the results by opening output_folder/index.html </code></pre></div> <p>Now, all the generated files are in the <code>output_folder</code>. Start your journey by starting from the <code>index.html</code> in the output folder. Open it by using any&nbsp;browser.</p> <p>Good luck.&nbsp;:)</p> <hr> <p>If you find this post helpful, you can show your support <a href="https://www.patreon.com/kmonsoor">through Patreon</a> or by <a href="https://ko-fi.com/kmonsoor">buying me a coffee</a>. <em>Thanks!</em></p>