Thursday, March 21, 2013

Libraries, Art, Math, & the Value of Failure

I am going to start in one place & end up somewhere else. Ready? Here we go.

Failure

There's a wonderful trend lately at library conferences of promoting open dialog around failure. I was first acquainted with this at the #drupalfail sessions held by LITA's Drupal Interest Group. Presenters would detail the various ways their projects crashed & burned, or merely did not meet expectations. With Drupal, this is particularly easy: it's a complex CMS, as powerful as it is enigmatic, & you have to be fairly experienced to successfully plan & implement a project with no hiccups along the way.

Related news flash: most librarians don't learn Drupal in library school, it's something they learn on the job, so there's a lot of intermediate failures before anyone gets close to something that vaguely resembles success. But Drupal isn't the only example of this trend: I heard good things about Code4Lib's "Fail4Lib" preconference. There have been scattered talks elsewhere discussing the need to create a culture where taking risks & occasionally failing is a welcome. It's certainly a necessary element of innovation.

What stands out about these failure sessions? They're useful. Knowing someone else's mistakes saves you immense amounts of time & often all you have to do is avoid something stupid to gain from it. As a technologist at a small library, I'm constantly bombarded with awesome things I can't use: they require money, or staff, or time, or expertise, or scale that we just don't have. It's cool to hear about Linked Data & Near-Field Communication; it's not something that would be a wise investment on my part. But when I hear someone say that creating a custom theme from scratch in Drupal is a waste of time relative to using a pre-built theme, I instantly am more prepared to do my job. Don't reinvent the wheel with theming, check. Lesson learned, time saved.

Art

When you consider the pedagogical value of failure, some weird issues arise. I had a unique undergraduate career in that I was trained in both the humanities (English) & formal sciences (Mathematics). You know what both of those fields happen to be utterly terrible at? Teaching failure. In math, when a theorem is superseded, it's simply not taught anymore. It might as well have never existed. I never had homework problems phrased "spot the problem with this theorem" or "hey Fermat was a dummy, can you tell why?" Mathematics ignores an entire mode of analysis. You become skilled at deductive reasoning & constructing your own theorem cabins from axiom Lincoln Logs; you never learn how to approach someone else's theoretical edifice other than simply assuming it's true because it's in the textbook.

English is also awful at admitting failure, in its own warped way. We read the classics, but not the failed classics. There are at least two kinds of failed classics: works which were highly regarded in their own time but grew irrelevant & works which were never highly regarded in any time. Either way, why are the canonical works more valued than the telling failures of their contemporaries? While Mathematics education's failure to teach non-deductive modes of logic is troubling, artistic prejudices are even more disturbing.

Everyone has, by now, heard "beauty is in the eye of the beholder." The aesthetic disciplines cling to this maxim as if it somehow places them outside the realm of objective inquiry, paradoxically able to pass judgment without recourse to supporting evidence. If this is true, why do we read Shakespeare? [1] Wouldn't any arbitrarily chosen text suffice, given that the text itself is irrelevant, it's the Beholder that matters? Of course, what you learn in English is that—to paraphrase George Orwell's Animal Farm—some Beholders are more equal than others. Your professors are Beholders, you as a student are but a Beholder-in-training, & art works are unimpeachable: they do not fail, they either go unmentioned or become canonical via mysterious means. Aesthetics masquerades as subjective judgment while never admitting its own folly or interrogating the social conditions that cause certain works to become canonical while others are summarily discarded.

Practicality

Doubtless there are objections that I'm conflating disparate fields. Mathematics is axiomatic logic, English is aesthetics, & the specific vein of librarianship I've mentioned is quite practical. These are library projects that failed & perhaps we cannot say a work of art or a theorem fails in any corresponding sense.

But don't give up on me so easily. Librarians are onto something here. We know that art fails. We don't purchase every book, we write harsh Goodreads reviews about books that didn't please our Beholder's eye. My earlier examples were from library technology events, events that skirt around the practice of programming if not engage it directly. & what is a failed program, or a bug in an algorithm, if not a flawed theorem? There are connections & they might even be meaningful. Otherwise I'm just way off base, a deranged squirrel collecting copper washers for the winter. I never was good at aesthetics as theory. I probably should avoid writing about it. But I'm pretty good at failing & perhaps I'll write more about that.

Footnotes

[1]^ The thing about Shakespeare: he's not a very good writer. He has flaws & they're the sort creative writing teachers spell out in red sharpie at the end of student plays: "Heavy handed." "Deus ex machina." "Did you really back yourself into such a corner that the only way out is to kill every single character for which the audience has a shred of empathy left? Please, go back to the drawing board." I'm still bitter about Shakespeare, a sole author, being a required course for my undergrad degree, which utterly ignored the entire 20th century.

Friday, February 22, 2013

Reflections on Writing JavaScript

I've been working with JavaScript for a little while now & I want to briefly share changes I've made in my coding style. These changes, while seemingly pedantic, can be very meaningful in constructing a maintainable script.

Use Anonymous Functions Sparingly

When I first started writing semi-serious JavaScript using jQuery, I was passing anonymous functions as parameters frequently. It's a pattern that's condoned by Codecademy & all the brief jQuery API examples, but it gets messy & unsustainable quickly. Throwing anonymous functions around all the time misses the entire point of functions, i.e. that they're named, reusable chunks of code. What's clearer here:

// anonymous
$.getJSON( "http://some.api.url/gimmejson", { q: "search+term" } , function ( response ) {
        var len = response.len;
        if ( len > 0 ) {
            console.log( "Well, at least it's not empty..." );
        } else {
            return "ERROR ERROR DEATH FATAL ERROR";
        }
        var dataset = [];
        for ( var i = 0; i < len; i++ ) {
            dataset.push( response.items[ i ].text );
        }
        return dataset; },
    );

// named
$.get( "http://some.api.url/gimme.json", processResponse );

Having ten lines of anonymous function pasted into a function call as a parameter is probably the least readable code pattern commonly in use. In particular, if other parameters also span multiple lines (e.g. if I pass a much larger object in the second parameter above) it is a chore to differentiate between commas that separate items within objects & arrays & the commas that separate the parameters you're passing. Debugging is also easier with named functions; you can look back through a call stack that makes sense, rather than discovering that the last function called before an error was but one of the twelve anonymous ones sprinkled throughout your code.

The one disadvantage is that it's not immediately evident that processResponse is a function; it looks like it could be any type of variable. That's why the best, most readable way to use most functions is by passing parameters in an object, which jQuery makes extensive use of:

// passed in an object
$.ajax( {
    url: "http://some.api.url/gimme?json=yesyesyes",
    dataType: "json",
    data: { q: "search+term" },
    success: processResponse,
    error: displayError
} );

This makes the role of processResponse much clearer; it's a callback function called upon a successful request. If the $.getJSON function let me pass in both a success & an error callback, I'd have to look up the function's syntax every time just to figure out which anonymous function was assigned to each. With the object parameter, their roles are doubly evident both from the name of their key as well as the name I've given the function.

&& and ||

&& and || are frequently used in assignment expressions, while intuitively they only belong inside comparison expressions. It's not something I do a lot but it's incredibly frequent in code libraries so understanding its usage is important. Basically, && and || are not merely comparison operators; they are expressions which return a value. && returns the first value if it is falsey & the second if the first is truthy; || is the opposite in that it returns the second value if the first is falsey & the first if it is truthy. You can see how this works in typical comparisons, where && is used to mean "and" & || is used to mean "or". Example:

if ( false && true ) // -> false because 1st is falsey, code won't execute
if ( false || true ) // -> true because 2nd is truthy, code will execute

We know intuitively that these make sense, because "and" usage means that both the first and the second conditions must be true while "or" usage is happy if either the first or the second is true. But what do you think this code, taken from the Google Analytics snippet, does?

var _gaq = _gaq || {};

Does it make sense to have a || outside of a conditional statement such as if? Here, || returns _gaq if _gaq is truthy (e.g. if it exists) but it will return an empty object literal if _gaq is falsey. Then, later on in my code, if I add a method or property to _gaq I've guaranteed that it exists so I won't receive a reference error. So a more verbose but less tricksy rewriting is:

if ( _gaq !== undefined ) {
    _gaq = _gaq;
} else {
    _gaq = {};
}

Writing one line as opposed to five makes sense; an if-else condition is overkill here, when we just want to check if our object already exists & initialize it as empty if not.

Spaces

Spaces are good. I like an abundance of spaces in my code. I pad array brackets, object curly braces, & parentheses wrapped around control flow expressions or function parameters with spaces. So I write

var obj = {
    nums = [ "one", 2, three ],
    funk: function ( param ) {
        if ( param.toLowerCase() === 'parliament' ) {
            return 'Give up the funk.';
        }
    }
};

instead of

var obj = {nums=["one", 2, three],
    funk:function(param){
        if (param.toLowerCase() === 'parliament') return 'Give up the funk.';
}};

One telling space is the parentheses that wrap a function's parameters. I try to always put a space in between the term function & the parameters in a function definition, while there's no space when the function is being executed.

var funk = function ( args ) { ... } // function assigned to variable
funkyFunk function ( args ) { ... } // function declaration
funk(); // function being executed.

Functions are thrown around so frequently in JavaScript that this subtle difference, if consistently enforced, can go a long ways towards helping you read whether a piece of code is being executed or defined for later use.

Switch

I generally avoid the switch statement; its syntax is weird. I find it uncharacteristic that the code blocks following "case foo" aren't wrapped in curly braces. If I had to guess how a switch statement would be done, the cases would look more like:

switch ( foo ) {
    case ( bar ) {
        doSomething();
        break;
    }
    case ( bah ) {
        doSomethingElse();
        break;
    }
}

which parallels the control flow operators. switch doesn't save much space over a series of if comparisons & carries the potential hazard of unintentional fallthrough.

++ and ?

I follow a lot of Douglas Crockford's advice, but not his avoidance of ++. I use ++ in for or while loops & it hasn't come back to bite me. Sometimes I'll use it to increment a value outside of a loop. I think I understand its usage in these limited contexts & while it isn't a huge gain in terms of saving space, it's nice to put all my loop details in one expression. I also don't think the ternary operator is worth avoiding; it's very handy during variable initialization even if it's a little opaque, much like || and &&. The ternary operator looks like:

var someVariable = ( expression ) ? "value if expression evaluates to true" : "value if expression evaluates to false";

We could rewrite the Google Analytics code:

var _gaq = ( _gaq ) ? _gaq : {};

It does the exact same thing; check to see if _gaq exists, initialize it to an empty object literal if not.

You Don't Hate JavaScript, You Hate the DOM

I, as many JavaScript programmers before me, have discovered that JavaScript is really not so bad a language. It has its peculiar errors—the extreme unreliability of typeof & the leading zero issue with parseInt come to mind—but it also has gorgeous features. In particular, the first-class nature of functions is wonderful & I can't live without it. Passing functions as parameters to other functions is mind-blowing once you realize how much you can achieve with it.

But JavaScript's biggest issue isn't the language itself, it's the way it interacts with HTML pages via the DOM. DOM manipulation is tough, the commands are verbose, & cross-browser incompatibilities abound. There's a reason why people love jQuery; it removes the pain of accessing & altering the DOM, scaffolding on top of CSS selectors that most web developers already know. The one biggest piece of advice I give to people who want to learn JavaScript is to start with jQuery. With a nice layer of abstraction, you can actually do something on a website which is amazingly gratifying. The building blocks of the language are easier to acquire when you see their utility on the web, as opposed to repeatedly printing text to the console.

Conclusion: Steps to Learning a Language

There are a few steps you go through when learning a programming language. The very first step is simply understanding what syntax is valid. Writing echo "Hello world!" will result in an error in JavaScript. The next step is understanding the advantages of specific syntax choices; knowing whether a particular situation calls for a a particular control flow operator, for instance.

The next step after that is meaningless in terms of how the code executes but of paramount importance to programmers, who tend to be human; knowing how to write clear code. Once I had the basics out of the way, I found myself having lots of opinions on what makes a piece of JavaScript understandable. Now, every time I go back & look at something I wrote previously, I find myself employing all sorts of conventions (spaces! fewer anonymous functions!) that I've discovered or come to appreciate. While much of JavaScript: The Good Parts went over my head initially, I now understand its essence; that deliberate choices when writing JavaScript can not only avoid common programming pitfalls but increase clarity.

Friday, February 15, 2013

Optimizing IIS for Performance & Security

My college uses Microsoft's IIS 7 for its servers instead of the more common Apache. That's fine; IIS is probably a good server. I don't know, I'm not qualified to say which is better. But one thing's for sure: Apache is a easier to use & learn simply because of the availability of documentation. If you're a full stack web person starting a new project, please use something with community support & documentation. Apache plays nice with Drupal, there's tons of security & performance tweaks documented online, & it has some great add-ons for any situation.

But hey, I'm stuck with IIS. This post is mostly a note-to-self on how to optimize IIS. I'm not at all a server configuration expert, so please don't take it as gospel. Most especially, if I'm flat-out wrong about something, I'd like to hear about it.

For the tl;dr & the resulting file, see my web.config github repo.

Caching

The hardest part is caching correctly. The goal is to use far-future expires headers, similar to Cache-Control: max-age=9000000. There are many different means of caching in HTTP but far-future expires is both simple (the server just says "hey, you can hang onto this content for X seconds") & effective. Some other caching methods end up sending "conditional get" requests, essentially saying "hey, server, I have version 3.2 of this file, is that current?" & the server sends a response back saying either "yup, carry on" or "nope, here's the current version." That is slightly less error-prone, because you can update a file on the server & it'll still make its way to clients that have cached the content, but that extra HTTP request adds up quickly. To update files with max-age or other far-future type caching schemes, I use filename-based versioning, essentially bumping a version number like "style.1.css" to "style.2.css" every time I change a file. Because remembering to change filenames is tedious, I either have a CMS (Drupal's built-in caching) or a build script (Yeoman) handle it for me.

In IIS 7, unfortunately, it looks like you can either set static content caching on or off with little in between (Apache lets you specify expires time by MIME type). If there's a particular static MIME type that you don't want to get cached, too bad. That's problematic for at least two types: text/html & text/cache-manifest. These are both static, text types but the files need to be able to change without changing their name. If you altered your HTML file's name every time it changed, you'd constantly break incoming links. The appcache can't change because it causes this weird loop wherein clients that have previously visited the site & primed their cache can never get an updated version because they always looks in the wrong place; Jake Archibald covers this brilliantly in Appcache Douchebag.

So to get around this conundrum, I use two layers of web.config files: in the site's root where HTML, server-side scripts, & the cache manifest reside I use a config with no caching whatsoever, that's <clientCache cacheControlMode="DisableCache" />. Then, in any subdirectory where static content (images, CSS, JavaScript, fonts, etc.) might reside, I override that setting with an aggressive, far-future expires header.

Finally, I remove ETags with a two-part rule. The HTML5 Boilerplate server configs botch this horribly, ruining the X-UA-Compatible header in the process, but some searching around StackOverflow found me the right combination of rules to remove ETags per performance best practice (see Steve Souders' book).

GZIP

I just copied this bit from the HTML5 Boilerplate Server Configs & made sure it worked with YSlow & other external tests. It's super important to GZIP content, arguably the biggest performance win you can get, & yet that's not the default in IIS 7.

Security

I'm not an expert at hardening servers but it makes sense to eliminate headers that unnecessarily expose server information without any added benefit. I blank the X-AspNet-Version, X-Powered-By, & Server headers. Another IIS quirk is that you can't simply remove the Server header, all you can do is set its value to be an empty string which is at least enough to protect against the version number being exposed.

Rendering Engines

Since the X-UA-Compatible meta tag doesn't really work, I send it as an HTTP header. This forces IE to use Chrome Frame if it's available or the latest rendering engine (e.g. no IE 8 using the IE 7 engine) if not.

Saturday, February 9, 2013

Eric Explains URLs (video)

I'm teaching a course entitled "The Nature of Knowledge" and we're specifically focusing on what happens to knowledge in a digitized, networked environment. I gave the class a "technology inventory" survey to complete and the hardest question on it proved to be identifying the top-level domain of a given URL. As such, I made this video to explain URLs a little bit more in-depth.



Weaknesses

I didn't do a particularly good job of explaining a few things in this video. I want to make it clear that it's not a flawless intro. Hopefully I can remake it sometime, but for now here are some caveats:

  • What does a scheme mean? I introduce two of them but don't describe their implications, i.e. that they're transfer protocols.
  • Subdomains are basically everything in the domain that's not the TLD. I don't think that's clear from my example.
  • Search can literally be a file, e.g. search.php, search.html, search.pdf (though that wouldn't have a query string). I know that the idea of URLs pointing to files is mostly an antiquated idea in the days of database-driven CMSs & web frameworks like Ruby on Rails. But it's a good starting point to learn more about them.
  • Google is a bad example. I knew that but I didn't realize quite how poor, because Google doesn't use a ? to distinguish the query string, oddly enough, so a Google search actually contradicts how I'm describing a query string.

Anything I missed? Open to criticism but I hope this is a decent overview despite its flaws.

Also, I have a git repo of the site I made to demonstrate the different pieces, totally willing to share if someone wants it.

Tuesday, January 29, 2013

A Block to the Head (in Drupal 7)

This is going to be a super specific & technical post, because it took me way too long to figure out how to do this & someone else could probably stand to benefit from it. For the tl;dr skip to the code section.

The Scenario

In general, the <head> of your HTML document is not editable in Drupal. The <title> changes to reflect what page you're on, sure, & tags are added or removed by modules frequently, but you can't tweak the markup in the same way you can the contents of a block or main content. For the most part, that's fine; 90% of the <head> should stay the same from page to page, such as site styles & must-have <meta> tags like viewport. If you need to change something site-wide, that can be done through theme templates. But sometimes that's not enough; what if I want to add different <meta> tags to different pages? Or load variegated builds of Modernizr on variegated pages? Or use a special font on a set of pages? There are numerous reasons why you might want a specific subset of pages to have a bit of custom markup; in short, you want to put one of Drupal's blocks in the document's <head>. Without further ado, here's how you do that.

The Code

In yourtheme.info, find the list of regions & add one for your new <head> region:
regions[html_head] = HTML Head
The name that you'll use in the code has to be a valid PHP variable name, so no spaces.

In your theme's template.php, add the following to the yourtheme_preprocess_html function:
$variables[ 'html_head' ] = block_get_blocks_by_region( 'html_head' );
If you're using a subtheme without a template.php, you can create it & write the function like this:
function mastertheme_subtheme_preprocess_html(&$variables) {
  /* so that html_head is available in html.tpl.php */
  $variables[ 'html_head' ] = block_get_blocks_by_region( 'html_head' );
}

In html.tpl.php, your theme's HTML template, find the spot where you want the block's markup to be inserted & add:
<?php if ( $html_head ): ?>
  <?php print render( $html_head ); ?>
<?php endif; ?>
If you're using a subtheme where html.tpl.php doesn't exist, copy the master theme's html.tpl.php into the templates directory & then add the above.

Right now, that would be enough to insert some code into the <head> but the code will likely be wrapped in problematic tags that don't belong in the <head> like <div> & <section>. To get rid of that junk, we create a new region-specific block template that only prints out the contents of the block & nothing else. Create a block--html-head.tpl.php file & put it in the templates directory with the following line as its only contents:
<?php print $content; ?>
Now clear the cache & add a block to the new region; it should appear in your site's <head>.

If you've named your block something other than "html_head" then you'll need to change the references throughout, but this should work for any Drupal 7 site. Note that if you rename a region (or perhaps otherwise screw with its templates? not entirely clear to me), all the blocks you had previously assigned to it become unassigned. That was what wasted the majority of my time; I couldn't understand why my block wasn't showing up when the $html_head variable should have been available but the block had been unassigned during my shenanigans.

The References

The Drupal Answers thread Printing regions in html.tpl.php provided most of the special sauce for this one, specifically the idea to store the return value of block_get_blocks_by_region in a variable that's accessible later when html.tpl.php runs.
The template_preprocess_html, block_get_blocks_by_region, & html.tpl.php API documents all provide useful reference material.
Lastly, the drupal_add_html_head function appears to provide another avenue to the same destination. However, it's much more convenient to store markup in a block. I also want to write straight HTML & not Drupal's weird "renderable array" content, which is what the function takes as a parameter.