Tuesday, July 23, 2013

Adding LibGuides to Drupal's Search Results

This will be another super specific post about how to do something useful for libraries in Drupal. The tl;dr is that you can use LibGuides XML Export, the Feeds module, and the Feeds XPath Parser module to make LibGuides show up in your Drupal site search results. So when users search for "english composition" and you don't have any study guides on your Drupal site, something relevant from LibGuides might show up.

I was inspired to do this by the Drupal in Libraries book, though I haven't read it (I saw it mentioned in American Libraries). I didn't see specific details in the book's preview, and Michigan is putting the XML into their Solr search index which is too sophisticated for my small college, so I thought a brief write-up might benefit other libraries who have LibGuides but don't use Solr. Libraries using other CMSs might still benefit from the general outline, though the specific details won't be useful. I'd be shocked if Wordpress libraries couldn't do the same, using WP All Import or other plugins.

These directions are specific to Drupal 7; I bet the same can be achieved in 6 but I can't vouch for any of the settings or code being the same.

Set-up: LibGuides & Modules

In order to do this, you have to do a couple steps first to prepare both LibGuides and Drupal.

  • Purchase the Images and Backups Module from Springshare. In my experience, the pricing is very reasonable, and the "images" part of it means you can upload images to LibGuides which makes adding them to guides much, much easier for authors.
  • Install the Feeds module, a popular and well-maintained module for mass importing nodes from structured data (RSS/Atom feeds, CSV files, OPML files) into Drupal
  • Install the Feeds XPath Query module which adds an extra parser to your Feeds installation, allowing you to import nodes from arbitrary XML documents

Once you've done these three steps, download the XML export from LibGuides (Springshare will email you when it's ready) and enable both modules in Drupal.

Process the XML

I don't work with XML much (shame, librarian, shame!) but this is a step where you could edit the LibGuides export to make it more useful as an imported node. In my pre-processing, I only wanted to accomplish one thing: when I import the nodes, I don't want any unpublished or private guides to be published in Drupal. We have a few under-construction or private guides that shouldn't show up in search results.

To do so, there's just one Drupal quirk you have to know: later on, in configuring the way your data maps to Drupal nodes, you'll be able to map the contents of an XML element to a Drupal node's "publication status" field. 1 means published and 0 means unpublished.

Luckily, the LibGuides XML has a <STATUS> element underneath each <GUIDE> which you can easily map to either 0 or 1. To process the XML, I performed a simple pair of search-and-replace operations in Sublime Text:

  • Search for "<STATUS>Published</STATUS>" and replace with "<PUBLISH>1</PUBLISH>"
  • Search for "<STATUS>.*</STATUS>" and replace with "<PUBLISH>0</PUBLISH>"

That second search and replace uses a teeny bit of regex: the period stands for "any character except a line-break" and the asterisk means "any non-zero number of the preceding character". So I'm searching for any non-empty string of text inside of a <STATUS> element and turning it into <PUBLISH>0</PUBLISH>, which works because all of my published guides no longer have a <STATUS> element after the first search-and-replace.

Configure the Feeds Importer

Back inside Drupal, we need to create a new content type and set up the Feeds module to receive our XML file.

  • Under the "Structure" menu of the admin toolbar, select Content Types
  • Add content type and then give it a name and description, e.g. "Imported LibGuides"
  • Add fields to your new content type, which at the very least should contain two new fields: an "ugly URL" field for LibGuides that don't have a friendly URL, and a "friendly URL" field. You can make these Text field types with the standard settings.
  • Under the "Structure" menu of the admin toolbar, select Feeds importers (or visit {{drupal root}}/admin/structure/feeds)
  • Add importer and then give it a name and description, e.g. "LibGuides Importer"

There are a lot of settings here, which can seem intimidating, but is actually great. The Feeds module gives you control over how data is imported into Drupal and everything is straight-forward if you take the time to read through it. I'll walk through my basic settings, but just know that you could do whatever seems reasonable here and be OK; the only piece of this post you might need to reference are the XPath queries later on.

  • Basic Settings
    • Attach to content type: select your LibGuides content type here
    • Periodic import: off, periodic import is only for grabbing nodes from web feeds, e.g. RSS
    • Import on submission: check
  • Fetcher: File upload
    • Allowed file extensions: you can leave as is, but I put XML since I'll only be uploading XML files
    • Upload directory: leave as is
  • Parser: XPath XML parser (this option only appears if you installed Feeds XPath Query)
    • Settings: see the section below on the XPath queries, but trust me this won't be that painful
  • Processor: Node processor
    • Bundle: select your LibGuides content type again
    • Update existing nodes: this is a bit of a judgment call, but you'll be fine with either "Replace existing nodes" or "Update existing nodes."
    • Skip hash check: I leave this unchecked but you'd be fine either way
    • Text format: your call, I leave as "Plain text" which is fine for search results
    • Author: anonymous, or your user if you want to brag about how many nodes you made
    • Authorize: probably should leave checked
    • Expire nodes: Never
    • Mapping for Node processor: make the Title, Body, Published status, Friendly URL, and Ugly URL fields all map to an "XPath Expression" source. The two URLs fields are ones we created with our Imported LibGuides content type, so if you chose a different name for them back then they will appear differently in the Target drop-down options here.

Whew, we're done! I know that looks like a lot, but Feeds has a pretty nice UI for such a sophisticated and powerful module.

Parsing XML with XPath

Now for the fun part: we need to map XML elements in LibGuides to Drupal fields using XPath expressions. We also get to say things like that which only .01% of humans understand.

XPath is a query language for XML, if you know SQL or CSS it's kind of similar. It gives you a way of traversing the structure of an XML document to retrieve the contents of various elements. The LibGuides XML is structured in a pretty logical, simplistic manner so writing our queries won't be tough. Back in the Feeds importer settings that we were just editing, select the Settings link under the Parser section. This gives us a menu where we can write our XPath queries. Here's the setup that I use with some English translations:

Context: //GUIDE

We want our queries to run in the context of each <GUIDE> element. We could do without this, but it means we'd be prepending /LIBGUIDES/GUIDES/GUIDE/ to each query below, which is silly.

title: NAME

body: DESCRIPTION

Set the name of the LibGuide to the node's title and the body of the node to its description. The description is the brief sentence which shows up underneath the name of a LibGuide.

field_friendly_url: FRIENDLY_URL

field_ugly_url: URL

Each <GUIDE> element has two URLs, so we map both of those to the two custom fields we set up on our Imported LibGuides content type. Once again, if you named your fields something different, their machine-readable names (which is what you see in this menu, they're just lowercase with underscores instead of spaces) will be different.

status: PUBLISH

Remember when we edited the LibGuides XML to set up a <PUBLISH> element that's either 0 or 1? That's where this mapping comes into play, taking that Boolean value and using it as Drupal's publication status field.

You can leave all the "Select the queries you would like to return raw XML or HTML" options unchecked. Note that this could provide some interesting options if you were doing more sophisticated things with LibGuides, since the XML export contains all the raw HTML of the various boxes in each guide. Debug Options can also be left unchecked, although if you're testing this process I recommend checking them off. The debug options show you what Drupal found with each XPath query, which can help you configure the importer properly.

I leave "Allow source configuration override" unchecked as well. Since we just set up our XPath queries the way we wanted, there's no need to override them later. However, you could do something interesting where you set up a generic LibGuides importer in these settings, then have multiple different ways of mapping the XML into nodes.

Redirecting Imported Nodes to LibGuides

Before we actually import our LibGuides, we want to make sure they're handled appropriately. That is, we don't want people clicking on their search results simply to see some lame text and URLs on the screen, we want them to be redirected straight to the LibGuide.

There are probably other ways to do this, for instance the Field Redirection module, but I use node templates, which are PHP templates that apply to only specific node types. Under the Templates folder of your theme (which will be somewhere in sites/all/themes likely) create a file named "node--imported-libguides.tpl.php" where "imported-libguides" is whatever you named your LibGuides content type but with hyphens replacing spaces. Inside that template, paste the following PHP:

<?php
// redirect user to LibGuide rather than node if user is not signed in
// uid 0 means anonymous user
if ( $user->uid == 0 ) {
  // prefer friendly URL if available
  if ( $node->field_friendly_url ) {
    drupal_goto( $node->field_friendly_url[ 'und' ][ 0 ][ 'value' ] );
  } else if ( $node->field_ugly_url ) {
    // ugly_url should always exist but just in case, use a conditional
    drupal_goto( $node->field_ugly_url[ 'und' ][ 0 ][ 'value' ] );
  }
} else {
  print render($content);
}
?>

I've written comments in the code, but essentially here's the path this code steps through:

  • Is the user anonymous? If yes, redirect them. If not, we assume the user is some kind of editor, so we print out the lame text fields. This makes it easier for librarians to edit nodes after they've been imported, but assumes that your users don't have Drupal accounts. If they do, you'll need to consider the first if condition thoroughly to make sure only the right types of users are seeing the plain text.
  • Does the node have a friendly URL? If so, redirect anonymous users to it.
  • If not, the node must have an ugly URL, redirect anonymous users to that.

I noted it above, but because it's so important: if you allow users to create Drupal accounts, this template won't work well. It won't expose confidential data or anything, but it's definitely meant for Drupal sites where all non-editor traffic is anonymous.

Your theme may also have a particular way of printing out nodes that you want to stick to; in that case, you'd be better off copying node.tpl.php or another node type template rather than using my code verbatim. You could put the logic piece of this code at the top of your node template, dropping the else clause at the end. That would work fine as long as it's named appropriately, e.g. "node--imported-libguides.tpl.php".

We're Almost There

Now that our template is set and our importer configured, we need to create an importer node, give it a file, and let it run wild. Go to {{drupal root}}/import to see a list of available importers, including the default ones that come with the Feeds module and your LibGuides Importer. Select LibGuides Importer and you're greeted with the usual node editing form, except this time there's a place to upload a file towards the top. Use that to browse to the processed LibGuides XML, then upload it. You can leave the body and other fields blank.

Once you've created this node, it will have an Import tab with an identically named button. Simply click that and your nodes should be created in Drupal, with whatever debug messages you chose in the importer displaying as well.

Totally screwed up the XPath queries, causing a bunch of broken and useless nodes to be imported? No worries, the importer node that you just created has a Delete items tab which can delete any of the nodes which it imported. This makes trying out a Feeds importer rather risk free; just keep trying until you get it right.

Final Steps

Drupal's internal search index will still need to index the new nodes before they show up in its results. You can run cron a few times depending on how many nodes you just added and they should show up. Try a search for the title of a LibGuide that wouldn't return any of your other pages, and make sure clicking on a LibGuide result from an anonymous session causes you to be redirected to the guide.

As LibGuides are added and removed, you'll have to sync them to their Drupal nodes again. However, once you've done the process once, it only takes a few minutes to grab a new XML export, upload it, and click the import button.

Monday, July 1, 2013

Foreign For-In, or Python as a First Language

...being a brief recap my experience at the Python Preconference at ALA Annual. In general, the session was a smashing success and I was elated to see a diverse group of people picking up Python so quickly. Without going into details elsewhere, which I think other attendees or organizers will cover, here's one struggle and one pleasant surprise from the preconference.

Explain a For-In Loop

Describing how a for-in loop works was difficult and I repeatedly ran into attendees who just couldn't quite grok it. A Python for-in loop looks like:

for word in wordlist:
    print word

That would loop through the wordlist data structure, which we'll say is a list (similar to an array in other languages), printing each term to the screen. Simple, right? But it's actually pretty weird, because in the above example what exactly is word? It's a local variable that gets a new value each time through the loop. If for-in loops for lists didn't exist in Python, you might implement them like so:

i = 0
while i < len( wordlist ):
    # being super explicit here
    word = wordlist[ i ]
    print word
    i = i + 1

len( wordlist ) here returns the length of the wordlist list, for non-Python people. Otherwise, I assume the syntax is straightforward for anyone who knows a little code. The biggest disadvantage to this implementation is you end up with two variables in the scope—i and word—neither of which is useful after the loop has run.

I'm not sure my explicit for-in loop is more clear to a new programmer, but it's my conceptual model. Students struggled with understanding the for variable's name; where does word come from? In the lecture, Becky Yoose used this example:

for fruit in pies:
    print fruit

The reaction from attendees seemed to be "since pies is a list of different fruits, the variable name has to be 'fruit' here." As if Python was somehow doing natural language processing to figure out a good descriptive term of an individual item in a thematic list. It's a weird thing to grasp conceptually, perhaps the crux being you're getting a variable without any assignment statement. That's a nice convenience for programmers coming from other languages but it obscures what's going on for learners.

Nested Loops & First Languages

On the other hand, I found that a lot of our exercises and final projects involved nested loops, sometimes three to four layers deep. Everyone seemed to absorb this without conceptual difficulty. Maybe it's my own experience speaking here, but I get more and more anxious the deeper my indents go. A lot of this anxiety is based in JavaScript, where blocks wrapped in curly braces tend to take up more space and are harder to parse than in whitespace-happy Python. The uglist code in the world is an instantly-invoked function expression which ends in a bunch of closed code blocks:

            }
        }
    }
}( 'this happens way too often in JavaScript' ) );

Python's conveniences, like range() and how the for-in loop works seamlessly across different data types (lists, dictionaries, even strings. Strings, people!) are a serious boon to beginners. I still think JavaScript makes a great first language for a few reasons: 1) everyone already has it installed via their web browser, so there's zero setup barrier, 2) the web is where data and applications live these days and JavaScript is the language of the web, and 3) a trivial amount of jQuery can make cool things happen. Other languages require more investment before the cool things go down.

But the setup process wasn't an issue for the preconference. We held a help session the night before and only two people came; one of them already had Python installed and on the Windows path, they just needed confirmation that they'd done it right. A number of factors contributed to the ease of setup: many attendees had Macs which typically come with a 2.6.x or 2.7.x version of Python, the Boston Python Workshop docs are great and cross-platform, and a fair portion of attendees were advanced computer users. So with an easy setup, Python (or Ruby) is a sensible choice for a first language.

Tuesday, June 18, 2013

Describe Your Ideal Work Environment

I served on two search committees recently and blogged about the experience. I was struck by how tough it was to frame good interview questions. A lot of the questions we asked ended up being duds, not receiving a single response which illuminated anything about our candidates. Yet once you've asked a question, you're rather obligated to ask it of each person, for fairness' sake.

On the other hand, I also recently interviewed for a position and I was asked an excellent question: "Describe your ideal work environment." Why is this so great? I think it helps both parties, the search committee and the interviewee. The interviewee's answers almost of necessity must be revealing. So much so that the committee might rule them out based upon this question alone, which really aids the interviewee: if your own ideals are so at odds with an institution's, it's better to be ruled out ahead of time than to find that out a few months after you've started.

But what I really want to talk about is how I answered this question. Maybe it wasn't what the committee wanted to hear—I didn't get the position—but it felt good to articulate.

Control Over My Work Environment


Specifically, my computer. I want to run the operating system and software of my choice. Unfortunately, this is all-too-rare at most libraries and educational institutions.
To be fair, I understood that there was no way I'd receive admin privileges at this position. But it's definitely a preference of mine. It's positively unproductive to limit the software available to information professionals. I do lots of development work, I have probably installed forty-plus packages on my Windows (not my first choice) machine at work. It's a waste of IT Support's time to come to my office to type in a password once a week; it's a waste of my time putting off a task because I can't install a requisite tool. I'm incredibly appreciative that MPOW allows me admin privileges.

Every institution should have a simple "admin quiz" one can take to receive appropriate privileges. I understand why we deny everyone by default; running an institution's computers is hard work and ensuring consistent security and software settings is a great aid. But those of us who are capable of administering our own computers, who know to run antivirus software (or just not run Windows...sorry, I'm belaboring the point) and avoid sketchy links in emails, should be given that prerogative.

While I've rambled quite a bit about computers, I also like to control my office environment. Now that I have my office set up the way I like, I'm rather attached to it. I like to have a standing desk, some room for pictures on the wall, some open space. I can do without, but I'd prefer not to.

Data-Driven Decision-making


I like to make decisions based upon data rather than my own feelings or opinions. That data doesn't have to be quantitative; I have a great appreciation for user experience research and I wish I had more time to devote to it. There's no substitute to seeing actual users perform actual tasks, whether it be searching for a peer-reviewed article or trying to find the print card vending machine.

This isn't a personal preference either, it's an institutional one. I love seeing data brought up in meetings, at presentations, in board meetings. It says something about an institution and its commitment to objectivity and success. Again, it's not all that common and that's understandable; collecting and analyzing data is difficult, time-consuming work. But recognizing the importance of those activities isn't.

Failure is Natural


As I've covered before, I have a great appreciation for failure. We cannot be successful in all our ventures and we often learn as much from the crash-and-burn projects as the epic wins. An institution that acknowledges that failure is a natural part of its own evolution is one I want to work for. I want to see presentations that not only say "gee, we really screwed up here" but also "and here's how we'll avoid the same mistakes next time." There's nothing more frustrating than seeing people cover up obvious mistakes because you just know that they'll be repeated in the future.

That's My List


or at least part of it, the main items certainly. What's yours? Is there anything in particular that libraries do well or struggle with?

Again, I think this question is more of a healthy exercise in articulating your own priorities rather than a wish list. I fully expect that I'll never work for an institution that gets flying colors in all three of these categories, but that doesn't mean that I shouldn't recognize my own predilections.

Wednesday, June 5, 2013

Teach While You're Learning Yourself

There's a (pretty reasonable) theory that the best way to learn is from the experts. They know what they're talking about, right? It makes sense, and those who have studied and worked in an area for years have valuable insights to share. They know the pitfalls, the broken assumptions, the brilliant hypotheses and they can communicate them.

But the experts have their disadvantages. The fundamentals are so ingrained in them, so second nature, that they speak a different language. A technical term on their lips has an intricate history, labyrinthes of connotations. The neophyte, on the other hand, has but glimpsed the adumbrations. They've learned a term only to find out their understanding was slightly askew. Their confusion is laden with value, with the very undulation of learning. It should be harnessed while it's prime.

Enough Abstraction Already

I'm engaged in a community of librarians who are steadily leveling up their technical skills. A lot of this happens in the Library Codeyear Interest Group (come to the Python Preconference at ALA!), but also on the ACRL Tech Connect blog where our posts are less prophets handing down commandments than regular ol' librarians sharing their inchoate knowledge.

A specific of example is the Codeyear IG's GitHub Project, which I started (though feedback from participants and Andromeda Yelton has been invaluable). I started the project despite being mediocre at Git and GitHub. I am not a software developer. Sure, I have a deceptive number of projects on my GitHub account, but I'm thoroughly amateur and still make embarrassing mistakes.[1] But that hasn't hindered the project's efficacy: we've had ten people complete the Getting Started tutorial and many more read the Tech Connect blog posts on it. If nothing else, it's upping the community's exposure to and understanding of awesome tools like GitHub.

Part of the success of the GitHub Project, I hope, is my ability to write for beginners. Having just started using version control myself, I'm hesitant to employ Git terminology which is familiar to people coming from other VC systems but not to people new to the whole class of software. For instance, rather than write something like git commit does just what it says: it stores the current contents of the index in a new commit along with a log message from the user describing the changes it's obvious that the commit command finalizes our changes and adds them to the project's history is a better explanation. But even with my valuable inexperience, I still assume familiarities that don't necessarily exist. An early participant noted that the keyboard shortcut to exit the git log command was never mentioned (it's the letter q, by the way). This is precisely the sort of key detail that is lost on experienced users. I'm no command line expert, but I know that q exits the less pager. It was a real hangup for me when I was learning, but now that I press it several times a day, it's cognitively absorbed. I forgot that it was something I had to learn, once upon a time.

Old News

Pedagogy has known that experts do not make great teachers for awhile. We've all heard of the move away from the "sage on stage" to the "guide on the side," which is related to the critique of top-down knowledge transmission. Other current movements, like "flipping the classroom" where lectures occur outside of class while time in the classroom is used for group projects, also come to mind. But we often fail to carry these lessons over to professional development; when you schedule conference sessions, do you look for Delphic panels like Top Tech Trends or amateur confessionals like Drupal Fail? More importantly, do you stop yourself from writing to a listserv, tweeting, blogging, or proposing conference sessions because you feel too inexperienced, too fraudulent?

A lot of librarianship is learning, whether it's how to teach information literacy or how to code, and we benefit as a community when everyone shares their own lessons. Go forth and edify, ye novices.

Footnotes

1.^ The history of my fork of a dotfiles repo has damning evidence. There's weird looking stuff if you run git log --pretty=online -n 50 --graph for what should be a fairly straightforward project.

Tuesday, May 21, 2013

Blacklisting Wikipedia & Information Literacy

I taught an interdisciplinary course this past semester, "The Nature of Knowledge." My co-instructor and I focused specifically on what happens to knowledge in a networked, digital environment. The course was revelatory for me, both because it was the first I've taught as a lead instructor and due to how students reacted to our content. The course is going to inspire a slew of blog posts, but I want to start with a plea to postsecondary educators:

Your attitude towards Wikipedia is destroying students' critical thinking.

I say this because virtually every student in the class had heard that Wikipedia is inappropriate for academic use. And it is; it says so itself. The problem is they have no idea why. The most common reason proffered was "because my professors said so," the very antithesis of critical thinking.

Information Literacy & Lists

The third bullet point in ACRL's information literacy competency standards is "evaluate information and its sources critically." This is where assignments that blacklist or whitelist certain sources fail. Rather than equip students to analyze sources, valid sources are pre-selected and often according to arbitrary criteria. For instance:

No Internet sources is a common theme. Even with an "except the library databases" caveat, this is at best confusing and at worst counterproductive. What about Google Scholar, OAIster, Scirus, and all the other open access aggregators? The web is the primary delivery mechanism for scholarly knowledge. One cannot simply write it off. This is especially harmful because it trains students to ignore so many wonderful sources out there. What will they do when they don't have access to research databases? We're teaching them that the open web is useless for research when it's not.

Then there are the blacklists which specify Wikipedia. The issue here is the discrimination: why is Yahoo! Answers not listed? About.com? Ask.com? Conservapedia? The list goes on. It's a fruitless endeavor to delimit the poor sources from the good ones. And Wikipedia is likely singled out not because it's particularly bad but because it's so common.

Finally, there's the inverse approach of assignments which require peer-reviewed articles. The issue is, at least for the first few years of an undergraduate degree, peer-reviewed sources are too arcane for our students. This is less of an indictment of students' reading than academic writing, which eschews accessibility. I've heard grumbles around the librarian blogosphere about peer-reviewed article requirements (Meredith Farkas' screed against freshman research papers is a must-read) and plenty of people are critical of them. They're still all-too-common in assignments.

The underlying concern throughout all of these approaches is that they rarely explain why. Why is the web so awful, especially since many scholarly sources now appear there? Why is Wikipedia specifically worse than other sites that allow anyone to publish? What the heck is peer review and why do we care about it? I touch on all these when I teach information literacy, but half of the time I'm combating the assignment. The ACRL standard is evaluate information and its sources critically, not uncritically accept whatever unjustified stance is taken by the assignment. These assignments cultivate intellectual laziness, to quote my co-instructor, not the skills to critically evaluate any source, regardless of where one happened to find it.

What's really wrong with Wikipedia?

In my classes, I'll often do comparative searches across Google and a library database, then ask students to evaluate a chosen result using a metric like the CRAAP test. I distinctly recall a class when I brought up a Wikipedia article and asked which elements of the CRAAP test it failed. All of them, a student ventured. Nothing could be further from the truth.

Currency? It varies, but most Wikipedia entries are updated frequently. In fact, this is one area where Wikipedia has a structural advantage over other modes of publication: because there's such a wonderfully low barrier to participation, new information can be added as soon as it's published elsewhere. Compare this to traditional tertiary sources (especially print ones), where editorial and publishing processes delay information becoming available to end users. On the other hand, compare Wikipedia to other websites; every article has, down to the very minute, its last-updated date visible on the "View history" tab. Anyone who has helped a student cite a website knows that determine the publication date is usually an exercise in futility.

Relevance? Wikipedia's enormous breadth virtually ensures it has something relevant to say on any topic.

Authority? This is Wikipedia's only problem in terms of the CRAAP test. We usually don't know who has contributed to any given article, it could be a credentialed academic or anyone else. Wikipedia itself dismisses authority, stating What is contributed is more important than the expertise or qualifications of the contributor, a provocative stance which I don't have space to explore here.

Accuracy? Wikipedia articles can have hundreds of references. The encyclopedia's insistence on verifiability and citations are cardinal strengths. In fact, the way I recommend most students use Wikipedia articles is to learn important terminology from them and mine their references. I wrote a paper in graduate school on net neutrality; Wikipedia was my first stop and it outlined not only the major issues but also linked directly to pertinent policies and secondary sources. Thanks largely to that excellent start, my paper earned an A.

Purpose? Wikipedia is admirably forthright about what it is and is not. Its goals are noble (as evidence by its enlightened five pillars), especially relative to for-profit alternatives like About.com which display ads and lack external references.

Yet no one had walked my students through this kind of analysis. No one had shown them the "View history" tab of an article, or its references section, or any of Wikipedia's fundamental policies. Why Wikipedia is a non-academic source was always left as an exercise for the reader.

"Anyone Can Edit"

It's worth investigating the "anyone can edit" argument further, because it appears to be the main objection to Wikipedia.

First of all, it is not strictly true that anyone can edit any article at any time. Certain articles are protected and can only be edited by a subset of editors, such as administrators or confirmed accounts. These articles tend to be common targets for vandalism. They form a small minority of articles.

Secondly, as my students found out, wiki markup is nontrivial. It takes some familiarity before an editor can do anything other than add unformatted text. This was a large obstacle for most of my students; despite an exercise introducing them to HTML using Codecademy early on, many struggled to understand more complex markup structures such as references and links. It seems unlikely that someone would invest a great deal of time learning wiki markup only to write nonsense into articles. Most would take the time to learn editorial guidelines as well as markup, which we did in our class by reading guidelines and Joseph Reagle's Good Faith Collaboration.

Thirdly, the "anyone can edit" objection often refers to vandalism more so than biased or inaccurate writing. The problem with this argument is...how often do you see actual vandalism on Wikipedia? Even the hypocrites who ban Wikipedia have likely read dozens if not hundreds of articles. I've probably read thousands myself but I've only ever seen vandalism once, which is largely due to the fact that anyone can edit. Anyone who spots vandalism can easily remove it and Wikipedia also employs bots to detect and delete vandalism. I made a brief video that covers the points in this paragraph, showing an example of vandalism that was reverted within a minute.

Finally, and most importantly, "anyone can edit" does not equate to "anyone writes anything they want." Wikipedia has standards which are enforced by an editorial community. It is not an open forum for any kind of discourse, it's an open encyclopedia written from a neutral point-of-view. Yes, there are articles which are inaccurate, biased, or incomplete. But they're not the product of a million monkeys hammering away on laptops, they're deliberate steps towards a better and more encyclopedic article.

What's really great about Wikipedia?

Wikipedia has a few advantages over traditional research sources, such as the widely distributed editorship, the speed with which articles can be updated, its strong community norms, and the bots which automate low-level tasks like reverting vandalism. But there has always been one thing that stands out about Wikipedia to me: it is the only source which warns you of its own inadequacies. From inline citation needed and weasel words warnings, to colored boxes up top (unencyclopedic, doesn't represent a worldwide view, personal reflection or opinion, uses out-of-date sources...the sheer variety of these indictments speaks to just how high the encylopedia's standards are, and how often they're not met); Wikipedia wants you to know it's imperfect. Users should be aware that not all articles are of encyclopedic quality from the start: they may contain false or debatable information.

No one else does this. Not About.com, not Britannica, not brilliant economists who make errors in their Excel spreadsheets. A source detailing its own issues is virtually unheard of and can only come about in a community like Wikipedia, where numerous editors representing diverse viewpoints constantly enforce a set of stringent standards.

To bring this back to assignment structure, it provides instructors with an easy criterion, too. If you must blacklist Wikipedia articles, how about starting with the ones that have issues identified by alert boxes? While this doesn't challenge students to analyze sources by themselves, it at least tells them why a particular article is unusable.

Scaffolding

I'll readily admit; I oversimplify concepts in instruction sessions all the time. It's productive to create a foundation of a few artificial givens upon which students can build. Then, in a later course, those assumptions can be examined and problematized. So, to some extent, black- or whitelists of sources are useful, they wean students off of poor sources until students can analyze them on their own. Scaffolding is tricky and I certainly haven't mastered it yet.

However, college is the time when we should be examining students' perceptions surrounding Wikipedia. The Wikipedia ban is a high school scaffold; it needs to be torn down in the first two years of college. Students can benefit from using Wikipedia articles appropriately, from understanding tertiary sources, from thinking critically about the sorts of issues that pop up in alert boxes at the top of questionable articles. If nothing else, the heresy of crowdsourcing—that a mass of amateurs can produce information as good as or even better than a handful of experts—must be taught. It's too important to today's information economies to be overlooked.