Tuesday, July 23, 2013

Adding LibGuides to Drupal's Search Results

This will be another super specific post about how to do something useful for libraries in Drupal. The tl;dr is that you can use LibGuides XML Export, the Feeds module, and the Feeds XPath Parser module to make LibGuides show up in your Drupal site search results. So when users search for "english composition" and you don't have any study guides on your Drupal site, something relevant from LibGuides might show up.

I was inspired to do this by the Drupal in Libraries book, though I haven't read it (I saw it mentioned in American Libraries). I didn't see specific details in the book's preview, and Michigan is putting the XML into their Solr search index which is too sophisticated for my small college, so I thought a brief write-up might benefit other libraries who have LibGuides but don't use Solr. Libraries using other CMSs might still benefit from the general outline, though the specific details won't be useful. I'd be shocked if Wordpress libraries couldn't do the same, using WP All Import or other plugins.

These directions are specific to Drupal 7; I bet the same can be achieved in 6 but I can't vouch for any of the settings or code being the same.

Set-up: LibGuides & Modules

In order to do this, you have to do a couple steps first to prepare both LibGuides and Drupal.

  • Purchase the Images and Backups Module from Springshare. In my experience, the pricing is very reasonable, and the "images" part of it means you can upload images to LibGuides which makes adding them to guides much, much easier for authors.
  • Install the Feeds module, a popular and well-maintained module for mass importing nodes from structured data (RSS/Atom feeds, CSV files, OPML files) into Drupal
  • Install the Feeds XPath Query module which adds an extra parser to your Feeds installation, allowing you to import nodes from arbitrary XML documents

Once you've done these three steps, download the XML export from LibGuides (Springshare will email you when it's ready) and enable both modules in Drupal.

Process the XML

I don't work with XML much (shame, librarian, shame!) but this is a step where you could edit the LibGuides export to make it more useful as an imported node. In my pre-processing, I only wanted to accomplish one thing: when I import the nodes, I don't want any unpublished or private guides to be published in Drupal. We have a few under-construction or private guides that shouldn't show up in search results.

To do so, there's just one Drupal quirk you have to know: later on, in configuring the way your data maps to Drupal nodes, you'll be able to map the contents of an XML element to a Drupal node's "publication status" field. 1 means published and 0 means unpublished.

Luckily, the LibGuides XML has a <STATUS> element underneath each <GUIDE> which you can easily map to either 0 or 1. To process the XML, I performed a simple pair of search-and-replace operations in Sublime Text:

  • Search for "<STATUS>Published</STATUS>" and replace with "<PUBLISH>1</PUBLISH>"
  • Search for "<STATUS>.*</STATUS>" and replace with "<PUBLISH>0</PUBLISH>"

That second search and replace uses a teeny bit of regex: the period stands for "any character except a line-break" and the asterisk means "any non-zero number of the preceding character". So I'm searching for any non-empty string of text inside of a <STATUS> element and turning it into <PUBLISH>0</PUBLISH>, which works because all of my published guides no longer have a <STATUS> element after the first search-and-replace.

Configure the Feeds Importer

Back inside Drupal, we need to create a new content type and set up the Feeds module to receive our XML file.

  • Under the "Structure" menu of the admin toolbar, select Content Types
  • Add content type and then give it a name and description, e.g. "Imported LibGuides"
  • Add fields to your new content type, which at the very least should contain two new fields: an "ugly URL" field for LibGuides that don't have a friendly URL, and a "friendly URL" field. You can make these Text field types with the standard settings.
  • Under the "Structure" menu of the admin toolbar, select Feeds importers (or visit {{drupal root}}/admin/structure/feeds)
  • Add importer and then give it a name and description, e.g. "LibGuides Importer"

There are a lot of settings here, which can seem intimidating, but is actually great. The Feeds module gives you control over how data is imported into Drupal and everything is straight-forward if you take the time to read through it. I'll walk through my basic settings, but just know that you could do whatever seems reasonable here and be OK; the only piece of this post you might need to reference are the XPath queries later on.

  • Basic Settings
    • Attach to content type: select your LibGuides content type here
    • Periodic import: off, periodic import is only for grabbing nodes from web feeds, e.g. RSS
    • Import on submission: check
  • Fetcher: File upload
    • Allowed file extensions: you can leave as is, but I put XML since I'll only be uploading XML files
    • Upload directory: leave as is
  • Parser: XPath XML parser (this option only appears if you installed Feeds XPath Query)
    • Settings: see the section below on the XPath queries, but trust me this won't be that painful
  • Processor: Node processor
    • Bundle: select your LibGuides content type again
    • Update existing nodes: this is a bit of a judgment call, but you'll be fine with either "Replace existing nodes" or "Update existing nodes."
    • Skip hash check: I leave this unchecked but you'd be fine either way
    • Text format: your call, I leave as "Plain text" which is fine for search results
    • Author: anonymous, or your user if you want to brag about how many nodes you made
    • Authorize: probably should leave checked
    • Expire nodes: Never
    • Mapping for Node processor: make the Title, Body, Published status, Friendly URL, and Ugly URL fields all map to an "XPath Expression" source. The two URLs fields are ones we created with our Imported LibGuides content type, so if you chose a different name for them back then they will appear differently in the Target drop-down options here.

Whew, we're done! I know that looks like a lot, but Feeds has a pretty nice UI for such a sophisticated and powerful module.

Parsing XML with XPath

Now for the fun part: we need to map XML elements in LibGuides to Drupal fields using XPath expressions. We also get to say things like that which only .01% of humans understand.

XPath is a query language for XML, if you know SQL or CSS it's kind of similar. It gives you a way of traversing the structure of an XML document to retrieve the contents of various elements. The LibGuides XML is structured in a pretty logical, simplistic manner so writing our queries won't be tough. Back in the Feeds importer settings that we were just editing, select the Settings link under the Parser section. This gives us a menu where we can write our XPath queries. Here's the setup that I use with some English translations:

Context: //GUIDE

We want our queries to run in the context of each <GUIDE> element. We could do without this, but it means we'd be prepending /LIBGUIDES/GUIDES/GUIDE/ to each query below, which is silly.

title: NAME

body: DESCRIPTION

Set the name of the LibGuide to the node's title and the body of the node to its description. The description is the brief sentence which shows up underneath the name of a LibGuide.

field_friendly_url: FRIENDLY_URL

field_ugly_url: URL

Each <GUIDE> element has two URLs, so we map both of those to the two custom fields we set up on our Imported LibGuides content type. Once again, if you named your fields something different, their machine-readable names (which is what you see in this menu, they're just lowercase with underscores instead of spaces) will be different.

status: PUBLISH

Remember when we edited the LibGuides XML to set up a <PUBLISH> element that's either 0 or 1? That's where this mapping comes into play, taking that Boolean value and using it as Drupal's publication status field.

You can leave all the "Select the queries you would like to return raw XML or HTML" options unchecked. Note that this could provide some interesting options if you were doing more sophisticated things with LibGuides, since the XML export contains all the raw HTML of the various boxes in each guide. Debug Options can also be left unchecked, although if you're testing this process I recommend checking them off. The debug options show you what Drupal found with each XPath query, which can help you configure the importer properly.

I leave "Allow source configuration override" unchecked as well. Since we just set up our XPath queries the way we wanted, there's no need to override them later. However, you could do something interesting where you set up a generic LibGuides importer in these settings, then have multiple different ways of mapping the XML into nodes.

Redirecting Imported Nodes to LibGuides

Before we actually import our LibGuides, we want to make sure they're handled appropriately. That is, we don't want people clicking on their search results simply to see some lame text and URLs on the screen, we want them to be redirected straight to the LibGuide.

There are probably other ways to do this, for instance the Field Redirection module, but I use node templates, which are PHP templates that apply to only specific node types. Under the Templates folder of your theme (which will be somewhere in sites/all/themes likely) create a file named "node--imported-libguides.tpl.php" where "imported-libguides" is whatever you named your LibGuides content type but with hyphens replacing spaces. Inside that template, paste the following PHP:

<?php
// redirect user to LibGuide rather than node if user is not signed in
// uid 0 means anonymous user
if ( $user->uid == 0 ) {
  // prefer friendly URL if available
  if ( $node->field_friendly_url ) {
    drupal_goto( $node->field_friendly_url[ 'und' ][ 0 ][ 'value' ] );
  } else if ( $node->field_ugly_url ) {
    // ugly_url should always exist but just in case, use a conditional
    drupal_goto( $node->field_ugly_url[ 'und' ][ 0 ][ 'value' ] );
  }
} else {
  print render($content);
}
?>

I've written comments in the code, but essentially here's the path this code steps through:

  • Is the user anonymous? If yes, redirect them. If not, we assume the user is some kind of editor, so we print out the lame text fields. This makes it easier for librarians to edit nodes after they've been imported, but assumes that your users don't have Drupal accounts. If they do, you'll need to consider the first if condition thoroughly to make sure only the right types of users are seeing the plain text.
  • Does the node have a friendly URL? If so, redirect anonymous users to it.
  • If not, the node must have an ugly URL, redirect anonymous users to that.

I noted it above, but because it's so important: if you allow users to create Drupal accounts, this template won't work well. It won't expose confidential data or anything, but it's definitely meant for Drupal sites where all non-editor traffic is anonymous.

Your theme may also have a particular way of printing out nodes that you want to stick to; in that case, you'd be better off copying node.tpl.php or another node type template rather than using my code verbatim. You could put the logic piece of this code at the top of your node template, dropping the else clause at the end. That would work fine as long as it's named appropriately, e.g. "node--imported-libguides.tpl.php".

We're Almost There

Now that our template is set and our importer configured, we need to create an importer node, give it a file, and let it run wild. Go to {{drupal root}}/import to see a list of available importers, including the default ones that come with the Feeds module and your LibGuides Importer. Select LibGuides Importer and you're greeted with the usual node editing form, except this time there's a place to upload a file towards the top. Use that to browse to the processed LibGuides XML, then upload it. You can leave the body and other fields blank.

Once you've created this node, it will have an Import tab with an identically named button. Simply click that and your nodes should be created in Drupal, with whatever debug messages you chose in the importer displaying as well.

Totally screwed up the XPath queries, causing a bunch of broken and useless nodes to be imported? No worries, the importer node that you just created has a Delete items tab which can delete any of the nodes which it imported. This makes trying out a Feeds importer rather risk free; just keep trying until you get it right.

Final Steps

Drupal's internal search index will still need to index the new nodes before they show up in its results. You can run cron a few times depending on how many nodes you just added and they should show up. Try a search for the title of a LibGuide that wouldn't return any of your other pages, and make sure clicking on a LibGuide result from an anonymous session causes you to be redirected to the guide.

As LibGuides are added and removed, you'll have to sync them to their Drupal nodes again. However, once you've done the process once, it only takes a few minutes to grab a new XML export, upload it, and click the import button.

5 comments:

  1. very cool - we're going to use this as a template at the hsl.virginia.edu site - Thanks!

    ReplyDelete
  2. yeah man - just tested = it's great, and added a feeds tamper to do the search and replace (and used the field redirection module too... very cool)

    I'm going to do a small writeup with some pics - mostly just wanted to thank you though!

    ReplyDelete
  3. Hey man - finally got around to making a feature for this - aded a few more fields for search's sake, am glad to add any other fields people need
    https://drupal.org/sandbox/alibama/2103149

    ReplyDelete
  4. Great tutorial! I had to use same type of image style in slideshow and field collection. Also, I use colorbox to enlarge pictures. Thanks a lot. drupal

    ReplyDelete