Implementing a Search Feature in Wyam

Post Image
12/26/2016

I have to implement searching on this site as it is one of the fundamental features a content-filled site should have. Fortunately, Wyam has support for this feature. It is packaged in a separate Nuget package named Wyam.SearchIndex and you must reference it in your config.wyam file in order to use its capabilities.

#n Wyam.SearchIndex

How does Wyam implement searching? Well, it taps the power of another library named http://lunrjs.com/. Lunr.js gives full-text search engine capabilities to your site. It does the searching in the client side. There are a couple of pieces you need to provide it for it to do its job. For lunr.js to search your content, you need to create an index of them. I won't go into the details but basically, the index file contains an index for each document or page in your site, categorizes them into fields such as title, description, and actual content, and condenses the words to remove duplicates and only retain the keywords in each content page. You also specify a list of words, called stop words, which represent words that are not treated as keywords such as common words like "the", "a", "I".

With the Wyam.SearchIndex package, you can generate this index needed by lunr.js. With Wyam.SearchIndex, we can use the SearchIndex module. Before you can use that though, it expects an entry in the documents' metadata named "SearchIndexItem" and it must contain an instance of Wyam.SearchIndex.SearchIndexItem. In my site, I only need to index my blog posts and not other pages like the about page. I have a separate Wyam pipeline for processing my posts. Notice the use of the Meta module to add the "SearchIndexItem" metadata.

Pipelines.Add("Posts",
    ReadFiles("post/*.md"),
    FrontMatter(Yaml()),
    Markdown(),
    Meta("SearchIndexItem", new SearchIndexItem("/" + @doc.String("RelativeFilePathBase"), @doc.String("Title"), @doc.Content)
            { Description = @doc.String("Description"), Tags = @doc.String("Tags") }),
    Excerpt(),
    Razor(),
    WriteFiles(".html")
);

The Meta module is used to add a metadata to the document. We instantiate a new SearchIndexItem object passing the URL, title, and content in the constructor, and additional fields using object initialization syntax. The information we passed will be part of the index. Note that we are doing this for each document. This alone will not create the actual index. To actually create it, I added another Wyam pipeline.

Pipelines.Add("SearchIndex",
	Documents("Posts"),
	SearchIndex((FilePath)"stopwords.txt"),
	WriteFiles((doc,ctx) => string.IsNullOrEmpty(doc.Content) ? null : "searchindex.js").UseWriteMetadata(false)
);

This pipeline gets all the documents outputted by the "Post" pipeline we had earlier, then call the SearchIndex module to process them. The single parameter I passed to it is a file path to a text file containing a list of stop words. As I said earlier, stop words are words that you want to be included in the index as they are too common and the user will probably not search those words. I created a file named stopwords.txt which I just copied from the https://github.com/Wyamio/Wyam/blob/develop/examples/SearchIndex/input/stopwords.txt.

SearchIndex will look for the contents of "SearchIndexItem" metadata for each input document and combine it to produce one output document containing the aggregated index. Finally, I call WriteFiles to write the index into a javascript file that can be referenced in our page.

Honestly, I can see this file getting bigger and bigger as I add new content so the initial load when loading the search page can be slow. The only alternative I can see is to use server side full-text search service like Apache SolR. This https://github.com/Wyamio/Wyam/pull/118 from Wyam's Github suggests that they are planning to make loading of the index file asynchronous.

Now that we made the generation of our search index a part of our Wyam pipeline, we just need to make the actual search page. I created search.cshtml and added links to the lunr.js and search index files.

<script src="/libs/lunr/lunr.min.js"></script>
<script src="/searchindex.js"></script>

Then I added some jQuery initialization script to make our search box respond to input events.

$(function () {
    var searchBox = $("#searchbox");

    searchBox.on('input propertychange paste', function () {
        runQuery(searchBox.val());
    });

    var q = getQueryParam("q");

    if (q) {
        q = decodeURIComponent(q);
        searchBox.val(q);
        runQuery(q);
    }
});

When you type something in the search box, the runQuery function is invoked passing the value that we typed in the search box. There's also an alternative way to search which is getting the value from the query string. The runQuery function does all the work to call the lunr.js API.

function runQuery(query) {
    var searchQuery = $("#searchQuery"),
        searchResults = $("#searchResults");

    searchResults.empty();
    searchQuery.html(query);

    if (query.length < 2) {
        return;
    }

    var results = searchModule.search(query);

    if (results.length == 0) {
        searchResults.append("<p><b>No results found for query '" + query + "'</b></p>")
    }
    else {
        searchResults.append("<p>Number of matching posts: <b>" + results.length + "</b></p>");

        var listHtml = "<ul>";

        for (var i = 0; i < results.length; ++i) {
            var res = results[i];
            listHtml += "<li><a href='" + res.url + "'>" + res.title + "</a><p>" + res.description + "</p></li>";
        }

        listHtml += "</ul>"

        searchResults.append(listHtml);
    }
}

We call searchModule.search passing in the search query. This will return an array of results. If we don't find a single result, then we notify the user by appending the message in the search results container. Otherwise, we enumerate each result and display the title and description of each result. The title and description came from the fields that we specify in the SearchIndexItem initialization inside our Wyam pipeline. You can see the full code of search.cshtml https://github.com/randalvance/RandalVance.Website/blob/master/input/search.cshtml.

That's about it. Searching now works. You can try by typing something in the search box of this site found below the header. Press enter and you will be redirected to the search page with your input as part of the query string. Inside the search page, type something and it will try to search the index without even contacting the server since the index is loaded in your browser as a javascript file.