Buzzword spell checking internals
During my “Building Buzzword” talk at Flex Camp Boston on Friday, there was an audience question about how spell checking worked in Buzzword. I built most of the spell check feature in Buzzword, but it hadn’t occurred to me to include it in the talk, so I was pretty excited to get a chance to talk about it briefly. So excited, in fact, that I thought it would be worth it to write an expanded version here.
How it works. At its heart, spell checking in Buzzword is a very simple and very familiar feature: we spell check the document as you type, and we flag each word that doesn’t appear in the dictionary. A potentially misspelled word is flagged visually with a red underline, and you can click on the underline to see suggestions, or to mark the word as correct. To provide “as-you-type” checking, we are looking up one or more words in the dictionary on every keystroke, and we’re doing it synchronously, so the dictionary lookups have to be fast. We are scanning forward and backwards from the selection to figure out which words have become “dirty” as a result of each typed character or editing action, and we are suppressing the red underline in the word currently being edited. When a word is flagged, it is marked in the document as having a “flagged” style, which is stored persistently in the document just like all the other text styles (bold, italic, etc.).
Grant Skinner, SPL. We got a leg up from Grant Skinner’s Spelling Plus Library (SPL), which he was developing around the same time that we were implementing the spell checking feature in Buzzword. He incorporated some of our requirements into his design, which was helpful and rewarding. His library does a lot more than what we use it for: we use only the low-level SpellingDictionary interface that allows us to look up a lot of words very quickly, but for more typical Flex projects, it plugs into off-the-shelf components very nicely to provide automatic as-you-type spell checking, red underlines, etc.
Foreground vs. background spell checking. In a typical desktop word-processing application, spell checking is done on a separate thread from the main user interface thread. You can see this in Microsoft Word just by typing fast: spell checking gets behind, then catches up after a while, typically when the user is idle for a moment. This is a great solution to the problem of doing some potentially computationally intensive work without getting in the user’s way. Buzzword needs to do as little work as possible on each keystroke, to keep the user interface responsive, but we don’t have the luxury of using a background thread, because Flash provides only one thread on which ActionScript can execute. (There are ways to do background processing using timer and frameEnter events, but you have to carefully manage contention between those event handlers and the “main thread” yourself.) In Buzzword, we do all the spell checking work synchronously. We considered doing it in the background, keeping a list of dirty regions of the document, and working through the dirty list on timer events while the user is idle, but it would have added a great deal of complexity and indeterminacy to the design, so we decided to try to do it all in the foreground and see how well it worked. It turned out to be an adequate approach.
Collaborative spell checking. Any spell checking software needs to allow a user to add words to the built-in dictionary. Buzzword is designed with the assumption that most documents will have multiple authors. So we had to consider what happens when two different users, with different user dictionaries, work on the same document. We started from a couple of principles. The first is that when one author completely spell checks a document (i.e., gets rid of all the red underlines), it should appear completely spell checked to all other authors no matter what they have in their user dictionaries. The second principle is that an author with, let’s say, creative ideas about spelling, who puts misspelled words into his own dictionary, should not be able to pollute other authors’ dictionaries. That is, a bad speller should only be able to pollute one document at a time. From those design constraints, we ended up with two different kinds of dictionaries: a user dictionary, which is stored with a given user’s preferences, and applied to every document that that user works on as an author; and a document dictionary, which is stored in the document, and is applied to that document for all authors who are working on it. So when a given user is working on a document, their user dictionary is merged with the document’s dictionary for the purpose of spell checking during that document editing session. When checking a flagged word, the author has the option to mark it as correct for all documents, in which case the word is added to both the user dictionary and the document dictionary, or to mark it as correct for only this document, in which case the word is added only to the document dictionary. If you work through all the possibilities you’ll see that this design meets the two principles described above.
Implementation challenges: background dictionary loading. There were a number of really interesting challenges in making spell checking work well in Buzzword. One of the things we do to make sure that Buzzword starts up as quickly as possible is to load and decompress the main spell check dictionary in the background. (Actually, the background loading and decompressing happens inside the SPL library, so any user of that product gets the benefit.) However, loading the dictionary in the background has a downside: it creates a race condition between the loading of the dictionary and the editing of a document. It’s not hard for an agile user to do about five or ten seconds of work on a document before the dictionary is fully available to be used for spell checking. Making the user wait five or ten seconds would be bad. Normally we only recheck the flagged words when the document opens, but in this particular case, since the user was allowed to edit for a while without any “as you type” checking, we have to rescan the whole document as soon as the dictionary loads.
Implementation challenges: edit baton passing. Buzzword’s support for multiple simultaneous editing is limited right now to one at a time: the author who can edit the document has the editing “baton,” no one else can edit until they get the baton, and the baton is automatically released whenever the user saves the document (either manually or in the background). One interesting case is when a user wants to modify the document dictionary as a result of spell checking running, but they can’t get the edit baton at that moment. This tends to happen when a document is opened by an author (call them Author 2) whose user dictionary is substantially different from another author’s dictionary, and that other author (call them Author 1) has the baton at that moment. The result is that Author 2 would like to add some words to the document dictionary, but they have to queue up the operation until they can get the baton to do it.
Other languages. Right now, Buzzword’s spell checker is U.S. English only. We did it this way in order to get the feature out into the hands of the most number of users as quickly as possible. When we do multilingual spell checking, we’ll have some very interesting challenges to solve. It’s easy enough to get dictionaries for other languages, but the hard part is making it work collaboratively. It’s prohibitive to download all possible dictionaries, so each user will have to specify their preferred dictionary(ies), each document will have to keep track of which dictionaries it requires, and we might even have to have a notion of per-language user dictionaries so that I don’t end up with a document with some words erroneously marked as correct because I happened to have them in my user dictionary for some other language.
There’s actually a lot more to say about spell checking, but I think this is a good stopping place. Spell check was definitely one of the most challenging features I worked on in Buzzword, and it’s an interesting case study of working within a lot of competing constraints to produce something that works well enough even if it’s not perfect.
Technorati Tags: Buzzword, Flex, Spell checking, Word processor




Very interesting. Do you have any long term plans of implementing a grammar-check as well, like those found in MS’s offerings? Along those same lines, could there ever be an Auto-Correct feature?
Nice, it will help the non-english speakers to write what they want in english
I second the question about an auto-correct feature being added to buzzword. Any chance of this happening?