Category Archives: Technical

CUDA – Four

I’ve been busy with other things, but I woke up early and decided to get some CUDA studying in. I did talk with the hiring manager for the position that I’m interested in, who (as I expected) clarified that I didn’t actually need to know CUDA for this position. I’m still interested, though I should focus more on the Leetcode-style exercises that are more likely to come up on the interivew.

That said, I haven’t been entirely ignoring this. I’ve been watching some 3Blue1Brown videos in my spare time, like this one on convolution. My calculus is definitely rusty (I don’t fully remember how to take an integral), but I’m mostly just trying to gain some intuition here so that I know what people are talking about if they say things like, “take a convolution”.

For today, I started by looking through the source of the sample code I got running last time. Thanks to the book I’ve been reading, a lot of the code makes sense and I feel like I can at least skim the code and understand what’s going on at a syntax level, for example:

__global__ void increment_kernel(int *g_data, int inc_value) {
  int idx = blockIdx.x * blockDim.x + threadIdx.x;
  g_data[idx] = g_data[idx] + inc_value;
}

Writing this mostly for my own understanding:

The __global identifier marks this as a Kernel – code that is called from the host but runs on the device. It takes in a pointer to an array g_data and an int inc_value. This kernel will be run for each element in the g_data array and each instance of the kernel will operate on the element calculated in idx. Each thread block of blockDim threads will have a unique blockIdx and each thread in that block will have a unique threadIdx. Since we are working on 1D data (i.e. a single array, and not a 2D or 3D array), we only care about the x property of each of these index variables. Then, we increment the value at index idx by the inc_value.

Ok, writing this up I think I have one question, which is about the .x property. The book explains that you can use the .x, .y, .z properties to easily split up 2D or 3D data, but also talks about ways to turn 2D or 3D data into a 1D representation. So are the .y, .z properties just “nice” because they allow us to leave 2D data as 2D, or do they actually allow us to do something that re-representing the 2D data as 1D data and just using .x doesn’t?

Ok, continuing on:

int main(int argc, char *argv[]) {
  int devID;
  cudaDeviceProp deviceProps;

  printf("[%s] - Starting...\n", argv[0]);

Start the main function and set up some variables, as well as letting the user know that we’re starting.


  // This will pick the best possible CUDA capable device
  devID = findCudaDevice(argc, (const char **)argv);

  // get device name
  checkCudaErrors(cudaGetDeviceProperties(&deviceProps, devID));
  printf("CUDA device [%s]\n", deviceProps.name);

Some questions here. What does it mean by “best”? Fortunately, the source for findCudaDevice is available to us. First it checks to see if a device is specified by command line flag, and if not, grabs the device “with “with highest Gflops/s”.

  int n = 16 * 1024 * 1024;
  int nbytes = n * sizeof(int);
  int value = 26;

  // allocate host memory
  int *a = 0;
  checkCudaErrors(cudaMallocHost((void **)&a, nbytes));
  memset(a, 0, nbytes);

Setting some variables first, but then we allocate some host memory. I was curious about cudaMallocHost. In the other examples I’d seen, host memory was usually created by just using malloc (or simply assumed to already be allocated, in the book). cudaMallocHost creates “pinned” memory, which is locked into RAM and is not allowed to swap. This allows us to use e.g. cudaMemcpy without the performance overhead of constantly checking to make sure that the host memory has not been swapped to disk.

I’m still not used to the C convention of handling errors via macros like checkCudaErrors instead of language constructs like try/catch or if (err != nil). It just feels like an obsolete way of doing error handling that’s easy to forget.

That’s all I had time for this morning, but it’s fun to understand more and more about this as I continue to learn!

CUDA – Three

I ran a CUDA program šŸ™‚

It was a rough experience šŸ™ƒ

Honestly, getting started with pretty much any programming language involves a lot of banging your head against the toolchain, and slowly untangling old tutorials that reference things that don’t exist anymore. Honestly, this was easier than some python setups I’ve done before.

I started with a pretty sparse windows installation. I keep my computers relatively clean and wipe them entirely about once a year, so all I had to start was VSCode and … that’s about it. I am lucky that I happen to already have a windows machine (named Maia) that has a GTX 2080, which supports CUDA.

I installed MSVC (the microsoft C++ compiler) and the NVIDIA toolkit.

Then I tried writing some C++, not even CUDA in VSCode and I couldn’t get it to compile. I kept getting the error that #include <iostream>was not valid. As I mentioned, I haven’t written C++ in about 10 years, so I knew that I was likely missing. I putzed around installing and poking various things. Eventually I switched out MSVC for MINGW (G++ for windows) and this allowed me to compile and run my “hello world” C++ code. Hooray!

Now I tried writing a .cu CUDA file. While NVIDIA provides an official extension for .cu files, and I had everything installed according to the CUDA quick start guide, But VSCode just did … nothing when I tried to run the .cu file with the C++ CUDA compiler selected. So I went off searching for other things to do.

Eventually I decided to install Visual Studio, which is basically a heavy version of VSCode and I don’t know why they named them the same thing except that giant corporations love to do that for whatever reason.

I got VS running and also downloaded Git (and then Github Desktop, since my CLI Git wasn’t reading my SSH keys for whatever reason.

Finally, I downloaded the CUDA-samples repo from NVIDIA’s Github, and it didn’t run – turns out that the CUDA Toolkit version number is hard-coded in two places in the config files, and it was 12.4 while I had version 12.5. But that was a quick fix, fortunately.

Finally, I was able to run one on my graphics card! I still haven’t *written* any CUDA, but I can at least run it if someone else writes it. My hope for tomorrow is to figure out the differences between my non-running project and their running project to put together a plan for actually writing some CUDA from scratch. Or maybe give up and just clone their project as a template!

 

CUDA – Two

I have an art sale coming up in three days, so Iā€™m spending most of my focus time finishing up the inventory for that. But in my spare time between holding the baby and helping my older kid sell lemonade, Iā€™ve started exploring a few of the topics Iā€™m interested in from the previous post.

Convolutions

Something I was reading mentioned convolutions, and I had no idea what that meant, so I tried to find out! I read several posts and articles, but the thing that made Convolutions click for me was a video by 3 Blue 1 Brown. The video has intuitive visualizations. Cheers to good technology and math communicators.

Sliding a kernel over data feels intuitive to me, and it looks like one of the cool things about this is that you can do this with extreme parallelism. Iā€™m pretty sure this is covered early on in the textbook, so Iā€™m not going to worry about understanding this completely yet.

It seems like convolutions are important for image processing, especially things like blur and edge detection, but also in being able to do feature detection – it allows us to search for a feature across an entire image, and not just in a specific location in an image.

One thing I donā€™t understand yet is how to build a convolution kernel for complicated feature detection. One of the articles I read mentioned that you could use feature detection convolution for something like eyes, which I assume requires a complicated kernel thatā€™s trained with ML techniques. But I donā€™t quite understand what that kernel would look like or how you would build it.

Parallel Processing

I started readingĀ Programming Massively Parallel Processors, and so far itā€™s just been the introduction. I did read it out loud to my newborn, so hopefully heā€™ll be a machine learning expert by the time heā€™s one.

Topics covered so far have been the idea of massive parallelism, the difference between CPU and GPU, and a formal definition of ā€œspeed upā€œ.

I do like that the book is focused on parallel programming andĀ not ML. It allows me to focus on just that one topic without needing to learn several other difficult concepts at the same time. I peeked ahead and saw a chapter on massively parallel radix sort, and the idea intrigues me.

Differentiation and Gradient Descent

Again, 3B1B had the best video on this topic that I could find. The key new idea here was that you can encode the weights of a neural network as an enormous vector, and then map that vector to a fitness score via a function. Finding the minimum of this function gives us the best neural network for whatever fitness evaluation method weā€™ve chosen. It hurts my brain a bit to think in that many dimensions, but I just need to get used to that if Iā€™m going to work with ML. I donā€™t fully understand what differentiation means in this context, but Iā€™m starting to get some of the general concept (we can see a ā€œgood directionā€ to move in).

I havenā€™t worked with gradients since Calc III in college, which was over a decade ago, but Iā€™ve done it once and I can do it again šŸ’Ŗ. It also looks like I need to understand the idea of total derivative versus partial derivative, which feels vaguely familiar.

Moving Forward

Once the art sale is over, Iā€™ll hopefully have more focus time for this šŸ™‚ For now, itā€™ll be bits and pieces here and there. For learning CUDA in particular, it looks like working through the textbook is going to be my best bet, so Iā€™m going to focus some energy there.

From Grand Rapids,
Erty

 

CUDA – One

First, some backstory. I was laid off from Google in January and Iā€™ve taken the last six months off, mostly working on art glass and taking care of my kids (one of whom was just born in April, and is sleeping on my chest as I write this). Iā€™m slowly starting to look for work again, with a target start date of early September 2024. If youā€™re hiring or know people who are, please check out my rĆ©sumĆ©.

A friend of mine recently let me know about a really interesting job opportunity, which will require working with code written in (with?) CUDA. The job is ML related, so Iā€™ll be focusing my learning in that direction.

I donā€™t know anything about CUDA. Time to learn! And, why not blog about the process as I go along.

First step: come up with some resources to help me learn. I googled something like ā€œlearn cudaā€ and found this Reddit post on the /r/MachineLearning subreddit. It looks like Iā€™ll probably be learning a couple of related topics as I go through this journey:

 

CUDA

This is the goal. It looks like CUDA is a language + toolkit for writing massively parallel programs on graphics cards, that arenā€™t necessarily for graphics. Basically, making the GPU compute whatever we want. If we use this for, say, matrix multiplications, we can accelerate training of ML models.

Python and C++

C++ ? I havenā€™t written C++ since college a decade ago. I think I remember some of it, but Iā€™ve always been intimidated by the size of the language, the number of ā€œcorrectā€ ways to write it, and the amount of magic introduced by macros. I also donā€™t like the whole .h / .cc thing, but I suppose Iā€™ll just have to get used to that.

Iā€™m pretty good at Python, having written several tens of thousands of lines of it at Google, so Iā€™m not super worried about that.

PyTorch or TensorFlow

Some folks on the Reddit post linked above recommend a specific tutorial on the PyTorch website, which looks interesting. It seems like PyTorch is a ML library written in Python (based on Torch, which was written in Lua).

PyTorch is Meta, now under Linux. TensorFlow is Google. Both use C++, Python, and CUDA.

Matrix Math

In college, I was only briefly introduced to matrix math, and most of that exposure was a graphics course that I audited. Based on my brief reading about all of this, it seems like the major advantage of using graphics cards to train ML is that they can do matrix mathĀ really, really fast.Ā Itā€™s up to me to brush up on this while I explore the other things. I donā€™t yet have a specific study plan for this.

Parallelism

According to redditor surge_cell in that previously linked thread, ā€œThere are three basic concepts – thread synchronization, shared memory and memory coalescing which CUDA coder should know in and out of [sic]ā€. Iā€™ve done some work with threading and parallelism, but not recently. Most of my work at Google was asynchronous, but I didnā€™t have to manage the threading and coalescing myself (e.g. async in JS)

Resources

Ok – so, what am I actually going to do?

I browsed some YouTube videos, but the ones that Iā€™ve watched so far have been pretty high level. It looks like NVIDIA has some CUDA training videos ā€¦ from 12 years ago. Iā€™m sure the language is quite different now. I also want deeper training than free YouTube videos will likely provide, so I need to identify resources to use that will give me a deep knowledge of the architecture, languages, and toolkits.

First, Iā€™ll try to do the Custom CUDA extensions for PyTorch tutorial. See how far I can get and make notes of what I get stuck on.

Second, One of the Reddit posts recommended a book called Programming Massively Parallel Processors by Hwu, Kirk, and Hajj, so I picked up a copy of that (4th Ed). Iā€™m going to start working through that. It looks like there are exercises so Iā€™ll be able to actually practice what Iā€™m applying, which will be fun.

Finally, Iā€™ll try implementing my own text prediction model in ML. I know you can do this cheaply by using something like šŸ¤— (aka HuggingFace) but the point here is to learn CUDA, and using someone elseā€™s pretrained model is not going to teach me CUDA. Iā€™m optimizing for learning, not for accurate or powerful models.

Questions

Thereā€™s a lot I donā€™t know, but here are my immediate questions.

  1. I have an NVIDIA card in my windows computer, but I donā€™t have a toolchain set up to write CUDA code for it. Iā€™m also not used to developing C++ on windows, so Iā€™ll need to figure out how to get that running as well. I have a feeling this wonā€™t be particularly tricky, itā€™ll just take time.
  2. I have a lot of unknown unknowns about CUDA – Iā€™m not even sure what I donā€™t know about it. I think Iā€™ll have more questions here as I get into the materials and textbooks.
  3. It seems like thereā€™s a few parts of ML with various difficulties. If you use a pretrained model, it seems pretty trivial (~20 lines of python) to make it do text prediction or what have you. But training the models is really, really difficult and involves getting a lot of training data. Or, perhaps not difficult, but expensive and time consuming. Designing the ML pipeline seems moderately difficult, and is probably where Iā€™ll spend most of my time. But I need to understand more about this.

Thatā€™s it for Day One

If youā€™re reading this and you see something Iā€™ve done wrong already, or know of a resource that helped you learn the tools that Iā€™m talking about here, please do reach out!

From Grand Rapids,
Erty

Cabal Package Installation Woes

tl;dr: Nuke ~/.ghc and then run cabal install --lib (every lib you need)

Edit: Since writing this post, there’s been some movement on the cabal bug, and it seems like there’s soon going to be a fix!

I’m trying to write a simple webserver based on Warp, but I ran into an issue with a hidden package. Here’s my imports in Server.hs:

{-# LANGUAGE OverloadedStrings #-}
import Network.Wai (Application, Response, rawPathInfo, responseFile, responseLBS)
import Network.HTTP.Types (status200, status404)
import Network.Wai.Handler.Warp (run)

And here’s the relevant part of my .cabal file:

executable server
  main-is: Server.hs
  build-depends:
    base >=4.12 && <4.13
    , wai
    , warp
  default-language:    Haskell2010

Note that http-types is missing, but we’ll come back to that at the end.

I’ll start by saying that I don’t fully understand the difference between cabal and stack, and at the beginning I decided to just use cabal and not worry about stack.

I ran cabal install wai warp and got the following error:

Resolving dependencies...
cabal: Cannot build the executables in the package wai because it does not
contain any executables. Check the .cabal file for the package and make sure
that it properly declares the components that you expect.
Cannot build the executables in the package warp because it does not contain
any executables. Check the .cabal file for the package and make sure that it
properly declares the components that you expect.

It turns out the solution to this is to append --lib and run cabal install --lib wai warp. (I wish it would say that in the warning though.)

I ran cabal install wai warp --lib and tried running Server.hs by pressing command+b in Sublime Text 3, but I ran into this error:

Could not load module ā€˜Network.HTTP.Typesā€™
    It is a member of the hidden package ā€˜http-types-0.12.3ā€™.
    You can run ā€˜:set -package http-typesā€™ to expose it.
    (Note: this unloads all the modules in the current scope.)
    Use -v to see a list of the files searched for.

I think what’s happening here is that http-types is installed, but not explicitly. Haskell wants me to definitely say that I want it, so I try running cabal install --lib http-types.

This, however, runs into a very frustrating error. Apparently the process library is required at two different versions in two different packages – despite already being happily installed as an indirect dependency:

cabal: Could not resolve dependencies:
[__0] trying: base-4.12.0.0/installed-4.1... (user goal)
[__1] trying: ghc-8.6.5/installed-8.6... (user goal)
[__2] next goal: process (user goal)
[__2] rejecting: process-1.6.8.2, process-1.6.8.1, process-1.6.8.0 (constraint
from user target requires ==1.6.7.0)
[__2] rejecting: process-1.6.7.0 (conflict: ghc =>
process==1.6.5.0/installed-1.6...)
[__2] rejecting: process-1.6.6.0, process-1.6.5.1,
process-1.6.5.0/installed-1.6..., process-1.6.5.0, process-1.6.4.0,
process-1.6.3.0, process-1.6.2.0, process-1.6.1.0, process-1.6.0.0,
process-1.5.0.0, process-1.4.3.0, process-1.4.2.0, process-1.4.1.0,
process-1.4.0.0, process-1.3.0.0, process-1.2.3.0, process-1.2.2.0,
process-1.2.1.0, process-1.2.0.0, process-1.1.0.2, process-1.1.0.1,
process-1.1.0.0, process-1.0.1.5, process-1.0.1.4, process-1.0.1.3,
process-1.0.1.2, process-1.0.1.1, process-1.0.0.0 (constraint from user target
requires ==1.6.7.0)
[__2] fail (backjumping, conflict set: ghc, process)
After searching the rest of the dependency tree exhaustively, these were the
goals I've had most trouble fulfilling: process, base, ghc

The solution to this is very frustrating, because even rolling back my git repo to the last known good commit didn’t fix it – it’s a global system problem (ironic, for Haskell, which is so demanding of “pureness” in the language). I deleted ~/.ghc and ran the install again:

rm -rf ~/.ghc && cabal install --lib wai warp http-types

And it worked! My server runs šŸ™‚

The problem is now, I want to build some tests, so I run cabal install --lib hspec and I run into the same “could not resolve dependencies” as above!

Hm, let’s see if just a rm -rf ~/.ghc && cabal install will fix it, if I declare hspec in build-depends in my .cabal file? I get the following error:

cabal: Path '/Users/erty/.cabal/bin/server' already exists. Use
--overwrite-policy=always to overwrite.

So let’s try the suggestion and run --overwrite-policy=always. Infuriatingly, this build succeeds but when I try to actually run Server.hs (by pressing cmd+b in Sublime Text, perhaps that’s missing a flag or something? I wonder if cabal install builds a binary but fails to install the libraries) it fails to find any of my modules:

Could not find module ā€˜Network.Waiā€™
Could not find module ā€˜Network.HTTP.Typesā€™
Could not find module ā€˜Network.Wai.Handler.Warpā€™

Let’s try rm -rf ~/.ghc && cabal install --lib, since adding --lib worked before. First, I also added http-types to my build-depends in the .cabal file. Nope:

Resolving dependencies...
cabal: Cannot build the libraries in the package crossword-hs because it does
not contain any libraries. Check the .cabal file for the package and make sure
that it properly declares the components that you expect.

But! We were able to get it working by listing all of the dependencies explicitly during the install phase. So let’s try that and run rm -rf ~/.ghc && cabal install --lib wai warp http-types hspec:

Works! The problem is that I have to remove ~/.ghc manually list out all of my deps every time I want to install something, but at least I can move forward for now.

I also added http-types to my cabal file, but it didn’t seem to really matter for running in sublime text, as long as I’d installed it via cabal install --lib.

I would love to hear from any more experienced haskellers out there if I’m not understanding something about cabal. Specifically, coming from node, I feel like cabal install (or even with --lib should “just work” and install all of the deps I’ve listed in the .cabal file.

Hopefully this writeup saves someone else time šŸ™‚

Radish Cache

After coming across a (now deleted) answer on StackOverflow, I took some time to find instances of “redis” (a popular caching program) misspelled as “radish”. I think autocorrect is likely the culprit.

I’m posting these here because I think that this is a wonderfully innocent error and not to shame the people involved. Imagining someone refreshing their cache of small red vegetables brings joy to my heart and I hope it does to yours as well.

The Original

This came up in an edit queue, but I didn’t have the heart to edit it. I now suggest this as the solution to most of our technical problems at work, much to my coworkers annoyance.

From Harvard

This one is in an article published by Harvard, so you know it’s legit.

LinkedIn

This person has radish cache on their LinkedIn profile, so you know they’re an expert. (Last item)

Please go to my LinkedIn and recommend me for radish cache. I’d like to add you to my persimmon network.

Speeding up Magneto

Vegetables are part of a healthy diet. Varnish, not so much.

IRL

Unfortunately, the only IRL radish cache has been removed and no longer exists.

Conclusion

Remember, if you have a problem, flush your radish cache!

a e s t h e t i c šŸ’« g e n e r a t o r

Update: the a e s t h e t i c g e n e r a t o r is now online.

 

Ryan McVerry, Per-Andre Stromhaug, and I got together this weekend to do a mini-hackathon. We’d been inspired by a Chicago-based artist on Tumblr named Galactic Castle.Ā Specifically, this image:

Our goal, was to write python to generate images that looked like Galactic Castle’s. We used the Pillow python library for the most part, messing around with 256×256 arrays of integers, limited to 10 or so colors.

We experimented with a few different rendering techniques, eventually settling on a layered technique. I’ll start with some examples of the finished product, and then a quick walk through some of the fun images we generated along the way.Ā You can run it yourself at https://aesthetic-landscape.herokuapp.com/.

Here are a few shots from the process along the way.

This is one of the first renders we saved. You can see most of the elements coming together, and the dithering in the sky is already in place.

Experimenting with color palettes:

I messed up the color palette:

Getting reflections working. At this point I was doing two things that I eventually stopped doing: reflecting the actual colors (instead, we just use a single color to represent any reflection) and at this point we weren’t doing any sort of layering, so the reflections were crude and unaware of what they were reflecting. In the latest version we use a different algorithm for things above the horizon (mountains, moon) and things in the water (rocks/land).

Better reflections, added a moon.

Re-worked the rocks code. We were originally using polygons that were then filled by PIL. The new code instead generates an enormous int buffer and we manually fill in from the edge. The rocks at the base of the land spits are just handled by keeping track of a number that grows and shrinks, and switching colors when we reach that threshold.

We were originally working at 512×512 (scaling up x4 for the final image), but eventually realized that Galactic Castle works at about 150×150, so we scaled down. The resulting pixelation is much more pleasing. Added trees and improved the mountain cross-hatching.

I hope you enjoy this! You can see it for yourself at https://erty.me/aesthetic.

In 2024, I rewrote it in Typescript so that it runs fully in the browser (instead of a thin frontend on a Python backend). Code here šŸ™‚

We Built an Arcade Cabinet in a Weekend!

We built an arcade cabinet! In 72 hours!

The team:

Project Lead
ERTY SEIDOHL
Software Engineers
RYAN MCVERRY
MAX FELDKAMP
Hardware Engineer
BEN GOODING
Music and Sound
EVAN CONWAY
Cabinet Construction
ERIC VAN DER HEIDE
MATT GOLON
Cabinet Art
HALEY WHITE-BALLOWE

Finished image first šŸ™‚

We had the idea to build an arcade cabinet several months ago, and by the time Ludum Dare 38 rolled around, we had the time and space to make it happen! We met a few times beforehand toĀ test out ideas and hardware, and parts started arriving in early April:

We decided to use the LƖVE framework on Raspberry Pi 2-Bs, via the PiLove raspbian image.

Ideas started flying on Friday at 7pm Mountain, though, when we all got together to create the game. Sadly, our game idea about slipper and flipper the penguins didn’t make it to the final round.

We quickly began to mock out the cabinet design, and started building the game!

Our first prototypes were… okay šŸ™‚

But things began to come together…

The cabinet really began to come together! We thrifted the TV for about $30. The speakers were old computer speakers we had lying around, and the coin door we purchased on Amazon for about $40.

More construction pics:

Haley (our artist) couldn’t join us until day 3, but she did an amazing job!

The yellow paint was a little… weird. Ah well :\

Finally, we had the whole thing rigged up! Time to plug everything in and turn it on!

A few touchups on the title…

And, about 15 minutes before the deadline, we had a functioning, working arcade game!

Here’s a quick video of the gameplay:

 

 
YAY!

We ran into a major issue right at the end, where (to get technical for a moment), our audio would cut out from time to time and just stop working. It seemed like an issue with pulseaudio. The way we solved it was to kill pulseaudio on startup. And for some reason that worked? So, I have no idea what’s playing the sound, but whatever it is crashes less.
 

We also had a wonderful “blur” effect that pulsed in time with the music, but it turned out to be too intensive for the raspberry pi, so we had to turn it off in the end. We tried doing it with shaders (way too intensive) and we tried doing it with just drawing translucent, larger versions of everything (just a little too intensive) and finally turned it off. Ludum Dare definitely requires you to just cut things if they’re not going to work, no matter how much time you put into them!

I’m going to attempt to get a copy of the game working so you can play it online by Friday, but no promises!

Dictionaries and Word Lists for Programmers

I love to play with words, and I especially love to play with words programatically. I’ve written three small apps (so far) which use some form of a dictionary to create readable, humorous text:

I’ve had some people ask, so here are some great resources that I’ve found while building these apps.

Dictionaries

/usr/share/dict/words (~235k words)
Available on any *nix system, this word list is a local way to check for words using a simple grep. You can also read the file in as long as you have permission to do so. Won’t work well if you’re trying to write something for the internet or windows.
Most of these dictionaries are licensed very freely, but you should check on your own system. Versions of this are available online, e.g. the FreeBSD version at Ā https://svnweb.freebsd.org/csrg/share/dict/Ā (click “words”)

GCIDE (~185k words)
http://www.ibiblio.org/webster/ orĀ http://gcide.gnu.org.ua/download
This dictionary contains words and definitions. Very useful if you actually want to look up the words you are using. Sources for the definitions are available as well. There are two versions – GCIDE which comes in a strange format and needs its own reader software, and GCIDE-XML.Ā Licensed under GNU.

SCOWL and friends (variable word count)
http://wordlist.aspell.net/
A very complete set of wordlists, used for the aspell spell checker. My favorite part is the customizable interface where you can create your own custom dictionary. Many links and different dictionaries are available on this page, including some with part-of-speech and/or inflectionĀ data. Be aware: manyĀ versions of SCOWL contain swears and racial slurs.
Variable licensure, but all are released for private or commercial use as long as youĀ maintain a version of the license.

CMU Pronouncing Dictionary (~134k words)
http://www.speech.cs.cmu.edu/cgi-bin/cmudict
Contains not only words but their phonemes, meaning this is a great dictionary for text-to-speech, rhyming, and syllable counting. There are CMUDict libraries for node/browser, justĀ node, python, and many other languages (send links please).
The file is copyright Carnegie Mellon, but is unrestricted for personal or commercial use.

Corpora (Lists of ~1k words)
https://github.com/dariusk/corpora
A really neat set of lists, broken down by category (e.g. /data/foods/beer_styles.json). A great starting resource for small projects that don’t need an extensive dictionary of the English language. Licensed under CC0 (no copyright).

Wordnik Developer (Web API)
http://developer.wordnik.com/
A powerful web API. From the site: “request definitions, example sentences, spelling suggestions, related words like synonyms and antonyms, phrases containing a given word, word autocompletion, random words, words of the day, and much more.”
Free 15k calls per hour, licensed for anything that isn’t a direct clone of Wordnik itself.

(Please, send any more dictionaries you know of my way and I’ll add them to the list!)

Libraries

FastTagĀ / jsPos
https://github.com/mark-watson/fasttag_v2Ā (java) and https://code.google.com/p/jspos/Ā (js)
Java and Javascript libraries to tag parts of speech in words. Very handy if you’re doing any sort of lexical generation.
Licensed under LGPL3 or Apache 2 licenses

(Please, send more libraries my way if you know of them!)

People / Blogs

Peter Norvig
Probably my greatest source of inspiration on this front is Peter Norvig’s How to Write a Spelling CorrectorĀ (python). He shows that you don’t need any sort of fancy tooling or arcane knowledge to write something that at first seems complex – just don’t be afraid of making the computer do a lot of work for you. That’s what they do – they do work really, really fast. (see, for example, this scrabble solver)

Allison Parrish
Allison plays with words in amazing ways. She isĀ the brains behind @everyword. Her website and research are full of great inspiration for playing with words and experimenting with language.

(please, send anyone doing interesting things with words my way and I’ll add them to the list!)