Startup idea: Swear words API

27.02.2022

How to (maybe - prove it yourself) make money off corporate family-friendliness.

My Facebook feed consists of only two things: job postings and ads. Today piece isn’t about career, so it’s easy to guess they made me click on that nice, personalizable, Swiss-made watch.

After giving my timepiece a unique graphic design (try it yourself - much fun!), a final touch was to put some custom text to be engraved on the watch. First word out of my mind?

kurwa

No good! This most sophisticated profanity was rejected by the online creator. Quick substitution with Greek Small Letter Alpha, hovewer, was enough to defeat the validation. Almost the same looking, definitely equal sounding, still vulgar - kurwα was my not-even-script-kiddie-level workaround.

Apparently, the check for inappropriate language is based on a rather big regex of known bad words. You’ll get an idea from the below picture:

Sample of forbidden words list defined on swatch.com

I’m really proud of the polish ones taking up a significant part of the whole! Further diving into the list is as educational as entertaining:

Imagine how ridiculous it must have been to propose, discuss and implement this validation. “Why is Mike not at the computer?” “He’s off to library, in search of Bulgarian obscenities” “Well, tell him later I created a new Jira ticket - add »buttpirate« to the list”. I haven’t mentioned testing, because it clearly wasn’t checked by someone aware of Zero Width Space or the fact there are several characters looking just like like “o”.

It’s not how you’re supposed to handle censorship in 2022! The downsides of the current approach are:

The above case led me to thinking, maybe there’s a market for online service that could be described as “Swear words API”. Customization is hot in e-commerce, but it must be kept polite - no company wants to allow designs/wording that would undermine their brand.

I’m giving away the idea for free, along with the following API draft, simply because I don’t know how to implement it. My best guess is: render whatever the client sent, and let some low-paid human native speaker decide whether it’s offensive. Rather unacceptable response time.

paths:
  /rate:
    get:
      produces:
      - "application/json"
      parameters:
      - in: "query"
        name: "term"
        type: "string"
        required: true
      responses:
        "200":
          description: "OK"
          schema:
            type: "object"
            properties:
              score:
                description: "Severity of the term, on a scale from 0 (safe) to 1 (definitely a swear)."
                type: "number"
                minimum: 0
                maximum: 1
              isOriginalStopWord:
                description: "Whether the term comes from the lame, pre-API regex."
                type: "boolean"

In the end, I didn’t buy. If you stumbled upon this article and got rich by turning this post into working solution, please consider ordering me that watch. The decoration isn’t going to be as important as the inscription, and my word of choice is “SODOMIZER”. Thanks!