ProfanityFilter Documentation¶
ProfanityFilter is a universal python library for detecting and filtering profanity in text.
Quick Start¶
from profanityfilter import ProfanityFilter
pf = ProfanityFilter()
pf.censor("That's bullshit!")
> "That's ********!"
pf.set_censor("@")
pf.censor("That's bullshit!")
> "That's @@@@@@@@!"
pf.define_words(["icecream", "choco"])
pf.censor("I love icecream and choco!")
> "I love ******** and *****"
pf.is_clean("That's awesome!")
> True
pf.is_clean("That's bullshit!")
> False
pf.is_profane("Profane shit is not good")
> True
pf_custom = ProfanityFilter(custom_censor_list=["chocolate", "orange"])
pf_custom.censor("Fuck orange chocolates")
> "Fuck ****** **********"
pf_extended = ProfanityFilter(extra_censor_list=["chocolate", "orange"])
pf_extended.censor("Fuck orange chocolates")
> "**** ****** **********"
ProfanityFilter also comes with a simple command line utility. profanityfilter -h
for more details.
Installing ProfanityFilter¶
You can install profanityfilter using pip.
> pip install profanityfilter
Using a Custom List¶
You can use a custom list of bad words in profanityfilter the following two ways:
During instantiation:
from profanityfilter import ProfanityFilter
with open("custom_words.txt", "r") as f:
custom_list = [l.replace("\n", "") for l in f.readlines()]
pf_custom = ProfanityFilter(custom_censor_list = custom_list)
After instantiation:
from profanityfilter import ProfanityFilter
pf_custom = ProfanityFilter()
with open("custom_words.txt", "r") as f:
custom_list = [l.replace("\n", "") for l in f.readlines()]
pf.define_words(custom_list)
Note
Using a custom censor list means that profanityfilter will not use the default censor list. If you’re looking for a way to add additional bad words to the list, see Adding Bad Words
Adding Bad Words¶
You can use an additional list of words in conjunction with the default words in profanityfilter.
During instantiation:
from profanityfilter import ProfanityFilter
with open("more_words.txt", "r") as f:
more_words = [l.replace("\n", "") for l in f.readlines()]
pf = ProfanityFilter(extra_censor_list = more_words)
After instantiation:
from profanityfilter import ProfanityFilter
pf = ProfanityFilter()
with open("more_words.txt", "r") as f:
more_words = [l.replace("\n", "") for l in f.readlines()]
pf.append_words(more_words)
Words Boundaries¶
By default, profanityfilter applies word boundries to bad words during censoring. Therefore, the default behaviour of profanityfilter is to ignore bad words inside words.
Example:
from profanityfilter import ProfanityFilter
pf = ProfanityFilter()
pf.censor("My username is fuckyouusername@bitch.com")
> "My username is fuckyouusername@*****.com"
To avoid this behaviour you can pass a no_word_boundaries keyword to ProfanityFilter telling it to detect bad words inside words.
from profanityfilter import ProfanityFilter
pf = ProfanityFilter(no_word_boundaries = True)
pf.censor("My username is fuckyouusername")
> "My username is ****youusername@*****.com"
Note
Without word boundaries there’s a risk of censoring non-harmful words. For example fun is censored to **n when fu is in the censor list. To avoid this behavior, you can either come up with custom censor lists or use different instances of profanityfilter accordingly.