Visualizing Part-of-Speech Tags with NLTK and SpaCy

In this tutorial, we will develop a function to visualize part-of-speech (POS) tags with NLTK and SpaCy.

The resulting function will turn this

into this:

Motivation

POS tagging is a technique used in Natural Language Processing. It categorizes the tokens in a text as nouns, verbs, adjectives, and so on. In Python, you can use the NLTK library for this purpose.

import nltk  
from nltk import word_tokenizetext = "This is one simple example."tokens = word_tokenize(text)  
tags = nltk.pos_tag(tokens, tagset = "universal")

In the above code snipped the example text = "This is one simple example." is first tokenized (This, is, one, simple, example, and .) with the word_tokenize() function. Then the tokens are POS tagged with the function pos_tag(). For this example, we’ll use the tagset = "universal" because it uses a more general tagset in contrast to the default tagset, which provides more detailed tags.

Below you can see the POS-tagged tokens of the example sentence.

POS-tagged example sentence (Image by the author)

SpaCy comes with a visualizer called displaCy. E.g., you can render the POS tags and syntactic dependencies as follows with style = "dep".

import spacy  
from spacy import displacynlp = spacy.load("en_core_web_sm")  
doc = nlp(text)displacy.render(doc, style = "dep")

Visualized dependencies of example sentence with displaCy (Image by the author)

A more colorful option is to highlight named entities and their labels with style = "ent".

displacy.render(doc, style = "ent")

Visualized entities of example sentence with displaCy (Image by the author)

Unfortunately, the style = "dep" option does not utilize any color to visualize the POS tags and the style = "ent" does not visualize the POS tags. Therefore, we’ll develop a function to highlight the POS tags similarly to the entity highlighting of SpaCy with the help of NLTK.

Developing the Visualization Function

In this section, we will develop the visualization function in two simple steps:

Customizing the Display Options
Filling the Entity Dictionary

Customizing the Display Options

Although displaCy’s named entity highlighting does not highlight POS tags out-of-the-box, you can customize what it should highlight.

You can also use displaCy to manually render data. […] If you set manual=True on either render() or serve(), you can pass in data in displaCy’s format as a dictionary (instead of Doc objects). – [2]

from spacy import displacydisplacy.render(doc,   
                style = "ent",   
                options = options,   
                manual = True)

The entity visualizer lets you customize the following options:
ents Entity types to highlight.
colors Color overrides. Entity types should be mapped to color names or values. – [2]

In the case of this example, the entity types to highlight will be the different POS tags. We will use tagset = "universal". This tag set consists of the following 12 coarse tags: [1]

VERB – verbs (all tenses and modes)
NOUN – nouns (common and proper)
PRON – pronouns
ADJ – adjectives
ADV – adverbs
ADP – adpositions (prepositions and postpositions)
CONJ – conjunctions
DET – determiners
NUM – cardinal numbers
PRT – particles or other function words
X – other: foreign words, typos, abbreviations
. – punctuation

We will use all POS tags with the exception of “X” and “.”, so that the option’s ents and colors look like this.

pos_tags = ["PRON", "VERB", "NOUN", "ADJ", "ADP", "ADV", "CONJ", "DET", "NUM", "PRT"]colors = {"PRON": "blueviolet",  
          "VERB": "lightpink",  
          "NOUN": "turquoise",  
          "ADJ" : "lime",  
          "ADP" : "khaki",  
          "ADV" : "orange",  
          "CONJ" : "cornflowerblue",  
          "DET" : "forestgreen",  
          "NUM" : "salmon",  
          "PRT" : "yellow"}options = {"ents": pos_tags, "colors": colors}

In this section, you can decide which tags you want to use and customize the colors.

Filling the Entity Dictionary

Next, we need to define the doc.

doc = {"text" : text, "ents" : ents}

While the "text" simply is the text we want to visualize, the "ents" is a dictionary of each entity to highlight.

For every entity, we need to define the start and end index in the text. Also, we need to define the entity’s label, which is the POS tag in our case.

Let’s begin with tokenizing the text and POS tagging the tokens. In contrast to the code snippet in the “Motivation” section, we will use the TreebankWordTokenizer instead of the word_tokenize() function. The reason for that is that the TreebankWordTokenizer offers more flexibility, which we’ll need in a minute.

import nltk  
from nltk.tokenize import TreebankWordTokenizer as twt# Tokenize text and pos tag each token  
tokens = twt().tokenize(text)  
tags = nltk.pos_tag(tokens, tagset = "universal")

The POS tagged tokens tags look like this:

[('This', 'DET'),  
 ('is', 'VERB'),  
 ('one', 'NUM'),  
 ('simple', 'ADJ'),  
 ('example', 'NOUN'),  
 ('.', '.')]

As mentioned above, the TreebankWordTokenizer offers a function to get the spans for each token, which we need for the "ents" dictionary.

# Get start and end index (span) for each token  
span_generator = twt().span_tokenize(text)  
spans = [span for span in span_generator]

The spans look like this:

# text = "This is one simple example."  
[(0, 4), (5, 7), (8, 11), (12, 18), (19, 26), (26, 27)]

Now that we have the tags and the spans, we can fill the "ents" dictionary.

# Create dictionary with start index, end index, pos_tag for each token  
ents = []  
for tag, span in zip(tags, spans):  
    if tag[1] in pos_tags:  
        ents.append({"start" : span[0],   
                     "end" : span[1],   
                     "label" : tag[1] })

And that’s it!

Results and Conclusion

In this tutorial, we developed a short function to visualize POS tags with NLTK and SpaCy.

The full function is shown below:

Let’s plot a few examples:

visualize_pos("Call me Ishmael.")

![python](images/visualizing-part-of-speech-tags-with-nltk-and-spacy-image-6.webp "python”)

visualize_pos("It was a bright cold day in April, and the clocks were striking thirteen.")

![python](images/visualizing-part-of-speech-tags-with-nltk-and-spacy-image-7.webp "python”)

visualize_pos("The train leaves here at 9AM.")

visualizing part of speech tags with nltk and spacy image 8

References

[1] “NLTK”, “Source code for nltk.tag.mapping”. nltk.org. https://www.nltk.org/_modules/nltk/tag/mapping.html (accessed August 2, 2022)

[2] “spaCy”, “Visualizers”. spacy.io. https://spacy.io/usage/visualizers (accessed August 1, 2022)

This blog was originally published on Towards Data Science on Aug 9, 2022 and moved to this site on Feb 1, 2026.