Skip to content

Workflow Examples

NERDA offers a simple easy-to-use interface for fine-tuning transformers for Named-Entity Recognition (=NER). We call this family of models NERDA models.

NERDA can be used in two ways. You can either (1) train your own customized NERDA model or (2) download and use one of our precooked NERDA models for inference i.e. identifying named entities in new texts.

Train Your Own NERDA model

We want to fine-tune a transformer for English.

First, we download an English NER dataset CoNLL-2003 with annotated Named Entities, that we will use for training and evaluation of our model.

from NERDA.datasets import get_conll_data, download_conll_data
download_conll_data()
Reading https://data.deepai.org/conll2003.zip

'archive extracted to /home/runner/.conll'

CoNLL-2003 operates with the following types of named entities:

  1. PERsons
  2. ORGanizations
  3. LOCations
  4. MISCellaneous
  5. Outside (Not a named Entity)

An observation from the CoNLL-2003 data set looks like this.

# extract the first _5_ rows from the training and validation data splits.
training = get_conll_data('train', 5)
validation = get_conll_data('valid', 5)
# example
sentence = training.get('sentences')[0]
tags = training.get('tags')[0]
print("\n".join(["{}/{}".format(word, tag) for word, tag in zip(sentence, tags)]))
EU/B-ORG
rejects/O
German/B-MISC
call/O
to/O
boycott/O
British/B-MISC
lamb/O
./O

If you provide your own dataset, it must have the same structure:

  • It must be a dictionary
  • The dictionary must contain
    • 'sentences': a list of word-tokenized sentences with one sentence per entry
    • 'tags': a list with the corresponding named-entity tags.

The data set does however not have to follow the Inside-Outside-Beginning (IOB) tagging scheme1.

The IOB tagging scheme implies, that words that are beginning of named entities are tagged with 'B-' and words 'inside' (=continuations of) named entities are tagged with 'I-'. That means that 'Joe Biden' should be tagged as Joe(B-PER) Biden(I-PER).

Now, instantiate a NERDA model for finetuning an ELECTRA transformer for NER.

from NERDA.models import NERDA
tag_scheme = ['B-PER',
              'I-PER', 
              'B-ORG', 
              'I-ORG', 
              'B-LOC', 
              'I-LOC', 
              'B-MISC', 
              'I-MISC']
model = NERDA(dataset_training = training,
              dataset_validation = validation,
              tag_scheme = tag_scheme,
              tag_outside = 'O',
              transformer = 'google/electra-small-discriminator',
              hyperparameters = {'epochs' : 1,
                                 'warmup_steps' : 10,
                                 'train_batch_size': 5,
                                 'learning_rate': 0.0001},)
Device automatically set to: cpu

Some weights of the model checkpoint at google/electra-small-discriminator were not used when initializing ElectraModel: ['discriminator_predictions.dense.bias', 'discriminator_predictions.dense_prediction.bias', 'discriminator_predictions.dense_prediction.weight', 'discriminator_predictions.dense.weight']
- This IS expected if you are initializing ElectraModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing ElectraModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).

Note, this model configuration only uses 5 sentences for model training to minimize execution time. Also the hyperparameters for the model have been chosen in order to minimize execution time. Therefore this example only serves to illustrate the functionality i.e. the resulting model will suck.

By default the network architecture is analogous that of the models in Hvingelby et al. 2020.

The model can be trained right away by invoking the train method.

model.train()

 Epoch 1 / 1

100%|██████████| 1/1 [00:00<00:00,  1.94it/s]
100%|██████████| 1/1 [00:00<00:00,  2.83it/s]
Train Loss = 2.361093521118164 Valid Loss = 2.3556602001190186



'Model trained successfully'

We can compute the performance of the model on a test set (limited to 5 sentences):

test = get_conll_data('test', 5)
model.evaluate_performance(test)
Level F1-Score Precision Recall
0 B-PER 0.0 0.0 0.0
1 I-PER 0.0 0.0 0.0
2 B-ORG 0.0 0.0 0.0
3 I-ORG 0.0 0.0 0.0
4 B-LOC 0.0 0.0 0.0
5 I-LOC 0.0 0.0 0.0
6 B-MISC 0.0 0.0 0.0
7 I-MISC 0.0 0.0 0.0
0 AVG_MICRO 0.0 NaN NaN
0 AVG_MICRO 0.0 NaN NaN

Unsurprisingly, the model sucks in this case due to the ludicrous specification.

Named Entities in new texts can be predicted with predict functions.

text = "Old MacDonald had a farm"
model.predict_text(text)
([['Old', 'MacDonald', 'had', 'a', 'farm']],
 [['I-PER', 'I-ORG', 'B-MISC', 'B-MISC', 'B-MISC']])

Needless to say the predicted entities for this model are nonsensical.

To get a more reasonable model, provide more data and a more meaningful model specification.

In general NERDA has the following handles, that you use.

  1. provide your own data set
  2. choose whatever pretrained transformer you would like to fine-tune
  3. provide your own set of hyperparameters and lastly
  4. provide your own torch network (architecture). You can do this by instantiating a NERDA model with the parameter 'network' set to your own network (torch.nn.Module).

Use a Precooked NERDA model

We have precooked a number of NERDA models, that you can download and use right off the shelf.

Here is an example.

Instantiate a NERDA model based on the English ELECTRA transformer, that has been finetuned for NER in English, EN_ELECTRA_EN.


from NERDA.precooked import EN_ELECTRA_EN
model = EN_ELECTRA_EN()


Device automatically set to: cpu

Some weights of the model checkpoint at google/electra-small-discriminator were not used when initializing ElectraModel: ['discriminator_predictions.dense.bias', 'discriminator_predictions.dense_prediction.bias', 'discriminator_predictions.dense_prediction.weight', 'discriminator_predictions.dense.weight']
- This IS expected if you are initializing ElectraModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing ElectraModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).

(Down)load network:


model.download_network()
model.load_network()


        Please make sure, that you're running the latest version of 'NERDA'
        otherwise the model is not guaranteed to work.

Downloading https://nerda.s3-eu-west-1.amazonaws.com/EN_ELECTRA_EN.bin to /home/runner/.nerda/EN_ELECTRA_EN.bin

100% |########################################################################|


        Model loaded. Please make sure, that you're running the latest version 
        of 'NERDA' otherwise the model is not guaranteed to work.


This model performs much better:

model.evaluate_performance(get_conll_data('test', 100))
Level F1-Score Precision Recall
0 B-PER 0.990909 1.000000 0.981982
1 I-PER 1.000000 1.000000 1.000000
2 B-ORG 0.800000 0.666667 1.000000
3 I-ORG 1.000000 1.000000 1.000000
4 B-LOC 0.993939 0.987952 1.000000
5 I-LOC 1.000000 1.000000 1.000000
6 B-MISC 0.930233 0.952381 0.909091
7 I-MISC 0.956522 1.000000 0.916667
0 AVG_MICRO 0.988060 NaN NaN
0 AVG_MICRO 0.958950 NaN NaN

Predict named entities in new texts

text = 'Old MacDonald had a farm'
model.predict_text(text)

([['Old', 'MacDonald', 'had', 'a', 'farm']],
 [['B-PER', 'I-PER', 'O', 'O', 'O']])

List of Precooked Models

The table below shows the precooked NERDA models publicly available for download. We have trained models for Danish and English.

Model Language Transformer Dataset F1-score
DA_BERT_ML Danish Multilingual BERT DaNE 82.8
DA_ELECTRA_DA Danish Danish ELECTRA DaNE 79.8
EN_BERT_ML English Multilingual BERT CoNLL-2003 90.4
EN_ELECTRA_EN English English ELECTRA CoNLL-2003 89.1

F1-score is the micro-averaged F1-score across entity tags and is evaluated on the respective test sets (that have not been used for training nor validation of the models).

Note, that we have not spent a lot of time on actually fine-tuning the models, so there could be room for improvement. If you are able to improve the models, we will be happy to hear from you and include your NERDA model.

Performance of Precooked Models

The table below summarizes the performance as measured by F1-scores of the model configurations, that NERDA ships with.

Level DA_BERT_ML DA_ELECTRA_DA EN_BERT_ML EN_ELECTRA_EN
B-PER 93.8 92.0 96.0 95.1
I-PER 97.8 97.1 98.5 97.9
B-ORG 69.5 66.9 88.4 86.2
I-ORG 69.9 70.7 85.7 83.1
B-LOC 82.5 79.0 92.3 91.1
I-LOC 31.6 44.4 83.9 80.5
B-MISC 73.4 68.6 81.8 80.1
I-MISC 86.1 63.6 63.4 68.4
AVG_MICRO 82.8 79.8 90.4 89.1
AVG_MACRO 75.6 72.8 86.3 85.3

This concludes our walkthrough of NERDA. If you have any questions, please do not hesitate to contact us!