Workflow Examples

NERDA offers a simple easy-to-use interface for fine-tuning transformers for Named-Entity Recognition (=NER). We call this family of models NERDA models.

NERDA can be used in two ways. You can either (1) train your own customized NERDA model or (2) download and use one of our precooked NERDA models for inference i.e. identifying named entities in new texts.

Train Your Own `NERDA` model

We want to fine-tune a transformer for English.

First, we download an English NER dataset CoNLL-2003 with annotated Named Entities, that we will use for training and evaluation of our model.

from NERDA.datasets import get_conll_data, download_conll_data
download_conll_data()

Reading https://data.deepai.org/conll2003.zip

'archive extracted to /home/runner/.conll'

CoNLL-2003 operates with the following types of named entities:

PERsons
ORGanizations
LOCations
MISCellaneous
Outside (Not a named Entity)

An observation from the CoNLL-2003 data set looks like this.

# extract the first _5_ rows from the training and validation data splits.
training = get_conll_data('train', 5)
validation = get_conll_data('valid', 5)
# example
sentence = training.get('sentences')[0]
tags = training.get('tags')[0]
print("\n".join(["{}/{}".format(word, tag) for word, tag in zip(sentence, tags)]))

EU/B-ORG
rejects/O
German/B-MISC
call/O
to/O
boycott/O
British/B-MISC
lamb/O
./O

If you provide your own dataset, it must have the same structure:

It must be a dictionary
The dictionary must contain
- 'sentences': a list of word-tokenized sentences with one sentence per entry
- 'tags': a list with the corresponding named-entity tags.

The data set does however not have to follow the Inside-Outside-Beginning (IOB) tagging scheme¹.

The IOB tagging scheme implies, that words that are beginning of named entities are tagged with 'B-' and words 'inside' (=continuations of) named entities are tagged with 'I-'. That means that 'Joe Biden' should be tagged as Joe(B-PER) Biden(I-PER).

Now, instantiate a NERDA model for finetuning an ELECTRA transformer for NER.

from NERDA.models import NERDA
tag_scheme = ['B-PER',
              'I-PER', 
              'B-ORG', 
              'I-ORG', 
              'B-LOC', 
              'I-LOC', 
              'B-MISC', 
              'I-MISC']
model = NERDA(dataset_training = training,
              dataset_validation = validation,
              tag_scheme = tag_scheme,
              tag_outside = 'O',
              transformer = 'google/electra-small-discriminator',
              hyperparameters = {'epochs' : 1,
                                 'warmup_steps' : 10,
                                 'train_batch_size': 5,
                                 'learning_rate': 0.0001},)

Device automatically set to: cpu

Some weights of the model checkpoint at google/electra-small-discriminator were not used when initializing ElectraModel: ['discriminator_predictions.dense.bias', 'discriminator_predictions.dense_prediction.bias', 'discriminator_predictions.dense_prediction.weight', 'discriminator_predictions.dense.weight']
- This IS expected if you are initializing ElectraModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing ElectraModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).

Note, this model configuration only uses 5 sentences for model training to minimize execution time. Also the hyperparameters for the model have been chosen in order to minimize execution time. Therefore this example only serves to illustrate the functionality i.e. the resulting model will suck.

By default the network architecture is analogous that of the models in Hvingelby et al. 2020.

The model can be trained right away by invoking the train method.

model.train()


 Epoch 1 / 1

100%|██████████| 1/1 [00:00<00:00,  1.94it/s]
100%|██████████| 1/1 [00:00<00:00,  2.83it/s]

Train Loss = 2.361093521118164 Valid Loss = 2.3556602001190186

'Model trained successfully'

We can compute the performance of the model on a test set (limited to 5 sentences):

test = get_conll_data('test', 5)
model.evaluate_performance(test)

	Level	Precision	Recall
0	B-PER	0.0	0.0
1	I-PER	0.0	0.0
2	B-ORG	0.0	0.0
3	I-ORG	0.0	0.0
4	B-LOC	0.0	0.0
5	I-LOC	0.0	0.0
6	B-MISC	0.0	0.0
7	I-MISC	0.0	0.0
0	AVG_MICRO	NaN	NaN
0	AVG_MICRO	NaN	NaN

Unsurprisingly, the model sucks in this case due to the ludicrous specification.

Named Entities in new texts can be predicted with predict functions.

text = "Old MacDonald had a farm"
model.predict_text(text)

([['Old', 'MacDonald', 'had', 'a', 'farm']],
 [['I-PER', 'I-ORG', 'B-MISC', 'B-MISC', 'B-MISC']])

Needless to say the predicted entities for this model are nonsensical.

To get a more reasonable model, provide more data and a more meaningful model specification.

In general NERDA has the following handles, that you use.

provide your own data set
choose whatever pretrained transformer you would like to fine-tune
provide your own set of hyperparameters and lastly
provide your own torch network (architecture). You can do this by instantiating a NERDA model with the parameter 'network' set to your own network (torch.nn.Module).

Use a Precooked `NERDA` model

We have precooked a number of NERDA models, that you can download and use right off the shelf.

Here is an example.

Instantiate a NERDA model based on the English ELECTRA transformer, that has been finetuned for NER in English, EN_ELECTRA_EN.


from NERDA.precooked import EN_ELECTRA_EN
model = EN_ELECTRA_EN()

Device automatically set to: cpu

Some weights of the model checkpoint at google/electra-small-discriminator were not used when initializing ElectraModel: ['discriminator_predictions.dense.bias', 'discriminator_predictions.dense_prediction.bias', 'discriminator_predictions.dense_prediction.weight', 'discriminator_predictions.dense.weight']
- This IS expected if you are initializing ElectraModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing ElectraModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).

(Down)load network:


model.download_network()
model.load_network()


        Please make sure, that you're running the latest version of 'NERDA'
        otherwise the model is not guaranteed to work.

Downloading https://nerda.s3-eu-west-1.amazonaws.com/EN_ELECTRA_EN.bin to /home/runner/.nerda/EN_ELECTRA_EN.bin

100% |########################################################################|


        Model loaded. Please make sure, that you're running the latest version 
        of 'NERDA' otherwise the model is not guaranteed to work.

This model performs much better:

model.evaluate_performance(get_conll_data('test', 100))

	Level	F1-Score	Precision	Recall
0	B-PER	0.990909	1.000000	0.981982
1	I-PER	1.000000	1.000000	1.000000
2	B-ORG	0.800000	0.666667	1.000000
3	I-ORG	1.000000	1.000000	1.000000
4	B-LOC	0.993939	0.987952	1.000000
5	I-LOC	1.000000	1.000000	1.000000
6	B-MISC	0.930233	0.952381	0.909091
7	I-MISC	0.956522	1.000000	0.916667
0	AVG_MICRO	0.988060	NaN	NaN
0	AVG_MICRO	0.958950	NaN	NaN

Predict named entities in new texts

text = 'Old MacDonald had a farm'
model.predict_text(text)

([['Old', 'MacDonald', 'had', 'a', 'farm']],
 [['B-PER', 'I-PER', 'O', 'O', 'O']])

List of Precooked Models

The table below shows the precooked NERDA models publicly available for download. We have trained models for Danish and English.

Model	Language	Transformer	Dataset	F1-score
`DA_BERT_ML`	Danish	Multilingual BERT	DaNE	82.8
`DA_ELECTRA_DA`	Danish	Danish ELECTRA	DaNE	79.8
`EN_BERT_ML`	English	Multilingual BERT	CoNLL-2003	90.4
`EN_ELECTRA_EN`	English	English ELECTRA	CoNLL-2003	89.1

F1-score is the micro-averaged F1-score across entity tags and is evaluated on the respective test sets (that have not been used for training nor validation of the models).

Note, that we have not spent a lot of time on actually fine-tuning the models, so there could be room for improvement. If you are able to improve the models, we will be happy to hear from you and include your NERDA model.

Performance of Precooked Models

The table below summarizes the performance as measured by F1-scores of the model configurations, that NERDA ships with.

Level	`DA_BERT_ML`	`DA_ELECTRA_DA`	`EN_BERT_ML`	`EN_ELECTRA_EN`
B-PER	93.8	92.0	96.0	95.1
I-PER	97.8	97.1	98.5	97.9
B-ORG	69.5	66.9	88.4	86.2
I-ORG	69.9	70.7	85.7	83.1
B-LOC	82.5	79.0	92.3	91.1
I-LOC	31.6	44.4	83.9	80.5
B-MISC	73.4	68.6	81.8	80.1
I-MISC	86.1	63.6	63.4	68.4
AVG_MICRO	82.8	79.8	90.4	89.1
AVG_MACRO	75.6	72.8	86.3	85.3

This concludes our walkthrough of NERDA. If you have any questions, please do not hesitate to contact us!

Workflow Examples

Train Your Own NERDA model

Use a Precooked NERDA model

List of Precooked Models

Performance of Precooked Models

Train Your Own `NERDA` model

Use a Precooked `NERDA` model