In information extraction, a named entity is a real-world object, such as a person, location, organization, product, etc., that can be denoted with a proper name. It can be abstract or have a physical existence. Examples of named entities include Barack Obama, New York City, Volkswagen Golf, or anything else that can be named. Named entities can simply be viewed as entity instances (e.g., New York City is an instance of a city).
From a historical perspective, the term Named Entity was coined during the MUC-6 evaluation campaign[1] and contained ENAMEX (entity name expressions e.g. persons, locations and organizations) and NUMEX (numerical expression).
A more formal definition can be derived from the rigid designator by Saul Kripke. In the expression "Named Entity", the word "Named" aims to restrict the possible set of entities to only those for which one or many rigid designators stands for the referent.[2] A designator is rigid when it designates the same thing in every possible world. On the contrary, flaccid designators may designate different things in different possible worlds.
As an example, consider the sentence, "Biden is the president of the United States". Both "Biden" and the "United States" are named entities since they refer to specific objects (Joe Biden and United States). However, "president" is not a named entity since it can be used to refer to many different objects in different worlds (in different presidential periods referring to different persons, or even in different countries or organizations referring to different people). Rigid designators usually include proper names as well as certain natural terms like biological species and substances.
There is also a general agreement in the Named Entity Recognition community to consider temporal and numerical expressions as named entities, such as amounts of money and other types of units, which may violate the rigid designator perspective.
The task of recognizing named entities in text is Named Entity Recognition while the task of determining the identity of the named entities mentioned in text is called Named Entity Disambiguation. Both tasks require dedicated algorithms and resources to be addressed.[3]
See also
- Named-entity recognition (also referred to as entity identification, entity chunking and entity extraction)
- Entity linking (also referred to as named entity linking (NEL), named entity disambiguation (NED), named entity recognition and disambiguation (NERD) or named entity normalization)
- Information extraction
- Knowledge extraction
- Text mining (also referred to as text data mining)
- Truecasing
- Apache OpenNLP
- spaCy
- General Architecture for Text Engineering
- Natural Language Toolkit
References
- ↑ Grishman, Ralph; Sundheim, Beth (1996). Design of the MUC-6 evaluation (PDF). TIPSTER '96 Proceedings.
- ↑ Nadeau, David; Sekine, Satoshi (2007). A survey of named entity recognition and classification (PDF). Lingvisticae Investigationes.
- ↑ Nouvel, Damien; Ehrmann, Maud; Rosset, Sophie (2015). Wiley (ed.). Named Entities for Computational Linguistics. ISBN 978-1-84821-838-3.