HumanName Class Documentation¶
HumanName.parser¶
- class nameparser.parser.HumanName[source]
- class nameparser.parser.HumanName(full_name: str | bytes = '', constants: Constants = <Constants() instance>, encoding: str = 'UTF-8', string_format: str | None = None, initials_format: str | None = None, initials_delimiter: str | None = None, initials_separator: str | None = None, suffix_delimiter: str | None = None, first: str | list[str] | None = None, middle: str | list[str] | None = None, last: str | list[str] | None = None, title: str | list[str] | None = None, suffix: str | list[str] | None = None, nickname: str | list[str] | None = None)[source]¶
Parse a person’s name into individual components.
Instantiation assigns to
full_name, and assignment tofull_nametriggersparse_full_name(). After parsing the name, these instance attributes are available. Alternatively, you can pass any of the instance attributes to the constructor method and skip the parsing process. If any of the the instance attributes are passed to the constructor as keywords,parse_full_name()will not be performed.HumanName Instance Attributes
- Parameters:
full_name (str) – The name string to be parsed.
constants (constants) – a
Constantsinstance. PassNonefor per-instance config.encoding (str) – string representing the encoding of your input
string_format (str) – python string formatting
initials_format (str) – python initials string formatting
initials_delimter (str) – string delimiter for initials
initials_separator (str) – string separator between consecutive initials
suffix_delimiter (str) – additional delimiter to split post-comma parts before suffix detection, e.g.
" - "for"RN - CRNA"first (str) – first name
middle (str) – middle name
last (str) – last name
title (str) – The title or prenominal
suffix (str) – The suffix or postnominal
nickname (str) – Nicknames
- C = <Constants() instance>¶
A reference to the configuration for this instance, which may or may not be a reference to the shared, module-wide instance at
CONSTANTS. See Customizing the Parser.
- __eq__(other: object) bool[source]¶
HumanName instances are equal to other objects whose lower case unicode representation is the same.
- __init__(full_name: str | bytes = '', constants: Constants = <Constants() instance>, encoding: str = 'UTF-8', string_format: str | None = None, initials_format: str | None = None, initials_delimiter: str | None = None, initials_separator: str | None = None, suffix_delimiter: str | None = None, first: str | list[str] | None = None, middle: str | list[str] | None = None, last: str | list[str] | None = None, title: str | list[str] | None = None, suffix: str | list[str] | None = None, nickname: str | list[str] | None = None) None[source]¶
- are_suffixes_after_comma(pieces: Iterable[str]) bool[source]¶
Like are_suffixes, but pieces found in suffix_not_acronyms are accepted unconditionally without passing through is_suffix().
Used when detecting suffix-comma format (e.g. “John Ingram, V”) where the post-comma position is unambiguous. This covers all suffix_not_acronyms members (i, ii, iii, iv, v, jr, sr, etc.), case-insensitively, including single-letter entries that is_suffix() would otherwise reject via is_an_initial().
- as_dict(include_empty: bool = True) dict[str, str][source]¶
Return the parsed name as a dictionary of its attributes.
- Parameters:
include_empty (bool) – Include keys in the dictionary for empty name attributes.
- Return type:
dict
>>> name = HumanName("Bob Dole") >>> name.as_dict() {'title': '', 'first': 'Bob', 'middle': '', 'last': 'Dole', 'suffix': '', 'nickname': ''} >>> name.as_dict(False) {'first': 'Bob', 'last': 'Dole'}
- capitalize(force: bool | None = None) None[source]¶
The HumanName class can try to guess the correct capitalization of name entered in all upper or lower case. By default, it will not adjust the case of names entered in mixed case. To run capitalization on all names pass the parameter force=True.
- Parameters:
force (bool) – Forces capitalization of mixed case strings. This parameter overrides rules set within
CONSTANTS.
Usage
>>> name = HumanName('bob v. de la macdole-eisenhower phd') >>> name.capitalize() >>> str(name) 'Bob V. de la MacDole-Eisenhower Ph.D.' >>> # Don't touch good names >>> name = HumanName('Shirley Maclaine') >>> name.capitalize() >>> str(name) 'Shirley Maclaine' >>> name.capitalize(force=True) >>> str(name) 'Shirley MacLaine'
- property family_prefixes: str¶
Alias for
last_prefixes.
- property first: str¶
The person’s first name. The first name piece after any known
titlepieces parsed fromfull_name.
- property full_name: str¶
The string output of the HumanName instance.
- property given_names: str¶
A string of the first name followed by all middle names.
- property given_names_list: list[str]¶
List of first name followed by middle names.
- handle_firstnames() None[source]¶
If there are only two parts and one is a title, assume it’s a last name instead of a first name. e.g. Mr. Johnson. Unless it’s a special title like “Sir”, then when it’s followed by a single name that name is always a first name.
- handle_middle_name_as_last() None[source]¶
When middle_name_as_last is enabled, fold middle_list into last_list (prepended, preserving order) and clear middle_list. No-op when middle_list is already empty.
- handle_patronymic_name_order() None[source]¶
When patronymic_name_order is enabled, detect Russian formal order (Surname GivenName Patronymic) and rotate to Western order. Fires only for no-comma, single-token first/middle/last where the last token is a patronymic and the middle token is not. Title, suffix, and nickname parts do not affect this guard — reordering proceeds regardless of whether they are present.
- property has_own_config: bool¶
True if this instance is not using the shared module-level configuration.
- initials() str[source]¶
Return formatted initials for the name, controlled by
initials_format,initials_delimiter, andinitials_separator.initials_delimiteris appended after each individual initial.initials_separatoris placed between consecutive initials within a name group (first, middle, or last). Both can be set asConstantsattributes or asHumanNameconstructor kwargs.>>> name = HumanName("Sir Bob Andrew Dole") >>> name.initials() 'B. A. D.' >>> name = HumanName("Sir Bob Andrew Dole", initials_format="{first} {middle}") >>> name.initials() 'B. A.' >>> name = HumanName("Doe, John A.", initials_delimiter="", initials_separator="") >>> name.initials() 'J A D'
- initials_list() list[str][source]¶
Returns the initials as a list
>>> name = HumanName("Sir Bob Andrew Dole") >>> name.initials_list() ['B', 'A', 'D'] >>> name = HumanName("J. Doe") >>> name.initials_list() ['J', 'D']
- is_an_initial(value: str) bool[source]¶
Words with a single period at the end, or a single uppercase letter.
Matches the
initialregular expression inREGEXES.
- is_conjunction(piece: str) bool[source]¶
Is in the conjunctions set and not
is_an_initial().
- is_first_name_prefix(piece: str) bool[source]¶
Lowercased, leading/trailing-periods-stripped version of piece is in
first_name_prefixes.
- is_leading_title(piece: str) bool[source]¶
True if
pieceis a known title, or an unrecognized multi-letter word ending in a single trailing period (e.g."Major."). The{2,}in theperiod_abbreviationregex, not a separateis_an_initial()check, is what excludes single-letter initials like"J.". Only meaningful for pieces in the title position (before the first name is set) — a period-abbreviation appearing later in the name is left as a middle name. Does not mutateC.titles, so the periodless form ("Major") is never affected in later parses.
- is_patronymic(piece: str) bool[source]¶
Return True if
pieceends with a recognised East-Slavic patronymic suffix, checked against both Latin-script and Cyrillic patterns inself.C.regexes. Latin suffixes:-ovich,-ovna,-evich,-evna,-ichna, and the irregular forms-ilyich,-kuzmich,-lukich,-fomich,-fokich. Cyrillic equivalents are matched by a separate pattern.
- is_prefix(piece: str) bool[source]¶
Lowercased, leading/trailing-periods-stripped version of piece is in the
PREFIXESset.
- is_rootname(piece: str) bool[source]¶
Is not a known title, suffix or prefix. Just first, middle, last names.
- is_suffix(piece: str) bool[source]¶
Is in the suffixes set and not
is_an_initial().Some suffixes may be acronyms (M.B.A) while some are not (Jr.), so we remove the periods from piece when testing against C.suffix_acronyms.
- is_suffix_at_lastname_comma_end(piece: str, nxt: str | None, parts: list[str]) bool[source]¶
True when
pieceis a suffix_not_acronyms member that should be treated as a suffix at the end ofparts[1](the post-comma segment) in a lastname-comma name, wherepartsis the full comma-split of the name string.Returns True only when all three conditions hold: -
nxt is None: piece is the last token in the post-comma segment -len(parts) == 2: noparts[2]suffix segment exists -lc(piece) in suffix_not_acronymsWhen
parts[2]exists the caller already declared an explicit suffix via comma (e.g. ‘Doe, Rev. John V, Jr.’), making the trailing token more likely a middle initial;len(parts) == 2excludes that case. Used as an OR alternative tois_suffix()for pieces thatis_suffix()would reject viais_an_initial().
- join_on_conjunctions(pieces: list[str], additional_parts_count: int = 0) list[str][source]¶
Join conjunctions to surrounding pieces. Title- and prefix-aware. e.g.:
- [‘Mr.’, ‘and’. ‘Mrs.’, ‘John’, ‘Doe’] ==>
[‘Mr. and Mrs.’, ‘John’, ‘Doe’]
- [‘The’, ‘Secretary’, ‘of’, ‘State’, ‘Hillary’, ‘Clinton’] ==>
[‘The Secretary of State’, ‘Hillary’, ‘Clinton’]
When joining titles, saves newly formed piece to the instance’s titles constant so they will be parsed correctly later. E.g. after parsing the example names above, ‘The Secretary of State’ and ‘Mr. and Mrs.’ would be present in the titles constant set.
- Parameters:
pieces (list) – name pieces strings after split on spaces
additional_parts_count (int)
- Returns:
new list with piece next to conjunctions merged into one piece with spaces in it.
- Return type:
list
- property last_base: str¶
The last name with leading prefix particles removed (the core surname). For
"van Gogh"this is"Gogh"; for"Smith"it is"Smith".lastis always unchanged. When every word in the last name matches a prefix particle, no stripping occurs and the full last name is returned.>>> HumanName("Vincent van Gogh").last_base 'Gogh' >>> HumanName("John Smith").last_base 'Smith'
- property last_base_list: list[str]¶
List of last-name words after stripping leading prefix particles. Never empty: when every word matches a prefix, no stripping occurs and the full last name is returned — see
_split_last().>>> HumanName("Vincent van Gogh").last_base_list ['Gogh']
- property last_prefixes: str¶
The leading prefix particle(s) of the last name (the tussenvoegsel). Returns
""(orempty_attribute_default) when there are none, including when every word in the last name matches a prefix particle (the all-particles guard; see_split_last()).>>> HumanName("Vincent van Gogh").last_prefixes 'van' >>> HumanName("Juan de la Vega").last_prefixes 'de la'
- property last_prefixes_list: list[str]¶
List of leading prefix particles in the last name (the tussenvoegsel). Returns
[]when there are none, including the case where every word in the last name matches a prefix — see_split_last().>>> HumanName("Juan de la Vega").last_prefixes_list ['de', 'la']
- property middle: str¶
The person’s middle names. All name pieces after the first name and before the last name parsed from
full_name.
- property nickname: str¶
The person’s nicknames. Any text found inside of quotes (
"") or parenthesis (())
- original: str | bytes = ''¶
The original string, untouched by the parser.
- parse_full_name() None[source]¶
The main parse method for the parser. This method is run upon assignment to the
full_nameattribute or instantiation.Basic flow is to hand off to
pre_process()to handle nicknames. It then splits on commas and chooses a code path depending on the number of commas.parse_pieces()then splits those parts on spaces andjoin_on_conjunctions()joins any pieces next to conjunctions.
- parse_nicknames() None[source]¶
The content of parenthesis or quotes in the name will be added to the nicknames list, unless that content is suffix-shaped – an unambiguous suffix_not_acronyms/suffix_acronyms member, or content ending in a period – in which case it’s left in place (undelimited) for normal downstream suffix/title/word parsing instead. This happens before any other processing of the name.
Single quotes cannot span white space characters and must border white space to allow for quotes in names like O’Connor and Kawai’ae’a. Double quotes and parenthesis can span white space.
Loops through the built-in quoted_word, double_quotes and parenthesis patterns in
regexes, followed by any patterns added toextra_nickname_delimiters– see the “Adding Custom Nickname Delimiters” section of the customization docs.
- parse_pieces(parts: Iterable[str], additional_parts_count: int = 0) list[str][source]¶
Split parts on spaces and remove commas, join on conjunctions and lastname prefixes. If parts have periods in the middle, try splitting on periods and check if the parts are titles or suffixes. If they are add to the constant so they will be found.
- Parameters:
parts (list) – name part strings from the comma split
additional_parts_count (int) – if the comma format contains other parts, we need to know how many there are to decide if things should be considered a conjunction.
- Returns:
pieces split on spaces and joined on conjunctions
- Return type:
list
- post_process() None[source]¶
This happens at the end of the
parse_full_name()after all other processing has taken place. Runshandle_firstnames()andhandle_capitalization().
- pre_process() None[source]¶
This method happens at the beginning of the
parse_full_name()before any other processing of the string aside from unicode normalization, so it’s a good place to do any custom handling in a subclass. Runsparse_nicknames()andsquash_emoji().
- property suffix: str¶
The persons’s suffixes. Pieces at the end of the name that are found in
suffixes, or pieces that are at the end of comma separated formats, e.g. “Lastname, Title Firstname Middle[,] Suffix [, Suffix]” parsed fromfull_name.
- property surnames: str¶
A string of all middle names followed by the last name.
- property surnames_list: list[str]¶
List of middle names followed by last name.
- property title: str¶
The person’s titles. Any string of consecutive pieces in
titlesorconjunctionsat the beginning offull_name.
HumanName.config¶
The nameparser.config module manages the configuration of the
nameparser.
A module-level instance of Constants is created
and used by default for all HumanName instances. You can adjust the entire module’s
configuration by importing this instance and changing it.
>>> from nameparser.config import CONSTANTS
>>> CONSTANTS.titles.remove('hon').add('chemistry','dean')
You can also adjust the configuration of individual instances by passing
None as the second argument upon instantiation.
>>> from nameparser import HumanName
>>> hn = HumanName("Dean Robert Johns", None)
>>> hn.C.titles.add('dean')
>>> hn.parse_full_name() # need to run this again after config changes
Potential Gotcha: If you do not pass None as the second argument,
hn.C will be a reference to the module config, possibly yielding
unexpected results. See Customizing the Parser.
- nameparser.config.CONSTANTS = <Constants() instance>¶
A module-level instance of the
Constants()class. Provides a common instance for the module to share to easily adjust configuration for the entire module. See Customizing the Parser with Your Own Configuration.
- class nameparser.config.Constants(prefixes: Iterable[str] = {"'t", 'aan', 'abu', 'aen', 'af', 'al', 'auf', 'av', 'bar', 'bat', 'bin', 'bint', 'bon', 'da', 'dal', 'de', "de'", 'degli', 'dei', 'del', 'dela', 'della', 'delle', 'delli', 'dello', 'dem', 'den', 'der', 'di', 'do', 'dos', 'du', 'dí', 'freiherr', 'freiherrin', 'heer', 'het', 'ibn', 'la', 'le', 'mac', 'mc', 'op', 'san', 'santa', 'st', 'ste', 'te', 'ter', 'tho', 'thoe', 'van', 'vande', 'vander', 'vd', 'vel', 'vom', 'von', 'zu'}, suffix_acronyms: Iterable[str] = {'8-vsb', 'aas', 'aba', 'abc', 'abd', 'abpp', 'abr', 'aca', 'acas', 'ace', 'acha', 'acp', 'ae', 'aem', 'afasma', 'afc', 'afm', 'agsf', 'aia', 'aicp', 'ala', 'alc', 'alp', 'am', 'amd', 'ame', 'amieee', 'ams', 'aphr', 'apn', 'apr', 'aprn', 'apss', 'aqp', 'arm', 'arrc', 'asa', 'asc', 'asid', 'asla', 'asp', 'atc', 'awb', 'ba', 'bca', 'bcl', 'bcss', 'bds', 'bem', 'bls-i', 'bn', 'bpe', 'bpi', 'bpt', 'bsc', 'bt', 'btcs', 'bts', 'cacts', 'cae', 'caha', 'caia', 'cams', 'cap', 'capa', 'capm', 'capp', 'caps', 'caro', 'cas', 'casp', 'cb', 'cbe', 'cbm', 'cbne', 'cbnt', 'cbp', 'cbrte', 'cbs', 'cbsp', 'cbt', 'cbte', 'cbv', 'cca', 'ccc', 'ccca', 'cccm', 'cce', 'cchp', 'ccie', 'ccim', 'cciso', 'ccm', 'ccmt', 'ccna', 'ccnp', 'ccp', 'ccp-c', 'ccpr', 'ccs', 'ccufc', 'cd', 'cdal', 'cdfm', 'cdmp', 'cds', 'cdt', 'cea', 'ceas', 'cebs', 'ceds', 'ceh', 'cela', 'cem', 'cep', 'cera', 'cet', 'cfa', 'cfc', 'cfcc', 'cfce', 'cfcm', 'cfe', 'cfeds', 'cfi', 'cfm', 'cfp', 'cfps', 'cfr', 'cfre', 'cga', 'cgap', 'cgb', 'cgc', 'cgfm', 'cgfo', 'cgm', 'cgma', 'cgp', 'cgr', 'cgsp', 'ch', 'cha', 'chba', 'chdm', 'che', 'ches', 'chfc', 'chi', 'chmc', 'chmm', 'chp', 'chpa', 'chpe', 'chpln', 'chpse', 'chrm', 'chsc', 'chse', 'chse-a', 'chsos', 'chss', 'cht', 'cia', 'cic', 'cie', 'cig', 'cip', 'cipm', 'cips', 'ciro', 'cisa', 'cism', 'cissp', 'cla', 'clsd', 'cltd', 'clu', 'cm', 'cma', 'cmas', 'cmc', 'cmfo', 'cmg', 'cmp', 'cms', 'cmsp', 'cmt', 'cna', 'cnm', 'cnp', 'cp', 'cp-c', 'cpa', 'cpacc', 'cpbe', 'cpcm', 'cpcu', 'cpe', 'cpfa', 'cpfo', 'cpg', 'cph', 'cpht', 'cpim', 'cpl', 'cplp', 'cpm', 'cpo', 'cpp', 'cppm', 'cprc', 'cpre', 'cprp', 'cpsc', 'cpsi', 'cpss', 'cpt', 'cpwa', 'crde', 'crisc', 'crma', 'crme', 'crna', 'cro', 'crp', 'crt', 'crtt', 'csa', 'csbe', 'csc', 'cscp', 'cscu', 'csep', 'csi', 'csm', 'csp', 'cspo', 'csre', 'csrte', 'csslp', 'cssm', 'cst', 'cste', 'ctbs', 'ctfa', 'cto', 'ctp', 'cts', 'cua', 'cusp', 'cva', 'cva[22]', 'cvo', 'cvp', 'cvrs', 'cwap', 'cwb', 'cwdp', 'cwep', 'cwna', 'cwne', 'cwp', 'cwsp', 'cxa', 'cyds', 'cysa', 'dabfm', 'dabvlm', 'dacvim', 'dbe', 'dc', 'dcb', 'dcm', 'dcmg', 'dcvo', 'dd', 'dds', 'ded', 'dep', 'dfc', 'dfm', 'diplac', 'diplom', 'djur', 'dma', 'dmd', 'dmin', 'dnp', 'do', 'dpm', 'dpt', 'drb', 'drmp', 'drph', 'dsc', 'dsm', 'dso', 'dss', 'dtr', 'dvep', 'dvm', 'ea', 'ed', 'edd', 'ei', 'eit', 'els', 'emd', 'emt-b', 'emt-i/85', 'emt-i/99', 'emt-p', 'enp', 'erd', 'esq', 'evp', 'faafp', 'faan', 'faap', 'fac-c', 'facc', 'facd', 'facem', 'facep', 'facha', 'facofp', 'facog', 'facp', 'facph', 'facs', 'faia', 'faicp', 'fala', 'fashp', 'fasid', 'fasla', 'fasma', 'faspen', 'fca', 'fcas', 'fcela', 'fd', 'fec', 'fhames', 'fic', 'ficf', 'fieee', 'fmp', 'fmva', 'fnss', 'fp&a', 'fp-c', 'fpc', 'frm', 'fsa', 'fsdp', 'fws', 'gaee[14]', 'gba', 'gbe', 'gc', 'gcb', 'gchs', 'gcie', 'gcmg', 'gcsi', 'gcvo', 'gisp', 'git', 'gm', 'gmb', 'gmr', 'gphr', 'gri', 'grp', 'gsmieee', 'hccp', 'hrs', 'iaccp', 'iaee', 'iccm-d', 'iccm-f', 'idsm', 'ifgict', 'iom', 'ipep', 'ipm', 'iso', 'issp-csp', 'issp-sa', 'itil', 'jd', 'jp', 'kbe', 'kcb', 'kchs/dchs', 'kcie', 'kcmg', 'kcsi', 'kcvo', 'kg', 'khs/dhs', 'kp', 'kt', 'lac', 'lcmt', 'lcpc', 'lcsw', 'leed ap', 'lg', 'litk', 'litl', 'litp', 'llm', 'lm', 'lmsw', 'lmt', 'lp', 'lpa', 'lpc', 'lpn', 'lpss', 'lsi', 'lsit', 'lt', 'lvn', 'lvo', 'lvt', 'ma', 'maaa', 'mai', 'mba', 'mbe', 'mbs', 'mc', 'mcct', 'mcdba', 'mches', 'mcm', 'mcp', 'mcpd', 'mcsa', 'mcsd', 'mcse', 'mct', 'md', 'mda', 'mdb', 'mdbb', 'mdep', 'mdhb', 'mdiv', 'mdl', 'mem', 'meng', 'mfa', 'micp', 'mieee', 'mirm', 'mle', 'mls', 'mlse', 'mlt', 'mm', 'mmad', 'mmas', 'mnaa', 'mnae', 'mp', 'mpa', 'mph', 'mpse', 'mra', 'ms', 'msa', 'msc', 'mscmsm', 'msm', 'mt', 'mts', 'mvo', 'nbc-his', 'nbcch', 'nbcch-ps', 'nbcdch', 'nbcdch-ps', 'nbcfch', 'nbcfch-ps', 'nbct', 'ncarb', 'nccp', 'ncidq', 'ncps', 'ncso', 'ncto', 'nd', 'ndtr', 'nicet i', 'nicet ii', 'nicet iii', 'nicet iv', 'nmd', 'np', 'np[18]', 'nraemt', 'nremr', 'nremt', 'nrp', 'obe', 'obi', 'oca', 'ocm', 'ocp', 'od', 'om', 'oscp', 'ot', 'pa-c', 'pcc', 'pci', 'pe', 'pfmp', 'pg', 'pgmp', 'ph', 'pharmd', 'phc', 'phd', 'phr', 'phrca', 'pla', 'pls', 'pmc', 'pmi-acp', 'pmp', 'pp', 'pps', 'prm', 'psm', 'psm i', 'psm ii', 'psp', 'psyd', 'pt', 'pta', 'qam', 'qc', 'qcsw', 'qfsm', 'qgm', 'qpm', 'qsd', 'qsp', 'ra', 'rai', 'rba', 'rci', 'rcp', 'rd', 'rdcs', 'rdh', 'rdms', 'rdn', 'res', 'rfp', 'rhca', 'rid', 'rls', 'rmsks', 'rn', 'rp', 'rpa', 'rph', 'rpl', 'rrc', 'rrt', 'rrt-accs', 'rrt-nps', 'rrt-sds', 'rtrp', 'rvm', 'rvt', 'sa', 'same', 'sasm', 'sccp', 'scmp', 'se', 'secb', 'sfp', 'sgm', 'shrm-cp', 'shrm-scp', 'si', 'siie', 'smieee', 'sphr', 'sra', 'sscp', 'stb', 'stmieee', 'tbr-ct', 'td', 'thd', 'thm', 'ud', 'usa', 'usaf', 'usar', 'uscg', 'usmc', 'usn', 'usnr', 'uxc', 'uxmc', 'vc', 'vcp', 'vd', 'vrd'}, suffix_not_acronyms: Iterable[str] = {'2', 'dr', 'esq', 'esquire', 'i', 'ii', 'iii', 'iv', 'jnr', 'jr', 'junior', 'ret', 'snr', 'sr', 'v', 'vet'}, suffix_acronyms_ambiguous: Iterable[str] = {'ed', 'jd'}, titles: Iterable[str] = {'10th', '1lt', '1sgt', '1st', '1stlt', '1stsgt', '2lt', '2nd', '2ndlt', '3rd', '4th', '5th', '6th', '7th', '8th', '9th', 'a1c', 'ab', 'abbess', 'abbot', 'abolitionist', 'academic', 'acolyte', 'activist', 'actor ', 'actress', 'adept', 'adjutant', 'adm', 'admiral', 'advertising', 'adviser', 'advocate', 'air', 'akhoond', 'alderman', 'almoner', 'ambassador', 'amn', 'analytics', 'anarchist', 'animator', 'anthropologist', 'appellate', 'apprentice', 'arbitrator', 'archbishop', 'archdeacon', 'archdruid', 'archduchess', 'archduke', 'archeologist', 'architect', 'arhat', 'army', 'arranger', 'assistant', 'assoc', 'associate', 'asst', 'astronomer', 'attache', 'attaché', 'attorney', 'aunt', 'auntie', 'author', 'award-winning', 'ayatollah', 'baba', 'bailiff', 'ballet', 'bandleader', 'banker', 'banner', 'bard', 'baron', 'baroness', 'barrister', 'baseball', 'bearer', 'behavioral', 'bench', 'bg', 'bgen', 'biblical', 'bibliographer', 'biochemist', 'biographer', 'biologist', 'bishop', 'blessed', 'blogger', 'blues', 'bodhisattva', 'bookseller', 'botanist', 'bp', 'brigadier', 'briggen', 'british', 'broadcaster', 'brother', 'buddha', 'burgess', 'burlesque', 'business', 'businessman', 'businesswoman', 'bwana', 'canon', 'capt', 'captain', 'cardinal', 'cartographer', 'cartoonist', 'catholicos', 'ccmsgt', 'cdr', 'celebrity', 'ceo', 'cfo', 'chair', 'chairs', 'chancellor', 'chaplain', "chargé d'affaires", 'chef', 'cheikh', 'chemist', 'chief', 'chieftain', 'choreographer', 'civil', 'classical', 'clergyman', 'clerk', 'cmsaf', 'cmsgt', 'co-chair', 'co-chairs', 'co-founder', 'coach', 'col', 'collector', 'colonel', 'comedian', 'comedienne', 'comic', 'commander', 'commander-in-chief', 'commodore', 'composer', 'compositeur', 'comptroller', 'computer', 'comtesse', 'conductor', 'consultant', 'controller', 'corporal', 'corporate', 'correspondent', 'councillor', 'counselor', 'count', 'countess', 'courtier', 'cpl', 'cpo', 'cpt', 'credit', 'criminal', 'criminologist', 'critic', 'csm', 'curator', 'customs', 'cwo-2', 'cwo-3', 'cwo-4', 'cwo-5', 'cwo2', 'cwo3', 'cwo4', 'cwo5', 'cyclist', 'dame', 'dancer', 'dcn', 'deacon', 'delegate', 'deputy', 'designated', 'designer', 'detective', 'developer', 'dhr', 'dipl.-ing', 'diplomat', 'dir', 'director', 'discovery', 'dissident', 'district', 'division', 'do', 'docent', 'docket', 'doctor', 'doyen', 'dpty', 'dr', 'dra', 'dramatist', 'druid', 'drummer', 'duchesse', 'dutchess', 'ecologist', 'economist', 'editor', 'edler', 'edmi', 'edohen', 'educator', 'effendi', 'ekegbian', 'elerunwon', 'eminence', 'emperor', 'empress', 'engineer', 'english', 'ens', 'entertainer', 'entrepreneur', 'envoy', 'erzbischof', 'essayist', 'evangelist', 'excellency', 'excellent', 'exec', 'executive', 'expert', 'fadm', 'family', 'father', 'federal', 'fh-prof', 'field', 'film', 'financial', 'first', 'flag', 'flying', 'foreign', 'forester', 'founder', 'fr', 'frau', 'freifrau', 'freiherr', 'friar', 'frk', 'fru', 'fräulein', 'frøken', 'fürst', 'fürsterzbischof', 'gaf', 'gen', 'general', 'generalissimo', 'gentiluomo', 'giani', 'goodman', 'goodwife', 'governor', 'graf', 'grand', 'group', 'großfürst', 'gräfin', 'guitarist', 'guru', 'gyani', 'gysgt', 'hajji', 'headman', 'heir', 'heiress', 'her', 'hereditary', 'heren', 'herr', 'herren', 'herrn', 'herzog', 'high', 'highness', 'his', 'historian', 'historicus', 'historien', 'holiness', 'hon', 'honorable', 'honourable', 'host', 'hr', 'illustrator', 'imam', 'industrialist', 'information', 'instructor', 'intelligence', 'intendant', 'inventor', 'investigator', 'investor', 'journalist', 'journeyman', 'jr', 'judge', 'judicial', 'junior', 'jurist', 'keyboardist', 'king', "king's", 'kingdom', 'knowledge', 'lady', 'lama', 'lamido', 'law', 'lawyer', 'lcdr', 'lcpl', 'leader', 'lecturer', 'legal', 'librarian', 'lieutenant', 'linguist', 'literary', 'lord', 'lt', 'ltc', 'ltcol', 'ltg', 'ltgen', 'ltjg', 'lyricist', 'madam', 'madame', 'mademoiselle', 'mag', 'mag-judge', 'mag/judge', 'magistrate', 'magistrate-judge', 'magnate', 'maharajah', 'maharani', 'mahdi', 'maid', 'maj', 'majesty', 'majgen', 'manager', 'marcher', 'marchess', 'marchioness', 'marketing', 'marquess', 'marquis', 'marquise', 'master', 'mathematician', 'mathematics', 'matriarch', 'mayor', 'mcpo', 'mcpoc', 'mcpon', 'md', 'me', 'member', 'memoirist', 'merchant', 'met', 'metropolitan', 'mevr', 'mevrouw', 'mevrouwe', 'mg', 'mgr', 'mgysgt', 'military', 'minister', 'miss', 'misses', 'missionary', 'mister', 'mlle', 'mme', 'mobster', 'model', 'monk', 'monseigneur', 'monsieur', 'monsignor', 'most', 'mother', 'mountaineer', 'mpco-cg', 'mr', 'mrs', 'ms', 'msg', 'msgt', 'mufti', 'mullah', 'municipal', 'murshid', 'musician', 'musicologist', 'mx', 'mystery', 'nanny', 'narrator', 'national', 'naturalist', 'navy', 'neuroscientist', 'novelist', 'nurse', 'obstetritian', 'officer', 'opera', 'operating', 'ornithologist', 'painter', 'paleontologist', 'pastor', 'patriarch', 'pd', 'pediatrician', 'personality', 'petty', 'pfc', 'pharaoh', 'phd', 'philantropist', 'philosopher', 'photographer', 'physician', 'physicist', 'pianist', 'pilot', 'pioneer', 'pir', 'player', 'playwright', 'po1', 'po2', 'po3', 'poet', 'police', 'political', 'politician', 'pope', 'prefect', 'prelate', 'premier', 'pres', 'presbyter', 'president', 'presiding', 'priest', 'priestess', 'primate', 'prime', 'prin', 'prince', 'princess', 'principal', 'printer', 'printmaker', 'prinz', 'prior', 'priv.-doz', 'private', 'pro', 'producer', 'prof', 'professor', 'provost', 'pslc', 'psychiatrist', 'psychologist', 'publisher', 'pursuivant', 'pv2', 'pvt', 'queen', "queen's", 'ra', 'rabbi', 'radio', 'radm', 'rangatira', 'ranger', 'rdml', 'rear', 'rebbe', 'registrar', 'reichsgraf', 'rep', 'representative', 'researcher', 'resident', 'rev', 'revenue', 'reverend', 'right', 'risk', 'ritter', 'rock', 'royal', 'rt', 'sa', 'sailor', 'saint', 'sainte', 'saoshyant', 'satirist', 'scholar', 'schoolmaster', 'scientist', 'scpo', 'screenwriter', 'se', 'secretary', 'security', 'seigneur', 'senator', 'senhor', 'senhora', 'senhorita', 'senior', 'senior-judge', 'sergeant', 'servant', 'señor', 'señora', 'señores', 'señorita', 'señoritas', 'sfc', 'sgm', 'sgt', 'sgtmaj', 'sgtmajmc', 'shaik', 'shaikh', 'shayk', 'shaykh', 'shehu', 'sheik', 'sheikh', 'shekh', 'sheriff', 'siddha', 'signor', 'signora', 'signore', 'signorina', 'singer', 'singer-songwriter', 'sir', 'sister', 'sma', 'smsgt', 'sn', 'soccer', 'social', 'sociologist', 'software', 'soldier', 'solicitor', 'soprano', 'spc', 'speaker', 'special', 'sr', 'sra', 'sres', 'srta', 'srtas', 'ssg', 'ssgt', 'st', 'staff', 'state', 'states', 'strategy', 'subaltern', 'subedar', 'suffragist', 'sultan', 'sultana', 'superior', 'supreme', 'surgeon', 'swami', 'swordbearer', 'sysselmann', 'tax', 'teacher', 'technical', 'technologist', 'television ', 'tenor', 'theater', 'theatre', 'theologian', 'theorist', 'timi', 'tirthankar', 'translator', 'travel', 'treasurer', 'tsar', 'tsarina', 'tsgt', 'uk', 'uncle', 'united', 'univ.prof', 'us', 'vadm', 'vardapet', 'vc', 'venerable', 'verderer', 'vicar', 'vice', 'viscount', 'vizier', 'vocalist', 'voice', 'vrouwe', 'warden', 'warrant', 'wing', 'wm', 'wo-1', 'wo1', 'wo2', 'wo3', 'wo4', 'wo5', 'woodman', 'wp', 'writer', 'zoologist'}, first_name_titles: Iterable[str] = {'aunt', 'auntie', 'brother', 'cheikh', 'dame', 'father', 'king', 'maid', 'master', 'mother', 'pope', 'queen', 'shaik', 'shaikh', 'shayk', 'shaykh', 'sheik', 'sheikh', 'shekh', 'sir', 'sister', 'uncle'}, conjunctions: Iterable[str] = {'&', 'and', 'e', 'et', 'of', 'the', 'und', 'y'}, first_name_prefixes: Iterable[str] = {'abdal', 'abdel', 'abdul', 'abou', 'abu', 'umm'}, capitalization_exceptions: TupleManager[str] | Iterable[tuple[str, str]] = (('ii', 'II'), ('iii', 'III'), ('iv', 'IV'), ('md', 'M.D.'), ('phd', 'Ph.D.')), regexes: RegexTupleManager | TupleManager[Pattern[str]] | Iterable[tuple[str, Pattern[str]]] = {('double_quotes', re.compile('\\"(.*?)\\"')), ('emoji', re.compile('[🌀-🙏🚀-\U0001f6ff☀-⛿✀-➿]+')), ('initial', re.compile('^(\\w\\.|[A-Z])?$')), ('mac', re.compile('^(ma?c)(\\w{2,})', re.IGNORECASE)), ('no_vowels', re.compile('^[^aeyiuo]+$', re.IGNORECASE)), ('parenthesis', re.compile('\\((.*?)\\)')), ('patronymic', re.compile('(ovich|ovna|evich|evna|ichna|ilyich|kuzmich|lukich|fomich|fokich)$', re.IGNORECASE)), ('patronymic_cyrillic', re.compile('(ович|овна|евич|евна|ична|ильич|кузьмич|лукич|фомич|фокич)$')), ('period_abbreviation', re.compile('^[^\\W\\d_]{2,}\\.$')), ('period_not_at_end', re.compile('.*\\..+$', re.IGNORECASE)), ('phd', re.compile('\\s(ph\\.?\\s+d\\.?)', re.IGNORECASE)), ('quoted_word', re.compile("(?<!\\w)\\'([^\\s]*?)\\'(?!\\w)")), ('roman_numeral', re.compile('^(X|IX|IV|V?I{0,3})$', re.IGNORECASE)), ('space_before_comma', re.compile('\\s+,')), ('spaces', re.compile('\\s+')), ('word', re.compile('(\\w|\\.)+'))}, patronymic_name_order: bool = False, middle_name_as_last: bool = False)[source]¶
An instance of this class hold all of the configuration constants for the parser.
- Parameters:
prefixes (set) –
prefixeswrapped withSetManager.titles (set) –
titleswrapped withSetManager.first_name_titles (set) –
FIRST_NAME_TITLESwrapped withSetManager.suffix_acronyms (set) –
SUFFIX_ACRONYMSwrapped withSetManager.suffix_not_acronyms (set) –
SUFFIX_NOT_ACRONYMSwrapped withSetManager.suffix_acronyms_ambiguous (set) –
SUFFIX_ACRONYMS_AMBIGUOUSwrapped withSetManager.conjunctions (set) –
conjunctionswrapped withSetManager.first_name_prefixes (set) –
FIRST_NAME_PREFIXESwrapped withSetManager.capitalization_exceptions (tuple or dict) –
CAPITALIZATION_EXCEPTIONSwrapped withTupleManager.regexes (tuple or dict) –
regexeswrapped withTupleManager.
- capitalize_name = False¶
If set, applies
capitalize()toHumanNameinstance.>>> from nameparser.config import CONSTANTS >>> CONSTANTS.capitalize_name = True >>> name = HumanName("bob v. de la macdole-eisenhower phd") >>> str(name) 'Bob V. de la MacDole-Eisenhower Ph.D.'
- empty_attribute_default = ''¶
Default return value for empty attributes.
>>> from nameparser.config import CONSTANTS >>> CONSTANTS.empty_attribute_default = None >>> name = HumanName("John Doe") >>> print(name.title) None >>> name.first 'John'
- force_mixed_case_capitalization = False¶
If set, forces the capitalization of mixed case strings when
capitalize()is called.>>> from nameparser.config import CONSTANTS >>> CONSTANTS.force_mixed_case_capitalization = True >>> name = HumanName('Shirley Maclaine') >>> name.capitalize() >>> str(name) 'Shirley MacLaine'
- initials_delimiter = '.'¶
The default initials delimiter used for all new HumanName instances. Will be used to add a delimiter between each initial.
- initials_format = '{first} {middle} {last}'¶
The default initials format used for all new HumanName instances.
- initials_separator = ' '¶
The default separator placed between consecutive initials within a name group (first, middle, or last). Distinct from
initials_delimiter, which is the trailing character after each individual initial.With defaults
initials_delimiter="."andinitials_separator=" ",initials()produces"J. A. D.". Settinginitials_separator=""withinitials_delimiter="."andinitials_format="{first}{middle}{last}"produces"J.A.D.". With the defaultinitials_format, group-level spacing from the template is still applied.
- middle_name_as_last = False¶
If set, folds middle names into the last name:
middle_listis prepended tolast_listandmiddle_listis cleared, so.lastbecomes what.surnamesalready was and.middlebecomes empty. Useful for naming systems with no middle-name concept, where everything after the given name is lineage/family (e.g. Arabic patronymic chaining: given + father + grandfather + family).The fold is uniform across both no-comma and comma (“Last, First Middle”) input, so the two written forms of a name converge on the same result.
For per-instance control without a shared
Constants, pass a dedicated instance:HumanName("...", constants=Constants(middle_name_as_last=True)).>>> from nameparser import HumanName >>> from nameparser.config import Constants >>> C = Constants(middle_name_as_last=True) >>> hn = HumanName("Mohamad Ahmad Ali Hassan", constants=C) >>> hn.first, hn.middle, hn.last ('Mohamad', '', 'Ahmad Ali Hassan')
- patronymic_name_order = False¶
If set, detects names in Russian formal order (
Surname GivenName Patronymic) by recognizing a trailing East-Slavic patronymic suffix on the last token, and rotates the three name parts so thatfirst/middle/lastmap to given name / patronymic / surname respectively. Detection requires exactly one token in each of first, middle, and last; names with multi-part given names or multiple middle names are left unchanged.Opt-in because a Western person whose surname happens to end in a patronymic suffix (e.g.
"David Michael Abramovich") will be reordered incorrectly when the flag is on. Enable only when your data is predominantly Russian formal-order names.For per-instance control without a shared
Constants, pass a dedicated instance:HumanName("...", constants=Constants(patronymic_name_order=True)).>>> from nameparser import HumanName >>> from nameparser.config import Constants >>> C = Constants(patronymic_name_order=True) >>> hn = HumanName("Ivanov Ivan Ivanovich", constants=C) >>> hn.first, hn.middle, hn.last ('Ivan', 'Ivanovich', 'Ivanov')
- string_format = '{title} {first} {middle} {last} {suffix} ({nickname})'¶
The default string format use for all new HumanName instances.
- suffix_delimiter = None¶
If set, an additional delimiter used to split suffix groups after comma-splitting. For example, setting
suffix_delimiter=" - "allows"RN - CRNA"to be parsed as two separate suffixes. Default isNone(no additional splitting beyond the standard comma split).Note: setting this to
","or", "has no additional effect — the full name is already split on bare commas first, and each resulting part is stripped of surrounding whitespace before this step runs.Known limitation: the expansion is applied to all post-comma parts, not just suffix groups. In inverted format (
"Last, First, suffix"), the first-name part is also split on the delimiter. In practice this is harmless since first names rarely contain the delimiter string, but a name like"Doe, Mary - Kate, RN"withsuffix_delimiter=" - "would misparse.
- class nameparser.config.SetManager(elements: Iterable[str])[source]¶
Easily add and remove config variables per module or instance. Subclass of
collections.abc.Set.Only special functionality beyond that provided by set() is to normalize constants for comparison (lower case, no periods) when they are add()ed and remove()d and allow passing multiple string arguments to the
add()andremove()methods.- add(*strings: str) Self[source]¶
Add the lowercased, leading/trailing-periods-stripped version of the string arguments to the set. Can pass a list of strings. Returns
selffor chaining.
HumanName.config Defaults¶
- nameparser.config.titles.FIRST_NAME_TITLES = {'aunt', 'auntie', 'brother', 'cheikh', 'dame', 'father', 'king', 'maid', 'master', 'mother', 'pope', 'queen', 'shaik', 'shaikh', 'shayk', 'shaykh', 'sheik', 'sheikh', 'shekh', 'sir', 'sister', 'uncle'}¶
When these titles appear with a single other name, that name is a first name, e.g. “Sir John”, “Sister Mary”, “Queen Elizabeth”.
- nameparser.config.titles.TITLES = {'10th', '1lt', '1sgt', '1st', '1stlt', '1stsgt', '2lt', '2nd', '2ndlt', '3rd', '4th', '5th', '6th', '7th', '8th', '9th', 'a1c', 'ab', 'abbess', 'abbot', 'abolitionist', 'academic', 'acolyte', 'activist', 'actor ', 'actress', 'adept', 'adjutant', 'adm', 'admiral', 'advertising', 'adviser', 'advocate', 'air', 'akhoond', 'alderman', 'almoner', 'ambassador', 'amn', 'analytics', 'anarchist', 'animator', 'anthropologist', 'appellate', 'apprentice', 'arbitrator', 'archbishop', 'archdeacon', 'archdruid', 'archduchess', 'archduke', 'archeologist', 'architect', 'arhat', 'army', 'arranger', 'assistant', 'assoc', 'associate', 'asst', 'astronomer', 'attache', 'attaché', 'attorney', 'aunt', 'auntie', 'author', 'award-winning', 'ayatollah', 'baba', 'bailiff', 'ballet', 'bandleader', 'banker', 'banner', 'bard', 'baron', 'baroness', 'barrister', 'baseball', 'bearer', 'behavioral', 'bench', 'bg', 'bgen', 'biblical', 'bibliographer', 'biochemist', 'biographer', 'biologist', 'bishop', 'blessed', 'blogger', 'blues', 'bodhisattva', 'bookseller', 'botanist', 'bp', 'brigadier', 'briggen', 'british', 'broadcaster', 'brother', 'buddha', 'burgess', 'burlesque', 'business', 'businessman', 'businesswoman', 'bwana', 'canon', 'capt', 'captain', 'cardinal', 'cartographer', 'cartoonist', 'catholicos', 'ccmsgt', 'cdr', 'celebrity', 'ceo', 'cfo', 'chair', 'chairs', 'chancellor', 'chaplain', "chargé d'affaires", 'chef', 'cheikh', 'chemist', 'chief', 'chieftain', 'choreographer', 'civil', 'classical', 'clergyman', 'clerk', 'cmsaf', 'cmsgt', 'co-chair', 'co-chairs', 'co-founder', 'coach', 'col', 'collector', 'colonel', 'comedian', 'comedienne', 'comic', 'commander', 'commander-in-chief', 'commodore', 'composer', 'compositeur', 'comptroller', 'computer', 'comtesse', 'conductor', 'consultant', 'controller', 'corporal', 'corporate', 'correspondent', 'councillor', 'counselor', 'count', 'countess', 'courtier', 'cpl', 'cpo', 'cpt', 'credit', 'criminal', 'criminologist', 'critic', 'csm', 'curator', 'customs', 'cwo-2', 'cwo-3', 'cwo-4', 'cwo-5', 'cwo2', 'cwo3', 'cwo4', 'cwo5', 'cyclist', 'dame', 'dancer', 'dcn', 'deacon', 'delegate', 'deputy', 'designated', 'designer', 'detective', 'developer', 'dhr', 'dipl.-ing', 'diplomat', 'dir', 'director', 'discovery', 'dissident', 'district', 'division', 'do', 'docent', 'docket', 'doctor', 'doyen', 'dpty', 'dr', 'dra', 'dramatist', 'druid', 'drummer', 'duchesse', 'dutchess', 'ecologist', 'economist', 'editor', 'edler', 'edmi', 'edohen', 'educator', 'effendi', 'ekegbian', 'elerunwon', 'eminence', 'emperor', 'empress', 'engineer', 'english', 'ens', 'entertainer', 'entrepreneur', 'envoy', 'erzbischof', 'essayist', 'evangelist', 'excellency', 'excellent', 'exec', 'executive', 'expert', 'fadm', 'family', 'father', 'federal', 'fh-prof', 'field', 'film', 'financial', 'first', 'flag', 'flying', 'foreign', 'forester', 'founder', 'fr', 'frau', 'freifrau', 'freiherr', 'friar', 'frk', 'fru', 'fräulein', 'frøken', 'fürst', 'fürsterzbischof', 'gaf', 'gen', 'general', 'generalissimo', 'gentiluomo', 'giani', 'goodman', 'goodwife', 'governor', 'graf', 'grand', 'group', 'großfürst', 'gräfin', 'guitarist', 'guru', 'gyani', 'gysgt', 'hajji', 'headman', 'heir', 'heiress', 'her', 'hereditary', 'heren', 'herr', 'herren', 'herrn', 'herzog', 'high', 'highness', 'his', 'historian', 'historicus', 'historien', 'holiness', 'hon', 'honorable', 'honourable', 'host', 'hr', 'illustrator', 'imam', 'industrialist', 'information', 'instructor', 'intelligence', 'intendant', 'inventor', 'investigator', 'investor', 'journalist', 'journeyman', 'jr', 'judge', 'judicial', 'junior', 'jurist', 'keyboardist', 'king', "king's", 'kingdom', 'knowledge', 'lady', 'lama', 'lamido', 'law', 'lawyer', 'lcdr', 'lcpl', 'leader', 'lecturer', 'legal', 'librarian', 'lieutenant', 'linguist', 'literary', 'lord', 'lt', 'ltc', 'ltcol', 'ltg', 'ltgen', 'ltjg', 'lyricist', 'madam', 'madame', 'mademoiselle', 'mag', 'mag-judge', 'mag/judge', 'magistrate', 'magistrate-judge', 'magnate', 'maharajah', 'maharani', 'mahdi', 'maid', 'maj', 'majesty', 'majgen', 'manager', 'marcher', 'marchess', 'marchioness', 'marketing', 'marquess', 'marquis', 'marquise', 'master', 'mathematician', 'mathematics', 'matriarch', 'mayor', 'mcpo', 'mcpoc', 'mcpon', 'md', 'me', 'member', 'memoirist', 'merchant', 'met', 'metropolitan', 'mevr', 'mevrouw', 'mevrouwe', 'mg', 'mgr', 'mgysgt', 'military', 'minister', 'miss', 'misses', 'missionary', 'mister', 'mlle', 'mme', 'mobster', 'model', 'monk', 'monseigneur', 'monsieur', 'monsignor', 'most', 'mother', 'mountaineer', 'mpco-cg', 'mr', 'mrs', 'ms', 'msg', 'msgt', 'mufti', 'mullah', 'municipal', 'murshid', 'musician', 'musicologist', 'mx', 'mystery', 'nanny', 'narrator', 'national', 'naturalist', 'navy', 'neuroscientist', 'novelist', 'nurse', 'obstetritian', 'officer', 'opera', 'operating', 'ornithologist', 'painter', 'paleontologist', 'pastor', 'patriarch', 'pd', 'pediatrician', 'personality', 'petty', 'pfc', 'pharaoh', 'phd', 'philantropist', 'philosopher', 'photographer', 'physician', 'physicist', 'pianist', 'pilot', 'pioneer', 'pir', 'player', 'playwright', 'po1', 'po2', 'po3', 'poet', 'police', 'political', 'politician', 'pope', 'prefect', 'prelate', 'premier', 'pres', 'presbyter', 'president', 'presiding', 'priest', 'priestess', 'primate', 'prime', 'prin', 'prince', 'princess', 'principal', 'printer', 'printmaker', 'prinz', 'prior', 'priv.-doz', 'private', 'pro', 'producer', 'prof', 'professor', 'provost', 'pslc', 'psychiatrist', 'psychologist', 'publisher', 'pursuivant', 'pv2', 'pvt', 'queen', "queen's", 'ra', 'rabbi', 'radio', 'radm', 'rangatira', 'ranger', 'rdml', 'rear', 'rebbe', 'registrar', 'reichsgraf', 'rep', 'representative', 'researcher', 'resident', 'rev', 'revenue', 'reverend', 'right', 'risk', 'ritter', 'rock', 'royal', 'rt', 'sa', 'sailor', 'saint', 'sainte', 'saoshyant', 'satirist', 'scholar', 'schoolmaster', 'scientist', 'scpo', 'screenwriter', 'se', 'secretary', 'security', 'seigneur', 'senator', 'senhor', 'senhora', 'senhorita', 'senior', 'senior-judge', 'sergeant', 'servant', 'señor', 'señora', 'señores', 'señorita', 'señoritas', 'sfc', 'sgm', 'sgt', 'sgtmaj', 'sgtmajmc', 'shaik', 'shaikh', 'shayk', 'shaykh', 'shehu', 'sheik', 'sheikh', 'shekh', 'sheriff', 'siddha', 'signor', 'signora', 'signore', 'signorina', 'singer', 'singer-songwriter', 'sir', 'sister', 'sma', 'smsgt', 'sn', 'soccer', 'social', 'sociologist', 'software', 'soldier', 'solicitor', 'soprano', 'spc', 'speaker', 'special', 'sr', 'sra', 'sres', 'srta', 'srtas', 'ssg', 'ssgt', 'st', 'staff', 'state', 'states', 'strategy', 'subaltern', 'subedar', 'suffragist', 'sultan', 'sultana', 'superior', 'supreme', 'surgeon', 'swami', 'swordbearer', 'sysselmann', 'tax', 'teacher', 'technical', 'technologist', 'television ', 'tenor', 'theater', 'theatre', 'theologian', 'theorist', 'timi', 'tirthankar', 'translator', 'travel', 'treasurer', 'tsar', 'tsarina', 'tsgt', 'uk', 'uncle', 'united', 'univ.prof', 'us', 'vadm', 'vardapet', 'vc', 'venerable', 'verderer', 'vicar', 'vice', 'viscount', 'vizier', 'vocalist', 'voice', 'vrouwe', 'warden', 'warrant', 'wing', 'wm', 'wo-1', 'wo1', 'wo2', 'wo3', 'wo4', 'wo5', 'woodman', 'wp', 'writer', 'zoologist'}¶
Cannot include things that could also be first names, e.g. “dean”. Many of these from wikipedia: https://en.wikipedia.org/wiki/Title. The parser recognizes chains of these including conjunctions allowing recognition titles like “Deputy Secretary of State”.
- nameparser.config.suffixes.SUFFIX_ACRONYMS = {'8-vsb', 'aas', 'aba', 'abc', 'abd', 'abpp', 'abr', 'aca', 'acas', 'ace', 'acha', 'acp', 'ae', 'aem', 'afasma', 'afc', 'afm', 'agsf', 'aia', 'aicp', 'ala', 'alc', 'alp', 'am', 'amd', 'ame', 'amieee', 'ams', 'aphr', 'apn', 'apr', 'aprn', 'apss', 'aqp', 'arm', 'arrc', 'asa', 'asc', 'asid', 'asla', 'asp', 'atc', 'awb', 'ba', 'bca', 'bcl', 'bcss', 'bds', 'bem', 'bls-i', 'bn', 'bpe', 'bpi', 'bpt', 'bsc', 'bt', 'btcs', 'bts', 'cacts', 'cae', 'caha', 'caia', 'cams', 'cap', 'capa', 'capm', 'capp', 'caps', 'caro', 'cas', 'casp', 'cb', 'cbe', 'cbm', 'cbne', 'cbnt', 'cbp', 'cbrte', 'cbs', 'cbsp', 'cbt', 'cbte', 'cbv', 'cca', 'ccc', 'ccca', 'cccm', 'cce', 'cchp', 'ccie', 'ccim', 'cciso', 'ccm', 'ccmt', 'ccna', 'ccnp', 'ccp', 'ccp-c', 'ccpr', 'ccs', 'ccufc', 'cd', 'cdal', 'cdfm', 'cdmp', 'cds', 'cdt', 'cea', 'ceas', 'cebs', 'ceds', 'ceh', 'cela', 'cem', 'cep', 'cera', 'cet', 'cfa', 'cfc', 'cfcc', 'cfce', 'cfcm', 'cfe', 'cfeds', 'cfi', 'cfm', 'cfp', 'cfps', 'cfr', 'cfre', 'cga', 'cgap', 'cgb', 'cgc', 'cgfm', 'cgfo', 'cgm', 'cgma', 'cgp', 'cgr', 'cgsp', 'ch', 'cha', 'chba', 'chdm', 'che', 'ches', 'chfc', 'chi', 'chmc', 'chmm', 'chp', 'chpa', 'chpe', 'chpln', 'chpse', 'chrm', 'chsc', 'chse', 'chse-a', 'chsos', 'chss', 'cht', 'cia', 'cic', 'cie', 'cig', 'cip', 'cipm', 'cips', 'ciro', 'cisa', 'cism', 'cissp', 'cla', 'clsd', 'cltd', 'clu', 'cm', 'cma', 'cmas', 'cmc', 'cmfo', 'cmg', 'cmp', 'cms', 'cmsp', 'cmt', 'cna', 'cnm', 'cnp', 'cp', 'cp-c', 'cpa', 'cpacc', 'cpbe', 'cpcm', 'cpcu', 'cpe', 'cpfa', 'cpfo', 'cpg', 'cph', 'cpht', 'cpim', 'cpl', 'cplp', 'cpm', 'cpo', 'cpp', 'cppm', 'cprc', 'cpre', 'cprp', 'cpsc', 'cpsi', 'cpss', 'cpt', 'cpwa', 'crde', 'crisc', 'crma', 'crme', 'crna', 'cro', 'crp', 'crt', 'crtt', 'csa', 'csbe', 'csc', 'cscp', 'cscu', 'csep', 'csi', 'csm', 'csp', 'cspo', 'csre', 'csrte', 'csslp', 'cssm', 'cst', 'cste', 'ctbs', 'ctfa', 'cto', 'ctp', 'cts', 'cua', 'cusp', 'cva', 'cva[22]', 'cvo', 'cvp', 'cvrs', 'cwap', 'cwb', 'cwdp', 'cwep', 'cwna', 'cwne', 'cwp', 'cwsp', 'cxa', 'cyds', 'cysa', 'dabfm', 'dabvlm', 'dacvim', 'dbe', 'dc', 'dcb', 'dcm', 'dcmg', 'dcvo', 'dd', 'dds', 'ded', 'dep', 'dfc', 'dfm', 'diplac', 'diplom', 'djur', 'dma', 'dmd', 'dmin', 'dnp', 'do', 'dpm', 'dpt', 'drb', 'drmp', 'drph', 'dsc', 'dsm', 'dso', 'dss', 'dtr', 'dvep', 'dvm', 'ea', 'ed', 'edd', 'ei', 'eit', 'els', 'emd', 'emt-b', 'emt-i/85', 'emt-i/99', 'emt-p', 'enp', 'erd', 'esq', 'evp', 'faafp', 'faan', 'faap', 'fac-c', 'facc', 'facd', 'facem', 'facep', 'facha', 'facofp', 'facog', 'facp', 'facph', 'facs', 'faia', 'faicp', 'fala', 'fashp', 'fasid', 'fasla', 'fasma', 'faspen', 'fca', 'fcas', 'fcela', 'fd', 'fec', 'fhames', 'fic', 'ficf', 'fieee', 'fmp', 'fmva', 'fnss', 'fp&a', 'fp-c', 'fpc', 'frm', 'fsa', 'fsdp', 'fws', 'gaee[14]', 'gba', 'gbe', 'gc', 'gcb', 'gchs', 'gcie', 'gcmg', 'gcsi', 'gcvo', 'gisp', 'git', 'gm', 'gmb', 'gmr', 'gphr', 'gri', 'grp', 'gsmieee', 'hccp', 'hrs', 'iaccp', 'iaee', 'iccm-d', 'iccm-f', 'idsm', 'ifgict', 'iom', 'ipep', 'ipm', 'iso', 'issp-csp', 'issp-sa', 'itil', 'jd', 'jp', 'kbe', 'kcb', 'kchs/dchs', 'kcie', 'kcmg', 'kcsi', 'kcvo', 'kg', 'khs/dhs', 'kp', 'kt', 'lac', 'lcmt', 'lcpc', 'lcsw', 'leed ap', 'lg', 'litk', 'litl', 'litp', 'llm', 'lm', 'lmsw', 'lmt', 'lp', 'lpa', 'lpc', 'lpn', 'lpss', 'lsi', 'lsit', 'lt', 'lvn', 'lvo', 'lvt', 'ma', 'maaa', 'mai', 'mba', 'mbe', 'mbs', 'mc', 'mcct', 'mcdba', 'mches', 'mcm', 'mcp', 'mcpd', 'mcsa', 'mcsd', 'mcse', 'mct', 'md', 'mda', 'mdb', 'mdbb', 'mdep', 'mdhb', 'mdiv', 'mdl', 'mem', 'meng', 'mfa', 'micp', 'mieee', 'mirm', 'mle', 'mls', 'mlse', 'mlt', 'mm', 'mmad', 'mmas', 'mnaa', 'mnae', 'mp', 'mpa', 'mph', 'mpse', 'mra', 'ms', 'msa', 'msc', 'mscmsm', 'msm', 'mt', 'mts', 'mvo', 'nbc-his', 'nbcch', 'nbcch-ps', 'nbcdch', 'nbcdch-ps', 'nbcfch', 'nbcfch-ps', 'nbct', 'ncarb', 'nccp', 'ncidq', 'ncps', 'ncso', 'ncto', 'nd', 'ndtr', 'nicet i', 'nicet ii', 'nicet iii', 'nicet iv', 'nmd', 'np', 'np[18]', 'nraemt', 'nremr', 'nremt', 'nrp', 'obe', 'obi', 'oca', 'ocm', 'ocp', 'od', 'om', 'oscp', 'ot', 'pa-c', 'pcc', 'pci', 'pe', 'pfmp', 'pg', 'pgmp', 'ph', 'pharmd', 'phc', 'phd', 'phr', 'phrca', 'pla', 'pls', 'pmc', 'pmi-acp', 'pmp', 'pp', 'pps', 'prm', 'psm', 'psm i', 'psm ii', 'psp', 'psyd', 'pt', 'pta', 'qam', 'qc', 'qcsw', 'qfsm', 'qgm', 'qpm', 'qsd', 'qsp', 'ra', 'rai', 'rba', 'rci', 'rcp', 'rd', 'rdcs', 'rdh', 'rdms', 'rdn', 'res', 'rfp', 'rhca', 'rid', 'rls', 'rmsks', 'rn', 'rp', 'rpa', 'rph', 'rpl', 'rrc', 'rrt', 'rrt-accs', 'rrt-nps', 'rrt-sds', 'rtrp', 'rvm', 'rvt', 'sa', 'same', 'sasm', 'sccp', 'scmp', 'se', 'secb', 'sfp', 'sgm', 'shrm-cp', 'shrm-scp', 'si', 'siie', 'smieee', 'sphr', 'sra', 'sscp', 'stb', 'stmieee', 'tbr-ct', 'td', 'thd', 'thm', 'ud', 'usa', 'usaf', 'usar', 'uscg', 'usmc', 'usn', 'usnr', 'uxc', 'uxmc', 'vc', 'vcp', 'vd', 'vrd'}¶
Post-nominal acronyms. Titles, degrees and other things people stick after their name that may or may not have periods between the letters. The parser removes periods when matching against these pieces.
- nameparser.config.suffixes.SUFFIX_ACRONYMS_AMBIGUOUS = {'ed', 'jd'}¶
Acronym suffixes from SUFFIX_ACRONYMS that also plausibly collide with a common given-name nickname. Not a partition of SUFFIX_ACRONYMS – a small, standalone exception list consulted only by parse_nicknames().
- nameparser.config.suffixes.SUFFIX_NOT_ACRONYMS = {'2', 'dr', 'esq', 'esquire', 'i', 'ii', 'iii', 'iv', 'jnr', 'jr', 'junior', 'ret', 'snr', 'sr', 'v', 'vet'}¶
Post-nominal pieces that are not acronyms. The parser does not remove periods when matching against these pieces.
- nameparser.config.prefixes.PREFIXES = {"'t", 'aan', 'abu', 'aen', 'af', 'al', 'auf', 'av', 'bar', 'bat', 'bin', 'bint', 'bon', 'da', 'dal', 'de', "de'", 'degli', 'dei', 'del', 'dela', 'della', 'delle', 'delli', 'dello', 'dem', 'den', 'der', 'di', 'do', 'dos', 'du', 'dí', 'freiherr', 'freiherrin', 'heer', 'het', 'ibn', 'la', 'le', 'mac', 'mc', 'op', 'san', 'santa', 'st', 'ste', 'te', 'ter', 'tho', 'thoe', 'van', 'vande', 'vander', 'vd', 'vel', 'vom', 'von', 'zu'}¶
Name pieces that appear before a last name. Prefixes join to the piece that follows them to make one new piece. They can be chained together, e.g “von der” and “de la”. Because they only appear in middle or last names, they also signify that all following name pieces should be in the same name part, for example, “von” will be joined to all following pieces that are not prefixes or suffixes, allowing recognition of double last names when they appear after a prefixes. So in “pennie von bergen wessels MD”, “von” will join with all following name pieces until the suffix “MD”, resulting in the correct parsing of the last name “von bergen wessels”.
- nameparser.config.conjunctions.CONJUNCTIONS = {'&', 'and', 'e', 'et', 'of', 'the', 'und', 'y'}¶
Pieces that should join to their neighboring pieces, e.g. “and”, “y” and “&”. “of” and “the” are also include to facilitate joining multiple titles, e.g. “President of the United States”.
- nameparser.config.capitalization.CAPITALIZATION_EXCEPTIONS = (('ii', 'II'), ('iii', 'III'), ('iv', 'IV'), ('md', 'M.D.'), ('phd', 'Ph.D.'))¶
Any pieces that are not capitalized by capitalizing the first letter.
- nameparser.config.regexes.REGEXES = {('double_quotes', re.compile('\\"(.*?)\\"')), ('emoji', re.compile('[🌀-🙏🚀-\U0001f6ff☀-⛿✀-➿]+')), ('initial', re.compile('^(\\w\\.|[A-Z])?$')), ('mac', re.compile('^(ma?c)(\\w{2,})', re.IGNORECASE)), ('no_vowels', re.compile('^[^aeyiuo]+$', re.IGNORECASE)), ('parenthesis', re.compile('\\((.*?)\\)')), ('patronymic', re.compile('(ovich|ovna|evich|evna|ichna|ilyich|kuzmich|lukich|fomich|fokich)$', re.IGNORECASE)), ('patronymic_cyrillic', re.compile('(ович|овна|евич|евна|ична|ильич|кузьмич|лукич|фомич|фокич)$')), ('period_abbreviation', re.compile('^[^\\W\\d_]{2,}\\.$')), ('period_not_at_end', re.compile('.*\\..+$', re.IGNORECASE)), ('phd', re.compile('\\s(ph\\.?\\s+d\\.?)', re.IGNORECASE)), ('quoted_word', re.compile("(?<!\\w)\\'([^\\s]*?)\\'(?!\\w)")), ('roman_numeral', re.compile('^(X|IX|IV|V?I{0,3})$', re.IGNORECASE)), ('space_before_comma', re.compile('\\s+,')), ('spaces', re.compile('\\s+')), ('word', re.compile('(\\w|\\.)+'))}¶
All regular expressions used by the parser are precompiled and stored in the config.