HumanName Class Documentation¶

HumanName.parser¶

class nameparser.parser.HumanName[source]

Parse a person’s name into individual components.

Instantiation assigns to full_name, and assignment to full_name triggers parse_full_name(). After parsing the name, these instance attributes are available. Alternatively, you can pass any of the instance attributes to the constructor method and skip the parsing process. If any of the the instance attributes are passed to the constructor as keywords, parse_full_name() will not be performed.

HumanName Instance Attributes

title
first
middle
last
suffix
nickname
maiden
surnames
given_names

Parameters:

full_name (str) – The name string to be parsed.
constants – a Constants instance (subclasses are honored). Defaults to the shared module-level CONSTANTS. For per-instance config, pass Constants() for fresh library defaults, or CONSTANTS.copy() for a private snapshot of the current shared config. Passing None also builds a fresh Constants(), but is deprecated (warns; raises TypeError in 2.0, see issue #260) since it silently discards any customizations the caller may have expected to carry over. Anything else raises TypeError.
encoding (str) – string representing the encoding of your input (deprecated with bytes input, removal in 2.0 — decode before passing; see issue #245)
string_format (str) – python string formatting
initials_format (str) – python initials string formatting
initials_delimter (str) – string delimiter for initials
initials_separator (str) – string separator between consecutive initials
suffix_delimiter (str) – additional delimiter to split post-comma parts before suffix detection, e.g. " - " for "RN - CRNA"
first (str) – first name
middle (str) – middle name
last (str) – last name
title (str) – The title or prenominal
suffix (str) – The suffix or postnominal
nickname (str) – Nicknames
maiden (str) – Maiden name

property C: Constants¶

A reference to the configuration for this instance, which may or may not be a reference to the shared, module-wide instance at CONSTANTS. See Customizing the Parser.

Assigning a non-Constants value (besides None, which builds a fresh private Constants() and emits a DeprecationWarning – see __init__()) raises the same TypeError as passing an invalid constants argument to the constructor (#239).

__eq__(other: object) → bool[source]¶: Deprecated since version 1.3.0: Removed in 2.0 (see issue #223); use matches().

HumanName instances are equal to other objects whose lower case unicode representation is the same. Note the differences from matches(): this compares formatted output, so it depends on string_format and cannot see maiden, and it stringifies operands of any type.

__init__(full_name: str | bytes = '', constants: Constants | None = <Constants : [ prefixes: 59 suffix_acronyms: 620 suffix_not_acronyms: 16 titles: 670 first_name_titles: 22 conjunctions: 8 bound_first_names: 6 non_first_name_prefixes: 22 suffix_acronyms_ambiguous: 2 ]>, encoding: str = 'UTF-8', string_format: str | None = None, initials_format: str | None = None, initials_delimiter: str | None = None, initials_separator: str | None = None, suffix_delimiter: str | None = None, first: str | list[str] | None = None, middle: str | list[str] | None = None, last: str | list[str] | None = None, title: str | list[str] | None = None, suffix: str | list[str] | None = None, nickname: str | list[str] | None = None, maiden: str | list[str] | None = None) → None[source]¶

are_suffixes(pieces: Iterable[str]) → bool[source]¶

Return True if all pieces are suffixes.

Vacuously True for an empty iterable — the piece loops in parse_full_name() rely on this to route the final piece to the last-name branch.

are_suffixes_after_comma(pieces: Iterable[str]) → bool[source]¶: Return True if all pieces are suffixes by the lenient is_suffix_lenient() test. Used when detecting suffix-comma format (e.g. “John Ingram, V”) where the post-comma position is unambiguous.

as_dict(include_empty: bool = True) → dict[str, str][source]¶

Return the parsed name as a dictionary of its attributes.

Parameters:: include_empty (bool) – Include keys in the dictionary for empty name attributes.
Return type:: dict

>>> name = HumanName("Bob Dole")
>>> name.as_dict()
{'title': '', 'first': 'Bob', 'middle': '', 'last': 'Dole', 'suffix': '', 'nickname': '', 'maiden': ''}
>>> name.as_dict(False)
{'first': 'Bob', 'last': 'Dole'}

capitalize(force: bool | None = None) → None[source]¶

The HumanName class can try to guess the correct capitalization of name entered in all upper or lower case. By default, it will not adjust the case of names entered in mixed case. To run capitalization on all names pass the parameter force=True.

Parameters:: force (bool) – Forces capitalization of mixed case strings. This parameter overrides rules set within CONSTANTS.

Usage

>>> name = HumanName('bob v. de la macdole-eisenhower phd')
>>> name.capitalize()
>>> str(name)
'Bob V. de la MacDole-Eisenhower Ph.D.'
>>> # Don't touch good names
>>> name = HumanName('Shirley Maclaine')
>>> name.capitalize()
>>> str(name)
'Shirley Maclaine'
>>> name.capitalize(force=True)
>>> str(name)
'Shirley MacLaine'

comparison_key() → tuple[str, ...][source]¶

The seven name components (title, first, middle, last, suffix, nickname, maiden) as a lowercased tuple: a canonical, hashable identity for the parsed name. Use it for dedup, dict keys, and sorting or grouping, e.g. unique = {n.comparison_key(): n for n in names}.values().

Built from the *_list attributes, so it is unaffected by display settings like string_format and empty_attribute_default.

Empty or unparsable input yields the all-empty key, so such names all compare equal and collide in dedup; screen them out with len(name) == 0 first.

>>> HumanName("Dr. Juan Q. Xavier de la Vega III").comparison_key()
('dr.', 'juan', 'q. xavier', 'de la vega', 'iii', '', '')

expand_suffix_delimiter(part: str) → list[str][source]¶: Split a single post-comma part on suffix_delimiter, if configured. Used only at suffix-consumption sites, where a part has already been identified as a suffix group, so splitting it further can’t misparse an unrelated name segment. Returns [part] unchanged if no delimiter is configured.

property first: str¶: The person’s first name. The first name piece after any known title pieces parsed from full_name.

property full_name: str¶: The string output of the HumanName instance.

property given_names: str¶: A string of the first name followed by all middle names.

property given_names_list: list[str]¶: List of first name followed by middle names.

handle_capitalization() → None[source]¶: Handles capitalization configurations set within CONSTANTS.

handle_east_slavic_patronymic_name_order() → None[source]¶: When patronymic_name_order is enabled, detect Russian formal order (Surname GivenName Patronymic) and rotate to Western order. Fires only for no-comma, single-token first/middle/last where the last token is a patronymic and the middle token is not. Title, suffix, and nickname parts do not affect this guard — reordering proceeds regardless of whether they are present.

handle_firstnames() → None[source]¶: If there are only two parts and one is a title, assume it’s a last name instead of a first name. e.g. Mr. Johnson. Unless it’s a special title like “Sir”, then when it’s followed by a single name that name is always a first name.

handle_middle_name_as_last() → None[source]¶: When middle_name_as_last is enabled, fold middle_list into last_list (prepended, preserving order) and clear middle_list. No-op when middle_list is already empty.

handle_non_first_name_prefix() → None[source]¶: A leading prefix that is never a first name means the whole name is a surname – fold first (and any middle) into last. Keys on the parsed first name, so a non-leading particle (“Jean de Mesnil”) is untouched and title/suffix are preserved. The middle_list/last_list guard leaves a degenerate bare “de” as first=”de” rather than inventing a surname.

handle_turkic_patronymic_name_order() → None[source]¶: When patronymic_name_order is enabled, detect the reversed Turkic formal order (Surname GivenName PatronymicRoot Marker) and rotate to Western order. Fires only for the strict 4-token, no-comma shape: single-token first/last and exactly two middle tokens, where the last token is a recognised Turkic patronymic marker.

property has_own_config: bool¶: True if this instance is not using the shared module-level configuration.

initials() → str[source]¶

Return formatted initials for the name, controlled by initials_format, initials_delimiter, and initials_separator.

initials_delimiter is appended after each individual initial. initials_separator is placed between consecutive initials within a name group (first, middle, or last). Both can be set as Constants attributes or as HumanName constructor kwargs.

>>> name = HumanName("Sir Bob Andrew Dole")
>>> name.initials()
'B. A. D.'
>>> name = HumanName("Sir Bob Andrew Dole", initials_format="{first} {middle}")
>>> name.initials()
'B. A.'
>>> name = HumanName("Doe, John A.", initials_delimiter="", initials_separator="")
>>> name.initials()
'J A D'

initials_list() → list[str][source]¶

Returns the initials as a list

>>> name = HumanName("Sir Bob Andrew Dole")
>>> name.initials_list()
['B', 'A', 'D']
>>> name = HumanName("J. Doe")
>>> name.initials_list()
['J', 'D']

is_an_initial(value: str) → bool[source]¶

Words with a single period at the end, or a single uppercase letter.

Matches the initial regular expression in REGEXES.

is_bound_first_name(piece: str) → bool[source]¶: Lowercased, leading/trailing-periods-stripped version of piece is in bound_first_names.

is_conjunction(piece: str | list[str]) → bool[source]¶: Is in the conjunctions set — config or derived earlier in this parse (e.g. "of the") — and not is_an_initial().

is_east_slavic_patronymic(piece: str) → bool[source]¶: Return True if piece ends with a recognised East-Slavic patronymic suffix, checked against both Latin-script and Cyrillic patterns in self.C.regexes. Latin suffixes: -ovich, -ovna, -evich, -evna, -ichna, and the irregular forms -ilyich, -kuzmich, -lukich, -fomich, -fokich. Cyrillic equivalents are matched by a separate pattern.

is_leading_title(piece: str) → bool[source]¶: True if piece is a known title, or an unrecognized multi-letter word ending in a single trailing period (e.g. "Major."). The {2,} in the period_abbreviation regex, not a separate is_an_initial() check, is what excludes single-letter initials like "J.". Only meaningful for pieces in the title position (before the first name is set) — a period-abbreviation appearing later in the name is left as a middle name. The match is not registered in C.titles or the per-parse derived titles, so matching "Major." here never makes "Major" (or "Major.") a recognized title elsewhere, even within the same parse.

is_non_first_name_prefix(piece: str) → bool[source]¶: Lowercased, leading/trailing-periods-stripped version of piece is in non_first_name_prefixes.

is_prefix(piece: str | list[str]) → bool[source]¶: Lowercased, leading/trailing-periods-stripped version of piece is in the PREFIXES set, or was derived as a prefix earlier in this parse (e.g. "von und").

is_roman_numeral(value: str) → bool[source]¶: Matches the roman_numeral regular expression in REGEXES.

is_rootname(piece: str) → bool[source]¶: Is not a known title, suffix or prefix. Just first, middle, last names.

is_suffix(piece: str | list[str]) → bool[source]¶

Is in the suffixes set — or was derived as a period-joined suffix earlier in this parse (e.g. "JD.CPA") — and not is_an_initial().

Some suffixes may be acronyms (M.B.A) while some are not (Jr.), so we remove the periods from piece when testing against C.suffix_acronyms.

is_suffix_lenient(piece: str) → bool[source]¶

Like is_suffix(), but suffix_not_acronyms members are accepted unconditionally, bypassing is_suffix()’s is_an_initial() veto.

This covers all suffix_not_acronyms members (i, ii, iii, iv, v, jr, sr, etc.), case-insensitively, including single-letter entries that is_suffix() would otherwise reject. Only safe for pieces in unambiguous positions, e.g. after a comma (“John Ingram, V”).

is_title(value: str) → bool[source]¶: Is in the TITLES set or was derived as a title earlier in this parse (e.g. "Lt.Gov.", "Mr. and Mrs.").

is_turkic_patronymic_marker(piece: str) → bool[source]¶: Return True if piece is exactly a recognised Turkic patronymic marker word (e.g. oglu, qizi, uly), checked against both Latin-script and Cyrillic patterns in self.C.regexes. Unlike East-Slavic patronymics, these are standalone marker words, not suffixes, so the match is whole-word rather than a suffix search.

join_on_conjunctions(pieces: list[str], additional_parts_count: int = 0) → list[str][source]¶

Join conjunctions to surrounding pieces. Title- and prefix-aware. e.g.:

[‘Mr.’, ‘and’, ‘Mrs.’, ‘John’, ‘Doe’] ==>
[‘Mr. and Mrs.’, ‘John’, ‘Doe’]

[‘The’, ‘Secretary’, ‘of’, ‘State’, ‘Hillary’, ‘Clinton’] ==>
[‘The Secretary of State’, ‘Hillary’, ‘Clinton’]

When joining titles, registers the newly formed piece as a derived title for the current parse so it will be recognized correctly later in the same parse. E.g. while parsing the example names above, ‘The Secretary of State’ and ‘Mr. and Mrs.’ are treated as titles. The configuration in self.C is never modified.

Parameters:

pieces (list) – name pieces strings after split on spaces
additional_parts_count (int)

Returns:

new list with piece next to conjunctions merged into one piece with spaces in it.

Return type:

list

property last: str¶: The person’s last name. The last name piece parsed from full_name.

property last_base: str¶

The last name with leading prefix particles removed (the core surname). For "van Gogh" this is "Gogh"; for "Smith" it is "Smith". last is always unchanged. When every word in the last name matches a prefix particle, no stripping occurs and the full last name is returned.

>>> HumanName("Vincent van Gogh").last_base
'Gogh'
>>> HumanName("John Smith").last_base
'Smith'

property last_base_list: list[str]¶

List of last-name words after stripping leading prefix particles. Never empty: when every word matches a prefix, no stripping occurs and the full last name is returned — see _split_last().

>>> HumanName("Vincent van Gogh").last_base_list
['Gogh']

property last_prefixes: str¶

The leading prefix particle(s) of the last name (the tussenvoegsel). Returns "" (or empty_attribute_default) when there are none, including when every word in the last name matches a prefix particle (the all-particles guard; see _split_last()).

>>> HumanName("Vincent van Gogh").last_prefixes
'van'
>>> HumanName("Juan de la Vega").last_prefixes
'de la'

property last_prefixes_list: list[str]¶

List of leading prefix particles in the last name (the tussenvoegsel). Returns [] when there are none, including the case where every word in the last name matches a prefix — see _split_last().

>>> HumanName("Juan de la Vega").last_prefixes_list
['de', 'la']

property maiden: str¶: The person’s maiden (alternate/prior) last name. Empty unless a delimiter has been routed to it via maiden_delimiters – see the “Routing to Maiden Name” section of the customization docs.

matches(other: str | HumanName) → bool[source]¶

Compare parsed components case-insensitively; the semantic replacement for the deprecated ==. A str argument is parsed first, using this instance’s configuration, so any written form of the same name matches; a HumanName argument is compared as already parsed — its own configuration determined its components. Two empty or unparsable names match each other; check len(name) == 0 to screen them.

>>> name = HumanName("Dr. Juan Q. Xavier de la Vega III")
>>> name.matches("de la vega, dr. juan Q. xavier III")
True
>>> name.matches("Juan de la Vega")
False

Unlike the deprecated ==, all seven components participate (including maiden, which the default string_format omits) and display settings have no effect. Raises TypeError for anything that is not a str or HumanName; guard optional values explicitly, e.g. x is not None and name.matches(x).

Parses string arguments on every call. When matching one name against many candidates, parse the candidates once or compare comparison_key() values instead.

property middle: str¶: The person’s middle names. All name pieces after the first name and before the last name parsed from full_name.

property nickname: str¶: The person’s nicknames. Any text found inside of quotes ("") or parenthesis (())

original: str | bytes = ''¶: The original string, untouched by the parser.

parse_full_name() → None[source]¶

The main parse method for the parser. This method is run upon assignment to the full_name attribute or instantiation.

Basic flow is to hand off to pre_process() to handle nicknames. It then splits on commas and chooses a code path depending on the number of commas.

parse_pieces() then splits those parts on spaces and join_on_conjunctions() joins any pieces next to conjunctions.

parse_nicknames() → None[source]¶

Delimited content in the name is routed to either the nickname or maiden bucket, based on which of nickname_delimiters / maiden_delimiters the matching delimiter belongs to – unless that content is suffix-shaped – an unambiguous suffix_not_acronyms/suffix_acronyms member, or content ending in a period – in which case it’s left in place (undelimited) for normal downstream suffix/title/word parsing instead. This happens before any other processing of the name.

Single quotes cannot span white space characters and must border white space to allow for quotes in names like O’Connor and Kawai’ae’a. Double quotes and parenthesis can span white space.

By default, nickname_delimiters holds the three built-in delimiters (quoted_word, double_quotes and parenthesis, resolved live from regexes so overriding e.g. CONSTANTS.regexes.parenthesis keeps affecting nickname parsing) and maiden_delimiters is empty. Move a key between the two dicts, e.g. maiden_delimiters['parenthesis'] = nickname_delimiters.pop('parenthesis'), to route it to maiden instead, or add a new compiled pattern under any key to recognize an additional delimiter – see the “Adding Custom Nickname Delimiters” and “Routing to Maiden Name” sections of the customization docs.

parse_pieces(parts: Iterable[str], additional_parts_count: int = 0) → list[str][source]¶

Split parts on spaces and remove commas, join on conjunctions and lastname prefixes. Tokens that are empty after stripping spaces and commas are dropped, so the returned pieces never contain empty strings. If parts have periods in the middle, try splitting on periods and check if the parts are titles or suffixes. If they are, register the periods-joined part as a derived title/suffix for this parse so it will be recognized; the constants are not modified.

Parameters:

parts (list) – name part strings from the comma split
additional_parts_count (int) – if the comma format contains other parts, we need to know how many there are to decide if things should be considered a conjunction.

Returns:

pieces split on spaces and joined on conjunctions

Return type:

list

post_process() → None[source]¶: This happens at the end of the parse_full_name() after all other processing has taken place. Runs handle_firstnames() and handle_capitalization().

pre_process() → None[source]¶: This method happens at the beginning of the parse_full_name() before any other processing of the string aside from unicode normalization, so it’s a good place to do any custom handling in a subclass. Runs squash_bidi(), parse_nicknames() and squash_emoji().

squash_bidi() → None[source]¶: Remove invisible bidirectional control characters from the input string. They carry no name content but stick to the parts they surround, so parsed attributes stop comparing equal to the clean name.

squash_emoji() → None[source]¶: Remove emoji from the input string.

property suffix: str¶: The persons’s suffixes. Pieces at the end of the name that are found in suffixes, or pieces that are at the end of comma separated formats, e.g. “Lastname, Title Firstname Middle[,] Suffix [, Suffix]” parsed from full_name.

property surnames: str¶: A string of all middle names followed by the last name.

property surnames_list: list[str]¶: List of middle names followed by last name.

property title: str¶: The person’s titles. Any string of consecutive pieces in titles or conjunctions at the beginning of full_name.

HumanName.config¶

The nameparser.config module manages the configuration of the nameparser.

Constants is for application-level configuration, set once at startup. CONSTANTS, the module-level instance used by every HumanName created without its own config, is the only channel that reaches parses happening in code you don’t own (helpers, pipelines, a third-party library using nameparser internally) – the same role logging and locale play elsewhere. Import it and change it directly:

>>> from nameparser.config import CONSTANTS
>>> CONSTANTS.titles.remove('hon').add('chemistry','dean')

For anything scoped – one dataset, one library, one test – pass your own Constants instance as the second argument upon instantiation instead: Constants() for fresh library defaults, or CONSTANTS.copy() for a private snapshot of the current module config.

>>> from nameparser import HumanName
>>> from nameparser.config import Constants
>>> hn = HumanName("Dean Robert Johns", Constants())
>>> hn.C.titles.add('dean')
>>> hn.parse_full_name() # need to run this again after config changes

Mixing the two up is where the surprises come from, not the API itself: if you do not pass your own Constants instance as the second argument, hn.C will be a reference to the module config, and a change there reaches every other instance sharing it. See Customizing the Parser.

Deprecated since version 1.4.0: Passing None as the second argument also builds a fresh Constants(), but is deprecated in favor of the explicit spellings above; it will raise TypeError in 2.0 (issue #260).

nameparser.config.CONSTANTS = <Constants : [ prefixes: 59 suffix_acronyms: 620 suffix_not_acronyms: 16 titles: 670 first_name_titles: 22 conjunctions: 8 bound_first_names: 6 non_first_name_prefixes: 22 suffix_acronyms_ambiguous: 2 ]>¶: A module-level instance of the Constants() class. Provides a common instance for the module to share to easily adjust configuration for the entire module. See Customizing the Parser with Your Own Configuration.

class nameparser.config.Constants(prefixes: Iterable[str] = {"'t", 'aan', 'abu', 'aen', 'af', 'al', 'auf', 'av', 'bar', 'bat', 'bin', 'bint', 'bon', 'da', 'dal', 'de', "de'", 'degli', 'dei', 'del', 'dela', 'della', 'delle', 'delli', 'dello', 'dem', 'den', 'der', 'di', 'do', 'dos', 'du', 'dí', 'freiherr', 'freiherrin', 'heer', 'het', 'ibn', 'la', 'le', 'mac', 'mc', 'op', 'san', 'santa', 'st', 'ste', 'te', 'ter', 'tho', 'thoe', 'van', 'vande', 'vander', 'vd', 'vel', 'vom', 'von', 'zu'}, suffix_acronyms: Iterable[str] = {'8-vsb', 'aas', 'aba', 'abc', 'abd', 'abpp', 'abr', 'aca', 'acas', 'ace', 'acha', 'acp', 'ae', 'aem', 'afasma', 'afc', 'afm', 'agsf', 'aia', 'aicp', 'ala', 'alc', 'alp', 'am', 'amd', 'ame', 'amieee', 'ams', 'aphr', 'apn', 'apr', 'aprn', 'apss', 'aqp', 'arm', 'arrc', 'asa', 'asc', 'asid', 'asla', 'asp', 'atc', 'awb', 'ba', 'bca', 'bcl', 'bcss', 'bds', 'bem', 'bls-i', 'bn', 'bpe', 'bpi', 'bpt', 'bsc', 'bt', 'btcs', 'bts', 'cacts', 'cae', 'caha', 'caia', 'cams', 'cap', 'capa', 'capm', 'capp', 'caps', 'caro', 'cas', 'casp', 'cb', 'cbe', 'cbm', 'cbne', 'cbnt', 'cbp', 'cbrte', 'cbs', 'cbsp', 'cbt', 'cbte', 'cbv', 'cca', 'ccc', 'ccca', 'cccm', 'cce', 'cchp', 'ccie', 'ccim', 'cciso', 'ccm', 'ccmt', 'ccna', 'ccnp', 'ccp', 'ccp-c', 'ccpr', 'ccs', 'ccufc', 'cd', 'cdal', 'cdfm', 'cdmp', 'cds', 'cdt', 'cea', 'ceas', 'cebs', 'ceds', 'ceh', 'cela', 'cem', 'cep', 'cera', 'cet', 'cfa', 'cfc', 'cfcc', 'cfce', 'cfcm', 'cfe', 'cfeds', 'cfi', 'cfm', 'cfp', 'cfps', 'cfr', 'cfre', 'cga', 'cgap', 'cgb', 'cgc', 'cgfm', 'cgfo', 'cgm', 'cgma', 'cgp', 'cgr', 'cgsp', 'ch', 'cha', 'chba', 'chdm', 'che', 'ches', 'chfc', 'chi', 'chmc', 'chmm', 'chp', 'chpa', 'chpe', 'chpln', 'chpse', 'chrm', 'chsc', 'chse', 'chse-a', 'chsos', 'chss', 'cht', 'cia', 'cic', 'cie', 'cig', 'cip', 'cipm', 'cips', 'ciro', 'cisa', 'cism', 'cissp', 'cla', 'clsd', 'cltd', 'clu', 'cm', 'cma', 'cmas', 'cmc', 'cmfo', 'cmg', 'cmp', 'cms', 'cmsp', 'cmt', 'cna', 'cnm', 'cnp', 'cp', 'cp-c', 'cpa', 'cpacc', 'cpbe', 'cpcm', 'cpcu', 'cpe', 'cpfa', 'cpfo', 'cpg', 'cph', 'cpht', 'cpim', 'cpl', 'cplp', 'cpm', 'cpo', 'cpp', 'cppm', 'cprc', 'cpre', 'cprp', 'cpsc', 'cpsi', 'cpss', 'cpt', 'cpwa', 'crde', 'crisc', 'crma', 'crme', 'crna', 'cro', 'crp', 'crt', 'crtt', 'csa', 'csbe', 'csc', 'cscp', 'cscu', 'csep', 'csi', 'csm', 'csp', 'cspo', 'csre', 'csrte', 'csslp', 'cssm', 'cst', 'cste', 'ctbs', 'ctfa', 'cto', 'ctp', 'cts', 'cua', 'cusp', 'cva', 'cva[22]', 'cvo', 'cvp', 'cvrs', 'cwap', 'cwb', 'cwdp', 'cwep', 'cwna', 'cwne', 'cwp', 'cwsp', 'cxa', 'cyds', 'cysa', 'dabfm', 'dabvlm', 'dacvim', 'dbe', 'dc', 'dcb', 'dcm', 'dcmg', 'dcvo', 'dd', 'dds', 'ded', 'dep', 'dfc', 'dfm', 'diplac', 'diplom', 'djur', 'dma', 'dmd', 'dmin', 'dnp', 'do', 'dpm', 'dpt', 'drb', 'drmp', 'drph', 'dsc', 'dsm', 'dso', 'dss', 'dtr', 'dvep', 'dvm', 'ea', 'ed', 'edd', 'ei', 'eit', 'els', 'emd', 'emt-b', 'emt-i/85', 'emt-i/99', 'emt-p', 'enp', 'erd', 'esq', 'evp', 'faafp', 'faan', 'faap', 'fac-c', 'facc', 'facd', 'facem', 'facep', 'facha', 'facofp', 'facog', 'facp', 'facph', 'facs', 'faia', 'faicp', 'fala', 'fashp', 'fasid', 'fasla', 'fasma', 'faspen', 'fca', 'fcas', 'fcela', 'fd', 'fec', 'fhames', 'fic', 'ficf', 'fieee', 'fmp', 'fmva', 'fnss', 'fp&a', 'fp-c', 'fpc', 'frm', 'fsa', 'fsdp', 'fws', 'gaee[14]', 'gba', 'gbe', 'gc', 'gcb', 'gchs', 'gcie', 'gcmg', 'gcsi', 'gcvo', 'gisp', 'git', 'gm', 'gmb', 'gmr', 'gphr', 'gri', 'grp', 'gsmieee', 'hccp', 'hrs', 'iaccp', 'iaee', 'iccm-d', 'iccm-f', 'idsm', 'ifgict', 'iom', 'ipep', 'ipm', 'iso', 'issp-csp', 'issp-sa', 'itil', 'jd', 'jp', 'kbe', 'kcb', 'kchs/dchs', 'kcie', 'kcmg', 'kcsi', 'kcvo', 'kg', 'khs/dhs', 'kp', 'kt', 'lac', 'lcmt', 'lcpc', 'lcsw', 'leed ap', 'lg', 'litk', 'litl', 'litp', 'llm', 'lm', 'lmsw', 'lmt', 'lp', 'lpa', 'lpc', 'lpn', 'lpss', 'lsi', 'lsit', 'lt', 'lvn', 'lvo', 'lvt', 'ma', 'maaa', 'mai', 'mba', 'mbe', 'mbs', 'mc', 'mcct', 'mcdba', 'mches', 'mcm', 'mcp', 'mcpd', 'mcsa', 'mcsd', 'mcse', 'mct', 'md', 'mda', 'mdb', 'mdbb', 'mdep', 'mdhb', 'mdiv', 'mdl', 'mem', 'meng', 'mfa', 'micp', 'mieee', 'mirm', 'mle', 'mls', 'mlse', 'mlt', 'mm', 'mmad', 'mmas', 'mnaa', 'mnae', 'mp', 'mpa', 'mph', 'mpse', 'mra', 'ms', 'msa', 'msc', 'mscmsm', 'msm', 'mt', 'mts', 'mvo', 'nbc-his', 'nbcch', 'nbcch-ps', 'nbcdch', 'nbcdch-ps', 'nbcfch', 'nbcfch-ps', 'nbct', 'ncarb', 'nccp', 'ncidq', 'ncps', 'ncso', 'ncto', 'nd', 'ndtr', 'nicet i', 'nicet ii', 'nicet iii', 'nicet iv', 'nmd', 'np', 'np[18]', 'nraemt', 'nremr', 'nremt', 'nrp', 'obe', 'obi', 'oca', 'ocm', 'ocp', 'od', 'om', 'oscp', 'ot', 'pa-c', 'pcc', 'pci', 'pe', 'pfmp', 'pg', 'pgmp', 'ph', 'pharmd', 'phc', 'phd', 'phr', 'phrca', 'pla', 'pls', 'pmc', 'pmi-acp', 'pmp', 'pp', 'pps', 'prm', 'psm', 'psm i', 'psm ii', 'psp', 'psyd', 'pt', 'pta', 'qam', 'qc', 'qcsw', 'qfsm', 'qgm', 'qpm', 'qsd', 'qsp', 'ra', 'rai', 'rba', 'rci', 'rcp', 'rd', 'rdcs', 'rdh', 'rdms', 'rdn', 'res', 'rfp', 'rhca', 'rid', 'rls', 'rmsks', 'rn', 'rp', 'rpa', 'rph', 'rpl', 'rrc', 'rrt', 'rrt-accs', 'rrt-nps', 'rrt-sds', 'rtrp', 'rvm', 'rvt', 'sa', 'same', 'sasm', 'sccp', 'scmp', 'se', 'secb', 'sfp', 'sgm', 'shrm-cp', 'shrm-scp', 'si', 'siie', 'smieee', 'sphr', 'sra', 'sscp', 'stb', 'stmieee', 'tbr-ct', 'td', 'thd', 'thm', 'ud', 'usa', 'usaf', 'usar', 'uscg', 'usmc', 'usn', 'usnr', 'uxc', 'uxmc', 'vc', 'vcp', 'vd', 'vrd'}, suffix_not_acronyms: Iterable[str] = {'2', 'dr', 'esq', 'esquire', 'i', 'ii', 'iii', 'iv', 'jnr', 'jr', 'junior', 'ret', 'snr', 'sr', 'v', 'vet'}, suffix_acronyms_ambiguous: Iterable[str] = {'ed', 'jd'}, titles: Iterable[str] = {'10th', '1lt', '1sgt', '1st', '1stlt', '1stsgt', '2lt', '2nd', '2ndlt', '3rd', '4th', '5th', '6th', '7th', '8th', '9th', 'a1c', 'ab', 'abbess', 'abbot', 'abolitionist', 'academic', 'acolyte', 'activist', 'actor ', 'actress', 'adept', 'adjutant', 'adm', 'admiral', 'advertising', 'adviser', 'advocate', 'air', 'akhoond', 'alderman', 'almoner', 'ambassador', 'amn', 'analytics', 'anarchist', 'animator', 'anthropologist', 'appellate', 'apprentice', 'arbitrator', 'archbishop', 'archdeacon', 'archdruid', 'archduchess', 'archduke', 'archeologist', 'architect', 'arhat', 'army', 'arranger', 'assistant', 'assoc', 'associate', 'asst', 'astronomer', 'attache', 'attaché', 'attorney', 'aunt', 'auntie', 'author', 'award-winning', 'ayatollah', 'baba', 'bailiff', 'ballet', 'bandleader', 'banker', 'banner', 'bard', 'baron', 'baroness', 'barrister', 'baseball', 'bearer', 'behavioral', 'bench', 'bg', 'bgen', 'biblical', 'bibliographer', 'biochemist', 'biographer', 'biologist', 'bishop', 'blessed', 'blogger', 'blues', 'bodhisattva', 'bookseller', 'botanist', 'bp', 'brigadier', 'briggen', 'british', 'broadcaster', 'brother', 'buddha', 'burgess', 'burlesque', 'business', 'businessman', 'businesswoman', 'bwana', 'canon', 'capt', 'captain', 'cardinal', 'cartographer', 'cartoonist', 'catholicos', 'ccmsgt', 'cdr', 'celebrity', 'ceo', 'cfo', 'chair', 'chairs', 'chancellor', 'chaplain', "chargé d'affaires", 'chef', 'cheikh', 'chemist', 'chief', 'chieftain', 'choreographer', 'civil', 'classical', 'clergyman', 'clerk', 'cmsaf', 'cmsgt', 'co-chair', 'co-chairs', 'co-founder', 'coach', 'col', 'collector', 'colonel', 'comedian', 'comedienne', 'comic', 'commander', 'commander-in-chief', 'commodore', 'composer', 'compositeur', 'comptroller', 'computer', 'comtesse', 'conductor', 'consultant', 'controller', 'corporal', 'corporate', 'correspondent', 'councillor', 'counselor', 'count', 'countess', 'courtier', 'cpl', 'cpo', 'cpt', 'credit', 'criminal', 'criminologist', 'critic', 'csm', 'curator', 'customs', 'cwo-2', 'cwo-3', 'cwo-4', 'cwo-5', 'cwo2', 'cwo3', 'cwo4', 'cwo5', 'cyclist', 'dame', 'dancer', 'dcn', 'deacon', 'delegate', 'deputy', 'designated', 'designer', 'detective', 'developer', 'dhr', 'dipl.-ing', 'diplomat', 'dir', 'director', 'discovery', 'dissident', 'district', 'division', 'do', 'docent', 'docket', 'doctor', 'doyen', 'dpty', 'dr', 'dra', 'dramatist', 'druid', 'drummer', 'duchesse', 'dutchess', 'ecologist', 'economist', 'editor', 'edler', 'edmi', 'edohen', 'educator', 'effendi', 'ekegbian', 'elerunwon', 'eminence', 'emperor', 'empress', 'engineer', 'english', 'ens', 'entertainer', 'entrepreneur', 'envoy', 'erzbischof', 'essayist', 'evangelist', 'excellency', 'excellent', 'exec', 'executive', 'expert', 'fadm', 'family', 'father', 'federal', 'fh-prof', 'field', 'film', 'financial', 'first', 'flag', 'flying', 'foreign', 'forester', 'founder', 'fr', 'frau', 'freifrau', 'freiherr', 'friar', 'frk', 'fru', 'fräulein', 'frøken', 'fürst', 'fürsterzbischof', 'gaf', 'gen', 'general', 'generalissimo', 'gentiluomo', 'giani', 'goodman', 'goodwife', 'governor', 'graf', 'grand', 'group', 'großfürst', 'gräfin', 'guitarist', 'guru', 'gyani', 'gysgt', 'hajji', 'headman', 'heir', 'heiress', 'her', 'hereditary', 'heren', 'herr', 'herren', 'herrn', 'herzog', 'high', 'highness', 'his', 'historian', 'historicus', 'historien', 'holiness', 'hon', 'honorable', 'honourable', 'host', 'hr', 'illustrator', 'imam', 'industrialist', 'information', 'instructor', 'intelligence', 'intendant', 'inventor', 'investigator', 'investor', 'journalist', 'journeyman', 'jr', 'judge', 'judicial', 'junior', 'jurist', 'keyboardist', 'king', "king's", 'kingdom', 'knowledge', 'lady', 'lama', 'lamido', 'law', 'lawyer', 'lcdr', 'lcpl', 'leader', 'lecturer', 'legal', 'librarian', 'lieutenant', 'linguist', 'literary', 'lord', 'lt', 'ltc', 'ltcol', 'ltg', 'ltgen', 'ltjg', 'lyricist', 'madam', 'madame', 'mademoiselle', 'mag', 'mag-judge', 'mag/judge', 'magistrate', 'magistrate-judge', 'magnate', 'maharajah', 'maharani', 'mahdi', 'maid', 'maj', 'majesty', 'majgen', 'manager', 'marcher', 'marchess', 'marchioness', 'marketing', 'marquess', 'marquis', 'marquise', 'master', 'mathematician', 'mathematics', 'matriarch', 'mayor', 'mcpo', 'mcpoc', 'mcpon', 'md', 'me', 'member', 'memoirist', 'merchant', 'met', 'metropolitan', 'mevr', 'mevrouw', 'mevrouwe', 'mg', 'mgr', 'mgysgt', 'military', 'minister', 'miss', 'misses', 'missionary', 'mister', 'mlle', 'mme', 'mobster', 'model', 'monk', 'monseigneur', 'monsieur', 'monsignor', 'most', 'mother', 'mountaineer', 'mpco-cg', 'mr', 'mrs', 'ms', 'msg', 'msgt', 'mufti', 'mullah', 'municipal', 'murshid', 'musician', 'musicologist', 'mx', 'mystery', 'nanny', 'narrator', 'national', 'naturalist', 'navy', 'neuroscientist', 'novelist', 'nurse', 'obstetritian', 'officer', 'opera', 'operating', 'ornithologist', 'painter', 'paleontologist', 'pastor', 'patriarch', 'pd', 'pediatrician', 'personality', 'petty', 'pfc', 'pharaoh', 'phd', 'philantropist', 'philosopher', 'photographer', 'physician', 'physicist', 'pianist', 'pilot', 'pioneer', 'pir', 'player', 'playwright', 'po1', 'po2', 'po3', 'poet', 'police', 'political', 'politician', 'pope', 'prefect', 'prelate', 'premier', 'pres', 'presbyter', 'president', 'presiding', 'priest', 'priestess', 'primate', 'prime', 'prin', 'prince', 'princess', 'principal', 'printer', 'printmaker', 'prinz', 'prior', 'priv.-doz', 'private', 'pro', 'producer', 'prof', 'professor', 'provost', 'pslc', 'psychiatrist', 'psychologist', 'publisher', 'pursuivant', 'pv2', 'pvt', 'queen', "queen's", 'ra', 'rabbi', 'radio', 'radm', 'rangatira', 'ranger', 'rdml', 'rear', 'rebbe', 'registrar', 'reichsgraf', 'rep', 'representative', 'researcher', 'resident', 'rev', 'revenue', 'reverend', 'right', 'risk', 'ritter', 'rock', 'royal', 'rt', 'sa', 'sailor', 'saint', 'sainte', 'saoshyant', 'satirist', 'scholar', 'schoolmaster', 'scientist', 'scpo', 'screenwriter', 'se', 'secretary', 'security', 'seigneur', 'senator', 'senhor', 'senhora', 'senhorita', 'senior', 'senior-judge', 'sergeant', 'servant', 'señor', 'señora', 'señores', 'señorita', 'señoritas', 'sfc', 'sgm', 'sgt', 'sgtmaj', 'sgtmajmc', 'shaik', 'shaikh', 'shayk', 'shaykh', 'shehu', 'sheik', 'sheikh', 'shekh', 'sheriff', 'siddha', 'signor', 'signora', 'signore', 'signorina', 'singer', 'singer-songwriter', 'sir', 'sister', 'sma', 'smsgt', 'sn', 'soccer', 'social', 'sociologist', 'software', 'soldier', 'solicitor', 'soprano', 'spc', 'speaker', 'special', 'sr', 'sra', 'sres', 'srta', 'srtas', 'ssg', 'ssgt', 'st', 'staff', 'state', 'states', 'strategy', 'subaltern', 'subedar', 'suffragist', 'sultan', 'sultana', 'superior', 'supreme', 'surgeon', 'swami', 'swordbearer', 'sysselmann', 'tax', 'teacher', 'technical', 'technologist', 'television ', 'tenor', 'theater', 'theatre', 'theologian', 'theorist', 'timi', 'tirthankar', 'translator', 'travel', 'treasurer', 'tsar', 'tsarina', 'tsgt', 'uk', 'uncle', 'united', 'univ.prof', 'us', 'vadm', 'vardapet', 'vc', 'venerable', 'verderer', 'vicar', 'vice', 'viscount', 'vizier', 'vocalist', 'voice', 'vrouwe', 'warden', 'warrant', 'wing', 'wm', 'wo-1', 'wo1', 'wo2', 'wo3', 'wo4', 'wo5', 'woodman', 'wp', 'writer', 'zoologist'}, first_name_titles: Iterable[str] = {'aunt', 'auntie', 'brother', 'cheikh', 'dame', 'father', 'king', 'maid', 'master', 'mother', 'pope', 'queen', 'shaik', 'shaikh', 'shayk', 'shaykh', 'sheik', 'sheikh', 'shekh', 'sir', 'sister', 'uncle'}, conjunctions: Iterable[str] = {'&', 'and', 'e', 'et', 'of', 'the', 'und', 'y'}, bound_first_names: Iterable[str] = {'abdal', 'abdel', 'abdul', 'abou', 'abu', 'umm'}, non_first_name_prefixes: Iterable[str] = {"'t", 'af', 'auf', 'av', 'bint', 'de', "de'", 'degli', 'dei', 'delle', 'delli', 'dello', 'dem', 'der', 'dos', 'het', 'ibn', 'op', 'ter', 'vd', 'vom', 'zu'}, capitalization_exceptions: Mapping[str, str] | Iterable[tuple[str, str]] = {'ii': 'II', 'iii': 'III', 'iv': 'IV', 'md': 'M.D.', 'phd': 'Ph.D.'}, regexes: Mapping[str, Pattern[str]] | Iterable[tuple[str, Pattern[str]]] = {'bidi': re.compile('[\u061c\u200e\u200f\u202a-\u202e\u2066-\u2069]+'), 'commas': re.compile('[,،，]'), 'double_quotes': re.compile('\\"(.*?)\\"'), 'east_slavic_patronymic': re.compile('(ovich|ovna|evich|evna|ichna|ilyich|kuzmich|lukich|fomich|fokich)$', re.IGNORECASE), 'east_slavic_patronymic_cyrillic': re.compile('(ович|овна|евич|евна|ична|ильич|кузьмич|лукич|фомич|фокич)$', re.IGNORECASE), 'emoji': re.compile('[🌀-🙏🚀-\U0001f6ff☀-⛿✀-➿]+'), 'initial': re.compile('^(\\w\\.|[A-Z])?$'), 'mac': re.compile('^(ma?c)(\\w{2,})', re.IGNORECASE), 'no_vowels': re.compile('^[^aeyiuo]+$', re.IGNORECASE), 'parenthesis': re.compile('\$(.*?)\$'), 'period_abbreviation': re.compile('^[^\\W\\d_]{2,}\\.$'), 'period_not_at_end': re.compile('.*\\..+$', re.IGNORECASE), 'phd': re.compile('\\s(ph\\.?\\s+d\\.?)', re.IGNORECASE), 'quoted_word': re.compile("(?<!\\w)\\'([^\\s]*?)\\'(?!\\w)"), 'roman_numeral': re.compile('^(X|IX|IV|V?I{0,3})$', re.IGNORECASE), 'space_before_comma': re.compile('\\s+,'), 'spaces': re.compile('\\s+'), 'turkic_patronymic_marker': re.compile("^(oglu|oğlu|ogly|ogli|o['’ʻ]g['’ʻ]li|qizi|qızı|kizi|kyzy|gyzy|uly|uulu)$", re.IGNORECASE), 'turkic_patronymic_marker_cyrillic': re.compile('^(оглу|оглы|оғлу|ўғли|угли|кызы|гызы|қызы|қизи|улы|ұлы|уулу)$', re.IGNORECASE), 'word': re.compile('(\\w|\\.)+')}, patronymic_name_order: bool = False, middle_name_as_last: bool = False)[source]¶

An instance of this class hold all of the configuration constants for the parser.

Parameters:

prefixes (set) – prefixes wrapped with SetManager.
titles (set) – titles wrapped with SetManager.
first_name_titles (set) – FIRST_NAME_TITLES wrapped with SetManager.
suffix_acronyms (set) – SUFFIX_ACRONYMS wrapped with SetManager.
suffix_not_acronyms (set) – SUFFIX_NOT_ACRONYMS wrapped with SetManager.
suffix_acronyms_ambiguous (set) – SUFFIX_ACRONYMS_AMBIGUOUS wrapped with SetManager.
conjunctions (set) – conjunctions wrapped with SetManager.
bound_first_names (set) – BOUND_FIRST_NAMES wrapped with SetManager.
non_first_name_prefixes (set) – NON_FIRST_NAME_PREFIXES wrapped with SetManager. The subset of prefixes that are never a first name, so a leading one marks the whole name as a surname. Must stay disjoint from bound_first_names.
capitalization_exceptions (dict or iterable of (key, value) tuples) – CAPITALIZATION_EXCEPTIONS wrapped with TupleManager.
regexes (dict or iterable of (name, compiled pattern) tuples) – REGEXES wrapped with RegexTupleManager.

nickname_delimiters and maiden_delimiters are not constructor arguments – they’re always set in __init__ (see the comment there for the string-sentinel-vs-compiled-pattern mechanism) – but are documented here since they’re the two Constants attributes a caller is most likely to want to look up: per-bucket TupleManager collections that parse_nicknames() consults to route delimited content into nickname/maiden. See the “Adding Custom Nickname Delimiters” and “Routing to Maiden Name” sections of the customization docs.

capitalize_name = False¶

If set, applies capitalize() to HumanName instance.

>>> from nameparser.config import CONSTANTS
>>> CONSTANTS.capitalize_name = True
>>> name = HumanName("bob v. de la macdole-eisenhower phd")
>>> str(name)
'Bob V. de la MacDole-Eisenhower Ph.D.'
>>> CONSTANTS.capitalize_name = False

copy() → Constants[source]¶: Return a detached deep copy of this Constants instance, preserving its current customizations – unlike Constants’s own constructor, which always starts from library defaults. Useful for snapshotting the shared module-level CONSTANTS (including whatever it’s been customized with) into a private instance, e.g. CONSTANTS.copy(). Relies on the same __getstate__/__setstate__ pair pickling uses, so it’s as cheap and correct as pickle round-tripping.

empty_attribute_default = ''¶

Default return value for empty attributes.

Deprecated since version 1.4.0: Assignment emits DeprecationWarning; the option is removed in 2.0 (see issue #255) and empty attributes will always return ''.

>>> import warnings
>>> from nameparser.config import CONSTANTS
>>> with warnings.catch_warnings():
...     warnings.simplefilter('ignore', DeprecationWarning)
...     CONSTANTS.empty_attribute_default = None
>>> name = HumanName("John Doe")
>>> print(name.title)
None
>>> name.first
'John'
>>> with warnings.catch_warnings():
...     warnings.simplefilter('ignore', DeprecationWarning)
...     CONSTANTS.empty_attribute_default = ''

force_mixed_case_capitalization = False¶

If set, forces the capitalization of mixed case strings when capitalize() is called.

>>> from nameparser.config import CONSTANTS
>>> CONSTANTS.force_mixed_case_capitalization = True
>>> name = HumanName('Shirley Maclaine')
>>> name.capitalize()
>>> str(name)
'Shirley MacLaine'
>>> CONSTANTS.force_mixed_case_capitalization = False

initials_delimiter = '.'¶: The default initials delimiter used for all new HumanName instances. Will be used to add a delimiter between each initial.

initials_format = '{first} {middle} {last}'¶: The default initials format used for all new HumanName instances.

initials_separator = ' '¶

The default separator placed between consecutive initials within a name group (first, middle, or last). Distinct from initials_delimiter, which is the trailing character after each individual initial.

With defaults initials_delimiter="." and initials_separator=" ", initials() produces "J. A. D.". Setting initials_separator="" with initials_delimiter="." and initials_format="{first}{middle}{last}" produces "J.A.D.". With the default initials_format, group-level spacing from the template is still applied.

middle_name_as_last = False¶

If set, folds middle names into the last name: middle_list is prepended to last_list and middle_list is cleared, so .last becomes what .surnames already was and .middle becomes empty. Useful for naming systems with no middle-name concept, where everything after the given name is lineage/family (e.g. Arabic patronymic chaining: given + father + grandfather + family).

The fold is uniform across both no-comma and comma (“Last, First Middle”) input, so the two written forms of a name converge on the same result.

For per-instance control without a shared Constants, pass a dedicated instance: HumanName("...", constants=Constants(middle_name_as_last=True)).

>>> from nameparser import HumanName
>>> from nameparser.config import Constants
>>> C = Constants(middle_name_as_last=True)
>>> hn = HumanName("Mohamad Ahmad Ali Hassan", constants=C)
>>> hn.first, hn.middle, hn.last
('Mohamad', '', 'Ahmad Ali Hassan')

patronymic_name_order = False¶

If set, detects names in Russian formal order (Surname GivenName Patronymic) by recognizing a trailing East-Slavic patronymic suffix on the last token, and rotates the three name parts so that first/middle/last map to given name / patronymic / surname respectively. Detection requires exactly one token in each of first, middle, and last; names with multi-part given names or multiple middle names are left unchanged.

Also detects reversed-order Azerbaijani/Central-Asian Turkic patronymics (Surname GivenName PatronymicRoot Marker, e.g. oglu/qizi), a structurally different, standalone-marker-word patronymic family. Detection requires exactly one token in each of first and last, exactly two tokens in middle, and the last token a recognised Turkic marker.

Opt-in because a Western person whose surname happens to end in a patronymic suffix (e.g. "David Michael Abramovich") will be reordered incorrectly when the flag is on. Enable only when your data is predominantly Russian formal-order names.

For per-instance control without a shared Constants, pass a dedicated instance: HumanName("...", constants=Constants(patronymic_name_order=True)).

>>> from nameparser import HumanName
>>> from nameparser.config import Constants
>>> C = Constants(patronymic_name_order=True)
>>> hn = HumanName("Ivanov Ivan Ivanovich", constants=C)
>>> hn.first, hn.middle, hn.last
('Ivan', 'Ivanovich', 'Ivanov')
>>> hn2 = HumanName("Aliyev Vusal Said oglu", constants=C)
>>> hn2.first, hn2.middle, hn2.last
('Vusal', 'Said oglu', 'Aliyev')

string_format: str | None = '{title} {first} {middle} {last} {suffix} ({nickname})'¶: The default string format use for all new HumanName instances.

suffix_delimiter: str | None = None¶

If set, an additional delimiter used to split suffix groups after comma-splitting. For example, setting suffix_delimiter=" - " allows "RN - CRNA" to be parsed as two separate suffixes. Default is None (no additional splitting beyond the standard comma split).

Note: setting this to "," or ", " has no additional effect — the full name is already split on comma characters first (including the Arabic ، and fullwidth ， variants), and each resulting part is stripped of surrounding whitespace before this step runs.

The delimiter is only applied to parts once they’ve been identified as a suffix group, so it never leaks into a first- or middle-name part. For example, in inverted format ("Last, First, suffix") a hyphenated given name like "Doe, Mary - Kate, RN" with suffix_delimiter=" - " does not get mistaken for a suffix split.

class nameparser.config.RegexTupleManager(arg: Mapping[str, T] | Iterable[tuple[str, T]] = (), **kwargs: T)[source]¶

class nameparser.config.SetManager(elements: Iterable[str])[source]¶

Easily add and remove config variables per module or instance. Subclass of collections.abc.Set.

Special functionality beyond that provided by set() is to normalize constants for comparison (lowercase, leading/trailing periods stripped) when they are add()ed and remove()d, and to allow passing multiple string arguments to the add() and remove() methods. The constructor and the set operators apply the same normalization to their elements and operands, so every entry is stored in the form the parser’s lookups expect, and they reject a bare string with TypeError, since e.g. set('dr') would silently build a set of single characters.

add(*strings: str) → Self[source]¶: Add the lowercased, leading/trailing-periods-stripped version of the string arguments to the set. Returns self for chaining.

Deprecated since version 1.3.0: bytes arguments will raise TypeError in 2.0 (see issue #245); decode before adding.

add_with_encoding(s: str | bytes, encoding: str | None = None) → None[source]¶: Add the lowercased, leading/trailing-periods-stripped version of the string to the set. Pass an explicit encoding parameter to specify the encoding of binary strings that are not DEFAULT_ENCODING (UTF-8).

Deprecated since version 1.3.0: bytes arguments will raise TypeError in 2.0 (see issue #245); decode before adding.

Deprecated since version 1.4.0: The method itself is removed in 2.0 (see issue #245); use add() instead, decoding bytes first.

clear() → Self[source]¶: Remove all entries from the set. Returns self for chaining.

discard(*strings: str) → Self[source]¶: Remove the lower case and no-period version of the string arguments from the set if present; missing members are ignored, like set.discard. Returns self for chaining.

remove(*strings: str) → Self[source]¶: Remove the lower case and no-period version of the string arguments from the set. Returns self for chaining.

Deprecated since version 1.3.0: Removing a missing member currently does nothing but will raise KeyError in 2.0, matching set.remove (see issue #243); use discard() to ignore missing members.

class nameparser.config.TupleManager(arg: Mapping[str, T] | Iterable[tuple[str, T]] = (), **kwargs: T)[source]¶: A dictionary with dot.notation access. Subclass of dict. Wraps the mapping config constants (capitalization_exceptions, regexes, and the nickname/maiden delimiter buckets). The name is historical: before 1.3.0 these constants were tuples of pairs.

HumanName.config Defaults¶

nameparser.config.titles.FIRST_NAME_TITLES = {'aunt', 'auntie', 'brother', 'cheikh', 'dame', 'father', 'king', 'maid', 'master', 'mother', 'pope', 'queen', 'shaik', 'shaikh', 'shayk', 'shaykh', 'sheik', 'sheikh', 'shekh', 'sir', 'sister', 'uncle'}¶: When these titles appear with a single other name, that name is a first name, e.g. “Sir John”, “Sister Mary”, “Queen Elizabeth”.

nameparser.config.titles.TITLES = {'10th', '1lt', '1sgt', '1st', '1stlt', '1stsgt', '2lt', '2nd', '2ndlt', '3rd', '4th', '5th', '6th', '7th', '8th', '9th', 'a1c', 'ab', 'abbess', 'abbot', 'abolitionist', 'academic', 'acolyte', 'activist', 'actor ', 'actress', 'adept', 'adjutant', 'adm', 'admiral', 'advertising', 'adviser', 'advocate', 'air', 'akhoond', 'alderman', 'almoner', 'ambassador', 'amn', 'analytics', 'anarchist', 'animator', 'anthropologist', 'appellate', 'apprentice', 'arbitrator', 'archbishop', 'archdeacon', 'archdruid', 'archduchess', 'archduke', 'archeologist', 'architect', 'arhat', 'army', 'arranger', 'assistant', 'assoc', 'associate', 'asst', 'astronomer', 'attache', 'attaché', 'attorney', 'aunt', 'auntie', 'author', 'award-winning', 'ayatollah', 'baba', 'bailiff', 'ballet', 'bandleader', 'banker', 'banner', 'bard', 'baron', 'baroness', 'barrister', 'baseball', 'bearer', 'behavioral', 'bench', 'bg', 'bgen', 'biblical', 'bibliographer', 'biochemist', 'biographer', 'biologist', 'bishop', 'blessed', 'blogger', 'blues', 'bodhisattva', 'bookseller', 'botanist', 'bp', 'brigadier', 'briggen', 'british', 'broadcaster', 'brother', 'buddha', 'burgess', 'burlesque', 'business', 'businessman', 'businesswoman', 'bwana', 'canon', 'capt', 'captain', 'cardinal', 'cartographer', 'cartoonist', 'catholicos', 'ccmsgt', 'cdr', 'celebrity', 'ceo', 'cfo', 'chair', 'chairs', 'chancellor', 'chaplain', "chargé d'affaires", 'chef', 'cheikh', 'chemist', 'chief', 'chieftain', 'choreographer', 'civil', 'classical', 'clergyman', 'clerk', 'cmsaf', 'cmsgt', 'co-chair', 'co-chairs', 'co-founder', 'coach', 'col', 'collector', 'colonel', 'comedian', 'comedienne', 'comic', 'commander', 'commander-in-chief', 'commodore', 'composer', 'compositeur', 'comptroller', 'computer', 'comtesse', 'conductor', 'consultant', 'controller', 'corporal', 'corporate', 'correspondent', 'councillor', 'counselor', 'count', 'countess', 'courtier', 'cpl', 'cpo', 'cpt', 'credit', 'criminal', 'criminologist', 'critic', 'csm', 'curator', 'customs', 'cwo-2', 'cwo-3', 'cwo-4', 'cwo-5', 'cwo2', 'cwo3', 'cwo4', 'cwo5', 'cyclist', 'dame', 'dancer', 'dcn', 'deacon', 'delegate', 'deputy', 'designated', 'designer', 'detective', 'developer', 'dhr', 'dipl.-ing', 'diplomat', 'dir', 'director', 'discovery', 'dissident', 'district', 'division', 'do', 'docent', 'docket', 'doctor', 'doyen', 'dpty', 'dr', 'dra', 'dramatist', 'druid', 'drummer', 'duchesse', 'dutchess', 'ecologist', 'economist', 'editor', 'edler', 'edmi', 'edohen', 'educator', 'effendi', 'ekegbian', 'elerunwon', 'eminence', 'emperor', 'empress', 'engineer', 'english', 'ens', 'entertainer', 'entrepreneur', 'envoy', 'erzbischof', 'essayist', 'evangelist', 'excellency', 'excellent', 'exec', 'executive', 'expert', 'fadm', 'family', 'father', 'federal', 'fh-prof', 'field', 'film', 'financial', 'first', 'flag', 'flying', 'foreign', 'forester', 'founder', 'fr', 'frau', 'freifrau', 'freiherr', 'friar', 'frk', 'fru', 'fräulein', 'frøken', 'fürst', 'fürsterzbischof', 'gaf', 'gen', 'general', 'generalissimo', 'gentiluomo', 'giani', 'goodman', 'goodwife', 'governor', 'graf', 'grand', 'group', 'großfürst', 'gräfin', 'guitarist', 'guru', 'gyani', 'gysgt', 'hajji', 'headman', 'heir', 'heiress', 'her', 'hereditary', 'heren', 'herr', 'herren', 'herrn', 'herzog', 'high', 'highness', 'his', 'historian', 'historicus', 'historien', 'holiness', 'hon', 'honorable', 'honourable', 'host', 'hr', 'illustrator', 'imam', 'industrialist', 'information', 'instructor', 'intelligence', 'intendant', 'inventor', 'investigator', 'investor', 'journalist', 'journeyman', 'jr', 'judge', 'judicial', 'junior', 'jurist', 'keyboardist', 'king', "king's", 'kingdom', 'knowledge', 'lady', 'lama', 'lamido', 'law', 'lawyer', 'lcdr', 'lcpl', 'leader', 'lecturer', 'legal', 'librarian', 'lieutenant', 'linguist', 'literary', 'lord', 'lt', 'ltc', 'ltcol', 'ltg', 'ltgen', 'ltjg', 'lyricist', 'madam', 'madame', 'mademoiselle', 'mag', 'mag-judge', 'mag/judge', 'magistrate', 'magistrate-judge', 'magnate', 'maharajah', 'maharani', 'mahdi', 'maid', 'maj', 'majesty', 'majgen', 'manager', 'marcher', 'marchess', 'marchioness', 'marketing', 'marquess', 'marquis', 'marquise', 'master', 'mathematician', 'mathematics', 'matriarch', 'mayor', 'mcpo', 'mcpoc', 'mcpon', 'md', 'me', 'member', 'memoirist', 'merchant', 'met', 'metropolitan', 'mevr', 'mevrouw', 'mevrouwe', 'mg', 'mgr', 'mgysgt', 'military', 'minister', 'miss', 'misses', 'missionary', 'mister', 'mlle', 'mme', 'mobster', 'model', 'monk', 'monseigneur', 'monsieur', 'monsignor', 'most', 'mother', 'mountaineer', 'mpco-cg', 'mr', 'mrs', 'ms', 'msg', 'msgt', 'mufti', 'mullah', 'municipal', 'murshid', 'musician', 'musicologist', 'mx', 'mystery', 'nanny', 'narrator', 'national', 'naturalist', 'navy', 'neuroscientist', 'novelist', 'nurse', 'obstetritian', 'officer', 'opera', 'operating', 'ornithologist', 'painter', 'paleontologist', 'pastor', 'patriarch', 'pd', 'pediatrician', 'personality', 'petty', 'pfc', 'pharaoh', 'phd', 'philantropist', 'philosopher', 'photographer', 'physician', 'physicist', 'pianist', 'pilot', 'pioneer', 'pir', 'player', 'playwright', 'po1', 'po2', 'po3', 'poet', 'police', 'political', 'politician', 'pope', 'prefect', 'prelate', 'premier', 'pres', 'presbyter', 'president', 'presiding', 'priest', 'priestess', 'primate', 'prime', 'prin', 'prince', 'princess', 'principal', 'printer', 'printmaker', 'prinz', 'prior', 'priv.-doz', 'private', 'pro', 'producer', 'prof', 'professor', 'provost', 'pslc', 'psychiatrist', 'psychologist', 'publisher', 'pursuivant', 'pv2', 'pvt', 'queen', "queen's", 'ra', 'rabbi', 'radio', 'radm', 'rangatira', 'ranger', 'rdml', 'rear', 'rebbe', 'registrar', 'reichsgraf', 'rep', 'representative', 'researcher', 'resident', 'rev', 'revenue', 'reverend', 'right', 'risk', 'ritter', 'rock', 'royal', 'rt', 'sa', 'sailor', 'saint', 'sainte', 'saoshyant', 'satirist', 'scholar', 'schoolmaster', 'scientist', 'scpo', 'screenwriter', 'se', 'secretary', 'security', 'seigneur', 'senator', 'senhor', 'senhora', 'senhorita', 'senior', 'senior-judge', 'sergeant', 'servant', 'señor', 'señora', 'señores', 'señorita', 'señoritas', 'sfc', 'sgm', 'sgt', 'sgtmaj', 'sgtmajmc', 'shaik', 'shaikh', 'shayk', 'shaykh', 'shehu', 'sheik', 'sheikh', 'shekh', 'sheriff', 'siddha', 'signor', 'signora', 'signore', 'signorina', 'singer', 'singer-songwriter', 'sir', 'sister', 'sma', 'smsgt', 'sn', 'soccer', 'social', 'sociologist', 'software', 'soldier', 'solicitor', 'soprano', 'spc', 'speaker', 'special', 'sr', 'sra', 'sres', 'srta', 'srtas', 'ssg', 'ssgt', 'st', 'staff', 'state', 'states', 'strategy', 'subaltern', 'subedar', 'suffragist', 'sultan', 'sultana', 'superior', 'supreme', 'surgeon', 'swami', 'swordbearer', 'sysselmann', 'tax', 'teacher', 'technical', 'technologist', 'television ', 'tenor', 'theater', 'theatre', 'theologian', 'theorist', 'timi', 'tirthankar', 'translator', 'travel', 'treasurer', 'tsar', 'tsarina', 'tsgt', 'uk', 'uncle', 'united', 'univ.prof', 'us', 'vadm', 'vardapet', 'vc', 'venerable', 'verderer', 'vicar', 'vice', 'viscount', 'vizier', 'vocalist', 'voice', 'vrouwe', 'warden', 'warrant', 'wing', 'wm', 'wo-1', 'wo1', 'wo2', 'wo3', 'wo4', 'wo5', 'woodman', 'wp', 'writer', 'zoologist'}¶: Cannot include things that could also be first names, e.g. “dean”. Many of these from wikipedia: https://en.wikipedia.org/wiki/Title. The parser recognizes chains of these including conjunctions allowing recognition titles like “Deputy Secretary of State”.

nameparser.config.suffixes.SUFFIX_ACRONYMS = {'8-vsb', 'aas', 'aba', 'abc', 'abd', 'abpp', 'abr', 'aca', 'acas', 'ace', 'acha', 'acp', 'ae', 'aem', 'afasma', 'afc', 'afm', 'agsf', 'aia', 'aicp', 'ala', 'alc', 'alp', 'am', 'amd', 'ame', 'amieee', 'ams', 'aphr', 'apn', 'apr', 'aprn', 'apss', 'aqp', 'arm', 'arrc', 'asa', 'asc', 'asid', 'asla', 'asp', 'atc', 'awb', 'ba', 'bca', 'bcl', 'bcss', 'bds', 'bem', 'bls-i', 'bn', 'bpe', 'bpi', 'bpt', 'bsc', 'bt', 'btcs', 'bts', 'cacts', 'cae', 'caha', 'caia', 'cams', 'cap', 'capa', 'capm', 'capp', 'caps', 'caro', 'cas', 'casp', 'cb', 'cbe', 'cbm', 'cbne', 'cbnt', 'cbp', 'cbrte', 'cbs', 'cbsp', 'cbt', 'cbte', 'cbv', 'cca', 'ccc', 'ccca', 'cccm', 'cce', 'cchp', 'ccie', 'ccim', 'cciso', 'ccm', 'ccmt', 'ccna', 'ccnp', 'ccp', 'ccp-c', 'ccpr', 'ccs', 'ccufc', 'cd', 'cdal', 'cdfm', 'cdmp', 'cds', 'cdt', 'cea', 'ceas', 'cebs', 'ceds', 'ceh', 'cela', 'cem', 'cep', 'cera', 'cet', 'cfa', 'cfc', 'cfcc', 'cfce', 'cfcm', 'cfe', 'cfeds', 'cfi', 'cfm', 'cfp', 'cfps', 'cfr', 'cfre', 'cga', 'cgap', 'cgb', 'cgc', 'cgfm', 'cgfo', 'cgm', 'cgma', 'cgp', 'cgr', 'cgsp', 'ch', 'cha', 'chba', 'chdm', 'che', 'ches', 'chfc', 'chi', 'chmc', 'chmm', 'chp', 'chpa', 'chpe', 'chpln', 'chpse', 'chrm', 'chsc', 'chse', 'chse-a', 'chsos', 'chss', 'cht', 'cia', 'cic', 'cie', 'cig', 'cip', 'cipm', 'cips', 'ciro', 'cisa', 'cism', 'cissp', 'cla', 'clsd', 'cltd', 'clu', 'cm', 'cma', 'cmas', 'cmc', 'cmfo', 'cmg', 'cmp', 'cms', 'cmsp', 'cmt', 'cna', 'cnm', 'cnp', 'cp', 'cp-c', 'cpa', 'cpacc', 'cpbe', 'cpcm', 'cpcu', 'cpe', 'cpfa', 'cpfo', 'cpg', 'cph', 'cpht', 'cpim', 'cpl', 'cplp', 'cpm', 'cpo', 'cpp', 'cppm', 'cprc', 'cpre', 'cprp', 'cpsc', 'cpsi', 'cpss', 'cpt', 'cpwa', 'crde', 'crisc', 'crma', 'crme', 'crna', 'cro', 'crp', 'crt', 'crtt', 'csa', 'csbe', 'csc', 'cscp', 'cscu', 'csep', 'csi', 'csm', 'csp', 'cspo', 'csre', 'csrte', 'csslp', 'cssm', 'cst', 'cste', 'ctbs', 'ctfa', 'cto', 'ctp', 'cts', 'cua', 'cusp', 'cva', 'cva[22]', 'cvo', 'cvp', 'cvrs', 'cwap', 'cwb', 'cwdp', 'cwep', 'cwna', 'cwne', 'cwp', 'cwsp', 'cxa', 'cyds', 'cysa', 'dabfm', 'dabvlm', 'dacvim', 'dbe', 'dc', 'dcb', 'dcm', 'dcmg', 'dcvo', 'dd', 'dds', 'ded', 'dep', 'dfc', 'dfm', 'diplac', 'diplom', 'djur', 'dma', 'dmd', 'dmin', 'dnp', 'do', 'dpm', 'dpt', 'drb', 'drmp', 'drph', 'dsc', 'dsm', 'dso', 'dss', 'dtr', 'dvep', 'dvm', 'ea', 'ed', 'edd', 'ei', 'eit', 'els', 'emd', 'emt-b', 'emt-i/85', 'emt-i/99', 'emt-p', 'enp', 'erd', 'esq', 'evp', 'faafp', 'faan', 'faap', 'fac-c', 'facc', 'facd', 'facem', 'facep', 'facha', 'facofp', 'facog', 'facp', 'facph', 'facs', 'faia', 'faicp', 'fala', 'fashp', 'fasid', 'fasla', 'fasma', 'faspen', 'fca', 'fcas', 'fcela', 'fd', 'fec', 'fhames', 'fic', 'ficf', 'fieee', 'fmp', 'fmva', 'fnss', 'fp&a', 'fp-c', 'fpc', 'frm', 'fsa', 'fsdp', 'fws', 'gaee[14]', 'gba', 'gbe', 'gc', 'gcb', 'gchs', 'gcie', 'gcmg', 'gcsi', 'gcvo', 'gisp', 'git', 'gm', 'gmb', 'gmr', 'gphr', 'gri', 'grp', 'gsmieee', 'hccp', 'hrs', 'iaccp', 'iaee', 'iccm-d', 'iccm-f', 'idsm', 'ifgict', 'iom', 'ipep', 'ipm', 'iso', 'issp-csp', 'issp-sa', 'itil', 'jd', 'jp', 'kbe', 'kcb', 'kchs/dchs', 'kcie', 'kcmg', 'kcsi', 'kcvo', 'kg', 'khs/dhs', 'kp', 'kt', 'lac', 'lcmt', 'lcpc', 'lcsw', 'leed ap', 'lg', 'litk', 'litl', 'litp', 'llm', 'lm', 'lmsw', 'lmt', 'lp', 'lpa', 'lpc', 'lpn', 'lpss', 'lsi', 'lsit', 'lt', 'lvn', 'lvo', 'lvt', 'ma', 'maaa', 'mai', 'mba', 'mbe', 'mbs', 'mc', 'mcct', 'mcdba', 'mches', 'mcm', 'mcp', 'mcpd', 'mcsa', 'mcsd', 'mcse', 'mct', 'md', 'mda', 'mdb', 'mdbb', 'mdep', 'mdhb', 'mdiv', 'mdl', 'mem', 'meng', 'mfa', 'micp', 'mieee', 'mirm', 'mle', 'mls', 'mlse', 'mlt', 'mm', 'mmad', 'mmas', 'mnaa', 'mnae', 'mp', 'mpa', 'mph', 'mpse', 'mra', 'ms', 'msa', 'msc', 'mscmsm', 'msm', 'mt', 'mts', 'mvo', 'nbc-his', 'nbcch', 'nbcch-ps', 'nbcdch', 'nbcdch-ps', 'nbcfch', 'nbcfch-ps', 'nbct', 'ncarb', 'nccp', 'ncidq', 'ncps', 'ncso', 'ncto', 'nd', 'ndtr', 'nicet i', 'nicet ii', 'nicet iii', 'nicet iv', 'nmd', 'np', 'np[18]', 'nraemt', 'nremr', 'nremt', 'nrp', 'obe', 'obi', 'oca', 'ocm', 'ocp', 'od', 'om', 'oscp', 'ot', 'pa-c', 'pcc', 'pci', 'pe', 'pfmp', 'pg', 'pgmp', 'ph', 'pharmd', 'phc', 'phd', 'phr', 'phrca', 'pla', 'pls', 'pmc', 'pmi-acp', 'pmp', 'pp', 'pps', 'prm', 'psm', 'psm i', 'psm ii', 'psp', 'psyd', 'pt', 'pta', 'qam', 'qc', 'qcsw', 'qfsm', 'qgm', 'qpm', 'qsd', 'qsp', 'ra', 'rai', 'rba', 'rci', 'rcp', 'rd', 'rdcs', 'rdh', 'rdms', 'rdn', 'res', 'rfp', 'rhca', 'rid', 'rls', 'rmsks', 'rn', 'rp', 'rpa', 'rph', 'rpl', 'rrc', 'rrt', 'rrt-accs', 'rrt-nps', 'rrt-sds', 'rtrp', 'rvm', 'rvt', 'sa', 'same', 'sasm', 'sccp', 'scmp', 'se', 'secb', 'sfp', 'sgm', 'shrm-cp', 'shrm-scp', 'si', 'siie', 'smieee', 'sphr', 'sra', 'sscp', 'stb', 'stmieee', 'tbr-ct', 'td', 'thd', 'thm', 'ud', 'usa', 'usaf', 'usar', 'uscg', 'usmc', 'usn', 'usnr', 'uxc', 'uxmc', 'vc', 'vcp', 'vd', 'vrd'}¶: Post-nominal acronyms. Titles, degrees and other things people stick after their name that may or may not have periods between the letters. The parser removes periods when matching against these pieces.

nameparser.config.suffixes.SUFFIX_ACRONYMS_AMBIGUOUS = {'ed', 'jd'}¶: Acronym suffixes from SUFFIX_ACRONYMS that also plausibly collide with a common given-name nickname. Not a partition of SUFFIX_ACRONYMS – a small, standalone exception list consulted only by parse_nicknames().

nameparser.config.suffixes.SUFFIX_NOT_ACRONYMS = {'2', 'dr', 'esq', 'esquire', 'i', 'ii', 'iii', 'iv', 'jnr', 'jr', 'junior', 'ret', 'snr', 'sr', 'v', 'vet'}¶: Post-nominal pieces that are not acronyms. The parser does not remove periods when matching against these pieces.

nameparser.config.prefixes.NON_FIRST_NAME_PREFIXES = {"'t", 'af', 'auf', 'av', 'bint', 'de', "de'", 'degli', 'dei', 'delle', 'delli', 'dello', 'dem', 'der', 'dos', 'het', 'ibn', 'op', 'ter', 'vd', 'vom', 'zu'}¶: The sub-set of PREFIXES that are never a standalone first name. A name that starts with one of these has no first name – the whole thing is a surname (e.g. “de Mesnil” -> last name “de Mesnil”). Curated to exclude anything that can be a given name in some culture (al, van, von, della, di, del, da, vander, …) and anything that is also a first name prefix (abu). When unsure, leave a word out: a missing member just means that name is not auto-fixed, whereas a wrong member misparses a real person. Must stay a subset of PREFIXES and disjoint from BOUND_FIRST_NAMES.

nameparser.config.prefixes.PREFIXES = {"'t", 'aan', 'abu', 'aen', 'af', 'al', 'auf', 'av', 'bar', 'bat', 'bin', 'bint', 'bon', 'da', 'dal', 'de', "de'", 'degli', 'dei', 'del', 'dela', 'della', 'delle', 'delli', 'dello', 'dem', 'den', 'der', 'di', 'do', 'dos', 'du', 'dí', 'freiherr', 'freiherrin', 'heer', 'het', 'ibn', 'la', 'le', 'mac', 'mc', 'op', 'san', 'santa', 'st', 'ste', 'te', 'ter', 'tho', 'thoe', 'van', 'vande', 'vander', 'vd', 'vel', 'vom', 'von', 'zu'}¶

Name pieces that appear before a last name. Prefixes join to the piece that follows them to make one new piece. They can be chained together, e.g “von der” and “de la”. Because they only appear in middle or last names, they also signify that all following name pieces should be in the same name part, for example, “von” will be joined to all following pieces that are not prefixes or suffixes, allowing recognition of double last names when they appear after a prefixes. So in “pennie von bergen wessels MD”, “von” will join with all following name pieces until the suffix “MD”, resulting in the correct parsing of the last name “von bergen wessels”.

Defined as a static union so every NON_FIRST_NAME_PREFIXES member is guaranteed to also be a prefix (and still join forward), with no drift – mirroring TITLES = FIRST_NAME_TITLES | {...} in nameparser.config.titles.

nameparser.config.bound_first_names.BOUND_FIRST_NAMES: set[str] = {'abdal', 'abdel', 'abdul', 'abou', 'abu', 'umm'}¶: Bound Arabic given-name prefixes that attach to the following word to form one first name (e.g. “abdul salam” → first name “abdul salam”). They are never standalone names. Join logic runs in the given-name region only, mirroring PREFIXES for last names.

nameparser.config.conjunctions.CONJUNCTIONS = {'&', 'and', 'e', 'et', 'of', 'the', 'und', 'y'}¶: Pieces that should join to their neighboring pieces, e.g. “and”, “y” and “&”. “of” and “the” are also include to facilitate joining multiple titles, e.g. “President of the United States”.

nameparser.config.capitalization.CAPITALIZATION_EXCEPTIONS = {'ii': 'II', 'iii': 'III', 'iv': 'IV', 'md': 'M.D.', 'phd': 'Ph.D.'}¶: Any pieces that are not capitalized by capitalizing the first letter.

nameparser.config.regexes.REGEXES = {'bidi': re.compile('[\u061c\u200e\u200f\u202a-\u202e\u2066-\u2069]+'), 'commas': re.compile('[,،，]'), 'double_quotes': re.compile('\\"(.*?)\\"'), 'east_slavic_patronymic': re.compile('(ovich|ovna|evich|evna|ichna|ilyich|kuzmich|lukich|fomich|fokich)$', re.IGNORECASE), 'east_slavic_patronymic_cyrillic': re.compile('(ович|овна|евич|евна|ична|ильич|кузьмич|лукич|фомич|фокич)$', re.IGNORECASE), 'emoji': re.compile('[🌀-🙏🚀-\U0001f6ff☀-⛿✀-➿]+'), 'initial': re.compile('^(\\w\\.|[A-Z])?$'), 'mac': re.compile('^(ma?c)(\\w{2,})', re.IGNORECASE), 'no_vowels': re.compile('^[^aeyiuo]+$', re.IGNORECASE), 'parenthesis': re.compile('\$(.*?)\$'), 'period_abbreviation': re.compile('^[^\\W\\d_]{2,}\\.$'), 'period_not_at_end': re.compile('.*\\..+$', re.IGNORECASE), 'phd': re.compile('\\s(ph\\.?\\s+d\\.?)', re.IGNORECASE), 'quoted_word': re.compile("(?<!\\w)\\'([^\\s]*?)\\'(?!\\w)"), 'roman_numeral': re.compile('^(X|IX|IV|V?I{0,3})$', re.IGNORECASE), 'space_before_comma': re.compile('\\s+,'), 'spaces': re.compile('\\s+'), 'turkic_patronymic_marker': re.compile("^(oglu|oğlu|ogly|ogli|o['’ʻ]g['’ʻ]li|qizi|qızı|kizi|kyzy|gyzy|uly|uulu)$", re.IGNORECASE), 'turkic_patronymic_marker_cyrillic': re.compile('^(оглу|оглы|оғлу|ўғли|угли|кызы|гызы|қызы|қизи|улы|ұлы|уулу)$', re.IGNORECASE), 'word': re.compile('(\\w|\\.)+')}¶: All regular expressions used by the parser are precompiled and stored in the config.