HumanName Class Documentation

HumanName.parser

class nameparser.parser.HumanName[source]
class nameparser.parser.HumanName(full_name='', constants=<Constants() instance>, encoding='UTF-8', string_format=None, initials_format=None, initials_delimiter=None, first=None, middle=None, last=None, title=None, suffix=None, nickname=None)[source]

Parse a person’s name into individual components.

Instantiation assigns to full_name, and assignment to full_name triggers parse_full_name(). After parsing the name, these instance attributes are available. Alternatively, you can pass any of the instance attributes to the constructor method and skip the parsing process. If any of the the instance attributes are passed to the constructor as keywords, parse_full_name() will not be performed.

HumanName Instance Attributes

Parameters:
  • full_name (str) – The name string to be parsed.
  • constants (constants) – a Constants instance. Pass None for per-instance config.
  • encoding (str) – string representing the encoding of your input
  • string_format (str) – python string formatting
  • initials_format (str) – python initials string formatting
  • initials_delimter (str) – string delimiter for initials
  • first (str) – first name
  • middle (str) – middle name
  • last (str) – last name
  • title (str) – The title or prenominal
  • suffix (str) – The suffix or postnominal
  • nickname (str) – Nicknames
C = <Constants() instance>

A reference to the configuration for this instance, which may or may not be a reference to the shared, module-wide instance at CONSTANTS. See Customizing the Parser.

__eq__(other)[source]

HumanName instances are equal to other objects whose lower case unicode representation is the same.

__init__(full_name='', constants=<Constants() instance>, encoding='UTF-8', string_format=None, initials_format=None, initials_delimiter=None, first=None, middle=None, last=None, title=None, suffix=None, nickname=None)[source]

Initialize self. See help(type(self)) for accurate signature.

are_suffixes(pieces)[source]

Return True if all pieces are suffixes.

as_dict(include_empty=True)[source]

Return the parsed name as a dictionary of its attributes.

Parameters:include_empty (bool) – Include keys in the dictionary for empty name attributes.
Return type:dict
>>> name = HumanName("Bob Dole")
>>> name.as_dict()
{'last': 'Dole', 'suffix': '', 'title': '', 'middle': '', 'nickname': '', 'first': 'Bob'}
>>> name.as_dict(False)
{'last': 'Dole', 'first': 'Bob'}
capitalize(force=None)[source]

The HumanName class can try to guess the correct capitalization of name entered in all upper or lower case. By default, it will not adjust the case of names entered in mixed case. To run capitalization on all names pass the parameter force=True.

Parameters:force (bool) – Forces capitalization of mixed case strings. This parameter overrides rules set within CONSTANTS.

Usage

>>> name = HumanName('bob v. de la macdole-eisenhower phd')
>>> name.capitalize()
>>> str(name)
'Bob V. de la MacDole-Eisenhower Ph.D.'
>>> # Don't touch good names
>>> name = HumanName('Shirley Maclaine')
>>> name.capitalize()
>>> str(name)
'Shirley Maclaine'
>>> name.capitalize(force=True)
>>> str(name)
'Shirley MacLaine'
first

The person’s first name. The first name piece after any known title pieces parsed from full_name.

full_name

The string output of the HumanName instance.

handle_capitalization()[source]

Handles capitalization configurations set within CONSTANTS.

handle_firstnames()[source]

If there are only two parts and one is a title, assume it’s a last name instead of a first name. e.g. Mr. Johnson. Unless it’s a special title like “Sir”, then when it’s followed by a single name that name is always a first name.

has_own_config

True if this instance is not using the shared module-level configuration.

initials()[source]

Return period-delimited initials of the first, middle and optionally last name.

Parameters:include_last_name (bool) – Include the last name as part of the initials
Return type:str
>>> name = HumanName("Sir Bob Andrew Dole")
>>> name.initials()
"B. A. D."
>>> name = HumanName("Sir Bob Andrew Dole", initials_format="{first} {middle}")
>>> name.initials()
"B. A."
initials_list()[source]

Returns the initials as a list

>>> name = HumanName("Sir Bob Andrew Dole")
>>> name.initials_list()
["B", "A", "D"]
>>> name = HumanName("J. Doe")
>>> name.initials_list()
["J", "D"]
is_an_initial(value)[source]

Words with a single period at the end, or a single uppercase letter.

Matches the initial regular expression in REGEXES.

is_conjunction(piece)[source]

Is in the conjunctions set and not is_an_initial().

is_prefix(piece)[source]

Lowercase and no periods version of piece is in the PREFIXES set.

is_roman_numeral(value)[source]

Matches the roman_numeral regular expression in REGEXES.

is_rootname(piece)[source]

Is not a known title, suffix or prefix. Just first, middle, last names.

is_suffix(piece)[source]

Is in the suffixes set and not is_an_initial().

Some suffixes may be acronyms (M.B.A) while some are not (Jr.), so we remove the periods from piece when testing against C.suffix_acronyms.

is_title(value)[source]

Is in the TITLES set.

join_on_conjunctions(pieces, additional_parts_count=0)[source]

Join conjunctions to surrounding pieces. Title- and prefix-aware. e.g.:

[‘Mr.’, ‘and’. ‘Mrs.’, ‘John’, ‘Doe’] ==>
[‘Mr. and Mrs.’, ‘John’, ‘Doe’]
[‘The’, ‘Secretary’, ‘of’, ‘State’, ‘Hillary’, ‘Clinton’] ==>
[‘The Secretary of State’, ‘Hillary’, ‘Clinton’]

When joining titles, saves newly formed piece to the instance’s titles constant so they will be parsed correctly later. E.g. after parsing the example names above, ‘The Secretary of State’ and ‘Mr. and Mrs.’ would be present in the titles constant set.

Parameters:
  • pieces (list) – name pieces strings after split on spaces
  • additional_parts_count (int) –
Returns:

new list with piece next to conjunctions merged into one piece with spaces in it.

Return type:

list

last

The person’s last name. The last name piece parsed from full_name.

middle

The person’s middle names. All name pieces after the first name and before the last name parsed from full_name.

nickname

The person’s nicknames. Any text found inside of quotes ("") or parenthesis (())

original = ''

The original string, untouched by the parser.

parse_full_name()[source]

The main parse method for the parser. This method is run upon assignment to the full_name attribute or instantiation.

Basic flow is to hand off to pre_process() to handle nicknames. It then splits on commas and chooses a code path depending on the number of commas.

parse_pieces() then splits those parts on spaces and join_on_conjunctions() joins any pieces next to conjunctions.

parse_nicknames()[source]

The content of parenthesis or quotes in the name will be added to the nicknames list. This happens before any other processing of the name.

Single quotes cannot span white space characters and must border white space to allow for quotes in names like O’Connor and Kawai’ae’a. Double quotes and parenthesis can span white space.

Loops through 3 REGEXES; quoted_word, double_quotes and parenthesis.

parse_pieces(parts, additional_parts_count=0)[source]

Split parts on spaces and remove commas, join on conjunctions and lastname prefixes. If parts have periods in the middle, try splitting on periods and check if the parts are titles or suffixes. If they are add to the constant so they will be found.

Parameters:
  • parts (list) – name part strings from the comma split
  • additional_parts_count (int) – if the comma format contains other parts, we need to know how many there are to decide if things should be considered a conjunction.
Returns:

pieces split on spaces and joined on conjunctions

Return type:

list

post_process()[source]

This happens at the end of the parse_full_name() after all other processing has taken place. Runs handle_firstnames() and handle_capitalization().

pre_process()[source]

This method happens at the beginning of the parse_full_name() before any other processing of the string aside from unicode normalization, so it’s a good place to do any custom handling in a subclass. Runs parse_nicknames() and squash_emoji().

squash_emoji()[source]

Remove emoji from the input string.

suffix

The persons’s suffixes. Pieces at the end of the name that are found in suffixes, or pieces that are at the end of comma separated formats, e.g. “Lastname, Title Firstname Middle[,] Suffix [, Suffix]” parsed from full_name.

surnames

A string of all middle names followed by the last name.

surnames_list

List of middle names followed by last name.

title

The person’s titles. Any string of consecutive pieces in titles or conjunctions at the beginning of full_name.

HumanName.config

The nameparser.config module manages the configuration of the nameparser.

A module-level instance of Constants is created and used by default for all HumanName instances. You can adjust the entire module’s configuration by importing this instance and changing it.

>>> from nameparser.config import CONSTANTS
>>> CONSTANTS.titles.remove('hon').add('chemistry','dean') 
SetManager(set([u'msgt', ..., u'adjutant']))

You can also adjust the configuration of individual instances by passing None as the second argument upon instantiation.

>>> from nameparser import HumanName
>>> hn = HumanName("Dean Robert Johns", None)
>>> hn.C.titles.add('dean') 
SetManager(set([u'msgt', ..., u'adjutant']))
>>> hn.parse_full_name() # need to run this again after config changes

Potential Gotcha: If you do not pass None as the second argument, hn.C will be a reference to the module config, possibly yielding unexpected results. See Customizing the Parser.

nameparser.config.CONSTANTS = <Constants() instance>

A module-level instance of the Constants() class. Provides a common instance for the module to share to easily adjust configuration for the entire module. See Customizing the Parser with Your Own Configuration.

class nameparser.config.Constants(prefixes={'abu', 'al', 'bin', 'bon', 'da', 'dal', 'de', "de'", 'degli', 'dei', 'del', 'dela', 'della', 'delle', 'delli', 'dello', 'der', 'di', 'do', 'dos', 'du', 'dí', 'ibn', 'la', 'le', 'mac', 'mc', 'san', 'santa', 'st', 'ste', 'van', 'vander', 'vel', 'vom', 'von'}, suffix_acronyms={'(ret)', '(vet)', '8-vsb', 'aas', 'aba', 'abc', 'abd', 'abpp', 'abr', 'aca', 'acas', 'ace', 'acha', 'acp', 'ae', 'aem', 'afasma', 'afc', 'afm', 'agsf', 'aia', 'aicp', 'ala', 'alc', 'alp', 'am', 'amd', 'ame', 'amieee', 'ams', 'aphr', 'apn aprn', 'apr', 'apss', 'aqp', 'arm', 'arrc', 'asa', 'asc', 'asid', 'asla', 'asp', 'atc', 'awb', 'bca', 'bcl', 'bcss', 'bds', 'bem', 'bls-i', 'bpe', 'bpi', 'bpt', 'bt', 'btcs', 'bts', 'cacts', 'cae', 'caha', 'caia', 'cams', 'cap', 'capa', 'capm', 'capp', 'caps', 'caro', 'cas', 'casp', 'cb', 'cbe', 'cbm', 'cbne', 'cbnt', 'cbp', 'cbrte', 'cbs', 'cbsp', 'cbt', 'cbte', 'cbv', 'cca', 'ccc', 'ccca', 'cccm', 'cce', 'cchp', 'ccie', 'ccim', 'cciso', 'ccm', 'ccmt', 'ccna', 'ccnp', 'ccp', 'ccp-c', 'ccpr', 'ccs', 'ccufc', 'cd', 'cdal', 'cdfm', 'cdmp', 'cds', 'cdt', 'cea', 'ceas', 'cebs', 'ceds', 'ceh', 'cela', 'cem', 'cep', 'cera', 'cet', 'cfa', 'cfc', 'cfcc', 'cfce', 'cfcm', 'cfe', 'cfeds', 'cfi', 'cfm', 'cfp', 'cfps', 'cfr', 'cfre', 'cga', 'cgap', 'cgb', 'cgc', 'cgfm', 'cgfo', 'cgm', 'cgma', 'cgp', 'cgr', 'cgsp', 'ch', 'cha', 'chba', 'chdm', 'che', 'ches', 'chfc', 'chi', 'chmc', 'chmm', 'chp', 'chpa', 'chpe', 'chpln', 'chpse', 'chrm', 'chsc', 'chse', 'chse-a', 'chsos', 'chss', 'cht', 'cia', 'cic', 'cie', 'cig', 'cip', 'cipm', 'cips', 'ciro', 'cisa', 'cism', 'cissp', 'cla', 'clsd', 'cltd', 'clu', 'cm', 'cma', 'cmas', 'cmc', 'cmfo', 'cmg', 'cmp', 'cms', 'cmsp', 'cmt', 'cna', 'cnm', 'cnp', 'cp', 'cp-c', 'cpa', 'cpacc', 'cpbe', 'cpcm', 'cpcu', 'cpe', 'cpfa', 'cpfo', 'cpg', 'cph', 'cpht', 'cpim', 'cpl', 'cplp', 'cpm', 'cpo', 'cpp', 'cppm', 'cprc', 'cpre', 'cprp', 'cpsc', 'cpsi', 'cpss', 'cpt', 'cpwa', 'crde', 'crisc', 'crma', 'crme', 'crna', 'cro', 'crp', 'crt', 'crtt', 'csa', 'csbe', 'csc', 'cscp', 'cscu', 'csep', 'csi', 'csm', 'csp', 'cspo', 'csre', 'csrte', 'csslp', 'cssm', 'cst', 'cste', 'ctbs', 'ctfa', 'cto', 'ctp', 'cts', 'cua', 'cusp', 'cva', 'cva[22]', 'cvo', 'cvp', 'cvrs', 'cwap', 'cwb', 'cwdp', 'cwep', 'cwna', 'cwne', 'cwp', 'cwsp', 'cxa', 'cyds', 'cysa', 'dabfm', 'dabvlm', 'dacvim', 'dbe', 'dc', 'dcb', 'dcm', 'dcmg', 'dcvo', 'dd', 'dds', 'ded', 'dep', 'dfc', 'dfm', 'diplac', 'diplom', 'djur', 'dma', 'dmd', 'dmin', 'dnp', 'do', 'dpm', 'dpt', 'drb', 'drmp', 'drph', 'dsc', 'dsm', 'dso', 'dss', 'dtr', 'dvep', 'dvm', 'ea', 'ed', 'edd', 'ei', 'eit', 'els', 'emd', 'emt-b', 'emt-i/85', 'emt-i/99', 'emt-p', 'enp', 'erd', 'esq', 'evp', 'faafp', 'faan', 'faap', 'fac-c', 'facc', 'facd', 'facem', 'facep', 'facha', 'facofp', 'facog', 'facp', 'facph', 'facs', 'faia', 'faicp', 'fala', 'fashp', 'fasid', 'fasla', 'fasma', 'faspen', 'fca', 'fcas', 'fcela', 'fd', 'fec', 'fhames', 'fic', 'ficf', 'fieee', 'fmp', 'fmva', 'fnss', 'fp&a', 'fp-c', 'fpc', 'frm', 'fsa', 'fsdp', 'fws', 'gaee[14]', 'gba', 'gbe', 'gc', 'gcb', 'gchs', 'gcie', 'gcmg', 'gcsi', 'gcvo', 'gisp', 'git', 'gm', 'gmb', 'gmr', 'gphr', 'gri', 'grp', 'gsmieee', 'hccp', 'hrs', 'iaccp', 'iaee', 'iccm-d', 'iccm-f', 'idsm', 'ifgict', 'iom', 'ipep', 'ipm', 'iso', 'issp-csp', 'issp-sa', 'itil', 'jd', 'jp', 'kbe', 'kcb', 'kchs/dchs', 'kcie', 'kcmg', 'kcsi', 'kcvo', 'kg', 'khs/dhs', 'kp', 'kt', 'lac', 'lcmt', 'lcpc', 'lcsw', 'leed ap', 'lg', 'litk', 'litl', 'litp', 'llm', 'lm', 'lmsw', 'lmt', 'lp', 'lpa', 'lpc', 'lpn', 'lpss', 'lsi', 'lsit', 'lt', 'lvn', 'lvo', 'lvt', 'ma', 'maaa', 'mai', 'mba', 'mbe', 'mbs', 'mc', 'mcct', 'mcdba', 'mches', 'mcm', 'mcp', 'mcpd', 'mcsa', 'mcsd', 'mcse', 'mct', 'md', 'mdiv', 'mem', 'mfa', 'micp', 'mieee', 'mirm', 'mle', 'mls', 'mlse', 'mlt', 'mm', 'mmad', 'mmas', 'mnaa', 'mnae', 'mp', 'mpa', 'mph', 'mpse', 'mra', 'ms', 'msa', 'mscmscmsm', 'msm', 'mt', 'mts', 'mvo', 'nbc-his', 'nbcch', 'nbcch-ps', 'nbcdch', 'nbcdch-ps', 'nbcfch', 'nbcfch-ps', 'nbct', 'ncarb', 'nccp', 'ncidq', 'ncps', 'ncso', 'ncto', 'nd', 'ndtr', 'nicet i', 'nicet ii', 'nicet iii', 'nicet iv', 'nmd', 'np', 'np[18]', 'nraemt', 'nremr', 'nremt', 'nrp', 'obe', 'obi', 'oca', 'ocm', 'ocp', 'od', 'om', 'oscp', 'ot', 'pa-c', 'pcc', 'pci', 'pe', 'pfmp', 'pg', 'pgmp', 'ph', 'pharmd', 'phc', 'phd', 'phr', 'phrca', 'pla', 'pls', 'pmc', 'pmi-acp', 'pmp', 'pp', 'pps', 'prm', 'psm', 'psm i', 'psm ii', 'psp', 'psyd', 'pt', 'pta', 'qam', 'qc', 'qcsw', 'qfsm', 'qgm', 'qpm', 'qsd', 'qsp', 'ra', 'rai', 'rba', 'rci', 'rcp', 'rd', 'rdcs', 'rdh', 'rdms', 'rdn', 'res', 'rfp', 'rhca', 'rid', 'rls', 'rmsks', 'rn', 'rp', 'rpa', 'rph', 'rpl', 'rrc', 'rrt', 'rrt-accs', 'rrt-nps', 'rrt-sds', 'rtrp', 'rvm', 'rvt', 'sa', 'same', 'sasm', 'sccp', 'scmp', 'se', 'secb', 'sfp', 'sgm', 'shrm-cp', 'shrm-scp', 'si', 'siie', 'smieee', 'sphr', 'sra', 'sscp', 'stmieee', 'tbr-ct', 'td', 'thd', 'thm', 'ud', 'usa', 'usaf', 'usar', 'uscg', 'usmc', 'usn', 'usnr', 'uxc', 'uxmc', 'vc', 'vcp', 'vd', 'vrd'}, suffix_not_acronyms={'2', 'dr', 'esq', 'esquire', 'i', 'ii', 'iii', 'iv', 'jnr', 'jr', 'junior', 'snr', 'sr', 'v'}, titles={'10th', '1lt', '1sgt', '1st', '1stlt', '1stsgt', '2lt', '2nd', '2ndlt', '3rd', '4th', '5th', '6th', '7th', '8th', '9th', 'a1c', 'ab', 'abbess', 'abbot', 'abolitionist', 'academic', 'acolyte', 'activist', 'actor ', 'actress', 'adept', 'adjutant', 'adm', 'admiral', 'advertising', 'adviser', 'advocate', 'air', 'akhoond', 'alderman', 'almoner', 'ambassador', 'amn', 'analytics', 'anarchist', 'animator', 'anthropologist', 'appellate', 'apprentice', 'arbitrator', 'archbishop', 'archdeacon', 'archdruid', 'archduchess', 'archduke', 'archeologist', 'architect', 'arhat', 'army', 'arranger', 'assistant', 'assoc', 'associate', 'asst', 'astronomer', 'attache', 'attaché', 'attorney', 'aunt', 'auntie', 'author', 'award-winning', 'ayatollah', 'baba', 'bailiff', 'ballet', 'bandleader', 'banker', 'banner', 'bard', 'baron', 'baroness', 'barrister', 'baseball', 'bearer', 'behavioral', 'bench', 'bg', 'bgen', 'biblical', 'bibliographer', 'biochemist', 'biographer', 'biologist', 'bishop', 'blessed', 'blogger', 'blues', 'bodhisattva', 'bookseller', 'botanist', 'bp', 'brigadier', 'briggen', 'british', 'broadcaster', 'brother', 'buddha', 'burgess', 'burlesque', 'business', 'businessman', 'businesswoman', 'bwana', 'canon', 'capt', 'captain', 'cardinal', 'cartographer', 'cartoonist', 'catholicos', 'ccmsgt', 'cdr', 'celebrity', 'ceo', 'cfo', 'chair', 'chairs', 'chancellor', 'chaplain', "chargé d'affaires", 'chef', 'cheikh', 'chemist', 'chief', 'chieftain', 'choreographer', 'civil', 'classical', 'clergyman', 'clerk', 'cmsaf', 'cmsgt', 'co-chair', 'co-chairs', 'co-founder', 'coach', 'col', 'collector', 'colonel', 'comedian', 'comedienne', 'comic', 'commander', 'commander-in-chief', 'commodore', 'composer', 'compositeur', 'comptroller', 'computer', 'comtesse', 'conductor', 'consultant', 'controller', 'corporal', 'corporate', 'correspondent', 'councillor', 'counselor', 'count', 'countess', 'courtier', 'cpl', 'cpo', 'cpt', 'credit', 'criminal', 'criminologist', 'critic', 'csm', 'curator', 'customs', 'cwo-2', 'cwo-3', 'cwo-4', 'cwo-5', 'cwo2', 'cwo3', 'cwo4', 'cwo5', 'cyclist', 'dame', 'dancer', 'dcn', 'deacon', 'delegate', 'deputy', 'designated', 'designer', 'detective', 'developer', 'diplomat', 'dir', 'director', 'discovery', 'dissident', 'district', 'division', 'do', 'docent', 'docket', 'doctor', 'doyen', 'dpty', 'dr', 'dra', 'dramatist', 'druid', 'drummer', 'duchesse', 'dutchess', 'ecologist', 'economist', 'editor', 'edmi', 'edohen', 'educator', 'effendi', 'ekegbian', 'elerunwon', 'eminence', 'emperor', 'empress', 'engineer', 'english', 'ens', 'entertainer', 'entrepreneur', 'envoy', 'essayist', 'evangelist', 'excellency', 'excellent', 'exec', 'executive', 'expert', 'fadm', 'family', 'father', 'federal', 'field', 'film', 'financial', 'first', 'flag', 'flying', 'foreign', 'forester', 'founder', 'fr', 'friar', 'gaf', 'gen', 'general', 'generalissimo', 'gentiluomo', 'giani', 'goodman', 'goodwife', 'governor', 'graf', 'grand', 'group', 'guitarist', 'guru', 'gyani', 'gysgt', 'hajji', 'headman', 'heir', 'heiress', 'her', 'hereditary', 'high', 'highness', 'his', 'historian', 'historicus', 'historien', 'holiness', 'hon', 'honorable', 'honourable', 'host', 'illustrator', 'imam', 'industrialist', 'information', 'instructor', 'intelligence', 'intendant', 'inventor', 'investigator', 'investor', 'journalist', 'journeyman', 'jr', 'judge', 'judicial', 'junior', 'jurist', 'keyboardist', 'king', "king's", 'kingdom', 'knowledge', 'lady', 'lama', 'lamido', 'law', 'lawyer', 'lcdr', 'lcpl', 'leader', 'lecturer', 'legal', 'librarian', 'lieutenant', 'linguist', 'literary', 'lord', 'lt', 'ltc', 'ltcol', 'ltg', 'ltgen', 'ltjg', 'lyricist', 'madam', 'madame', 'mademoiselle', 'mag', 'mag-judge', 'mag/judge', 'magistrate', 'magistrate-judge', 'magnate', 'maharajah', 'maharani', 'mahdi', 'maid', 'maj', 'majesty', 'majgen', 'manager', 'marcher', 'marchess', 'marchioness', 'marketing', 'marquess', 'marquis', 'marquise', 'master', 'mathematician', 'mathematics', 'matriarch', 'mayor', 'mcpo', 'mcpoc', 'mcpon', 'md', 'member', 'memoirist', 'merchant', 'met', 'metropolitan', 'mg', 'mgr', 'mgysgt', 'military', 'minister', 'miss', 'misses', 'missionary', 'mister', 'mlle', 'mme', 'mobster', 'model', 'monk', 'monsignor', 'most', 'mother', 'mountaineer', 'mpco-cg', 'mr', 'mrs', 'ms', 'msg', 'msgt', 'mufti', 'mullah', 'municipal', 'murshid', 'musician', 'musicologist', 'mx', 'mystery', 'nanny', 'narrator', 'national', 'naturalist', 'navy', 'neuroscientist', 'novelist', 'nurse', 'obstetritian', 'officer', 'opera', 'operating', 'ornithologist', 'painter', 'paleontologist', 'pastor', 'patriarch', 'pediatrician', 'personality', 'petty', 'pfc', 'pharaoh', 'phd', 'philantropist', 'philosopher', 'photographer', 'physician', 'physicist', 'pianist', 'pilot', 'pioneer', 'pir', 'player', 'playwright', 'po1', 'po2', 'po3', 'poet', 'police', 'political', 'politician', 'pope', 'prefect', 'prelate', 'premier', 'pres', 'presbyter', 'president', 'presiding', 'priest', 'priestess', 'primate', 'prime', 'prin', 'prince', 'princess', 'principal', 'printer', 'printmaker', 'prior', 'private', 'pro', 'producer', 'prof', 'professor', 'provost', 'pslc', 'psychiatrist', 'psychologist', 'publisher', 'pursuivant', 'pv2', 'pvt', 'queen', "queen's", 'rabbi', 'radio', 'radm', 'rangatira', 'ranger', 'rdml', 'rear', 'rebbe', 'registrar', 'rep', 'representative', 'researcher', 'resident', 'rev', 'revenue', 'reverend', 'right', 'risk', 'rock', 'royal', 'rt', 'sa', 'sailor', 'saint', 'sainte', 'saoshyant', 'satirist', 'scholar', 'schoolmaster', 'scientist', 'scpo', 'screenwriter', 'se', 'secretary', 'security', 'seigneur', 'senator', 'senior', 'senior-judge', 'sergeant', 'servant', 'sfc', 'sgm', 'sgt', 'sgtmaj', 'sgtmajmc', 'shaik', 'shaikh', 'shayk', 'shaykh', 'shehu', 'sheik', 'sheikh', 'shekh', 'sheriff', 'siddha', 'singer', 'singer-songwriter', 'sir', 'sister', 'sma', 'smsgt', 'sn', 'soccer', 'social', 'sociologist', 'software', 'soldier', 'solicitor', 'soprano', 'spc', 'speaker', 'special', 'sr', 'sra', 'srta', 'ssg', 'ssgt', 'st', 'staff', 'state', 'states', 'strategy', 'subaltern', 'subedar', 'suffragist', 'sultan', 'sultana', 'superior', 'supreme', 'surgeon', 'swami', 'swordbearer', 'sysselmann', 'tax', 'teacher', 'technical', 'technologist', 'television ', 'tenor', 'theater', 'theatre', 'theologian', 'theorist', 'timi', 'tirthankar', 'translator', 'travel', 'treasurer', 'tsar', 'tsarina', 'tsgt', 'uk', 'uncle', 'united', 'us', 'vadm', 'vardapet', 'vc', 'venerable', 'verderer', 'vicar', 'vice', 'viscount', 'vizier', 'vocalist', 'voice', 'warden', 'warrant', 'wing', 'wm', 'wo-1', 'wo1', 'wo2', 'wo3', 'wo4', 'wo5', 'woodman', 'writer', 'zoologist'}, first_name_titles={'aunt', 'auntie', 'brother', 'cheikh', 'dame', 'father', 'king', 'maid', 'master', 'mother', 'pope', 'queen', 'shaik', 'shaikh', 'shayk', 'shaykh', 'sheik', 'sheikh', 'shekh', 'sir', 'sister', 'uncle'}, conjunctions={'&', 'and', 'e', 'et', 'of', 'the', 'und', 'y'}, capitalization_exceptions=(('ii', 'II'), ('iii', 'III'), ('iv', 'IV'), ('md', 'M.D.'), ('phd', 'Ph.D.')), regexes={('double_quotes', re.compile('\"(.*?)\"')), ('emoji', re.compile('[🌀-🙏🚀-U0001f6ff☀-⛿✀-➿]+')), ('initial', re.compile('^(\w\.|[A-Z])?$')), ('mac', re.compile('^(ma?c)(\w{2, })', re.IGNORECASE)), ('no_vowels', re.compile('^[^aeyiuo]+$', re.IGNORECASE)), ('parenthesis', re.compile('\((.*?)\)')), ('period_not_at_end', re.compile('.*\..+$', re.IGNORECASE)), ('phd', re.compile('\s(ph\.?\s+d\.?)', re.IGNORECASE)), ('quoted_word', re.compile("(?<!\w)\'([^\s]*?)\'(?!\w)")), ('roman_numeral', re.compile('^(X|IX|IV|V?I{0, 3})$', re.IGNORECASE)), ('spaces', re.compile('\s+')), ('word', re.compile('(\w|\.)+'))})[source]

An instance of this class hold all of the configuration constants for the parser.

Parameters:
capitalize_name = False

If set, applies capitalize() to HumanName instance.

>>> from nameparser.config import CONSTANTS
>>> CONSTANTS.capitalize_name = True
>>> name = HumanName("bob v. de la macdole-eisenhower phd")
>>> str(name)
'Bob V. de la MacDole-Eisenhower Ph.D.'
empty_attribute_default = ''

Default return value for empty attributes.

>>> from nameparser.config import CONSTANTS
>>> CONSTANTS.empty_attribute_default = None
>>> name = HumanName("John Doe")
>>> name.title
None
>>>name.first
'John'
force_mixed_case_capitalization = False

If set, forces the capitalization of mixed case strings when capitalize() is called.

>>> from nameparser.config import CONSTANTS
>>> CONSTANTS.force_mixed_case_capitalization = True
>>> name = HumanName('Shirley Maclaine')
>>> name.capitalize()
>>> str(name)
'Shirley MacLaine'
initials_delimiter = '.'

The default initials delimiter used for all new HumanName instances. Will be used to add a delimiter between each initial.

initials_format = '{first} {middle} {last}'

The default initials format used for all new HumanName instances.

string_format = '{title} {first} {middle} {last} {suffix} ({nickname})'

The default string format use for all new HumanName instances.

class nameparser.config.SetManager(elements)[source]

Easily add and remove config variables per module or instance. Subclass of collections.abc.Set.

Only special functionality beyond that provided by set() is to normalize constants for comparison (lower case, no periods) when they are add()ed and remove()d and allow passing multiple string arguments to the add() and remove() methods.

add(*strings)[source]

Add the lower case and no-period version of the string arguments to the set. Can pass a list of strings. Returns self for chaining.

add_with_encoding(s, encoding=None)[source]

Add the lower case and no-period version of the string to the set. Pass an explicit encoding parameter to specify the encoding of binary strings that are not DEFAULT_ENCODING (UTF-8).

remove(*strings)[source]

Remove the lower case and no-period version of the string arguments from the set. Returns self for chaining.

class nameparser.config.TupleManager[source]

A dictionary with dot.notation access. Subclass of dict. Makes the tuple constants more friendly.

HumanName.config Defaults

nameparser.config.titles.FIRST_NAME_TITLES = {'aunt', 'auntie', 'brother', 'cheikh', 'dame', 'father', 'king', 'maid', 'master', 'mother', 'pope', 'queen', 'shaik', 'shaikh', 'shayk', 'shaykh', 'sheik', 'sheikh', 'shekh', 'sir', 'sister', 'uncle'}

When these titles appear with a single other name, that name is a first name, e.g. “Sir John”, “Sister Mary”, “Queen Elizabeth”.

nameparser.config.titles.TITLES = {'10th', '1lt', '1sgt', '1st', '1stlt', '1stsgt', '2lt', '2nd', '2ndlt', '3rd', '4th', '5th', '6th', '7th', '8th', '9th', 'a1c', 'ab', 'abbess', 'abbot', 'abolitionist', 'academic', 'acolyte', 'activist', 'actor ', 'actress', 'adept', 'adjutant', 'adm', 'admiral', 'advertising', 'adviser', 'advocate', 'air', 'akhoond', 'alderman', 'almoner', 'ambassador', 'amn', 'analytics', 'anarchist', 'animator', 'anthropologist', 'appellate', 'apprentice', 'arbitrator', 'archbishop', 'archdeacon', 'archdruid', 'archduchess', 'archduke', 'archeologist', 'architect', 'arhat', 'army', 'arranger', 'assistant', 'assoc', 'associate', 'asst', 'astronomer', 'attache', 'attaché', 'attorney', 'aunt', 'auntie', 'author', 'award-winning', 'ayatollah', 'baba', 'bailiff', 'ballet', 'bandleader', 'banker', 'banner', 'bard', 'baron', 'baroness', 'barrister', 'baseball', 'bearer', 'behavioral', 'bench', 'bg', 'bgen', 'biblical', 'bibliographer', 'biochemist', 'biographer', 'biologist', 'bishop', 'blessed', 'blogger', 'blues', 'bodhisattva', 'bookseller', 'botanist', 'bp', 'brigadier', 'briggen', 'british', 'broadcaster', 'brother', 'buddha', 'burgess', 'burlesque', 'business', 'businessman', 'businesswoman', 'bwana', 'canon', 'capt', 'captain', 'cardinal', 'cartographer', 'cartoonist', 'catholicos', 'ccmsgt', 'cdr', 'celebrity', 'ceo', 'cfo', 'chair', 'chairs', 'chancellor', 'chaplain', "chargé d'affaires", 'chef', 'cheikh', 'chemist', 'chief', 'chieftain', 'choreographer', 'civil', 'classical', 'clergyman', 'clerk', 'cmsaf', 'cmsgt', 'co-chair', 'co-chairs', 'co-founder', 'coach', 'col', 'collector', 'colonel', 'comedian', 'comedienne', 'comic', 'commander', 'commander-in-chief', 'commodore', 'composer', 'compositeur', 'comptroller', 'computer', 'comtesse', 'conductor', 'consultant', 'controller', 'corporal', 'corporate', 'correspondent', 'councillor', 'counselor', 'count', 'countess', 'courtier', 'cpl', 'cpo', 'cpt', 'credit', 'criminal', 'criminologist', 'critic', 'csm', 'curator', 'customs', 'cwo-2', 'cwo-3', 'cwo-4', 'cwo-5', 'cwo2', 'cwo3', 'cwo4', 'cwo5', 'cyclist', 'dame', 'dancer', 'dcn', 'deacon', 'delegate', 'deputy', 'designated', 'designer', 'detective', 'developer', 'diplomat', 'dir', 'director', 'discovery', 'dissident', 'district', 'division', 'do', 'docent', 'docket', 'doctor', 'doyen', 'dpty', 'dr', 'dra', 'dramatist', 'druid', 'drummer', 'duchesse', 'dutchess', 'ecologist', 'economist', 'editor', 'edmi', 'edohen', 'educator', 'effendi', 'ekegbian', 'elerunwon', 'eminence', 'emperor', 'empress', 'engineer', 'english', 'ens', 'entertainer', 'entrepreneur', 'envoy', 'essayist', 'evangelist', 'excellency', 'excellent', 'exec', 'executive', 'expert', 'fadm', 'family', 'father', 'federal', 'field', 'film', 'financial', 'first', 'flag', 'flying', 'foreign', 'forester', 'founder', 'fr', 'friar', 'gaf', 'gen', 'general', 'generalissimo', 'gentiluomo', 'giani', 'goodman', 'goodwife', 'governor', 'graf', 'grand', 'group', 'guitarist', 'guru', 'gyani', 'gysgt', 'hajji', 'headman', 'heir', 'heiress', 'her', 'hereditary', 'high', 'highness', 'his', 'historian', 'historicus', 'historien', 'holiness', 'hon', 'honorable', 'honourable', 'host', 'illustrator', 'imam', 'industrialist', 'information', 'instructor', 'intelligence', 'intendant', 'inventor', 'investigator', 'investor', 'journalist', 'journeyman', 'jr', 'judge', 'judicial', 'junior', 'jurist', 'keyboardist', 'king', "king's", 'kingdom', 'knowledge', 'lady', 'lama', 'lamido', 'law', 'lawyer', 'lcdr', 'lcpl', 'leader', 'lecturer', 'legal', 'librarian', 'lieutenant', 'linguist', 'literary', 'lord', 'lt', 'ltc', 'ltcol', 'ltg', 'ltgen', 'ltjg', 'lyricist', 'madam', 'madame', 'mademoiselle', 'mag', 'mag-judge', 'mag/judge', 'magistrate', 'magistrate-judge', 'magnate', 'maharajah', 'maharani', 'mahdi', 'maid', 'maj', 'majesty', 'majgen', 'manager', 'marcher', 'marchess', 'marchioness', 'marketing', 'marquess', 'marquis', 'marquise', 'master', 'mathematician', 'mathematics', 'matriarch', 'mayor', 'mcpo', 'mcpoc', 'mcpon', 'md', 'member', 'memoirist', 'merchant', 'met', 'metropolitan', 'mg', 'mgr', 'mgysgt', 'military', 'minister', 'miss', 'misses', 'missionary', 'mister', 'mlle', 'mme', 'mobster', 'model', 'monk', 'monsignor', 'most', 'mother', 'mountaineer', 'mpco-cg', 'mr', 'mrs', 'ms', 'msg', 'msgt', 'mufti', 'mullah', 'municipal', 'murshid', 'musician', 'musicologist', 'mx', 'mystery', 'nanny', 'narrator', 'national', 'naturalist', 'navy', 'neuroscientist', 'novelist', 'nurse', 'obstetritian', 'officer', 'opera', 'operating', 'ornithologist', 'painter', 'paleontologist', 'pastor', 'patriarch', 'pediatrician', 'personality', 'petty', 'pfc', 'pharaoh', 'phd', 'philantropist', 'philosopher', 'photographer', 'physician', 'physicist', 'pianist', 'pilot', 'pioneer', 'pir', 'player', 'playwright', 'po1', 'po2', 'po3', 'poet', 'police', 'political', 'politician', 'pope', 'prefect', 'prelate', 'premier', 'pres', 'presbyter', 'president', 'presiding', 'priest', 'priestess', 'primate', 'prime', 'prin', 'prince', 'princess', 'principal', 'printer', 'printmaker', 'prior', 'private', 'pro', 'producer', 'prof', 'professor', 'provost', 'pslc', 'psychiatrist', 'psychologist', 'publisher', 'pursuivant', 'pv2', 'pvt', 'queen', "queen's", 'rabbi', 'radio', 'radm', 'rangatira', 'ranger', 'rdml', 'rear', 'rebbe', 'registrar', 'rep', 'representative', 'researcher', 'resident', 'rev', 'revenue', 'reverend', 'right', 'risk', 'rock', 'royal', 'rt', 'sa', 'sailor', 'saint', 'sainte', 'saoshyant', 'satirist', 'scholar', 'schoolmaster', 'scientist', 'scpo', 'screenwriter', 'se', 'secretary', 'security', 'seigneur', 'senator', 'senior', 'senior-judge', 'sergeant', 'servant', 'sfc', 'sgm', 'sgt', 'sgtmaj', 'sgtmajmc', 'shaik', 'shaikh', 'shayk', 'shaykh', 'shehu', 'sheik', 'sheikh', 'shekh', 'sheriff', 'siddha', 'singer', 'singer-songwriter', 'sir', 'sister', 'sma', 'smsgt', 'sn', 'soccer', 'social', 'sociologist', 'software', 'soldier', 'solicitor', 'soprano', 'spc', 'speaker', 'special', 'sr', 'sra', 'srta', 'ssg', 'ssgt', 'st', 'staff', 'state', 'states', 'strategy', 'subaltern', 'subedar', 'suffragist', 'sultan', 'sultana', 'superior', 'supreme', 'surgeon', 'swami', 'swordbearer', 'sysselmann', 'tax', 'teacher', 'technical', 'technologist', 'television ', 'tenor', 'theater', 'theatre', 'theologian', 'theorist', 'timi', 'tirthankar', 'translator', 'travel', 'treasurer', 'tsar', 'tsarina', 'tsgt', 'uk', 'uncle', 'united', 'us', 'vadm', 'vardapet', 'vc', 'venerable', 'verderer', 'vicar', 'vice', 'viscount', 'vizier', 'vocalist', 'voice', 'warden', 'warrant', 'wing', 'wm', 'wo-1', 'wo1', 'wo2', 'wo3', 'wo4', 'wo5', 'woodman', 'writer', 'zoologist'}

Cannot include things that could also be first names, e.g. “dean”. Many of these from wikipedia: https://en.wikipedia.org/wiki/Title. The parser recognizes chains of these including conjunctions allowing recognition titles like “Deputy Secretary of State”.

nameparser.config.suffixes.SUFFIX_ACRONYMS = {'(ret)', '(vet)', '8-vsb', 'aas', 'aba', 'abc', 'abd', 'abpp', 'abr', 'aca', 'acas', 'ace', 'acha', 'acp', 'ae', 'aem', 'afasma', 'afc', 'afm', 'agsf', 'aia', 'aicp', 'ala', 'alc', 'alp', 'am', 'amd', 'ame', 'amieee', 'ams', 'aphr', 'apn aprn', 'apr', 'apss', 'aqp', 'arm', 'arrc', 'asa', 'asc', 'asid', 'asla', 'asp', 'atc', 'awb', 'bca', 'bcl', 'bcss', 'bds', 'bem', 'bls-i', 'bpe', 'bpi', 'bpt', 'bt', 'btcs', 'bts', 'cacts', 'cae', 'caha', 'caia', 'cams', 'cap', 'capa', 'capm', 'capp', 'caps', 'caro', 'cas', 'casp', 'cb', 'cbe', 'cbm', 'cbne', 'cbnt', 'cbp', 'cbrte', 'cbs', 'cbsp', 'cbt', 'cbte', 'cbv', 'cca', 'ccc', 'ccca', 'cccm', 'cce', 'cchp', 'ccie', 'ccim', 'cciso', 'ccm', 'ccmt', 'ccna', 'ccnp', 'ccp', 'ccp-c', 'ccpr', 'ccs', 'ccufc', 'cd', 'cdal', 'cdfm', 'cdmp', 'cds', 'cdt', 'cea', 'ceas', 'cebs', 'ceds', 'ceh', 'cela', 'cem', 'cep', 'cera', 'cet', 'cfa', 'cfc', 'cfcc', 'cfce', 'cfcm', 'cfe', 'cfeds', 'cfi', 'cfm', 'cfp', 'cfps', 'cfr', 'cfre', 'cga', 'cgap', 'cgb', 'cgc', 'cgfm', 'cgfo', 'cgm', 'cgma', 'cgp', 'cgr', 'cgsp', 'ch', 'cha', 'chba', 'chdm', 'che', 'ches', 'chfc', 'chi', 'chmc', 'chmm', 'chp', 'chpa', 'chpe', 'chpln', 'chpse', 'chrm', 'chsc', 'chse', 'chse-a', 'chsos', 'chss', 'cht', 'cia', 'cic', 'cie', 'cig', 'cip', 'cipm', 'cips', 'ciro', 'cisa', 'cism', 'cissp', 'cla', 'clsd', 'cltd', 'clu', 'cm', 'cma', 'cmas', 'cmc', 'cmfo', 'cmg', 'cmp', 'cms', 'cmsp', 'cmt', 'cna', 'cnm', 'cnp', 'cp', 'cp-c', 'cpa', 'cpacc', 'cpbe', 'cpcm', 'cpcu', 'cpe', 'cpfa', 'cpfo', 'cpg', 'cph', 'cpht', 'cpim', 'cpl', 'cplp', 'cpm', 'cpo', 'cpp', 'cppm', 'cprc', 'cpre', 'cprp', 'cpsc', 'cpsi', 'cpss', 'cpt', 'cpwa', 'crde', 'crisc', 'crma', 'crme', 'crna', 'cro', 'crp', 'crt', 'crtt', 'csa', 'csbe', 'csc', 'cscp', 'cscu', 'csep', 'csi', 'csm', 'csp', 'cspo', 'csre', 'csrte', 'csslp', 'cssm', 'cst', 'cste', 'ctbs', 'ctfa', 'cto', 'ctp', 'cts', 'cua', 'cusp', 'cva', 'cva[22]', 'cvo', 'cvp', 'cvrs', 'cwap', 'cwb', 'cwdp', 'cwep', 'cwna', 'cwne', 'cwp', 'cwsp', 'cxa', 'cyds', 'cysa', 'dabfm', 'dabvlm', 'dacvim', 'dbe', 'dc', 'dcb', 'dcm', 'dcmg', 'dcvo', 'dd', 'dds', 'ded', 'dep', 'dfc', 'dfm', 'diplac', 'diplom', 'djur', 'dma', 'dmd', 'dmin', 'dnp', 'do', 'dpm', 'dpt', 'drb', 'drmp', 'drph', 'dsc', 'dsm', 'dso', 'dss', 'dtr', 'dvep', 'dvm', 'ea', 'ed', 'edd', 'ei', 'eit', 'els', 'emd', 'emt-b', 'emt-i/85', 'emt-i/99', 'emt-p', 'enp', 'erd', 'esq', 'evp', 'faafp', 'faan', 'faap', 'fac-c', 'facc', 'facd', 'facem', 'facep', 'facha', 'facofp', 'facog', 'facp', 'facph', 'facs', 'faia', 'faicp', 'fala', 'fashp', 'fasid', 'fasla', 'fasma', 'faspen', 'fca', 'fcas', 'fcela', 'fd', 'fec', 'fhames', 'fic', 'ficf', 'fieee', 'fmp', 'fmva', 'fnss', 'fp&a', 'fp-c', 'fpc', 'frm', 'fsa', 'fsdp', 'fws', 'gaee[14]', 'gba', 'gbe', 'gc', 'gcb', 'gchs', 'gcie', 'gcmg', 'gcsi', 'gcvo', 'gisp', 'git', 'gm', 'gmb', 'gmr', 'gphr', 'gri', 'grp', 'gsmieee', 'hccp', 'hrs', 'iaccp', 'iaee', 'iccm-d', 'iccm-f', 'idsm', 'ifgict', 'iom', 'ipep', 'ipm', 'iso', 'issp-csp', 'issp-sa', 'itil', 'jd', 'jp', 'kbe', 'kcb', 'kchs/dchs', 'kcie', 'kcmg', 'kcsi', 'kcvo', 'kg', 'khs/dhs', 'kp', 'kt', 'lac', 'lcmt', 'lcpc', 'lcsw', 'leed ap', 'lg', 'litk', 'litl', 'litp', 'llm', 'lm', 'lmsw', 'lmt', 'lp', 'lpa', 'lpc', 'lpn', 'lpss', 'lsi', 'lsit', 'lt', 'lvn', 'lvo', 'lvt', 'ma', 'maaa', 'mai', 'mba', 'mbe', 'mbs', 'mc', 'mcct', 'mcdba', 'mches', 'mcm', 'mcp', 'mcpd', 'mcsa', 'mcsd', 'mcse', 'mct', 'md', 'mdiv', 'mem', 'mfa', 'micp', 'mieee', 'mirm', 'mle', 'mls', 'mlse', 'mlt', 'mm', 'mmad', 'mmas', 'mnaa', 'mnae', 'mp', 'mpa', 'mph', 'mpse', 'mra', 'ms', 'msa', 'mscmscmsm', 'msm', 'mt', 'mts', 'mvo', 'nbc-his', 'nbcch', 'nbcch-ps', 'nbcdch', 'nbcdch-ps', 'nbcfch', 'nbcfch-ps', 'nbct', 'ncarb', 'nccp', 'ncidq', 'ncps', 'ncso', 'ncto', 'nd', 'ndtr', 'nicet i', 'nicet ii', 'nicet iii', 'nicet iv', 'nmd', 'np', 'np[18]', 'nraemt', 'nremr', 'nremt', 'nrp', 'obe', 'obi', 'oca', 'ocm', 'ocp', 'od', 'om', 'oscp', 'ot', 'pa-c', 'pcc', 'pci', 'pe', 'pfmp', 'pg', 'pgmp', 'ph', 'pharmd', 'phc', 'phd', 'phr', 'phrca', 'pla', 'pls', 'pmc', 'pmi-acp', 'pmp', 'pp', 'pps', 'prm', 'psm', 'psm i', 'psm ii', 'psp', 'psyd', 'pt', 'pta', 'qam', 'qc', 'qcsw', 'qfsm', 'qgm', 'qpm', 'qsd', 'qsp', 'ra', 'rai', 'rba', 'rci', 'rcp', 'rd', 'rdcs', 'rdh', 'rdms', 'rdn', 'res', 'rfp', 'rhca', 'rid', 'rls', 'rmsks', 'rn', 'rp', 'rpa', 'rph', 'rpl', 'rrc', 'rrt', 'rrt-accs', 'rrt-nps', 'rrt-sds', 'rtrp', 'rvm', 'rvt', 'sa', 'same', 'sasm', 'sccp', 'scmp', 'se', 'secb', 'sfp', 'sgm', 'shrm-cp', 'shrm-scp', 'si', 'siie', 'smieee', 'sphr', 'sra', 'sscp', 'stmieee', 'tbr-ct', 'td', 'thd', 'thm', 'ud', 'usa', 'usaf', 'usar', 'uscg', 'usmc', 'usn', 'usnr', 'uxc', 'uxmc', 'vc', 'vcp', 'vd', 'vrd'}

Post-nominal acronyms. Titles, degrees and other things people stick after their name that may or may not have periods between the letters. The parser removes periods when matching against these pieces.

nameparser.config.suffixes.SUFFIX_NOT_ACRONYMS = {'2', 'dr', 'esq', 'esquire', 'i', 'ii', 'iii', 'iv', 'jnr', 'jr', 'junior', 'snr', 'sr', 'v'}

Post-nominal pieces that are not acronyms. The parser does not remove periods when matching against these pieces.

nameparser.config.prefixes.PREFIXES = {'abu', 'al', 'bin', 'bon', 'da', 'dal', 'de', "de'", 'degli', 'dei', 'del', 'dela', 'della', 'delle', 'delli', 'dello', 'der', 'di', 'do', 'dos', 'du', 'dí', 'ibn', 'la', 'le', 'mac', 'mc', 'san', 'santa', 'st', 'ste', 'van', 'vander', 'vel', 'vom', 'von'}

Name pieces that appear before a last name. Prefixes join to the piece that follows them to make one new piece. They can be chained together, e.g “von der” and “de la”. Because they only appear in middle or last names, they also signify that all following name pieces should be in the same name part, for example, “von” will be joined to all following pieces that are not prefixes or suffixes, allowing recognition of double last names when they appear after a prefixes. So in “pennie von bergen wessels MD”, “von” will join with all following name pieces until the suffix “MD”, resulting in the correct parsing of the last name “von bergen wessels”.

nameparser.config.conjunctions.CONJUNCTIONS = {'&', 'and', 'e', 'et', 'of', 'the', 'und', 'y'}

Pieces that should join to their neighboring pieces, e.g. “and”, “y” and “&”. “of” and “the” are also include to facilitate joining multiple titles, e.g. “President of the United States”.

nameparser.config.capitalization.CAPITALIZATION_EXCEPTIONS = (('ii', 'II'), ('iii', 'III'), ('iv', 'IV'), ('md', 'M.D.'), ('phd', 'Ph.D.'))

Any pieces that are not capitalized by capitalizing the first letter.

nameparser.config.regexes.REGEXES = {('double_quotes', re.compile('\\"(.*?)\\"')), ('emoji', re.compile('[🌀-🙏🚀-\U0001f6ff☀-⛿✀-➿]+')), ('initial', re.compile('^(\\w\\.|[A-Z])?$')), ('mac', re.compile('^(ma?c)(\\w{2,})', re.IGNORECASE)), ('no_vowels', re.compile('^[^aeyiuo]+$', re.IGNORECASE)), ('parenthesis', re.compile('\\((.*?)\\)')), ('period_not_at_end', re.compile('.*\\..+$', re.IGNORECASE)), ('phd', re.compile('\\s(ph\\.?\\s+d\\.?)', re.IGNORECASE)), ('quoted_word', re.compile("(?<!\\w)\\'([^\\s]*?)\\'(?!\\w)")), ('roman_numeral', re.compile('^(X|IX|IV|V?I{0,3})$', re.IGNORECASE)), ('spaces', re.compile('\\s+')), ('word', re.compile('(\\w|\\.)+'))}

All regular expressions used by the parser are precompiled and stored in the config.