gender_analysis.text package¶
common module¶
-
exception
gender_analysis.text.common.MissingMetadataError(metadata_fields, message='')¶ Bases:
ExceptionRaised when a function that assumes certain metadata is called on a corpus without that metadata
-
gender_analysis.text.common.convert_text_file_to_new_encoding(source_path, target_path, target_encoding)¶ Converts a text file in source_path to the specified encoding in target_encoding
Note: Currently only supports encodings utf-8, ascii and iso-8859-1
Parameters: - source_path – str or Path
- target_path – str or Path
- target_encoding – str
Returns: None
>>> from gender_analysis.common import BASE_PATH >>> sample_text = ' ¶¶¶¶ here is a test file' >>> source_path = Path(BASE_PATH, 'source_file.txt') >>> target_path = Path(BASE_PATH, 'target_file.txt') >>> with open(source_path, 'w', encoding='iso-8859-1') as source: ... _ = source.write(sample_text) >>> get_text_file_encoding(source_path) 'ISO-8859-1' >>> convert_text_file_to_new_encoding(source_path, target_path, target_encoding='utf-8') >>> get_text_file_encoding(target_path) 'utf-8' >>> import os >>> os.remove(source_path) >>> os.remove(target_path)
-
gender_analysis.text.common.create_path_object_and_directories(output_dir, filename=None)¶ Creates a path object for the file with the absolute output_dir and with the given filename (if provided). It will create the path to the output_dir if it is non-existent
-
gender_analysis.text.common.download_nltk_package_if_not_present(package_name)¶ Checks to see whether the user already has a given nltk package, and if not, prompts the user whether to download it.
We download all necessary packages at install time, but this is just in case the user has deleted them.
Parameters: package_name – name of the nltk package Returns:
-
gender_analysis.text.common.get_text_file_encoding(filepath)¶ Returns the text encoding as a string for a txt file at the given filepath.
Parameters: filepath – str or Path object Returns: Name of encoding scheme as a string >>> from gender_analysis.testing.common import TEST_DATA_DIR >>> from pathlib import Path >>> import os >>> path=Path(TEST_DATA_DIR, 'sample_novels', 'texts', 'hawthorne_scarlet.txt') >>> get_text_file_encoding(path) 'ascii'
Note: For files containing only ascii characters, this function will return ‘ascii’ even if the file was encoded with utf-8
>>> import os >>> from pathlib import Path >>> from gender_analysis.common import BASE_PATH >>> text = 'here is an ascii text' >>> file_path = Path(BASE_PATH, 'example_file.txt') >>> with open(file_path, 'w', encoding='utf-8') as source: ... _ = source.write(text) ... source.close() >>> get_text_file_encoding(file_path) 'ascii' >>> os.remove(file_path)
-
gender_analysis.text.common.load_csv_to_list(file_path)¶ Loads a csv file from the given filepath and returns its contents as a list of strings.
Parameters: file_path – str or Path object Returns: a list of strings >>> from pathlib import Path >>> from gender_analysis.testing.common import LARGE_TEST_CORPUS_CSV >>> corpus_metadata_path = LARGE_TEST_CORPUS_CSV >>> corpus_metadata = load_csv_to_list(corpus_metadata_path) >>> type(corpus_metadata) <class 'list'>
-
gender_analysis.text.common.load_graph_settings(show_grid_lines=True)¶ Sets the seaborn graph settings to the defaults for the project. Defaults to displaying gridlines. To remove gridlines, call with False.
Parameters: show_grid_lines – Boolean; Determines whether to show gridlines in graphs. Returns: None
-
gender_analysis.text.common.load_pickle(filepath)¶ Loads the pickle stored at a given filepath, and returns the Python object that was stored.
Parameters: filepath – str or Path object Returns: Previously-pickled object >>> from gender_analysis.common import BASE_PATH >>> from pathlib import Path >>> from gender_analysis.text.common import load_pickle >>> pickle_filepath = Path(BASE_PATH, 'testing', 'test_data','test_pickle.pgz') >>> loaded_object = load_pickle(pickle_filepath) >>> loaded_object {'a': 4, 'b': 5, 'c': [1, 2, 3]}
-
gender_analysis.text.common.load_txt_to_string(file_path)¶ Loads a txt file and returns a str representation of it.
Parameters: file_path – str or Path object Returns: The file’s text as a string >>> from pathlib import Path >>> from gender_analysis.testing.common import TEST_DATA_DIR >>> novel_path = Path(TEST_DATA_DIR, 'sample_novels', 'texts', 'austen_persuasion.txt') >>> novel_text = load_txt_to_string(novel_path) >>> type(novel_text), len(novel_text) (<class 'str'>, 466887)
-
gender_analysis.text.common.store_pickle(obj, filepath)¶ Store a compressed “pickle” of the object in the “pickle_data” directory and return the full path to it.
Parameters: - obj – Any Python object that can be pickled
- filepath – str or Path object
Returns: Path object
Example in lieu of Doctest to avoid writing out a file.
my_object = {‘a’: 4, ‘b’: 5, ‘c’: [1, 2, 3]} gender_analysis.common.store_pickle(my_object, ‘path_to_pickle/example_pickle.pgz’)
corpus module¶
-
class
gender_analysis.text.corpus.Corpus(path_to_files, name=None, csv_path=None, pickle_on_load=None, ignore_warnings=False)¶ Bases:
objectThe corpus class is used to load the metadata and full texts of all documents in a corpus
Once loaded, each corpus contains a list of Document objects
Parameters: - path_to_files – Must be either the path to a directory of txt files or an already-pickled corpus
- name – Optional name of the corpus, for ease of use and readability
- csv_path – Optional path to a csv metadata file
- pickle_on_load – Filepath to save a pickled copy of the corpus
>>> from gender_analysis import Corpus >>> from gender_analysis.testing.common import TEST_DATA_DIR >>> path = TEST_DATA_DIR / 'sample_novels' / 'texts' >>> c = Corpus(path) >>> type(c.documents), len(c) (<class 'list'>, 99)
-
clone()¶ Return a copy of the Corpus object
Returns: Corpus object >>> from gender_analysis import Corpus >>> from gender_analysis.testing.common import TEST_CORPUS_PATH >>> path = TEST_CORPUS_PATH >>> sample_corpus = Corpus(path) >>> corpus_copy = sample_corpus.clone() >>> len(corpus_copy) == len(sample_corpus) True
This function returns the number of authors in the corpus with the specified gender.
NOTE: there must be an ‘author_gender’ field in the metadata of all documents.
Parameters: gender – gender identifier to search for in the metadata (i.e. ‘female’, ‘male’, etc.) Returns: Number of authors of the given gender >>> from gender_analysis import Corpus >>> from gender_analysis.testing.common import ( ... TEST_CORPUS_PATH as path, ... SMALL_TEST_CORPUS_CSV as path_to_csv ... ) >>> c = Corpus(path, csv_path=path_to_csv, ignore_warnings = True) >>> c.count_authors_by_gender('female') 7
-
filter_by_gender(gender)¶ Return a new Corpus object that contains documents only with authors whose gender matches the given parameter.
Parameters: gender – gender identifier (i.e. ‘male’, ‘female’, ‘unknown’, etc.) Returns: Corpus object >>> from gender_analysis import Corpus >>> from gender_analysis.testing.common import ( ... TEST_CORPUS_PATH as path, ... LARGE_TEST_CORPUS_CSV as path_to_csv ... ) >>> c = Corpus(path, csv_path=path_to_csv) >>> female_corpus = c.filter_by_gender('female') >>> len(female_corpus) 39 >>> female_corpus.documents[0].title 'The Indiscreet Letter'
>>> male_corpus = c.filter_by_gender('male') >>> len(male_corpus) 59
>>> male_corpus.documents[0].title 'Lisbeth Longfrock'
-
get_document(metadata_field, field_val)¶ Returns a specific Document object from self.documents that has metadata matching field_val for metadata_field.
This function will only return the first document in self.documents. It should only be used if you’re certain there is only one match in the Corpus or if you’re not picky about which Document you get. If you want more selectivity use get_document_multiple_fields, or if you want multiple documents, use subcorpus.
Parameters: - metadata_field – metadata field to search
- field_val – search term
Returns: Document Object
>>> from gender_analysis import Corpus >>> from gender_analysis.text.common import MissingMetadataError >>> from gender_analysis.testing.common import ( ... TEST_CORPUS_PATH as path, ... LARGE_TEST_CORPUS_CSV as path_to_csv ... )
>>> c = Corpus(path, csv_path=path_to_csv) >>> c.get_document("author", "Dickens, Charles") <Document (dickens_twocities)> >>> c.get_document("date", '1857') <Document (bronte_professor)> >>> try: ... c.get_document("meme_quality", "over 9000") ... except MissingMetadataError as exception: ... print(exception) This Corpus is missing the following metadata field: meme_quality In order to run this function, you must create a new metadata csv with this field and run Corpus.update_metadata().
-
get_document_multiple_fields(metadata_dict)¶ Returns a specific Document object from the corpus that has metadata matching a given metadata dict.
This method will only return the first document in the corpus. It should only be used if you’re certain there is only one match in the Corpus or if you’re not picky about which Document you get.
If you want multiple documents, use subcorpus.
Parameters: metadata_dict – Dictionary with metadata fields as keys and search terms as values Returns: Document object >>> from gender_analysis import Corpus >>> from gender_analysis.testing.common import ( ... TEST_CORPUS_PATH as path, ... LARGE_TEST_CORPUS_CSV as path_to_csv ... ) >>> c = Corpus(path, csv_path=path_to_csv) >>> c.get_document_multiple_fields({"author": "Dickens, Charles", "author_gender": "male"}) <Document (dickens_twocities)> >>> c.get_document_multiple_fields({"author": "Chopin, Kate", "title": "The Awakening"}) <Document (chopin_awakening)>
-
get_field_vals(field)¶ This function returns a sorted list of the values present in the corpus for a given metadata field.
Parameters: field – field to search for (i.e. ‘location’, ‘author_gender’, etc.) Returns: list of strings >>> from gender_analysis import Corpus >>> from gender_analysis.testing.common import ( ... TEST_CORPUS_PATH as path, ... LARGE_TEST_CORPUS_CSV as path_to_csv ... ) >>> c = Corpus(path, name='sample_novels', csv_path=path_to_csv) >>> c.get_field_vals('author_gender') ['both', 'female', 'male']
-
get_sample_text_passages(expression, no_passages)¶ Returns a specified number of example passages that include a certain expression.
The number of passages that you request is a maximum number, and this function may return fewer if there are limited cases of a passage in the corpus.
Parameters: - expression – expression to search for
- no_passages – number of passages to return
Returns: List of passages as strings
>>> from gender_analysis import Corpus >>> from gender_analysis.testing.common import ( ... TEST_CORPUS_PATH as path, ... LARGE_TEST_CORPUS_CSV as path_to_csv ... ) >>> corpus = Corpus(path, csv_path=path_to_csv, ignore_warnings=True) >>> results = corpus.get_sample_text_passages('he cried', 2) >>> 'he cried' in results[0][1] True >>> 'he cried' in results[1][1] True
-
multi_filter(characteristic_dict)¶ Returns a copy of the corpus, but with only the documents that fulfill the metadata parameters passed in by characteristic_dict. Multiple metadata keys can be searched at one time, provided that the metadata is available for the documents in the corpus.
Parameters: characteristic_dict – dict with metadata fields as keys and search terms as values Returns: Corpus object >>> from gender_analysis import Corpus >>> from gender_analysis.testing.common import ( ... TEST_CORPUS_PATH as path, ... LARGE_TEST_CORPUS_CSV as path_to_csv ... ) >>> c = Corpus(path, csv_path=path_to_csv) >>> corpus_filter = {'author_gender': 'male'} >>> len(c.multi_filter(corpus_filter)) 59
>>> corpus_filter['filename'] = 'aanrud_longfrock.txt' >>> len(c.multi_filter(corpus_filter)) 1
-
subcorpus(metadata_field, field_value)¶ Returns a new Corpus object that contains only documents with a given field_value for metadata_field
Parameters: - metadata_field – metadata field to search
- field_value – search term
Returns: Corpus object
>>> from gender_analysis import Corpus >>> from gender_analysis.testing.common import ( ... TEST_CORPUS_PATH as path, ... LARGE_TEST_CORPUS_CSV as path_to_csv ... ) >>> corp = Corpus(path, csv_path=path_to_csv) >>> female_corpus = corp.subcorpus('author_gender','female') >>> len(female_corpus) 39 >>> female_corpus.documents[0].title 'The Indiscreet Letter'
>>> male_corpus = corp.subcorpus('author_gender','male') >>> len(male_corpus) 59 >>> male_corpus.documents[0].title 'Lisbeth Longfrock'
>>> eighteen_fifty_corpus = corp.subcorpus('date','1850') >>> len(eighteen_fifty_corpus) 1 >>> eighteen_fifty_corpus.documents[0].title 'The Scarlet Letter'
>>> jane_austen_corpus = corp.subcorpus('author','Austen, Jane') >>> len(jane_austen_corpus) 2 >>> jane_austen_corpus.documents[0].title 'Emma'
>>> england_corpus = corp.subcorpus('country_publication','England') >>> len(england_corpus) 51 >>> england_corpus.documents[0].title 'Flatland'
-
update_metadata(new_metadata_path)¶ Takes a filepath to a csv with new metadata and updates the metadata in the corpus’ documents accordingly. The new file does not need to contain every metadata field in the documents - only the fields that you wish to update.
NOTE: The csv file must include at least a filename for the documents that will be altered.
Parameters: new_metadata_path – Path to new metadata csv file Returns: None
document module¶
-
class
gender_analysis.text.document.Document(metadata_dict)¶ Bases:
objectThe Document class loads and holds the full text and metadata (author, title, publication date, etc.) of a document
Parameters: metadata_dict – Dictionary with metadata fields as keys and data as values >>> from gender_analysis import Document >>> from pathlib import Path >>> from gender_analysis.testing.common import TEST_DATA_DIR >>> document_metadata = {'author': 'Austen, Jane', 'title': 'Persuasion', 'date': '1818', ... 'filename': 'austen_persuasion.txt', ... 'filepath': Path(TEST_DATA_DIR, ... 'sample_novels', 'texts', 'austen_persuasion.txt')} >>> austen = Document(document_metadata) >>> type(austen.text) <class 'str'> >>> len(austen.text) 466887
-
find_quoted_text()¶ Finds all of the quoted statements in the document text.
Returns: List of strings enclosed in double-quotations >>> from gender_analysis import Document >>> from pathlib import Path >>> from gender_analysis.testing.common import TEST_DATA_DIR >>> document_metadata = {'author': 'Austen, Jane', 'title': 'Persuasion', ... 'date': '1818', 'filename': 'test_text_0.txt', ... 'filepath': Path(TEST_DATA_DIR, ... 'document_test_files', 'test_text_0.txt')} >>> document_novel = Document(document_metadata) >>> document_novel.find_quoted_text() ['"This is a quote"', '"This is my quote"']
-
get_count_of_word(word)¶ Returns the number of instances of a word in the text. Not case-sensitive.
If this is your first time running this method, this can be slow.
Parameters: word – word to be counted in text Returns: Number of occurences of the word, as an int >>> from gender_analysis import Document >>> from pathlib import Path >>> from gender_analysis.testing.common import TEST_DATA_DIR >>> document_metadata = {'author': 'Hawthorne, Nathaniel', 'title': 'Scarlet Letter', ... 'date': '2018', 'filename': 'test_text_2.txt', ... 'filepath': Path(TEST_DATA_DIR, ... 'document_test_files', 'test_text_2.txt')} >>> scarlett = Document(document_metadata) >>> scarlett.get_count_of_word("sad") 4 >>> scarlett.get_count_of_word('ThisWordIsNotInTheWordCounts') 0
-
get_count_of_words(words)¶ A helper method for retrieving the number of occurrences of a given set of words within a Document.
Parameters: words – a list of strings. Returns: a Counter with each word in words keyed to its number of occurrences. >>> from gender_analysis.text.document import Document >>> from pathlib import Path >>> from gender_analysis.testing.common import TEST_DATA_DIR >>> document_filepath = Path(TEST_DATA_DIR, 'document_test_files', 'test_text_9.txt') >>> document_metadata = {'filename': 'test_text_2.txt', 'filepath': document_filepath} >>> test_document = Document(document_metadata) >>> test_document.get_count_of_words(['sad', 'was', 'sadness', 'very']) Counter({'was': 5, 'sad': 1, 'very': 1, 'sadness': 0})
Returns the part of speech tags as a list of tuples. The first part of each tuple is the term, the second one the part of speech tag.
Note: the same word can have a different part of speech tags. In the example below, see “refuse” and “permit”.
Returns: List of tuples (term, speech_tag) >>> from gender_analysis import Document >>> from pathlib import Path >>> from gender_analysis.testing.common import TEST_DATA_DIR >>> document_metadata = {'author': 'Hawthorne, Nathaniel', 'title': 'Scarlet Letter', ... 'date': '1900', 'filename': 'test_text_13.txt', ... 'filepath': Path(TEST_DATA_DIR, ... 'document_test_files', 'test_text_13.txt')} >>> document = Document(document_metadata) >>> document.get_part_of_speech_tags()[:4] [('They', 'PRP'), ('refuse', 'VBP'), ('to', 'TO'), ('permit', 'VB')] >>> document.get_part_of_speech_tags()[-4:] [('the', 'DT'), ('refuse', 'NN'), ('permit', 'NN'), ('.', '.')]
-
get_part_of_speech_words(words, remove_swords=True)¶ A helper method for retrieving the number of occurrences of input words keyed to their NLTK tag values (i.e., ‘NN’ for noun).
Parameters: - words – a list of strings.
- remove_swords – optional boolean, remove stop words from return.
Returns: a dictionary keying NLTK tag strings to Counter instances.
>>> from gender_analysis.text.document import Document >>> from pathlib import Path >>> from gender_analysis.testing.common import TEST_DATA_DIR >>> document_filepath = Path(TEST_DATA_DIR, 'document_test_files', 'test_text_9.txt') >>> document_metadata = {'filename': 'test_text_2.txt', 'filepath': document_filepath} >>> test_document = Document(document_metadata) >>> test_document.get_part_of_speech_words(['peace', 'died', 'beautiful', 'foobar']) {'JJ': Counter({'beautiful': 3}), 'VBD': Counter({'died': 1}), 'NN': Counter({'peace': 1})}
-
get_tokenized_text()¶ Tokenizes the text and returns it as a list of tokens, while removing all punctuation.
Note: This does not currently properly handle dashes or contractions.
Returns: List of each word in the Document >>> from gender_analysis import Document >>> from pathlib import Path >>> from gender_analysis.testing.common import TEST_DATA_DIR >>> document_metadata = {'author': 'Austen, Jane', 'title': 'Persuasion', 'date': '1818', ... 'filename': 'test_text_1.txt', ... 'filepath': Path(TEST_DATA_DIR, ... 'document_test_files', 'test_text_1.txt')} >>> austin = Document(document_metadata) >>> tokenized_text = austin.get_tokenized_text() >>> tokenized_text ['allkinds', 'of', 'punctuation', 'and', 'special', 'chars']
-
get_word_freq(word)¶ Returns the frequency of appearance of a word in the document
Parameters: word – str to search for in document Returns: float representing the portion of words in the text that are the parameter word >>> from gender_analysis import Document >>> from pathlib import Path >>> from gender_analysis.testing.common import TEST_DATA_DIR >>> document_metadata = {'author': 'Hawthorne, Nathaniel', 'title': 'Scarlet Letter', ... 'date': '1900', 'filename': 'test_text_2.txt', ... 'filepath': Path(TEST_DATA_DIR, ... 'document_test_files', 'test_text_2.txt')} >>> scarlett = Document(document_metadata) >>> frequency = scarlett.get_word_freq('sad') >>> frequency 0.13333333333333333
-
get_word_frequencies(words)¶ A helper method for retreiving the frequencies of a given set of words within a Document.
Parameters: words – a list of strings. Returns: a dictionary of words keyed to float frequencies. >>> from gender_analysis.text.document import Document >>> from pathlib import Path >>> from gender_analysis.testing.common import TEST_DATA_DIR >>> document_filepath = Path(TEST_DATA_DIR, 'document_test_files', 'test_text_9.txt') >>> document_metadata = {'filename': 'test_text_2.txt', 'filepath': document_filepath} >>> test_document = Document(document_metadata) >>> test_document.get_word_frequencies(['peace', 'died', 'foobar']) {'peace': 0.02702702702702703, 'died': 0.02702702702702703, 'foobar': 0.0}
-
get_word_windows(search_terms, window_size=2)¶ Finds all instances of word and returns a counter of the words around it. window_size is the number of words before and after to return, so the total window is 2*window_size + 1.
This is not case sensitive.
Parameters: - search_terms – String or list of strings to search for
- window_size – integer representing number of words to search for in either direction
Returns: Python Counter object
>>> from gender_analysis import Document >>> from pathlib import Path >>> from gender_analysis.testing.common import TEST_DATA_DIR >>> document_metadata = {'author': 'Hawthorne, Nathaniel', 'title': 'Scarlet Letter', ... 'date': '2018', 'filename': 'test_text_12.txt', ... 'filepath': Path(TEST_DATA_DIR, ... 'document_test_files', 'test_text_12.txt')} >>> scarlett = Document(document_metadata)
search_terms can be either a string…
>>> scarlett.get_word_windows("his", window_size=2) Counter({'he': 1, 'lit': 1, 'cigarette': 1, 'and': 1, 'then': 1, 'began': 1, 'speech': 1, 'which': 1})
… or a list of strings.
>>> scarlett.get_word_windows(['purse', 'tears']) Counter({'her': 2, 'of': 1, 'and': 1, 'handed': 1, 'proposal': 1, 'drowned': 1, 'the': 1})
-
get_wordcount_counter()¶ Returns a counter object of all of the words in the text.
If this is your first time running this method, this can be slow.
Returns: Python Counter object >>> from gender_analysis import Document >>> from pathlib import Path >>> from gender_analysis.testing.common import TEST_DATA_DIR >>> document_metadata = {'author': 'Hawthorne, Nathaniel', 'title': 'Scarlet Letter', ... 'date': '2018', 'filename': 'test_text_10.txt', ... 'filepath': Path(TEST_DATA_DIR, ... 'document_test_files', 'test_text_10.txt')} >>> scarlett = Document(document_metadata) >>> scarlett.get_wordcount_counter() Counter({'was': 2, 'convicted': 2, 'hester': 1, 'of': 1, 'adultery': 1})
-
update_metadata(new_metadata)¶ Updates the metadata of the document without requiring a complete reloading of the text and other properties.
‘filename’ cannot be updated with this method.
Parameters: new_metadata – dict of new metadata to apply to the document Returns: None This can be used to correct mistakes in the metadata:
>>> from gender_analysis import Document >>> from gender_analysis.testing.common import TEST_CORPUS_PATH >>> from pathlib import Path >>> metadata = {'filename': 'aanrud_longfrock.txt', ... 'filepath': Path(TEST_CORPUS_PATH, 'aanrud_longfrock.txt'), ... 'date': '2098'} >>> d = Document(metadata) >>> new_metadata = {'date': '1903'} >>> d.update_metadata(new_metadata) >>> d.date 1903
Or it can be used to add completely new attributes:
>>> new_attribute = {'cookies': 'chocolate chip'} >>> d.update_metadata(new_attribute) >>> d.cookies 'chocolate chip'
-
word_count¶ Lazy-loading for Document.word_count attribute. Returns the number of words in the document. The word_count attribute is useful for the get_word_freq function. However, it is performance-wise costly, so it’s only loaded when it’s actually required.
Returns: Number of words in the document’s text as an int >>> from gender_analysis import Document >>> from pathlib import Path >>> from gender_analysis.testing.common import TEST_DATA_DIR >>> document_metadata = {'author': 'Austen, Jane', 'title': 'Persuasion', 'date': '1818', ... 'filename': 'austen_persuasion.txt', ... 'filepath': Path(TEST_DATA_DIR, 'sample_novels', ... 'texts', 'austen_persuasion.txt')} >>> austen = Document(document_metadata) >>> austen.word_count 83285
-
words_associated(target_word)¶ Returns a Counter of the words found after a given word.
In the case of double/repeated words, the counter would include the word itself and the next new word.
Note: words always return lowercase.
Parameters: word – Single word to search for in the document’s text Returns: a Python Counter() object with {associated_word: occurrences} >>> from gender_analysis import Document >>> from pathlib import Path >>> from gender_analysis.testing.common import TEST_DATA_DIR >>> document_metadata = {'author': 'Hawthorne, Nathaniel', 'title': 'Scarlet Letter', ... 'date': '2018', 'filename': 'test_text_11.txt', ... 'filepath': Path(TEST_DATA_DIR, ... 'document_test_files', 'test_text_11.txt')} >>> scarlett = Document(document_metadata) >>> scarlett.words_associated("his") Counter({'cigarette': 1, 'speech': 1})
-
character module¶
-
class
gender_analysis.text.character.Character(name, gender=None, mentions=None)¶ Bases:
objectDefines a character that will be operated on in analysis functions
-
get_char_gender()¶ Get the gender for the character based on: 1. If user entry exists, fetch entered gender 2. If not, infer Character’s gender based on coreference resolution and pronouns Currently, this function only retrieves user entered gender for the character objects :return: a gender object >>> from gender_analysis.text.character import Character >>> from gender_analysis.gender.common import FEMALE >>> emma_name = ‘Emma’ >>> emma_gender = FEMALE >>> emma_mentions = [“Emma Woodhouse”, “Emma”, “Miss Woodhouse”] >>> emma = Character(emma_name, emma_gender, emma_mentions) >>> emma.get_char_gender() <Female>
-