A Comparative Analysis of Advertising in the 2020 Presidential Elections & Phoenix Mayoral Elections using Natural Language Processing

This language model examines advertisements used for political campaigns, specifically the 2023 Phoenix mayoral election. Political microtargeting looks at the tone of content throughout Instagram and Facebook ads during the 8 months leading up to the election. Through various processing techniques, both written and spoken language within ads was collected and used Facebook ad API, associated with a target demographic or group within the platform. This allows campaigns to send targeted messages to voters based on factors like location, media preference, associated political beliefs, age, and other critical variables. This data was then compared to a peer-reviewed study by the University of Amsterdam, that studied Facebook and Instagram ads 8 months before the 2020 US Presidential election. Media companies often offer paid advertisements to a targeted audience, and during highly polarized world events such as an election, uncivil negative ads can be pushed to groups to which the shown ideology appeals. (Votta, F.) The question posed by Votta in the comparative


Introduction
In the project, we find the question "Compared to the 2020 US presidential campaign, how extensively is microtargeting used for political advertisement in the city of Phoenix 2020 mayoral election?".Using the NLP model, we try to find the solution to answer this question.
Through our model, there is a number to show the accuracy from 0% to 100%.In this range, it shows how extensively microtargeting is used for political advertisement in the city of Phoenix's 2020 mayoral election.Different percentages can represent different extents.For example, 0 to 30, 30 to 60, and 60 to 100 percent can be set as voters' weak, medium, and extensive levels.
"Spreading uncivil negative campaign messages is a "high-risk, high reward" campaign strategy since certain voters are more likely to be swayed by negative messaging whereas other voters are more inclined to feel sympathy with the attack (Votta et al., 2023)".With advertising being dependent on monetary resources in most cases, it is important to understand how much money is being spent on advertising and the nature of this advertising.

Related work
In the project, the paper "Going Micro to Go Negative?Targeting Toxicity using Facebook and Instagram Ads" has been used for us.This paper introduces the basic idea of political communication, microtargeting, negative campaigning, toxicity, and ad library.And how they do work during the period of the campaign.Moreover, the paper gives the link to the Facebook Ad library API.Through this API, we can find the dataset to analyze and train our model.In the dataset, it exists several variables such as ad creation time, ad delivery start time, ad delivery stop time, byline, ad creative bodies, ad creative link titles, ad creative link captions, ad creative link descriptions, impressions, spend, demographic distribution, delivery by region, publisher platforms, and estimated audience size.Through analysis of the ad creative bodies, ad creative link titles, and creative link descriptions, we can find out how toxic messages are from the description of the Ad.

Using NLP model for analysis
NLP techniques are the best tools for answering the question.In the NLP techniques, the message needs to be processed such as grammar check, word stemming, remove stop words.Also, the most important thing is that the NLP model can use some mature algorithms such as decision trees, SVM, ANN, and RNN.It will help us build the best model to predict how extensive the effect of political microtargeting.

Identify the textual data.
Our dataset comes from the Facebook library API.Facebook uses its data from advertisements to support our question.Facebook library API provides reliable data sources from year to year, and every election campaign.Although the dataset still has some empty values or Political Microtargeting 5 outlines, we still can remove them with data clean technology.Based on these, we think the dataset can support and answer our question very well.

Process and Outcome
This project is a comparative analysis of the usage of microtargeting for political advertisement in the 2020 US presidential campaign and the 2020 City of Phoenix Mayoral Elections.We intended to recreate the model used by the original used in the study that studied political microtargeting in the 2020 US presidential campaign (Votta et al., 2023).After the model was created, we used the advertising data collected from the 2020 City of Phoenix Mayoral Elections collected from the Facebook Ad library API.

Data Source:
The Data Source for this project is the Facebook Ad library API, the same source researchers have primarily used to collect their dataset.The researchers have also used financial sponsorships different contestants receive and the different advertisers' information for data analysis.For many reasons, like city elections focusing on a small group of people or low advertising budget, the Mayoral elections in the city of Phoenix have very minimal information available related to how much money was spent on advertising.However, this narrowed down the different sources of political advertising.The data we have collected consists of all the textual information about the advertising campaigns, the timeline the sponsors have used for a certain type of advertising, impressions count, estimated audience, and demographic information.This can be used to understand which type of messaging reflects what type of audience.

Relation to NLP:
In politics, smear campaigns play a critical role.This is the basis of toxic advertising.If money can be spent on promoting a candidate, it can also be used to tarnish their reputation.
Sometimes advertising can be used to manipulate the facts to fit a narrative that creates hatred or fear among certain communities.So, it is important to understand the kind of advertising being done before an election to analyze the sentiments of voters.
The nature of advertising will change based on the demographics of the region, if we can define an objective metric to measure the toxicity, we can use this model to understand the nature of advertising being done.This requires a deeper understanding of the ways spoken language is used to veil toxic messages.In other terms, sentiment analysis of the political campaign advertisements is being performed to compare the toxicity of advertisements.
For Natural Language Models to be well-trained, one must capture the multiple layers of linguistic understanding.To develop this understanding of language, iterative training of models is required.In this case, the model must capture toxicity.To understand veiled toxic messaging, the model requires training in an iterative process and with a huge dataset.A reason for our case to not require an iterative approach is that the model has a small dataset.This will lead to the model being improperly trained and could be overfitted.

Data Cleaning
Based on our scenario, the NLP dataset comes from the Facebook Ad library API.This API allows us to use custom keywords to search for ads stored in the Ad Library.So, the data on all social issues, elections, or political ads can be searched, regardless of whether the ad is live or stopped (Fabio Votta, Arman Noroozian, Tom Dobber, Natali Helberger, Claes de Vreese, 2023).

Political Microtargeting 7
For data sensitivity, the three main things need to be considered such as data privacy, business secrets, and government data privacy.

Data privacy
If the data includes personally identifiable information, social security numbers, bank account numbers, health records, etc.When dealing with this kind of data, the relevant laws, regulations, and privacy policies need to be considered to ensure personal data security and confidentiality (Anna Rogers, Tim Baldwin, Kobi Leins, 2021).

Business secrets
Companies or other organizations may have commercially confidential data, such as business plans, customer lists, research, and development results, etc.This data is critical to the competitiveness of businesses and measures need to be taken to prevent unauthorized access, disclosure, or misuse (Anna Rogers, Tim Baldwin, Kobi Leins, 2021).

Government data
Government agencies may process sensitive data including national security, law enforcement, and intelligence.The security and confidentiality of this data are critical to the security and stability of the nation (Anna Rogers, Tim Baldwin, Kobi Leins, 2021).

Data classification and labeling
Classify and label sensitive data and clarify its sensitivity level and access rights so that it can be identified and managed in subsequent processing and storage.

Access Control
Restrict access to sensitive data to only authorized personnel.This can be achieved through means such as authentication, authorization, and rights management to ensure that only authorized users can access and process sensitive data (Gitanjali Gupta & Kamlesh Lakhwani, 2022).

Data encryption
Encrypts sensitive data to protect the security of data during transmission and storage.
Methods such as symmetric encryption or asymmetric encryption can be used to ensure that data cannot be encrypted (Gitanjali Gupta & Kamlesh Lakhwani, 2022).

Data backup and recovery
Regularly back up sensitive data and establish a reliable data recovery mechanism to prevent data loss or damage and ensure data reliability and integrity (Gitanjali Gupta & Kamlesh Lakhwani, 2022).

Process and Methodology
To clean the dataset, several things need to be considered.First, remove the irrelevant columns (Data reduction) from our dataset such as ID, currency type, and languages.Second, remove the rows that include null values.Third, some texts need to be transferred to numbers.Data Preparation is the process of transforming data from its raw format to the input format required by the data mining libraries.Data quality issues that might exist within the data such as noise or missing values (Asmaa Elbadrawy, 2023).
If the null values are negligible or do not significantly impact the overall dataset, they can be simply removed from the dataset (Asmaa Elbadrawy, 2023).Irrelevant columns are those that Political Microtargeting 9 do not contain meaningful information for the intended analysis or purpose of the dataset.
Removing irrelevant columns can help reduce the size and complexity of the dataset, simplify data analysis, and improve data processing efficiency (Asmaa Elbadrawy, 2023).Noise and outlier can be detected by using some techniques such as machine learning algorithms that can identify data points that fall outside of expected ranges, have extreme values, or exhibit abnormal patterns (Asmaa Elbadrawy, 2023).
For the data cleaning tools, the anaconda navigator was used for data cleaning tasks.In the Jupyter Notebook, the dataset can be cleaned with Python code and command.

Upload the dataset:
The Stoplist: For improving the efficiency of the NLP process, a stoplist needs to be used to remove some words from our dataset.These words are not helpful, at the same time, they take a lot of CPU and memory resources.As a result, we will create a stoplist for our dataset.For example, "a"、 "an"、"the"、"in"、"is"、"of"、"and" will be put into the stoplist.When we do this, the accuracy of searching will be improved efficiently.As a result, it also can reduce the noise and extract more helpful text.When we need to find the stoplist, appendix 2 can be used for our data cleaning.It is a very general stoplist.Based on our scenario, we still can add more stop words depending on the growth of our scenario (Hobson Lane, Cole Howard, Hannes Hapke, 2019).

Unique Jargon
Unique Jargon is a challenge for NLP, due to NLP dealing with general language and grammar is much better.Some unique Jargon cannot be marked as general terms.Hence, we need to find some solutions to deal with it.To solve these problems, Term Extraction, Term Dictionary, Domain-specific Word embedding, Custom Rules, and Manual Annotation can be used for unique jargon in NLP (Hobson Lane, Cole Howard, Hannes Hapke, 2019).

Acronyms
Acronyms pose a significant trouble for NLP.To solve these problems, we can use Acronym Expansion, Contextual Disambiguation, Word embedding, and Manual Annotation.
When these strategies have been used.It will help the machine to better understand the meaning of words(Hobson Lane, Cole Howard, Hannes Hapke, 2019).

Any other "noise" that might prove problematic for your process.
The stop words such as conjunctions, prepositions, and pronouns can be seen as noise.Also, spelling errors or grammar errors can also be seen as noise.

Spelling errors
The text may contain spelling errors, which may cause the model to misidentify and interpret words.Processing methods can include spelling correction, such as using automatic spell checkers, error correction models, or rule-based spelling correction methods, to fix spelling errors in the text (Hobson Lane, Cole Howard, Hannes Hapke, 2019).

Grammatical errors
The text may contain grammatical errors, such as wrong grammatical structures, wrong word forms, etc. Processing methods can include the use of syntax analysis tools or rule-based syntax error correction methods to identify and fix grammatical errors in the text (Hobson Lane, Cole Howard, Hannes Hapke, 2019).
In NLP tasks, noise can negatively affect a model, as it can introduce noise that makes it difficult for the model to understand and process text correctly.For example, in a text classification task, noise may include missing words, wrong punctuation, or other text formatting errors, which can cause the model to make incorrect predictions (Hobson Lane, Cole Howard, Hannes Hapke, 2019).To tackle these issues some solutions such as Data Cleaning, Text preprocessing, and Error Correction can be employed.

Data cleaning
By cleaning the text data, unnecessary characters, punctuation marks, HTML tags, special characters, etc. are removed, hence, reducing the noise in the text data (Hobson Lane, Cole Howard, Hannes Hapke, 2019).

Recurrent Neural Network (RNN)
Recurrent Neural Network (RNN) is a very useful model in Natural Language Processing (NLP).RNN is a type of neural network capable of processing sequence data, so it is very useful when processing text (Robert DiPietro, Gregory D. Hager, 2020).
In NLP, RNNs are commonly used for tasks such as text classification, sentiment analysis, language modeling, and machine translation.By using a recurrent structure, RNN can take the output of the previous time step as the input of the current time step, Hence, capturing the time dependence of the text sequence.The focus of RNN is memory, which can retain previous information internally and use this information in subsequent steps (Robert DiPietro, Gregory D.

Hager, 2020).
When processing text, RNNs typically use word embeddings to represent words.Word embedding is a technology that maps words to a low-dimensional vector space, which can encode the semantic information of words into a vector form so that RNN can better understand the meaning of text (Robert DiPietro, Gregory D. Hager, 2020).
Another important RNN variant is the Long Short-Term Memory (LSTM).LSTM solves the vanishing gradient problem of traditional RNNs when processing long sequences by using a gating mechanism to control the flow of information.LSTM is very useful in NLP because it can handle long text sequences and has excellent performance in tasks such as language modeling and machine translation (Robert DiPietro, Gregory D. Hager, 2020).For RNN, the result of accuracy is 0.093 and valid is 0.364.
We use the dataset already to transform it into numbers.The result is not ideal.This parameter still needs tuning, and the dataset needs to be enlarged for accuracy.In the code, we use TensorFlow and Keras as our libraries.All of them provide a powerful function to us.Also, in our RNN, we use four layers as the input layer, 1st layer, 2nd layer, and the output layer to calculate.Different active functions are used in the model such as sigmoid and relu._, valid_acc = model.evaluate(X_test, y_test, verbose=0) print ('Train: %.3f,Valid: %.3f' % (train_acc,valid_acc)) For training the SVM model, Anaconda has been used for SVM.In the notebook, we can use python3 and Sklearn to fit and train our model.For the RNN, we have used the powerful Pycharm as IDE, TensorFlow, and Keras as our library.They can easily train the neural network model with activation functions.

Data Analysis
Due to the product in the analysis phase being abstract, users cannot see what the product looks like.In the analysis phase of the product, we can provide a graphical representation of the statistics and the results achieved from the modeling.This representation of the data, using toxicity vs audience reach and ad toxicity comparison can help the audience understand the role of toxic advertising in political campaigns.We have a very small data set which prevents us from getting accurate results or even training the model to have good accuracy.
Our model is successful when the whole process, data acquisition to generate the model and results, is completed and when a deeper understanding of the toxicity of political campaigns is achieved.This can be achieved when the model returns a prediction or classifies a given set of features related to an advertising campaign as toxic on the given scale.
"We find SVM is better than RNN".RNN should be better than SVM when it comes to the entire model.In the beginning, we thought RNN was a better algorithm than SVM.However, we found that SVM is much better than RNN when we finished our model.The reasons for this conclusion are the size of data points and a powerful GPU is not involved.In theory, The RNNs is a good neural network model.To achieve that a bigger data set and equipment will be necessary.

Conclusion
In conclusion, our study examined the extent to which microtargeting was used in political advertisements during the 2023 Phoenix mayoral election, compared to the 2020 US presidential campaign.Our results suggest that social media advertising presents a valuable resource in targeting specific groups of individuals who are most susceptible to guided advertisements.This finding highlights the importance of understanding the nature of political advertising and the strategies used to target voters.As political campaigns become more dependent on monetary resources, understanding how much money is being spent on advertising and the nature of this advertising becomes increasingly important.Ultimately, our findings provide insight into how microtargeting may affect voter behavior in city elections and underscore the need for continued research in this area.
Based on the analysis of political advertisements used in the 2023 Phoenix mayoral election, it can be concluded that microtargeting is extensively used for political advertisement in city elections.Social media advertising presents a valuable resource in targeting specific groups of individuals who are most susceptible to guided advertisement.
While traditional media remains an important part of electoral voting, social media advertising allows political campaigns to send targeted messages to voters based on factors like location, media preference, associated political beliefs, age, and other critical variables.
For the project, the model of our training can be re-used for any project at any project, process, and solution.There are two ways to enhance products they want to use in the future.
First, the model can be published as an API.The users can upload their data points in their area to judge how toxic Ads affect voters.When they upload the data points based on their region, they can get the accuracy of the model, which is an index to indicate the level of toxic Ads.This model can also be used to understand the sentiments of an area.
Second, if considering data privacy, the users can copy the code into their project to generate their own model to fit their business predictions.In this way, it prevents the data from being exposed to untrustful transfer through the network.This way this model can enable the creation of inclusive and friendly advertising.
The techniques of NLP are very powerful when we deal with this situation.For measuring the toxic Ads, word embracing and removing the stop words can improve the efficiency when we train the model.The Ads do not have grammar errors, we can escape this process in our code.But if other users use the dataset that doesn't come from the Facebook Ads API, they still need to think about checking the grammar errors.Also, we just use SVM and RNNs to train the dataset as an Political Microtargeting 32 example.In the future, the decision tree, ANN, or other algorithms can still try to train the model.
It might be better than the current model.
When working with data from a remote rural area where the use of language varies from the use of language based on which the model is trained, the model might not be as successful.
This opens the door to understanding language in a new light.This can potentially require the model to be retrained if there is minimal linguistic similarity.