Text Sentiment Analyzer
1. Introduction
The Text Sentiment Analyzer is a project that leverages Natural Language Processing (NLP) techniques to classify text reviews or comments as positive, negative, or neutral. This project is widely used in customer feedback analysis, social media monitoring, and product reviews to gauge user sentiment.
2. Prerequisites
• Python: Install Python 3.x from the official Python
website.
• Required Libraries:
- nltk: Install using pip install nltk
- scikit-learn: Install using pip
install scikit-learn
- pandas: Install using pip install
pandas
- numpy: Install using pip install
numpy
• Dataset: A labeled dataset of text samples with corresponding sentiment
(e.g., positive, negative).
3. Project Setup
1. Create a Project Directory:
- Name your project folder, e.g., `Text_Sentiment_Analyzer`.
- Inside this folder, create the Python script file (`sentiment_analyzer.py`).
2. Install Required Libraries:
Ensure NLTK, Scikit-learn, Pandas, and other dependencies are installed using `pip`.
4. Writing the Code
Below is an example code snippet for the Text Sentiment Analyzer:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, classification_report
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
import nltk
# Download NLTK data
nltk.download('punkt')
nltk.download('stopwords')
# Load dataset
data = pd.read_csv('sentiment_data.csv')
# Dataset with 'text' and 'sentiment' columns
# Preprocess text
stop_words = set(stopwords.words('english'))
def preprocess_text(text):
words = word_tokenize(text.lower())
filtered_words = [word for word in
words if word.isalnum() and word not in stop_words]
return "
".join(filtered_words)
data['text'] = data['text'].apply(preprocess_text)
# Split data
X_train, X_test, y_train, y_test = train_test_split(data['text'],
data['sentiment'], test_size=0.2, random_state=42)
# Convert text to feature vectors
vectorizer = CountVectorizer()
X_train_vectors = vectorizer.fit_transform(X_train)
X_test_vectors = vectorizer.transform(X_test)
# Train Naive Bayes classifier
model = MultinomialNB()
model.fit(X_train_vectors, y_train)
# Evaluate model
y_pred = model.predict(X_test_vectors)
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Classification Report:
", classification_report(y_test, y_pred))
# Predict sentiment for new text
def predict_sentiment(text):
processed_text =
preprocess_text(text)
vector =
vectorizer.transform([processed_text])
return model.predict(vector)[0]
new_text = "I love this product, it works perfectly!"
print(f"Sentiment for '{new_text}':", predict_sentiment(new_text))
5. Key Components
• Text Preprocessing: Cleans the input text by removing
stopwords and punctuation.
• Feature Extraction: Converts text into numerical features using techniques
like Bag of Words.
• Model Training: Trains a classification model (e.g., Naive Bayes) on the
preprocessed data.
• Sentiment Prediction: Classifies new text inputs based on trained models.
6. Testing
1. Ensure the dataset (`sentiment_data.csv`) is available in the project directory.
2. Run the script:
python sentiment_analyzer.py
3. Verify the model accuracy and test with custom text inputs.
7. Enhancements
• Advanced Models: Use state-of-the-art models like BERT or
LSTM for better accuracy.
• Multi-Language Support: Extend the system to handle reviews in multiple
languages.
• GUI Integration: Create a user-friendly interface for non-technical users.
8. Troubleshooting
• Low Accuracy: Use a larger or more diverse dataset for
training.
• Text Processing Errors: Check for missing or incorrect preprocessing steps.
• Library Issues: Ensure all required libraries are installed and up-to-date.
9. Conclusion
The Text Sentiment Analyzer efficiently classifies user sentiments from text, making it a valuable tool for businesses and organizations to understand customer feedback and improve their services.