Context-Aware Spelling Correction
Implemented a probabilistic spelling correction pipeline emphasizing contextual disambiguation beyond edit distance heuristics.
- Language Models: Trained unigram–5gram models; evaluated perplexity across smoothing variants (Kneser‑Ney, interpolated, Add‑k, Good‑Turing, Stupid Backoff).
- Noisy Channel: Candidate generation via Damerau‑Levenshtein edits + phonetic approximations; ranked with P(word|context)*P(error|true).
- Smoothing Study: Comparative analysis highlighted modified Kneser‑Ney superiority on sparse tail distributions.
- Evaluation: 88% accuracy on curated context‑sensitive confusion set (homophones, morphological variants).
Results: Demonstrated robust context modeling outperforming naive frequency and pure edit-distance baselines.
Tech: Python, NLTK.
Error analysis logs false corrections with context windows to guide smoothing and candidate generation adjustments.