Context-Aware Spelling Correction

Implemented a probabilistic spelling correction pipeline emphasizing contextual disambiguation beyond edit distance heuristics.

  • Language Models: Trained unigram–5gram models; evaluated perplexity across smoothing variants (Kneser‑Ney, interpolated, Add‑k, Good‑Turing, Stupid Backoff).
  • Noisy Channel: Candidate generation via Damerau‑Levenshtein edits + phonetic approximations; ranked with P(word|context)*P(error|true).
  • Smoothing Study: Comparative analysis highlighted modified Kneser‑Ney superiority on sparse tail distributions.
  • Evaluation: 88% accuracy on curated context‑sensitive confusion set (homophones, morphological variants).

Results: Demonstrated robust context modeling outperforming naive frequency and pure edit-distance baselines.

Tech: Python, NLTK.

View on GitHub →

Error analysis logs false corrections with context windows to guide smoothing and candidate generation adjustments.