Leverage NLP techniques for risk classification of legal documents (For Legal Documents)
Objective: Leverage NLP based techniques for risk-based contract classification for a large number of contracts with a quick turnaround
Key Challenges:
- 100,000+ legal documents to be evaluated
- Lack of a standard format
- Foreign language usage in some of the contracts
Approach:
- Pre-processing engine for white-space removal, punctuation-removal, stop-words removal, etc.
- Term document matrix creation
- Text Classification and NLP Algorithms were leveraged to build the foundational ontology using feature extraction
- Use Expectation Maximization Algorithms to transfer the classification knowledge across languages, by translating the model features
- Use the extracted feature set, in conjunction with business rules to flag contracts into three risk categories – high, medium, and low
- Validate results against test set, and have incorporate feedback loop to continuously improve the model accuracy
Benefits:
- Identified fraudulent contracts with a precision of 98%
- Automated engine to efficiently parse 10000+ contracts per day