Accuracy for text classification

The classifier aglo is more or less un-changed since almost 10 years. I think a good reason for that could be, for example, NB, SVM have been able to achieve relatively high accuracy since long time back, provided with optimal/sub-optimal parameters.

While at the same time, a good approach to bump up the accuracy of overall text classification result is by data/corpus preparation, including stopwords, POS,TF-IDF etc, based on my experience.

Saw a good post on accuracy of text classification, echoing this:

https://www.analyticsvidhya.com/blog/2015/10/6-practices-enhance-performance-text-classification-model/

Advertisements

Supervised machine learning

libsvm is the first supervised machine learning library i have used extensively, more than 10 years back.

It was pretty awesome that time back, seeing a 78% text classification accuracy of against more than 100,000 hotel reviews, i have crawled from ctrip.com.

While, at version 3, they are able to achieve 96.875% for text classification results now, as:

https://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf

https://www.csie.ntu.edu.tw/~cjlin/libsvm/

AI for system support

Have tried to build an AI bot since almost 3 years back, finally did a prototype, in case anybody would like to do something similar:

Technologies:

Java, Scala, Python, Anaconda, Scikit Learn,  EWS, BootStrap, AngularJS/JQuery

Components:

Data Set

  1. I have built a scala web crawler, to download all historical support issues.
  2. at the same time, have manually cleaned up/read through each of the thousand of support issues, put in corresponding resolutions corresponding to each
AI
  1. have leveraged on anaconda & scikit learn, to NLP, to tokenize each support issue (text), remove stop words, stemmed each, remove punctuations
  2. have leveraged on anaconda & scikit learn, bag each token of the text as feature vs class, to feed into linear regression classifier, tried SLDA, so far working at 72% accuracy
AI Exposer
  1. have exposed AI as a service
Issue Feeder
  1. have leveraged EWS to read in all issues, post to AI service
UI
  1. have built a web user interface, on top of HTML5 + JQuery + Bootstrap, to show the support emails + AI responded resolutions
  2. have a option on UI, to provide user feedback to AI, to keep its intelligence updated
Notifier
  1. leverage on Java Mail API, Chat API, to post alerts for critical issues