The classifier aglo is more or less un-changed since almost 10 years. I think a good reason for that could be, for example, NB, SVM have been able to achieve relatively high accuracy since long time back, provided with optimal/sub-optimal parameters.
While at the same time, a good approach to bump up the accuracy of overall text classification result is by data/corpus preparation, including stopwords, POS,TF-IDF etc, based on my experience.
Saw a good post on accuracy of text classification, echoing this:
libsvm is the first supervised machine learning library i have used extensively, more than 10 years back.
It was pretty awesome that time back, seeing a 78% text classification accuracy of against more than 100,000 hotel reviews, i have crawled from ctrip.com.
While, at version 3, they are able to achieve 96.875% for text classification results now, as:
Have tried to build an AI bot since almost 3 years back, finally did a prototype, in case anybody would like to do something similar:
Java, Scala, Python, Anaconda, Scikit Learn, EWS, BootStrap, AngularJS/JQuery
- I have built a scala web crawler, to download all historical support issues.
- at the same time, have manually cleaned up/read through each of the thousand of support issues, put in corresponding resolutions corresponding to each
- have leveraged on anaconda & scikit learn, to NLP, to tokenize each support issue (text), remove stop words, stemmed each, remove punctuations
- have leveraged on anaconda & scikit learn, bag each token of the text as feature vs class, to feed into linear regression classifier, tried SLDA, so far working at 72% accuracy
- have exposed AI as a service
- have leveraged EWS to read in all issues, post to AI service
- have built a web user interface, on top of HTML5 + JQuery + Bootstrap, to show the support emails + AI responded resolutions
- have a option on UI, to provide user feedback to AI, to keep its intelligence updated
- leverage on Java Mail API, Chat API, to post alerts for critical issues