A classifier is a machine-learning technique that assigns documents to the most probable
class, out of a set of predefined classes. Classifiers try to find correlations between
attributes and classes. As a supervised ML method there are two different required
steps:
- Training Phase: The set of classes is determined and a set of training
documents
are analysed. Correlations are extracted and a model is built. The input is
typically
pairs of:
(class, document) - Application Phase: A new testing document is compared and then assigned to a class.
The Naive Bayes Classifier (NBC) is one of the methods that create a classification model. Its conceptual nature is based on probabilities and feature distribution amongst the training set. Press here to download a presentation that covers three classification algorithms in detail, including NBC.
How does this program work?
1. Setting a training set and a query
- Manually
- From file
- txt
- xml
You have to fill in one query and 2-10 documents. To increase document number, change the value of field "Number of docs". At least two different classes are required, otherwise the classification is trivial. For each document you must set at least one term.
Only .txt and .xml files are valid. For convenience, you can download examples of the files to use as templates. Txt and XML files will not limit you to 10 documents or 8 terms, allowing for wider applications.
The file must start with the query terms, separated by spaces. All following lines are considered training documents and the first word in each will be the document's class. Empty lines are not allowed.
This file should consist of all the training documents, alongside the query. See the xsd schema and example for more.
2. Results
A solution is compiled.