Data Mining - Mehmed Kantardzic [270]
This summary of some publicly available commercial data-mining products is being provided to help readers better understand what software tools can be found on the market and what their features are. It is not intended to endorse or critique any specific product. Potential users will need to decide for themselves the suitability of each product for their specific applications and data-mining environments. This is primarily intended as a starting point from which users can obtain more information. There is a constant stream of new products appearing in the market and hence this list is by no means comprehensive. Because these changes are very frequent, the author suggests two Web sites for information about the latest tools and their performances: http://www.kdnuggets.com and http://www.knowledgestorm.com.
A.5.1 Free Software
DataLab
Publisher: Epina Software Labs (www.lohninger.com/datalab/en_home.html)
DataLab, a complete and powerful data mining tool with a unique data exploration process, with a focus on marketing and interoperability with SAS. There is a public version for students.
DBMiner
Publisher: Simon Fraser University (http://ddm.cs.sfu.ca)
DBMiner is a publicly available tool for data mining. It is a multiple-strategy tool and it supports methodologies such as clustering, association rules, summarization, and visualization. DBMiner uses Microsoft SQL Server 7.0 Plato and runs on different Windows platforms.
GenIQ Model
Publisher: DM STAT-1 Consulting (www.geniqmodel.com)
GenIQ Model uses machine learning for regression tasks; automatically performs variable selection, and new variable construction, and then specifies the model equation to “optimize the decile table.”
NETMAP
Publisher: http://sourceforge.net/projects/netmap
NETMAP is a general-purpose, information-visualization tool. It is most effective for large, qualitative, text-based data sets. It runs on Unix workstations.
RapidMiner
Publisher: Rapid-I (http://rapid-i.com)
Rapid-I provides software, solutions, and services in the fields of predictive analytics, data mining, and text mining. The company concentrates on automatic intelligent analyses on a large-scale base, that is, for large amounts of structured data-like database systems and unstructured data-like texts. The open-source data-mining specialist Rapid-I enables other companies to use leading-edge technologies for data mining and business intelligence. The discovery and leverage of unused business intelligence from existing data enables better informed decisions and allows for process optimization.
SIPNA
Publisher: http://eric.univ-lyon2.fr/∼ricco/sipina.html
Sipina-W is publicly available software that includes different traditional data-mining techniques such as CART, Elisee, ID3, C4.5, and some new methods for generating decision trees.
SNNS
Publisher: University of Stuttart (http://www.nada.kth.se/∼orre/snns-manual/)
SNNS is a publicly available software. It is a simulation environment for research on and application of artificial neural networks. The environment is available on Unix and Windows platforms.
TiMBL
Publisher: Tilburg University (http://ilk.uvt.nl/timbl/)
TiMBL is a publicly available software. It includes several memory-based learning techniques for discrete data. A representation of the training set is explicitly stored in memory, and new cases are classified by extrapolation from the most similar cases.
TOOLDIAG
Publisher: http://sites.google.com/site/tooldiag/Home
TOOLDIAG is a publicly available tool for data mining. It consists of several programs in C for statistical pattern recognition of multivariate numeric data. The tool is primary oriented toward classification problems.
Weka
Publisher: University of Waikato (http://www.cs.waikato.ac.nz/ml/weka/)
Weka is a software environment that integrates several machine-learning tools within a common framework and a uniform GUI. Classification and summarization are the main data-mining tasks supported by the Weka system.