Out of the Box: Machine Learning

Part of the Talend (Real-Time) Big Data Platform is a large assortment of Machine Learning components which allow for analysis to be performed directly in the Talend Studio without custom coding.

The full list, description, and related documentation for Machine Learning components is detailed in the table below. For complete information on the Machine Learning components, see the documentation.

Note: The components detailed below are as of version 6.2.1

IconNameComponentDescription
ALS (Alternating Least Squares)tALSModelProcess information received from Spark and performs ALS computations over these sets to generate and write a product recommender model (user ranking) in Parquet format
Bayes (Naïve)tNaiveBayesModelAnalyzes data sets and applies Bayes’ law with a naïve assumption and generates a classification model in PMML (Predictive Model Markup Language) format
ClassificationtClassifyUses a given model to classify elements in the dataset
Classification (Support Vector Machine)tClassifySVMUses SVM (Support Vector Machines) model to classify elements in the dataset
Decision TreestDecisionTreeModelUses the Decision Tree algorithm to generate a classification model
Gradient Boosted Tree ModeltGradientBoostedTreeModelGenerates a binary classification model
K-MeanstKMeansModelAnalyzes incoming datasets and applies K-means algorithm producing a clustering model
K-Means (Streaming)tKMeansStrModelAnalyzes incoming datasets and applies K-means algorithm in real-time
Linear RegressiontLinearRegressionModelBuilds a linear regression model using a training dataset
Logistic RegressiontLogisticRegressionModelAnalyzes incoming datasets and applies Logistic Regression algorithm producing a classification model
Model Encoder
tModelEncoderCan apply a wide range of feature processing algorithms: HashingTF, Inverse document frequency, Word2Vector, CountVectorizer, Binarizer, Bucketizer, Discrete Cosine Transform (DCT), MinMaxScaler, N-gram, Normalizer, One hot enconder, PCA, Polynomial expansion, Quantile Discretizer, Regex tokenizer, Tokenizer, SQL Transformer, Standard scaler, StopWordsRemover, String indexer, Vector indexer, Vector assembler, ChiSQSelector, RFormula, VectorSlicer
PredicttPredictUses a given classification, clustering or relationship model to analyse datasets
Predict (Cluster)tPredictClusterUses a given clustering model to analyse datasets into different clusters
Random Forest ModeltRandomForestModelAnalyzes incoming datasets and applies Random Forest algorithm
RecommendtRecommendAnalyzes incoming data in conjunction with ALS computations using a user defined recommendation model
SVM (Support Vector Machine)tSVMModelApplies SVM algorithm to analyze feature vectors

Leave a Reply

Your email address will not be published. Required fields are marked *

%d bloggers like this: