Python module to perform under sampling and over sampling with various techniques.
glemaitre on master
Update MANIFEST.in (compare)
You just need to operate proper reshaping. I once worked with a time series activity data in which I created chunks of N-size time-steps. The shape of my input was (1, 100, 4)
. So for the training sample, I have (n_samples, 1, 100, 4)
and was a five-class, multi-minority problem, that I want to oversample using SMOTE.
The way I go about it was to flatten the input, like so:
#..reshape (flatten) Train_X for SMOTE resanpling
nsamples, k, nx, ny = Train_X.shape
#Train_X = Train_X.reshape((nsamples,nx*ny))
#smote = SMOTE('not majority', random_state=42, k_neighbors=5)
#X_reample, Y_resample = smote.fit_sample(Train_X, Train_Y)
And then reshape the instance back to the original input shape, like so:
#..reshape input back to CNN xture
X_reample = X_reample.reshape(len(X_reample), k, nx, ny)
strategy_sampling
parameter as well.
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
model = Pipeline(steps=[
("preprocessor", StandardScaler()),
("classifier", KNeighborsClassifier(n_neighbors=5)),
])
preprocessor
step
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import QuantileTransformer
from sklearn.preprocessing import PowerTransformer
all_preprocessors = [
None,
StandardScaler(),
MinMaxScaler(),
QuantileTransformer(n_quantiles=100),
PowerTransformer(method="box-cox"),
]
param_grid = {
“preprocessor”: all_preprocessors,
}
search_cv = GridSearchCV(model, param_grid=param_grid)
Pipeline
from imblearn.pipeline
such that it can handle sampler