Helper¶
This module provides helper functions and classes for the user. They make working with skorch easier but are not used by skorch itself.
SliceDict¶
A SliceDict
is a wrapper for Python dictionaries that makes
them behave a little bit like numpy.ndarray
s. That way, you
can slice your dictionary across values, len()
will show the
length of the arrays and not the number of keys, and you get a
shape
attribute. This is useful because if your data is in a
dict
, you would normally not be able to use sklearn
GridSearchCV
and similar things;
with SliceDict
, this works.
SliceDataset¶
A SliceDataset
is a wrapper for
PyTorch Dataset
s that makes them behave a little
bit like numpy.ndarray
s. That way, you can slice your
dataset with lists and arrays, and you get a shape
attribute.
These properties are useful because if your data is in a dataset, you
would normally not be able to use sklearn
GridSearchCV
and similar things;
with SliceDataset
, this works.
Note that SliceDataset
can only ever return one of the
values returned by the dataset. Typically, this will be either the X
or the y value. Therefore, if you want to wrap both X and y, you
should create two instances of SliceDataset
, one for X (with
argument idx=0
, the default) and one for y (with argument
idx=1
):
ds = MyCustomDataset()
X_sl = SliceDataset(ds, idx=0) # idx=0 is the default
y_sl = SliceDataset(ds, idx=1)
gs.fit(X_sl, y_sl)
Command line interface helpers¶
Often you want to wrap up your experiments by writing a small script that allows others to reproduce your work. With the help of skorch and the fire library, it becomes very easy to write command line interfaces without boilerplate. All arguments pertaining to skorch or its PyTorch module are immediately available as command line arguments, without the need to write a custom parser. If docstrings in the numpydoc specification are available, there is also an comprehensive help for the user. Overall, this allows you to make your work reproducible without the usual hassle.
There is an example in the skorch repository that shows how to use the CLI tools. Below is a snippet that shows the output created by the help function without writing a single line of argument parsing:
$ python examples/cli/train.py pipeline --help
<SelectKBest> options:
--select__score_func : callable
Function taking two arrays X and y, and returning a pair of arrays
(scores, pvalues) or a single array with scores.
Default is f_classif (see below "See also"). The default function only
works with classification tasks.
--select__k : int or "all", optional, default=10
Number of top features to select.
The "all" option bypasses selection, for use in a parameter search.
...
<NeuralNetClassifier> options:
--net__module : torch module (class or instance)
A PyTorch :class:`~torch.nn.Module`. In general, the
uninstantiated class should be passed, although instantiated
modules will also work.
--net__criterion : torch criterion (class, default=torch.nn.NLLLoss)
Negative log likelihood loss. Note that the module should return
probabilities, the log is applied during ``get_loss``.
--net__optimizer : torch optim (class, default=torch.optim.SGD)
The uninitialized optimizer (update rule) used to optimize the
module
--net__lr : float (default=0.01)
Learning rate passed to the optimizer. You may use ``lr`` instead
of using ``optimizer__lr``, which would result in the same outcome.
--net__max_epochs : int (default=10)
The number of epochs to train for each ``fit`` call. Note that you
may keyboard-interrupt training at any time.
--net__batch_size : int (default=128)
...
--net__verbose : int (default=1)
Control the verbosity level.
--net__device : str, torch.device (default='cpu')
The compute device to be used. If set to 'cuda', data in torch
tensors will be pushed to cuda tensors before being sent to the
module.
<MLPClassifier> options:
--net__module__hidden_units : int (default=10)
Number of units in hidden layers.
--net__module__num_hidden : int (default=1)
Number of hidden layers.
--net__module__nonlin : torch.nn.Module instance (default=torch.nn.ReLU())
Non-linearity to apply after hidden layers.
--net__module__dropout : float (default=0)
Dropout rate. Dropout is applied between layers.
Installation¶
To use this functionality, you need some further libraries that are not part of skorch, namely fire and numpydoc. You can install them thusly:
python -m pip install fire numpydoc
Usage¶
When you write your own script, only the following bits need to be added:
import fire
from skorch.helper import parse_args
# your model definition and data fetching code below
...
def main(**kwargs):
X, y = get_data()
my_model = get_model()
# important: wrap the model with the parsed arguments
parsed = parse_args(kwargs)
my_model = parsed(my_model)
my_model.fit(X, y)
if __name__ == '__main__':
fire.Fire(main)
This even works if your neural net is part of an sklearn pipeline, in which case the help extends to all other estimators of your pipeline.
In case you would like to change some defaults for the net (e.g. using
a batch_size
of 256 instead of 128), this is also possible. You
should have a dictionary containing your new defaults and pass it as
an additional argument to parse_args
:
my_defaults = {'batch_size': 128, 'module__hidden_units': 30}
def main(**kwargs):
...
parsed = parse_args(kwargs, defaults=my_defaults)
my_model = parsed(my_model)
This will update the displayed help to your new defaults, as well as
set the parameters on the net or pipeline for you. However, the
arguments passed via the commandline have precedence. Thus, if you
additionally pass --batch_size 512
to the script, batch size will
be 512.
Restrictions¶
Almost all arguments should work out of the box. Therefore, you get
command line arguments for the number of epochs, learning rate, batch
size, etc. for free. Moreover, you can access the module parameters
with the double-underscore notation as usual with skorch
(e.g. --module__num_units 100
). This should cover almost all
common cases.
Parsing command line arguments that are non-primitive Python objects
is more difficult, though. skorch’s custom parsing should support
normal Python types and simple custom objects, e.g. this works:
--module__nonlin 'torch.nn.RReLU(0.1, upper=0.4)'
. More complex
parsing might not work. E.g., it is currently not possible to add new
callbacks through the command line (but you can modify existing ones
as usual).