4. Overview¶

Note

Throughout this overview and in certain other sections the examples provided are for Files-only/ development installations, however this is only to make it easier to use the inbuilt examples/ sample files rather than having to force the user to define his/ her own cleaning, learning, infering scripts, for the sake of simplicity.

If you are not using the Files-only/ development installation you will have to point nemesyst to cleaners, learners, predictors etc that you want to use. Although even if you are using Files-only/ development, eventually once you have better understood and tested Nemesyst then you should likeley move to creating your own ones that you require, and using a normal installation of Nemesyst such as one of the Automated examples.

4.1. Nemesyst literal un-abstract stages¶

This image is a use case example of Nemesyst applied to a distributed refrigeration fleet over multiple sites, and both online and offline learning capabilities occuring simultaneously.¶

Nemesyst has been made to be generic enough to handle many possible configurations, but we cannot possibly handle all possible scenarios. Sometimes it may be necessary to manually configure certain aspects of the process, especially regarding MongoDB as it is quite a well developed, mature, database, with more features than we could, and should automate.

4.2. Nemesyst Abstraction of stages¶

Nemesyst stages of data from input to output.

Nemesyst has abstracted, grouped, and formalised what we believe are the core stages of applying deep learning at all scales.¶

Deep learning can be said to include 3 stages, data-wrangling, test-training, and inferring. Nemesyst adds an extra layer we call serving, which is the stage at which databases are involved as the message passing interface (MPI), and generator, between the layers, machines, and algorithms, along with being the data, and model storage mechanism.

4.3. Nemesyst Parallelisation¶

As of: 2.0.1.r6.f9f92c3

Nemesyst round depiction diagram, showing the order and values of rounds.

Nemesyst parallelises each script, up the the maximum number of processes in the process pool.¶

Local parallelization of your scripts occur using pythons process pools from multiprocessing. This diagram shows how the rounds of processing are abstracted and the order of them. Rounds do not continue between stages, I.E if there is a spare process but not enough scripts from that stage (e.g cleaning) it will not fill this with a script process from the next stage (e.g learning). This is to prevent the scenario where a learning script may depend on the output of a previous cleaning script.

4.4. Wrangling / cleaning¶

See All Options by Category for a full list of options.

Wrangling is the stage where the data is cleaned into single atomic examples to be imported to the database.¶

Files-only/ development example:

nemesyst

4.5. Serving¶

See All Options by Category for a full list of options.

Nemesyst database serving puzzle diagram.

Serving is the stage where the data and eventually trained models will be stored and passed to other processess potentially on other machines.¶

Nemesyst uses MongoDB databases through PyMongo as a data store, and distribution mechanism. The database(s) are some of the most important aspects of the chain of processes, as nothing can operate without a properly functioning database. As such we have attempted to simplify operations on both the user scripts side and our side by abstracting the slightly raw PyMongo interface into a much friendlier class of operations called Mongo.

A Mongo object is automatically passed into every one of your desired scripts entry points, so that you can also easily operate on the database if you so choose although aside from our data generator we handle the majority of use cases before it reaches your scripts.

Automated example:

# creating basic non-config, non-replica, localhost, mongodb instance
nemesyst --db-init --db-start --db-login --db-stop \
         --db-user-name USERNAME --db-password \
         --db-path DBPATH --db-log-path DBPATH/LOGDIR

Note

Please see Serving with MongoDB for more in depth serving with Nemesyst

4.6. Learning¶

See All Options by Category for a full list of options.

Learning is the stage where the data is used to train new models or to update an existing model already in the database.¶

Files-only/ development example:

nemesyst

Warning

Special attention should be paid to the size of the resultant neural networks. Beyond a certain size it will be necessary to store them as GridFS objects. The basic GridFS functionality is included in nemesyst’s Mongo however this is still experimental and should not be depended upon at this time.

4.7. Inferring / predicting¶

As of: 2.0.2.r7.1cf3eab

See All Options by Category for a full list of options.

Inferring is the stage where the model(s) are used to predict on newly provided data.¶

Files-only/ development example:

nemesyst