I suspect this is a bug in joblib, and that you won't get it with n_jobs=1.
Joblib employs memmap for inter-process communication if the array is
larger than a fized size:
https://github.com/joblib/joblib/blob/master/joblib/pool.py#L203. It seems
it needs another criterion to check ensure that the data is indeed
memmappable.

You could monkey-patch joblib's Parallel to be constructed with
max_nbytes=None to disable memmapping (untested):

from sklearn.externals import joblib
from functools import partial
joblib.Parallel = partial(joblib.Parallel, max_nbytes=None)
# now import other scikit-learn modules...


Issue at https://github.com/joblib/joblib/issues/162


On 19 August 2014 05:05, Anders Aagaard <[email protected]> wrote:

> Hi
>
> I've got a reasonably large dataset I'm trying to do a gridsearch on. If I
> feed in a subset of it it works fine, but if I feed in the entire file it
> dies with : "Array can't be memory-mapped: Python objects in dtype.". Now I
> realize what that's telling me, but I seem to remember building pipelines
> with a countvectorizer in it a ton of times, and feeding datasets with
> columns of strings to my gridsearches fit methods. Also why would this work
> on a small file, but not a large one?
>
> I stuck a fake classifier in the top of my pipeline with some print
> statements to find out if it was my pipeline that was causing it, but I
> never get there. So it seems to be before any of the input data is passed
> to my pipeline.
>
> Backtrace : https://gist.github.com/andaag/f8e4c3df2e41fcc1f84f
>
> Anyone have any ideas whats going on? This is on scikit 0.15.1. The dtypes
> are identical on the large file and the smaller one.
>
> --
> Best regards
> Anders Aagaard
>
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to