Oh well. I'm not a very experienced monkey-patcher. There may be a better way to do it (*make sure you apply the monkey patch before importing any other scikit-learn modules*).
That part seems pretty obvious now that you mention it ;). Works now, thank you very much! On Tue, Aug 19, 2014 at 8:55 AM, Joel Nothman <[email protected]> wrote: > Oh well. I'm not a very experienced monkey-patcher. There may be a better > way to do it (make sure you apply the monkey patch before importing any > other scikit-learn modules). > > > On 19 August 2014 16:52, Anders Aagaard <[email protected]> wrote: > >> It does work with 1 job. >> >> I tried your monkey patch: >> # joblib.Parallel >> functools.partial(<class 'sklearn.externals.joblib.parallel.Parallel'>, >> max_nbytes=None) >> >> I still get the same error though. >> >> >> >> On Tue, Aug 19, 2014 at 8:19 AM, Joel Nothman <[email protected]> >> wrote: >> >>> I suspect this is a bug in joblib, and that you won't get it with >>> n_jobs=1. Joblib employs memmap for inter-process communication if the >>> array is larger than a fized size: >>> https://github.com/joblib/joblib/blob/master/joblib/pool.py#L203. It >>> seems it needs another criterion to check ensure that the data is indeed >>> memmappable. >>> >>> You could monkey-patch joblib's Parallel to be constructed with >>> max_nbytes=None to disable memmapping (untested): >>> >>> from sklearn.externals import joblib >>> from functools import partial >>> joblib.Parallel = partial(joblib.Parallel, max_nbytes=None) >>> # now import other scikit-learn modules... >>> >>> >>> Issue at https://github.com/joblib/joblib/issues/162 >>> >>> >>> On 19 August 2014 05:05, Anders Aagaard <[email protected]> wrote: >>> >>>> Hi >>>> >>>> I've got a reasonably large dataset I'm trying to do a gridsearch on. >>>> If I feed in a subset of it it works fine, but if I feed in the entire file >>>> it dies with : "Array can't be memory-mapped: Python objects in dtype.". >>>> Now I realize what that's telling me, but I seem to remember building >>>> pipelines with a countvectorizer in it a ton of times, and feeding datasets >>>> with columns of strings to my gridsearches fit methods. Also why would this >>>> work on a small file, but not a large one? >>>> >>>> I stuck a fake classifier in the top of my pipeline with some print >>>> statements to find out if it was my pipeline that was causing it, but I >>>> never get there. So it seems to be before any of the input data is passed >>>> to my pipeline. >>>> >>>> Backtrace : https://gist.github.com/andaag/f8e4c3df2e41fcc1f84f >>>> >>>> Anyone have any ideas whats going on? This is on scikit 0.15.1. The >>>> dtypes are identical on the large file and the smaller one. >>>> >>>> -- >>>> Best regards >>>> Anders Aagaard >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> _______________________________________________ >>>> Scikit-learn-general mailing list >>>> [email protected] >>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>>> >>>> >>> >>> >>> ------------------------------------------------------------------------------ >>> >>> _______________________________________________ >>> Scikit-learn-general mailing list >>> [email protected] >>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>> >>> >> >> >> -- >> Mvh >> Anders Aagaard >> >> >> ------------------------------------------------------------------------------ >> >> _______________________________________________ >> Scikit-learn-general mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> >> > > > ------------------------------------------------------------------------------ > > _______________________________________________ > Scikit-learn-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > -- Mvh Anders Aagaard
------------------------------------------------------------------------------
_______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
