Oh well. I'm not a very experienced monkey-patcher. There may be a better
way to do it (*make sure you apply the monkey patch before importing any
other scikit-learn modules*).

That part seems pretty obvious now that you mention it ;). Works now, thank
you very much!


On Tue, Aug 19, 2014 at 8:55 AM, Joel Nothman <[email protected]>
wrote:

> Oh well. I'm not a very experienced monkey-patcher. There may be a better
> way to do it (make sure you apply the monkey patch before importing any
> other scikit-learn modules).
>
>
> On 19 August 2014 16:52, Anders Aagaard <[email protected]> wrote:
>
>> It does work with 1 job.
>>
>> I tried your monkey patch:
>> # joblib.Parallel
>> functools.partial(<class 'sklearn.externals.joblib.parallel.Parallel'>,
>> max_nbytes=None)
>>
>> I still get the same error though.
>>
>>
>>
>> On Tue, Aug 19, 2014 at 8:19 AM, Joel Nothman <[email protected]>
>> wrote:
>>
>>> I suspect this is a bug in joblib, and that you won't get it with
>>> n_jobs=1. Joblib employs memmap for inter-process communication if the
>>> array is larger than a fized size:
>>> https://github.com/joblib/joblib/blob/master/joblib/pool.py#L203. It
>>> seems it needs another criterion to check ensure that the data is indeed
>>> memmappable.
>>>
>>> You could monkey-patch joblib's Parallel to be constructed with
>>> max_nbytes=None to disable memmapping (untested):
>>>
>>> from sklearn.externals import joblib
>>> from functools import partial
>>> joblib.Parallel = partial(joblib.Parallel, max_nbytes=None)
>>> # now import other scikit-learn modules...
>>>
>>>
>>> Issue at https://github.com/joblib/joblib/issues/162
>>>
>>>
>>> On 19 August 2014 05:05, Anders Aagaard <[email protected]> wrote:
>>>
>>>> Hi
>>>>
>>>> I've got a reasonably large dataset I'm trying to do a gridsearch on.
>>>> If I feed in a subset of it it works fine, but if I feed in the entire file
>>>> it dies with : "Array can't be memory-mapped: Python objects in dtype.".
>>>> Now I realize what that's telling me, but I seem to remember building
>>>> pipelines with a countvectorizer in it a ton of times, and feeding datasets
>>>> with columns of strings to my gridsearches fit methods. Also why would this
>>>> work on a small file, but not a large one?
>>>>
>>>> I stuck a fake classifier in the top of my pipeline with some print
>>>> statements to find out if it was my pipeline that was causing it, but I
>>>> never get there. So it seems to be before any of the input data is passed
>>>> to my pipeline.
>>>>
>>>> Backtrace : https://gist.github.com/andaag/f8e4c3df2e41fcc1f84f
>>>>
>>>> Anyone have any ideas whats going on? This is on scikit 0.15.1. The
>>>> dtypes are identical on the large file and the smaller one.
>>>>
>>>> --
>>>> Best regards
>>>> Anders Aagaard
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>>
>>>> _______________________________________________
>>>> Scikit-learn-general mailing list
>>>> [email protected]
>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>
>>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>
>>
>> --
>> Mvh
>> Anders Aagaard
>>
>>
>> ------------------------------------------------------------------------------
>>
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>


-- 
Mvh
Anders Aagaard
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to