[Numpy-discussion] AUTO: Leon Woo is out of the office (returning 11/07/2016)
I am out of the office until 11/07/2016. For standard requests within the scope of EMG PWM Berlin, please write to EMG PWM Berlin@DBEMEA. For non-standard requests, please cc Hien Pham-Thu. Note: This is an automated response to your message "NumPy-Discussion Digest, Vol 118, Issue 2" sent on 02.07.2016 14:00:01. This is the only notification you will receive while this person is away.-- Informationen (einschließlich Pflichtangaben) zu einzelnen, innerhalb der EU tätigen Gesellschaften und Zweigniederlassungen des Konzerns Deutsche Bank finden Sie unter http://www.deutsche-bank.de/de/content/pflichtangaben.htm. Diese E-Mail enthält vertrauliche und/ oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese E-Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser E-Mail ist nicht gestattet. Please refer to http://www.db.com/en/content/eu_disclosures.htm for information (including mandatory corporate particulars) on selected Deutsche Bank branches and group companies registered or incorporated in the European Union. This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and delete this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Picking rows with the first (or last) occurrence of each key
(I'm probably going to botch the description...) Suppose I have a 2D array of Python objects, the first n elements of each row form a key, the rest of the elements form the value. Each key can (and generally does) occur multiple times. I'd like to generate a new array consisting of just the first (or last) row for each key occurrence. Rows retain their relative order on output. For example, suppose I have this array with key length 2: [ 'a', 27, 14.5 ] [ 'b', 12, 99.0 ] [ 'a', 27, 15.7 ] [ 'a', 17, 100.3 ] [ 'b', 12, -329.0 ] Selecting the first occurrence of each key would return this array: [ 'a', 27, 14.5 ] [ 'b', 12, 99.0 ] [ 'a', 17, 100.3 ] while selecting the last occurrence would return this array: [ 'a', 27, 15.7 ] [ 'a', 17, 100.3 ] [ 'b', 12, -329.0 ] In real life, my array is a bit larger than this example, with the input being on the order of a million rows, and the output being around 5000 rows. Avoiding processing all those extra rows at the Python level would speed things up. I don't know what this filter might be called (though I'm sure I haven't thought of something new), so searching Google or Bing for it would seem to be fruitless. It strikes me as something which numpy or Pandas might already have in their bag(s) of tricks. Pointers appreciated, Skip ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Picking rows with the first (or last) occurrence of each key
Hey Skip, Any way that you can make your keys numeric? Then you can run np.diff on that first column, and use the indices of nonzero entries (np.flatnonzero) to know where values change. With a +1/-1 offset (that I am too lazy to figure out right now ;) you can then index into the original rows to get either the first or last occurrence of each run. Juan. On 2 July 2016 at 10:10:16 PM, Skip Montanaro (skip.montan...@gmail.com) wrote: (I'm probably going to botch the description...) Suppose I have a 2D array of Python objects, the first n elements of each row form a key, the rest of the elements form the value. Each key can (and generally does) occur multiple times. I'd like to generate a new array consisting of just the first (or last) row for each key occurrence. Rows retain their relative order on output. For example, suppose I have this array with key length 2: [ 'a', 27, 14.5 ] [ 'b', 12, 99.0 ] [ 'a', 27, 15.7 ] [ 'a', 17, 100.3 ] [ 'b', 12, -329.0 ] Selecting the first occurrence of each key would return this array: [ 'a', 27, 14.5 ] [ 'b', 12, 99.0 ] [ 'a', 17, 100.3 ] while selecting the last occurrence would return this array: [ 'a', 27, 15.7 ] [ 'a', 17, 100.3 ] [ 'b', 12, -329.0 ] In real life, my array is a bit larger than this example, with the input being on the order of a million rows, and the output being around 5000 rows. Avoiding processing all those extra rows at the Python level would speed things up. I don't know what this filter might be called (though I'm sure I haven't thought of something new), so searching Google or Bing for it would seem to be fruitless. It strikes me as something which numpy or Pandas might already have in their bag(s) of tricks. Pointers appreciated, Skip ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion