On Tue, 24 Apr 2012, Ulrich Weigand wrote: > Hello, > > PR 52633 is caused by bad interaction between two different vectorizer > pattern recognition passed. A minimal test case is: > > void > test (unsigned short *x, signed char *y) > { > int i; > for (i = 0; i < 32; i++) > x[i] = (short) (y[i] << 5); > } > > built with "cc1 -O3 -march=armv7-a -mfpu=neon -mfloat-abi=softfp" > on a arm-linux-gnueabi target. > > Before the vectorizer, we have something like: > > short unsigned int D.4976; > int D.4975; > int D.4974; > signed char D.4973; > signed char * D.4972; > short unsigned int * D.4970; > [snip] > D.4973_8 = *D.4972_7; > D.4974_9 = (int) D.4973_8; > D.4975_10 = D.4974_9 << 5; > D.4976_11 = (short unsigned int) D.4975_10; > *D.4970_5 = D.4976_11; > > > The pattern recognizer now goes through its list of patterns it tries > to detect. The first successful one is vect_recog_over_widening_pattern. > This will annotate the statements (via related_stmt fields): > > D.4973_8 = *D.4972_7; > D.4974_9 = (int) D.4973_8; > --> D.4992_26 = (signed short) D.4973_8 > D.4975_10 = D.4974_9 << 5; > --> patt.16_31 = D.4992_26 << 5 > D.4976_11 = (short unsigned int) D.4975_10; > --> D.4994_32 = (short unsigned int) patt.16_31 > *D.4970_5 = D.4976_11; > > > In the next step, vect_recog_widen_shift_pattern *also* matches, and > creates a new annotation for the shift statement (using a widening > shift): > > D.4973_8 = *D.4972_7; > D.4974_9 = (int) D.4973_8; > --> D.4992_26 = (signed short) D.4973_8 > D.4975_10 = D.4974_9 << 5; > [--> patt.16_31 = D.4992_26 << 5] > --> patt.17_33 = D.4973_8 w<< 5 > D.4976_11 = (short unsigned int) D.4975_10; > --> D.4994_32 = (short unsigned int) patt.16_31 > *D.4970_5 = D.4976_11; > > > Since the original statement can only point to a single related_stmt, > the statement setting patt.16_31 is now longer refered to as related_stmt > by any other statement. This causes it to no longer be considered > relevant for the vectorizer. > > However, the statement: > --> D.4994_32 = (short unsigned int) patt.16_31 > *is* still considered relevant. While analysing it, however, the > vectorizer follows through to the def statement for patt.16_31, > and gets quite confused to find that it doesn't have a vectype > (because it wasn't considered by the vectorizer). The symptom > is a failing assertion > gcc_assert (*vectype != NULL_TREE); > in vect_is_simple_use_1. > > > Now, it seems quite unusual for multiple patterns to match for a > single original statement. In fact, most pattern recognizers > explicitly refuse to even consider statements that were already > recognized. However, vect_recog_widen_shift_pattern makes an > exception: > > /* This statement was also detected as over-widening operation (it can't > be any other pattern, because only over-widening detects shifts). > LAST_STMT is the final type demotion statement, but its related > statement is shift. We analyze the related statement to catch cases: > > orig code: > type a_t; > itype res; > TYPE a_T, res_T; > > S1 a_T = (TYPE) a_t; > S2 res_T = a_T << CONST; > S3 res = (itype)res_T; > > (size of type * 2 <= size of itype > and size of itype * 2 <= size of TYPE) > > code after over-widening pattern detection: > > S1 a_T = (TYPE) a_t; > --> a_it = (itype) a_t; > S2 res_T = a_T << CONST; > S3 res = (itype)res_T; <--- LAST_STMT > --> res = a_it << CONST; > > after widen_shift: > > S1 a_T = (TYPE) a_t; > --> a_it = (itype) a_t; - redundant > S2 res_T = a_T << CONST; > S3 res = (itype)res_T; > --> res = a_t w<< CONST; > > i.e., we replace the three statements with res = a_t w<< CONST. */ > > > If everything were indeed as described in that comment, things would work out > fine. However, what is described above as "code after over-widening pattern > detection" is only one of two possible outcomes of that latter pattern; the > other is the one that happens in the current test case, where we still have > a final type conversion left after applying the over-widening pattern. > > > I guess one could try to distiguish the two cases somehow and handle both; > but the overall approach seems quite fragile to me; it doesn't look really > maintainable to have to rely on so many details of the operation of one > particular pattern detection function while writing another one, or else > risk creating subtle problems like the one described above. > > So I was wondering why vect_recog_widen_shift_pattern tries to take advantage > of an already recognized over-widening pattern. But indeed, if it does not, > it will generate less efficient code in cases like the above test case: by > itself vect_recog_widen_shift_pattern, would generate code to expand the > char to a short, then do a widening shift resulting in an int, followed > by narrowing back down to a short. > > > However, even so, it might actually be preferable to just handle such > cases within vect_recog_widen_shift_pattern itself. Indeed, the routine > already looks for another subsequent type cast, in order to handle > unsigned shift variants. Maybe it simply ought to always look for > another cast, and detect over-widening situations itself? > > > Does this look reasonable? Any comments or suggestions appreciated!
Yes, getting rid of this fragile interaction by doing more work in vect_recog_widen_shift_pattern sounds like the correct thing to do. Richard.