On Wed, Jun 18, 2008 at 8:45 AM, Wacek Kusnierczyk <[EMAIL PROTECTED]> asked for opinions: > > When the pattern > matches the beginning of the search string, the empty string is added to > the result, but that's not the case when the pattern matches the end of > the search string: > > strsplit(" hello dolly ") > [1] "" "hello" "dolly"
With R version 2.6.1 Patched (2007-11-26 r43541), I get Error in strsplit(" hello dolly ") : argument "split" is missing, with no default But strsplit(" hello dolly ", " ") reproduces your results. > The man for strsplit explains the algorithm: > > " > The algorithm applied to each input string is > > > repeat { > if the string is empty > break. > if there is a match > add the string to the left of the match to the output. > remove the match and all to the left of it. > else > add the string to the output. > break. > } > > Note that this means that if there is a match at the beginning of > a (non-empty) string, the first element of the output is '""', but > if there is a match at the end of the string, the output is the > same as with the match removed. > " The algorithm, the comment after it, and your results are consistent. Whether it is intuitive is a matter of taste. I agree it's not as symmetric as one might like. > If the pattern matches, (second if above), the match is added to the > output, and removed from the input -- which after this step is the empty > string; Close. The string to the left of the match, "dolly", is added to the output. I agree, the input is now the empty string. > in the next step, there is no match (else above), so the rest of > the input string (= the empty string) *should* be added, but it is not > what happens. No, in the next step, the string is empty (first 'if' above), and we break. The else branch never applies in your example. > (i see no good > reason for including the empty string at the beginning but not at the > end of the output; no other language i know would do that this way) I checked Perl, and it does exactly the same: print join "==", split / /, " hello dolly " ==hello==dolly (that's 3 elements: "", "hello", and "dolly"). Cheers, /Christian ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel