To be honest, I've only used the hex values as that was the format in which the patterns were passed to me.
However from your explanation, I now understand what's going on. I didn't appreciate that the characters were passed to another layer and not seeing the hex code as the raw backslash. Many thanks for the explanation. > On 25 Aug 2020, at 20:53, Jeff Newmiller <jdnew...@dcn.davis.ca.us> wrote: > > In my opinion, using hexadecimal ASCII is much more obscure than simply > using the escape character properly... that is, you are doing no-one any > favors by using them. But to attain clarity here, you need to envision what > the various software layers are doing. > > In your case, SQLServer may not utilize escape character, but it is passing > your R code to the R interpreter, which does use the escape character to > convert source code into strings in memory, which are then passed into the > regex parser, which is the final layer that also handles the same escape > character. > > What may be confusing you is the distinction between what is in memory that > the regex parser sees: > > ["',?\\`] > > and what the R string literal looks like that you should type to get this > string into memory: > > "[\"',?\\\\`]" > > When you pass the latter literal to the cat() function, it will show you the > former version. When you have the literal stored in memory, you can use the > print() function to see what you have to type as a literal string to get the > in-memory version. I use this trick (cat) to help me zero in on what is > actually getting passed to the regex engine when I have difficulty > envisioning what is going on. > > The regex engine needs that doubled backslash to recognize that _it_ should > not give special treatment to the \ there, and should look for it in the > input data. > >> On August 25, 2020 12:16:35 PM PDT, Peter Bishop <bishop_pet...@hotmail.com> >> wrote: >> The feed is coming from a SQL table and this is using the embedded >> support for R which comes with SQL 2016. The source is therefore a >> SELECT statement. >> >> >> As an aside, I found a workaround by changing the pattern from: >> >> >> "[\x22\x27\x2c\x3f\x5c\x60]" >> >> >> to: >> >> >> "[\x22\x27\x2c\x3f\x5c\x5c\x60]" >> >> >> This seems to be escaping the backslash in the R script rather than in >> the data - which confuses me. >> >> ________________________________ >> From: Bert Gunter <bgunter.4...@gmail.com> >> Sent: Wednesday, 26 August 2020 4:26 AM >> To: Peter Bishop <bishop_pet...@hotmail.com> >> Cc: r-help@r-project.org <r-help@r-project.org> >> Subject: Re: [R] Matching backslash in a table's column using R >> language >> >> 1. I am far from an expert on such matters >> 2. It is unclear to me what your input is -- I assume a file. >> >> The problem, as you indicate, is that R's parser sees "\B" as an >> incorrect escape character, so, for example: >>> cat("\B") >> Error: '\B' is an unrecognized escape in character string starting >> ""\B" >> >> In any case, I think you should look at ?scan. Here is an example where >> I scan from the keyboard first and then remove the "\". You may have to >> scan from a file to do this. >> >>> z <-scan(file = "", what = "character") >> 1: A\BCDEFG >> 2: #CR terminates input >> Read 1 item >> >>> cat(z) >> A\BCDEFG >> >>> nchar(z) >> [1] 8 ## scan read in the "\" as a single character from the console. >> >>> sub("\\\\","",z) ## Yes, 4 backslashes >> [1] "ABCDEFG" >> >> There may be better ways to do this, but as I said, I'm no expert. >> >> BTW, in posting here, please post in *plain text,* as the server can >> mangle html. >> >> Cheers, >> Bert >> >> >> Bert Gunter >> >> "The trouble with having an open mind is that people keep coming along >> and sticking things into it." >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >> >> >> On Tue, Aug 25, 2020 at 9:02 AM Peter Bishop >> <bishop_pet...@hotmail.com<mailto:bishop_pet...@hotmail.com>> wrote: >> In SQL, I'm using R as a way to filter data based on: >> - 20 characters in the range <space> to <tilde> >> - excluding <quote>, <apostrophe>, <comma>, <question mark>, >> <backslash>, <backtick> >> >> Given a SQL column containing the data: >> >> code >> ---- >> A\BCDEFG >> >> and the T-SQL script: >> >> EXEC [sys].[sp_execute_external_script] >> @language=N'R', >> @script=N' >> pattern1 = "^[\x20-\x7e]{1,20}$" >> pattern2 = "[\x22\x27\x2c\x3f\x5c\x60]" >> >> outData <- subset(inData, grepl(pattern1, code, perl=TRUE) & >> !grepl(pattern2, code, perl=TRUE))', >> @input_data_1 = N'SELECT [code] FROM [dbo].[products]', >> @input_data_1_name = N'inData', >> @output_data_1_name = N'outData' >> WITH >> RESULT SETS (AS OBJECT [dbo].[products]); >> GO >> >> why does the row detailed above get returned? I know that backslash is >> a special character but not in the SQL table. Consequently, the T-SQL >> code: >> >> SELECT ASCII(SUBSTRING([value], 2, 1)) FROM [table] >> >> returns 92 (the ASCII code for <backslash>) which shows that this is >> being recognised as a backslash character and not as an escape >> indicator for the following "B". >> >> Can anyone advise how I can filter out the <backslash> in the way that >> the other identified characters are being successfully filtered? As the >> data is being retrieved from a table, I can�t ask the data provider to >> use �\\� instead of �\� as that will be invalid for other uses. >> >> Thanks. >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help@r-project.org<mailto:R-help@r-project.org> mailing list -- To >> UNSUBSCRIBE and more, see >> https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=02%7C01%7C%7C61b6372acf8c4d215faa08d8493093e1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637339820268206663&sdata=9t4wAFzYNfo%2B%2BITjORuSPUVNDtcSHm5hJN8yYGnEUnU%3D&reserved=0<https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=02%7C01%7C%7C61b6372acf8c4d215faa08d8493093e1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637339820268206663&sdata=9t4wAFzYNfo%2B%2BITjORuSPUVNDtcSHm5hJN8yYGnEUnU%3D&reserved=0> >> PLEASE do read the posting guide >> https://eur04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.html&data=02%7C01%7C%7C61b6372acf8c4d215faa08d8493093e1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637339820268206663&sdata=UVGq0iXkHMdLFC2oL1l5WG730HW1fvEJPDFaiHS3a98%3D&reserved=0<https://eur04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.html&data=02%7C01%7C%7C61b6372acf8c4d215faa08d8493093e1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637339820268206663&sdata=UVGq0iXkHMdLFC2oL1l5WG730HW1fvEJPDFaiHS3a98%3D&reserved=0> >> and provide commented, minimal, self-contained, reproducible code. >> >> [[alternative HTML version deleted]] > > -- > Sent from my phone. Please excuse my brevity. > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=02%7C01%7C%7C61b6372acf8c4d215faa08d8493093e1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637339820268206663&sdata=9t4wAFzYNfo%2B%2BITjORuSPUVNDtcSHm5hJN8yYGnEUnU%3D&reserved=0 > PLEASE do read the posting guide > https://eur04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.html&data=02%7C01%7C%7C61b6372acf8c4d215faa08d8493093e1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637339820268206663&sdata=UVGq0iXkHMdLFC2oL1l5WG730HW1fvEJPDFaiHS3a98%3D&reserved=0 > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.