To be honest, I've only used the hex values as that was the format in which the 
patterns were passed to me. 

However from your explanation, I now understand what's going on. I didn't 
appreciate that the characters were passed to another layer and not seeing the 
hex code as the raw backslash. 

Many thanks for the explanation. 

> On 25 Aug 2020, at 20:53, Jeff Newmiller <jdnew...@dcn.davis.ca.us> wrote:
> 
> In my opinion, using hexadecimal ASCII is much more obscure than simply 
> using the escape character properly... that is, you are doing no-one any 
> favors by using them. But to attain clarity here, you need to envision what 
> the various software layers are doing.
> 
> In your case, SQLServer may not utilize escape character, but it is passing 
> your R code to the R interpreter, which does use the escape character to 
> convert source code into strings in memory, which are then passed into the 
> regex parser, which is the final layer that also handles the same escape 
> character. 
> 
> What may be confusing you is the distinction between what is in memory that 
> the regex parser sees:
> 
> ["',?\\`]
> 
> and what the R string literal looks like that you should type to get this 
> string into memory:
> 
> "[\"',?\\\\`]"
> 
> When you pass the latter literal to the cat() function, it will show you the 
> former version. When you have the literal stored in memory, you can use the 
> print() function to see what you have to type as a literal string to get the 
> in-memory version. I use this trick (cat) to help me zero in on what is 
> actually getting passed to the regex engine when I have difficulty 
> envisioning what is going on.
> 
> The regex engine needs that doubled backslash to recognize that _it_ should 
> not give special treatment to the \ there, and should look for it in the 
> input data.
> 
>> On August 25, 2020 12:16:35 PM PDT, Peter Bishop <bishop_pet...@hotmail.com> 
>> wrote:
>> The feed is coming from a SQL table and this is using the embedded
>> support for R which comes with SQL 2016. The source is therefore a
>> SELECT statement.
>> 
>> 
>> As an aside, I found a workaround by changing the pattern from:
>> 
>> 
>> "[\x22\x27\x2c\x3f\x5c\x60]"
>> 
>> 
>> to:
>> 
>> 
>> "[\x22\x27\x2c\x3f\x5c\x5c\x60]"
>> 
>> 
>> This seems to be escaping the backslash in the R script rather than in
>> the data - which confuses me.
>> 
>> ________________________________
>> From: Bert Gunter <bgunter.4...@gmail.com>
>> Sent: Wednesday, 26 August 2020 4:26 AM
>> To: Peter Bishop <bishop_pet...@hotmail.com>
>> Cc: r-help@r-project.org <r-help@r-project.org>
>> Subject: Re: [R] Matching backslash in a table's column using R
>> language
>> 
>> 1. I am far from an expert on such matters
>> 2. It is unclear to me what your input is -- I assume a file.
>> 
>> The problem, as you indicate, is that R's parser sees "\B" as an
>> incorrect escape character, so, for example:
>>> cat("\B")
>> Error: '\B' is an unrecognized escape in character string starting
>> ""\B"
>> 
>> In any case, I think you should look at ?scan. Here is an example where
>> I scan from the keyboard first and then remove the "\". You may have to
>> scan from a file to do this.
>> 
>>> z <-scan(file = "", what = "character")
>> 1: A\BCDEFG
>> 2:     #CR terminates input
>> Read 1 item
>> 
>>> cat(z)
>> A\BCDEFG
>> 
>>> nchar(z)
>> [1] 8  ## scan read in the "\" as a single character from the console.
>> 
>>> sub("\\\\","",z)  ## Yes, 4 backslashes
>> [1] "ABCDEFG"
>> 
>> There may be better ways to do this, but as I said, I'm no expert.
>> 
>> BTW, in posting here, please post in *plain text,* as the server can
>> mangle html.
>> 
>> Cheers,
>> Bert
>> 
>> 
>> Bert Gunter
>> 
>> "The trouble with having an open mind is that people keep coming along
>> and sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>> 
>> 
>> On Tue, Aug 25, 2020 at 9:02 AM Peter Bishop
>> <bishop_pet...@hotmail.com<mailto:bishop_pet...@hotmail.com>> wrote:
>> In SQL, I'm using R as a way to filter data based on:
>>   - 20 characters in the range <space> to <tilde>
>> - excluding <quote>, <apostrophe>, <comma>, <question mark>,
>> <backslash>, <backtick>
>> 
>> Given a SQL column containing the data:
>> 
>>   code
>>   ----
>>   A\BCDEFG
>> 
>> and the T-SQL script:
>> 
>>   EXEC [sys].[sp_execute_external_script]
>>           @language=N'R',
>>           @script=N'
>>   pattern1 = "^[\x20-\x7e]{1,20}$"
>>   pattern2 = "[\x22\x27\x2c\x3f\x5c\x60]"
>> 
>> outData <- subset(inData, grepl(pattern1, code, perl=TRUE) &
>> !grepl(pattern2, code, perl=TRUE))',
>>           @input_data_1 = N'SELECT [code] FROM [dbo].[products]',
>>           @input_data_1_name = N'inData',
>>           @output_data_1_name = N'outData'
>>   WITH
>>           RESULT SETS (AS OBJECT [dbo].[products]);
>>   GO
>> 
>> why does the row detailed above get returned? I know that backslash is
>> a special character but not in the SQL table. Consequently, the T-SQL
>> code:
>> 
>>   SELECT ASCII(SUBSTRING([value], 2, 1)) FROM [table]
>> 
>> returns 92 (the ASCII code for <backslash>) which shows that this is
>> being recognised as a backslash character and not as an escape
>> indicator for the following "B".
>> 
>> Can anyone advise how I can filter out the <backslash> in the way that
>> the other identified characters are being successfully filtered? As the
>> data is being retrieved from a table, I can�t ask the data provider to
>> use �\\� instead of �\� as that will be invalid for other uses.
>> 
>> Thanks.
>> 
>>       [[alternative HTML version deleted]]
>> 
>> ______________________________________________
>> R-help@r-project.org<mailto:R-help@r-project.org> mailing list -- To
>> UNSUBSCRIBE and more, see
>> https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&amp;data=02%7C01%7C%7C61b6372acf8c4d215faa08d8493093e1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637339820268206663&amp;sdata=9t4wAFzYNfo%2B%2BITjORuSPUVNDtcSHm5hJN8yYGnEUnU%3D&amp;reserved=0<https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&amp;data=02%7C01%7C%7C61b6372acf8c4d215faa08d8493093e1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637339820268206663&amp;sdata=9t4wAFzYNfo%2B%2BITjORuSPUVNDtcSHm5hJN8yYGnEUnU%3D&amp;reserved=0>
>> PLEASE do read the posting guide
>> https://eur04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.html&amp;data=02%7C01%7C%7C61b6372acf8c4d215faa08d8493093e1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637339820268206663&amp;sdata=UVGq0iXkHMdLFC2oL1l5WG730HW1fvEJPDFaiHS3a98%3D&amp;reserved=0<https://eur04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.html&amp;data=02%7C01%7C%7C61b6372acf8c4d215faa08d8493093e1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637339820268206663&amp;sdata=UVGq0iXkHMdLFC2oL1l5WG730HW1fvEJPDFaiHS3a98%3D&amp;reserved=0>
>> and provide commented, minimal, self-contained, reproducible code.
>> 
>>    [[alternative HTML version deleted]]
> 
> -- 
> Sent from my phone. Please excuse my brevity.
> 
> ______________________________________________
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&amp;data=02%7C01%7C%7C61b6372acf8c4d215faa08d8493093e1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637339820268206663&amp;sdata=9t4wAFzYNfo%2B%2BITjORuSPUVNDtcSHm5hJN8yYGnEUnU%3D&amp;reserved=0
> PLEASE do read the posting guide 
> https://eur04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.html&amp;data=02%7C01%7C%7C61b6372acf8c4d215faa08d8493093e1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637339820268206663&amp;sdata=UVGq0iXkHMdLFC2oL1l5WG730HW1fvEJPDFaiHS3a98%3D&amp;reserved=0
> and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to