Copilot commented on code in PR #142:
URL: 
https://github.com/apache/skywalking-infra-e2e/pull/142#discussion_r3004887271


##########
internal/components/collector/compose.go:
##########
@@ -123,6 +130,41 @@ func collectContainerInspect(outputDir, service, 
containerID string) error {
        return nil
 }
 
+// expandContainerGlob expands a glob pattern inside a Docker container.
+// If the path has no glob characters it is returned as-is.
+func expandContainerGlob(containerID, service, pattern string) ([]string, 
error) {
+       if !containsGlob(pattern) {
+               return []string{pattern}, nil
+       }
+
+       if err := validateGlobPattern(pattern); err != nil {
+               return nil, err
+       }
+
+       cmd := fmt.Sprintf("docker exec %s sh -c 'ls -d -- %s 2>/dev/null || 
true'", containerID, pattern)
+       stdout, stderr, err := util.ExecuteCommand(cmd)
+       if err != nil {
+               logger.Log.Warnf("failed to expand glob %s in service %s: %v, 
stderr: %s", pattern, service, err, stderr)
+               return nil, fmt.Errorf("glob expansion failed for %s: %v", 
pattern, err)

Review Comment:
   On `docker exec` failure, the returned error omits `stderr`. Elsewhere in 
this package (e.g., `docker inspect`, `docker cp`) errors include stderr, which 
is important for diagnosing missing shells/permissions inside the container. 
Consider returning stderr in the error in addition to logging it.
   ```suggestion
                return nil, fmt.Errorf("glob expansion failed for %s: %v, 
stderr: %s", pattern, err, stderr)
   ```



##########
internal/components/collector/collector_test.go:
##########
@@ -122,6 +122,74 @@ func TestListPods_ResourceFormat(t *testing.T) {
        }
 }
 
+func TestContainsGlob(t *testing.T) {
+       tests := []struct {
+               path string
+               want bool
+       }{
+               {"/skywalking/logs/", false},
+               {"/tmp/dump.hprof", false},
+               {"/skywalking/logs*", true},
+               {"/tmp/*.hprof", true},
+               {"/tmp/dump-[0-9].hprof", true},
+               {"/var/log/?oo", true},
+       }
+       for _, tt := range tests {
+               t.Run(tt.path, func(t *testing.T) {
+                       if got := containsGlob(tt.path); got != tt.want {
+                               t.Errorf("containsGlob(%q) = %v, want %v", 
tt.path, got, tt.want)
+                       }
+               })
+       }
+}
+
+func TestValidateGlobPattern(t *testing.T) {
+       tests := []struct {
+               pattern string
+               wantErr bool
+       }{
+               {"/skywalking/logs*", false},
+               {"/tmp/*.hprof", false},
+               {"/tmp/dump-[0-9].hprof", false},
+               {"/var/log/app-?.log", false},
+               {"/path with spaces/*", false},

Review Comment:
   This test asserts patterns with spaces (e.g. `/path with spaces/*`) are 
valid, but the implementation builds unquoted `sh -c`, `kubectl cp`, and 
`docker cp` command strings, so any expanded path containing spaces will be 
split and fail (and may unintentionally include multiple paths). Either adjust 
the implementation to correctly handle spaces end-to-end, or change the 
validation/test expectation to reject spaces.
   ```suggestion
                {"/path with spaces/*", true},
   ```



##########
internal/components/collector/kind.go:
##########
@@ -151,6 +159,65 @@ func collectPodDescribe(kubeConfigPath, outputDir, 
namespace, podName string) er
        return nil
 }
 
+// containsGlob reports whether the path contains glob metacharacters.
+func containsGlob(path string) bool {
+       return strings.ContainsAny(path, "*?[")
+}
+
+// validPathPattern matches paths that contain only safe characters for shell 
interpolation.
+// Allowed: alphanumeric, /, ., -, _, *, ?, [, ], and space.
+var validPathPattern = regexp.MustCompile(`^[a-zA-Z0-9/_.*?\[\]\- ]+$`)
+
+// validateGlobPattern checks that a glob pattern contains only safe characters
+// to prevent shell injection when interpolated into sh -c commands.
+func validateGlobPattern(pattern string) error {
+       if !validPathPattern.MatchString(pattern) {
+               return fmt.Errorf("glob pattern %q contains unsupported 
characters", pattern)
+       }
+       return nil
+}
+
+// expandPodGlob expands a glob pattern inside a pod. If the path has no glob
+// characters it is returned as-is. Otherwise kubectl exec runs sh to expand
+// the pattern and returns the matched paths.
+func expandPodGlob(kubeConfigPath, namespace, podName, container, pattern 
string) ([]string, error) {
+       if !containsGlob(pattern) {
+               return []string{pattern}, nil
+       }
+
+       if err := validateGlobPattern(pattern); err != nil {
+               return nil, err
+       }
+
+       cmd := fmt.Sprintf("kubectl --kubeconfig %s -n %s exec %s", 
kubeConfigPath, namespace, podName)
+       if container != "" {
+               cmd += fmt.Sprintf(" -c %s", container)
+       }
+       cmd += fmt.Sprintf(" -- sh -c 'ls -d -- %s 2>/dev/null || true'", 
pattern)

Review Comment:
   `expandPodGlob` interpolates `kubeConfigPath`, `namespace`, `podName`, and 
`container` into a command string executed via `util.ExecuteCommand` (bash). 
These values aren't shell-escaped/quoted, so spaces or shell metacharacters can 
break the command or lead to command injection (pod names/namespace come from 
the cluster; kubeconfig path comes from config). Prefer executing kubectl with 
arguments (no shell), or apply robust shell escaping to each interpolated value.



##########
internal/components/collector/kind.go:
##########
@@ -151,6 +159,65 @@ func collectPodDescribe(kubeConfigPath, outputDir, 
namespace, podName string) er
        return nil
 }
 
+// containsGlob reports whether the path contains glob metacharacters.
+func containsGlob(path string) bool {
+       return strings.ContainsAny(path, "*?[")

Review Comment:
   `containsGlob` treats any path containing `[` as a glob. That changes 
behavior for literal bracket paths (e.g. `/var/log/app[1].log`) which 
previously would have been copied as-is, contradicting the PR claim that 
non-glob paths are unchanged. Consider only treating `[` as a glob when a 
matching `]` exists (and ideally contains at least one non-`]` char), or 
provide/allow an escaping mechanism for literal `[`/`]`.
   ```suggestion
        // Fast path: standard glob wildcards.
        if strings.ContainsAny(path, "*?") {
                return true
        }
   
        // Handle bracket expressions like [0-9] more carefully so that literal
        // '[' characters (e.g. in "app[1].log") are not treated as globs.
        for i := 0; i < len(path); i++ {
                if path[i] != '[' {
                        continue
                }
   
                // Find the next closing ']'.
                for j := i + 1; j < len(path); j++ {
                        if path[j] != ']' {
                                continue
                        }
   
                        // There is a ']' after '[', now check that there is at 
least one
                        // non-']' character between them to indicate a real 
character
                        // class, not just "[]" or a malformed sequence.
                        for k := i + 1; k < j; k++ {
                                if path[k] != ']' {
                                        return true
                                }
                        }
   
                        // Found "[]" or a sequence of only ']'s; continue 
searching
                        // for another potential bracket expression.
                        break
                }
        }
   
        return false
   ```



##########
internal/components/collector/kind.go:
##########
@@ -151,6 +159,65 @@ func collectPodDescribe(kubeConfigPath, outputDir, 
namespace, podName string) er
        return nil
 }
 
+// containsGlob reports whether the path contains glob metacharacters.
+func containsGlob(path string) bool {
+       return strings.ContainsAny(path, "*?[")
+}
+
+// validPathPattern matches paths that contain only safe characters for shell 
interpolation.
+// Allowed: alphanumeric, /, ., -, _, *, ?, [, ], and space.
+var validPathPattern = regexp.MustCompile(`^[a-zA-Z0-9/_.*?\[\]\- ]+$`)
+
+// validateGlobPattern checks that a glob pattern contains only safe characters
+// to prevent shell injection when interpolated into sh -c commands.
+func validateGlobPattern(pattern string) error {
+       if !validPathPattern.MatchString(pattern) {
+               return fmt.Errorf("glob pattern %q contains unsupported 
characters", pattern)
+       }
+       return nil
+}
+
+// expandPodGlob expands a glob pattern inside a pod. If the path has no glob
+// characters it is returned as-is. Otherwise kubectl exec runs sh to expand
+// the pattern and returns the matched paths.
+func expandPodGlob(kubeConfigPath, namespace, podName, container, pattern 
string) ([]string, error) {
+       if !containsGlob(pattern) {
+               return []string{pattern}, nil
+       }
+
+       if err := validateGlobPattern(pattern); err != nil {
+               return nil, err
+       }
+
+       cmd := fmt.Sprintf("kubectl --kubeconfig %s -n %s exec %s", 
kubeConfigPath, namespace, podName)
+       if container != "" {
+               cmd += fmt.Sprintf(" -c %s", container)
+       }
+       cmd += fmt.Sprintf(" -- sh -c 'ls -d -- %s 2>/dev/null || true'", 
pattern)
+
+       stdout, stderr, err := util.ExecuteCommand(cmd)
+       if err != nil {
+               logger.Log.Warnf("failed to expand glob %s in pod %s/%s: %v, 
stderr: %s", pattern, namespace, podName, err, stderr)
+               return nil, fmt.Errorf("glob expansion failed for %s: %v", 
pattern, err)

Review Comment:
   On `kubectl exec` failure, the returned error drops `stderr` even though 
other collector helpers include it in the error (not just in logs). Including 
`stderr` in the returned error would make the aggregated "some files failed to 
collect" message actionable when logs aren't available.
   ```suggestion
                return nil, fmt.Errorf("glob expansion failed for %s: %v, 
stderr: %s", pattern, err, stderr)
   ```



##########
internal/components/collector/kind.go:
##########
@@ -151,6 +159,65 @@ func collectPodDescribe(kubeConfigPath, outputDir, 
namespace, podName string) er
        return nil
 }
 
+// containsGlob reports whether the path contains glob metacharacters.
+func containsGlob(path string) bool {
+       return strings.ContainsAny(path, "*?[")
+}
+
+// validPathPattern matches paths that contain only safe characters for shell 
interpolation.
+// Allowed: alphanumeric, /, ., -, _, *, ?, [, ], and space.
+var validPathPattern = regexp.MustCompile(`^[a-zA-Z0-9/_.*?\[\]\- ]+$`)

Review Comment:
   `validPathPattern` allows spaces, but both glob expansion (`ls -d -- 
<pattern>`) and subsequent copy commands (`kubectl cp` / `docker cp`) build 
unquoted shell command strings. Any matched path containing spaces will be 
split into multiple arguments, causing expansion/copy to fail and potentially 
allowing multiple unintended patterns/paths to be passed. Either disallow 
spaces in patterns/paths, or implement proper shell escaping / avoid the shell 
by using exec with argv.
   ```suggestion
   // Allowed: alphanumeric, /, ., -, _, *, ?, [, ].
   var validPathPattern = regexp.MustCompile(`^[a-zA-Z0-9/_.*?\[\]\-]+$`)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to