SWE-bench-secret: Automating AI Agent Evaluation for Software Engineering Tasks