Dieharder Entropy Scorecard

Dieharder Test Suite • 500 sequences × 1,000,000 bits • Binary mode
Generated: 2026-04-20 19:37 UTC
Winner
System
Headline Verdict: Both sources show concerns in Dieharder tests. System demonstrates stronger performance with 14 failed and 6 weak tests, compared to QSE's 24 failed and 10 weak tests. Failed tests (p < 0.0001 or p > 0.9999) indicate potential randomness weaknesses.

QSE Overall

Passed: 82/116

Weak: 10 • Failed: 24

System Overall

Passed: 96/116

Weak: 6 • Failed: 14

Wins by Test

15 QSE • 13 System

4 ties

Key Risk / Weakest Tests

Source Weakest Test P-value Assessment
QSE rgb_lagged_sum 0.00688252 PASSED
System rgb_lagged_sum 0.00677909 PASSED
Assessment Criteria: Dieharder is a strength assessment tool, not a binary pass/fail test.
  • STRONG: No failed tests and <5% weak tests (some weak results are statistically expected with many tests)
  • REVIEW: No failed tests but ≥5% weak tests (may indicate statistical variation or require further investigation)
  • WEAK: One or more failed tests (p < 0.0001 or p > 0.9999) indicating potential randomness weaknesses
Test Reliability: Some Dieharder tests have known issues. Tests marked "Do Not Use" (e.g., diehard_sums) are automatically excluded. Tests marked "Suspect" (e.g., diehard_operm5) are included but flagged below.
Note: Dieharder evaluates statistical randomness. It does not alone certify cryptographic strength or "quantum resilience."

Per-Test Comparison (All Tests)

Test QSE P-value QSE Assessment System P-value System Assessment Winner
dab_birthdays1 0.32422668 PASSED 0.22105501 PASSED QSE
dab_bytedistrib 0.0 FAILED 0.0 FAILED Tie
dab_dct 0.01423758 PASSED 0.52281197 PASSED System
dab_filltree 0.68537192 PASSED 0.01268884 PASSED QSE
dab_filltree2 0.00088211 WEAK 9e-08 FAILED QSE
dab_monobit2 1.0 FAILED 1.0 FAILED Tie
dab_opso2 0.0 FAILED 0.0 FAILED Tie
diehard_2dsphere 0.8550421 PASSED 0.21198227 PASSED System
diehard_3dsphere 0.73169093 PASSED 0.20702679 PASSED QSE
diehard_birthdays 0.68948069 PASSED 0.68506382 PASSED System
diehard_bitstream 0.64906889 PASSED 0.70955138 PASSED QSE
diehard_count_1s_byt 0.65645361 PASSED 0.45952164 PASSED System
diehard_count_1s_str 0.11219792 PASSED 0.92278747 PASSED QSE
diehard_craps 0.00246887 WEAK 0.00020252 WEAK QSE
diehard_dna 0.59557509 PASSED 0.98496265 PASSED QSE
diehard_operm5 ⚠️ 0.36033879 PASSED 0.8149558 PASSED QSE
diehard_opso 0.26861769 PASSED 0.2340647 PASSED QSE
diehard_oqso 0.90230451 PASSED 0.13813094 PASSED System
diehard_parking_lot 0.37143226 PASSED 0.02719069 PASSED QSE
diehard_rank_32x32 0.13840187 PASSED 0.22385803 PASSED System
diehard_rank_6x8 0.12052741 PASSED 0.9981702 WEAK QSE
diehard_runs 0.09303189 PASSED 0.80231838 PASSED System
diehard_squeeze 0.02449221 PASSED 0.07693327 PASSED System
marsaglia_tsang_gcd 0.0 FAILED 0.0 FAILED Tie
rgb_bitdist 0.4951037 PASSED 0.51149519 PASSED QSE
rgb_kstest_test 0.13269817 PASSED 0.72946678 PASSED System
rgb_lagged_sum 0.06924377 PASSED 0.59383071 PASSED System
rgb_minimum_distance 0.00825861 PASSED 0.25365625 PASSED System
rgb_permutations 0.98330418 PASSED 0.56002545 PASSED System
sts_monobit 0.19813119 PASSED 0.81913595 PASSED QSE
sts_runs 0.9196863 PASSED 0.99291088 PASSED QSE
sts_serial 0.59063634 PASSED 0.48763708 PASSED System
⚠️ Tests marked with warning icon are "Suspect" per Dieharder documentation and may have known implementation issues.

⚠️ Suspect Tests (Known Issues)

Note: The following tests are marked as "Suspect" in Dieharder documentation due to known issues. For example, diehard_operm5 (-d 1) "seems to fail all generators in dieharder" and may have bugs in the original test implementation. Results from these tests should be interpreted with caution.

QSE Suspect Tests (1):

  • diehard_operm5 (p-value: 0.36033879)

System Suspect Tests (1):

  • diehard_operm5 (p-value: 0.8149558)

Understanding Dieharder Results

Dieharder vs NIST STS: Unlike NIST STS (which uses binary pass/fail criteria), Dieharder is a strength assessment tool that evaluates the distribution of p-values across multiple statistical tests. It is designed to "push a weak generator to unambiguous failure" rather than provide simple pass/fail results.

P-value Interpretation: For a truly random source, p-values should follow a uniform distribution between 0 and 1. Individual WEAK results (p < 0.005 or p > 0.995) are statistically expected with many tests, as approximately 1% of p-values should naturally fall in this range.

Comparison Methodology: When comparing two sources, the winner is determined by:

  1. Fewer FAILED tests (p < 0.0001 or p > 0.9999) - most important indicator of randomness weaknesses
  2. Fewer WEAK tests (if same number of failed tests) - secondary concern
  3. More PASSED tests (if same failed/weak counts) - indicates overall strength
  4. P-value distribution quality (if all counts equal) - p-values closer to 0.5 indicate more uniform distribution

Important: Individual test-by-test wins are less meaningful than overall failure counts. A source with fewer FAILED tests is stronger, regardless of individual test comparisons.

Test Reliability: Some Dieharder tests have known issues. Tests marked "Do Not Use" (e.g., diehard_sums) are automatically excluded from results. Tests marked "Suspect" (e.g., diehard_operm5) are included but flagged, as they may produce misleading results due to test implementation issues rather than generator weaknesses.

Assessment Levels:

  • PASSED: P-value between 0.005 and 0.995 (normal range)
  • WEAK: P-value between 0.0001-0.005 or 0.995-0.9999 (borderline, may indicate statistical variation)
  • FAILED: P-value < 0.0001 or > 0.9999 (extreme, indicates potential randomness weaknesses)

Overall Assessment: A source is considered STRONG if it has no FAILED tests and fewer than 5% WEAK tests. This accounts for expected statistical variation while flagging genuine concerns.

Recommended Next Steps

• Run multiple independent batches (e.g., 5 runs) with newly generated data for both sources.
• Increase sequences to 200–300 per run for stronger statistical confidence.
• Track stability: count how often any test hits WEAK or FAILED across runs.
• If FAILED tests appear consistently, investigate the specific test and entropy source.
• Archive all Dieharder reports and parameters for auditability.
— End of Report —