Volume Filtering in 1-Minute Backtests
One-minute data is seductive — high resolution, lots of samples, “feels” like you’re closer to market truth.
But raw 1-minute bars lie. They mix real bars (liquid, information-bearing) with dust bars (low volume, random noise).
Without filtering, your study’s entire statistical foundation bends around that noise.
Why Volume Thresholds Matter
Including dust bars causes:
- Inflated tails — random ticks masquerade as alpha.
- Distorted wick structure — false volatility spikes.
- Unstable paths — patterns collapse out-of-sample.
Filtering for minimum liquidity keeps the sample representative of meaningful market reactions rather than random microstructure drift.
How to Define Liquidity Thresholds
1. Relative to Recent Volume
vol_rel = df.volume / df.volume.rolling(50, min_periods=10).median()
cond_liquid = vol_rel >= 0.5
cond = cond_signal & cond_liquid
Adapts dynamically to local rhythm — robust across assets and regimes.
2. Relative to Time-of-Day
Volume naturally forms a U-shape intraday curve (open and close active, midday lull). To avoid penalizing quiet periods that are normal for their time:
tod_vol = df.groupby(df.index.time)['volume'].transform('median')
vol_z = (df.volume - tod_vol) / tod_vol.rolling(50, min_periods=10).std()
cond_liquid = vol_z > -1
This prevents misclassifying midday quiet bars as “illiquid.”
3. Absolute Floor (Fail-Safe)
cond_liquid &= df.volume > 50
Use an exchange-specific hard cutoff so ghost bars never slip through.
How Filtering Affects Confidence
Sample Size vs Reliability
Filtering cuts your n, but improves statistical quality:
| Metric | Before Filter | After Filter |
|---|---|---|
| N events | 800 | 650 |
| Mean return | +11% | +12% |
| CI width | 0.028 | 0.022 |
The loss of coverage is outweighed by reduced variance.
Plot coverage (n_filtered / n_total) vs CI width — the “elbow” is your sweet spot.
Bias Risk
Signals often correlate with volume (e.g. breakouts cause spikes). Too aggressive a filter double-counts that effect.
Sanity check: Overlay volume-normalized results vs unfiltered. If median volume explodes at t=0, consider lighter filtering or stratified sampling.
Confidence Intervals via Bootstrap
boot = np.random.choice(cum_ret[:, h], size=1000, replace=True)
ci_low, ci_high = np.percentile(boot, [2.5, 97.5])
Compare before/after filtering — narrower bands imply more consistent paths.
Practical Heuristics
For liquid crypto or equities:
cond_liquid = df.volume >= 0.25 * df.volume.rolling(100).median()
Usually retains 75-90 % of events while cutting most noise.
💡 The sweet spot is where variance reduction outweighs sample loss — typically the 20–30 th percentile of relative volume.
Next-Step Ideas
You could visualize or cluster events by liquidity signature:
- K-means on standardized (vol_rel, spread, vol_z) → groups of structurally similar bars
- Heatmaps or violin plots in Tableau / Grafana / Plotly → show how pattern reliability scales with liquidity percentile
- Volume-weighted confidence surfaces
→ 3D:
{lookahead × vol percentile → mean return}
That exposes the regimes where your model genuinely holds — and where it’s just microstructure noise pretending to be edge.
Summary
- 1-minute bars contain both signal and dust.
- Volume gating improves signal fidelity and confidence.
- The best filters adapt to local rhythm (relative, not absolute).
- Always test both filtered and unfiltered sets to detect volume-linked bias.
- Evaluate trade-off via coverage vs CI width — the elbow defines your sweet spot.
Goal: Reduce variance faster than you reduce sample count.
That’s the hallmark of a robust micro-scale event study.
Some visualization Ideas
🧱 Base Data Schema (for Tableau or any BI tool)
You want each row to represent one event-window observation.
| Column | Type | Role | Notes |
|---|---|---|---|
event_id | Dimension | Unique identifier for each triggered event | Used for count distinct / grouping |
t_offset | Dimension (numeric or discrete) | Relative bar index (e.g. –10 … +30) | X-axis for event-aligned plots |
return | Measure | Event-aligned return at t_offset | Y-axis metric |
volume | Measure | Raw traded volume for bar | Used for scaling or normalization |
vol_rel | Measure | Relative volume (vs rolling median or time-of-day) | Primary liquidity feature |
signal_type | Dimension | e.g., “breakout_up”, “mean_revert_down” | Filters in Tableau |
pair | Dimension | e.g., “SOL-PERP”, “BTC-PERP” | Optional facet |
session | Dimension | e.g., “Asia”, “US”, “EU” (could combine with DoW, DoM) | Datetime stratification |
standard tableau datetime dims | Dimension | e.g., hour, day of week (DoW), etc | Datetime stratification |
cond_liquid | Boolean Dimension | 1 if above volume threshold | For split comparison |
sample_group | Dimension | e.g., “Filtered” vs “Unfiltered” | Used for color/split |
That’s all you need — everything else is derived.
📊 Core Tableau Views
1. Event Path Plot
Goal: Compare average return trajectories with and without volume filtering.
| Role | Field |
|---|---|
| Columns | t_offset |
| Rows | AVG([return]) |
| Color | sample_group (“Filtered” vs “Unfiltered”) |
| Tooltip | N = COUNTD([event_id]) |
| Filter | signal_type = desired signal |
→ Add a reference band for 0-return and optional confidence intervals.
2. CI Width vs Volume Percentile
If you precompute bootstrapped CI width for each horizon:
| Role | Field |
|---|---|
| Columns | vol_rel_percentile |
| Rows | ci_width |
| Color | t_offset |
| Tooltip | coverage %, N events |
→ Shows where reliability (narrow CI) improves most sharply — the “sweet spot” volume band.
3. Heatmap: Mean Return by (t_offset × Volume Decile)
| Role | Field |
|---|---|
| Columns | t_offset |
| Rows | vol_rel_decile |
| Color | AVG([return]) |
| Tooltip | mean ± std, N events |
→ Reveals how return asymmetry or momentum persists differently under low/high liquidity regimes.
4. Distribution Comparison
Goal: Visualize volatility compression after filtering.
| Role | Field |
|---|---|
| Columns | sample_group |
| Rows | return (histogram / boxplot) |
| Tooltip | median, IQR, N |
→ Confirms the variance reduction effect quantitatively.
5. Coverage Curve
| Role | Field |
|---|---|
| Columns | volume_threshold (as % of rolling median) |
| Rows | coverage = N_filtered / N_total |
| Secondary Axis | ci_width |
| Dual Axis Type | Line / Line |
| Color | metric (“Coverage” vs “CI Width”) |
→ The intersection (“elbow”) shows the optimal filtering level.
🧮 Derived Calculations (in Tableau)
| Field | Formula | Purpose |
|---|---|---|
vol_rel | [volume] / WINDOW_MEDIAN([volume]) | Relative liquidity |
coverage | COUNTD(IF [cond_liquid] THEN [event_id] END) / COUNTD([event_id]) | % retained after filter |
return_norm | [return] / STDEV([return]) | Optional normalization |
ci_width | WINDOW_PERCENTILE([return], 97.5) - WINDOW_PERCENTILE([return], 2.5) | 95% CI span |