Volume Filtering in 1-Minute Backtests

One-minute data is seductive — high resolution, lots of samples, “feels” like you’re closer to market truth.

But raw 1-minute bars lie. They mix real bars (liquid, information-bearing) with dust bars (low volume, random noise).
Without filtering, your study’s entire statistical foundation bends around that noise.

Why Volume Thresholds Matter

Including dust bars causes:

Inflated tails — random ticks masquerade as alpha.
Distorted wick structure — false volatility spikes.
Unstable paths — patterns collapse out-of-sample.

Filtering for minimum liquidity keeps the sample representative of meaningful market reactions rather than random microstructure drift.

How to Define Liquidity Thresholds

1. Relative to Recent Volume

vol_rel = df.volume / df.volume.rolling(50, min_periods=10).median()
cond_liquid = vol_rel >= 0.5
cond = cond_signal & cond_liquid

Adapts dynamically to local rhythm — robust across assets and regimes.

2. Relative to Time-of-Day

Volume naturally forms a U-shape intraday curve (open and close active, midday lull). To avoid penalizing quiet periods that are normal for their time:

tod_vol = df.groupby(df.index.time)['volume'].transform('median')
vol_z = (df.volume - tod_vol) / tod_vol.rolling(50, min_periods=10).std()
cond_liquid = vol_z > -1

This prevents misclassifying midday quiet bars as “illiquid.”

3. Absolute Floor (Fail-Safe)

cond_liquid &= df.volume > 50

Use an exchange-specific hard cutoff so ghost bars never slip through.

How Filtering Affects Confidence

Sample Size vs Reliability

Filtering cuts your n, but improves statistical quality:

Metric	Before Filter	After Filter
N events	800	650
Mean return	+11%	+12%
CI width	0.028	0.022

The loss of coverage is outweighed by reduced variance. Plot coverage (n_filtered / n_total) vs CI width — the “elbow” is your sweet spot.

Bias Risk

Signals often correlate with volume (e.g. breakouts cause spikes). Too aggressive a filter double-counts that effect.

Sanity check: Overlay volume-normalized results vs unfiltered. If median volume explodes at t=0, consider lighter filtering or stratified sampling.

Confidence Intervals via Bootstrap

boot = np.random.choice(cum_ret[:, h], size=1000, replace=True)
ci_low, ci_high = np.percentile(boot, [2.5, 97.5])

Compare before/after filtering — narrower bands imply more consistent paths.

Practical Heuristics

For liquid crypto or equities:

cond_liquid = df.volume >= 0.25 * df.volume.rolling(100).median()

Usually retains 75-90 % of events while cutting most noise.

💡 The sweet spot is where variance reduction outweighs sample loss — typically the 20–30 th percentile of relative volume.

Next-Step Ideas

You could visualize or cluster events by liquidity signature:

K-means on standardized (vol_rel, spread, vol_z) → groups of structurally similar bars
Heatmaps or violin plots in Tableau / Grafana / Plotly → show how pattern reliability scales with liquidity percentile
Volume-weighted confidence surfaces → 3D: {lookahead × vol percentile → mean return}

That exposes the regimes where your model genuinely holds — and where it’s just microstructure noise pretending to be edge.

Summary

1-minute bars contain both signal and dust.
Volume gating improves signal fidelity and confidence.
The best filters adapt to local rhythm (relative, not absolute).
Always test both filtered and unfiltered sets to detect volume-linked bias.
Evaluate trade-off via coverage vs CI width — the elbow defines your sweet spot.

Goal: Reduce variance faster than you reduce sample count.

That’s the hallmark of a robust micro-scale event study.

Some visualization Ideas

🧱 Base Data Schema (for Tableau or any BI tool)

You want each row to represent one event-window observation.

Column	Type	Role	Notes
`event_id`	Dimension	Unique identifier for each triggered event	Used for count distinct / grouping
`t_offset`	Dimension (numeric or discrete)	Relative bar index (e.g. –10 … +30)	X-axis for event-aligned plots
`return`	Measure	Event-aligned return at `t_offset`	Y-axis metric
`volume`	Measure	Raw traded volume for bar	Used for scaling or normalization
`vol_rel`	Measure	Relative volume (vs rolling median or time-of-day)	Primary liquidity feature
`signal_type`	Dimension	e.g., “breakout_up”, “mean_revert_down”	Filters in Tableau
`pair`	Dimension	e.g., “SOL-PERP”, “BTC-PERP”	Optional facet
`session`	Dimension	e.g., “Asia”, “US”, “EU” (could combine with DoW, DoM)	Datetime stratification
`standard tableau datetime dims`	Dimension	e.g., hour, day of week (DoW), etc	Datetime stratification
`cond_liquid`	Boolean Dimension	1 if above volume threshold	For split comparison
`sample_group`	Dimension	e.g., “Filtered” vs “Unfiltered”	Used for color/split

That’s all you need — everything else is derived.

📊 Core Tableau Views

1. Event Path Plot

Goal: Compare average return trajectories with and without volume filtering.

Role	Field
Columns	`t_offset`
Rows	`AVG([return])`
Color	`sample_group` (“Filtered” vs “Unfiltered”)
Tooltip	`N = COUNTD([event_id])`
Filter	`signal_type` = desired signal

→ Add a reference band for 0-return and optional confidence intervals.

2. CI Width vs Volume Percentile

If you precompute bootstrapped CI width for each horizon:

Role	Field
Columns	`vol_rel_percentile`
Rows	`ci_width`
Color	`t_offset`
Tooltip	coverage %, N events

→ Shows where reliability (narrow CI) improves most sharply — the “sweet spot” volume band.

3. Heatmap: Mean Return by (t_offset × Volume Decile)

Role	Field
Columns	`t_offset`
Rows	`vol_rel_decile`
Color	`AVG([return])`
Tooltip	mean ± std, N events

→ Reveals how return asymmetry or momentum persists differently under low/high liquidity regimes.

4. Distribution Comparison

Goal: Visualize volatility compression after filtering.

Role	Field
Columns	`sample_group`
Rows	`return` (histogram / boxplot)
Tooltip	median, IQR, N

→ Confirms the variance reduction effect quantitatively.

5. Coverage Curve

Role	Field
Columns	`volume_threshold` (as % of rolling median)
Rows	`coverage = N_filtered / N_total`
Secondary Axis	`ci_width`
Dual Axis Type	Line / Line
Color	`metric` (“Coverage” vs “CI Width”)

→ The intersection (“elbow”) shows the optimal filtering level.

🧮 Derived Calculations (in Tableau)

Field	Formula	Purpose
`vol_rel`	`[volume] / WINDOW_MEDIAN([volume])`	Relative liquidity
`coverage`	`COUNTD(IF [cond_liquid] THEN [event_id] END) / COUNTD([event_id])`	% retained after filter
`return_norm`	`[return] / STDEV([return])`	Optional normalization
`ci_width`	`WINDOW_PERCENTILE([return], 97.5) - WINDOW_PERCENTILE([return], 2.5)`	95% CI span

Why Volume Thresholds Matter​

How to Define Liquidity Thresholds​

1. Relative to Recent Volume​

2. Relative to Time-of-Day​

3. Absolute Floor (Fail-Safe)​

How Filtering Affects Confidence​

Sample Size vs Reliability​

Bias Risk​

Confidence Intervals via Bootstrap​

Practical Heuristics​

Next-Step Ideas​

Summary​

Some visualization Ideas​

🧱 Base Data Schema (for Tableau or any BI tool)​

📊 Core Tableau Views​

1. Event Path Plot​

2. CI Width vs Volume Percentile​

3. Heatmap: Mean Return by (t_offset × Volume Decile)​

4. Distribution Comparison​

5. Coverage Curve​

🧮 Derived Calculations (in Tableau)​